Mcp Evals

55 MIT

FreeCommunity

AI Systems

A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and performing well.

What is Mcp Evals

mcp-evals is a Node.js package and GitHub Action designed for evaluating Model Context Protocol (MCP) tool implementations using LLM-based scoring. It ensures that the tools within your MCP server function correctly and perform optimally.

Use cases

Use cases for mcp-evals include evaluating the performance of various tools within an MCP server, automating the evaluation process in CI/CD pipelines, and ensuring that tools meet specified accuracy and completeness criteria before deployment.

How to use

To use mcp-evals, first install it as a Node.js package using ‘npm install mcp-evals’ or integrate it into your GitHub Actions workflow by adding the specified YAML configuration. Create an evaluation file that exports your evaluation configuration, and run the evaluations either through the command line interface or as part of your CI/CD pipeline.

Key features

Key features of mcp-evals include LLM-based scoring for accurate evaluations, easy integration with Node.js and GitHub Actions, and customizable evaluation configurations that allow users to define specific criteria for tool performance.

Where to use

mcp-evals can be used in software development environments, particularly in projects that implement the Model Context Protocol. It is suitable for teams looking to ensure the reliability and performance of their MCP tools during development and deployment.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Mcp Evals

Use cases

How to use

Key features

Where to use

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

MCP Evals

A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring, with built-in observability support. This helps ensure your MCP server’s tools are working correctly, performing well, and are fully observable with integrated monitoring and metrics.

Installation

As a Node.js Package

npm install mcp-evals

As a GitHub Action

Add the following to your workflow file:

name: Run MCP Evaluations
on:
  pull_request:
    types: [opened, synchronize, reopened]
jobs:
  evaluate:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          
      - name: Install dependencies
        run: npm install
        
      - name: Run MCP Evaluations
        uses: mclenhard/[email protected]
        with:
          evals_path: 'src/evals/evals.ts'
          server_path: 'src/index.ts'
          openai_api_key: ${{ secrets.OPENAI_API_KEY }}
          model: 'gpt-4'  # Optional, defaults to gpt-4

Usage – Evals

1. Create Your Evaluation File

Create a file (e.g., evals.ts) that exports your evaluation configuration:

import { EvalConfig } from 'mcp-evals';
import { openai } from "@ai-sdk/openai";
import { grade, EvalFunction} from "mcp-evals";

const weatherEval: EvalFunction = {
    name: 'Weather Tool Evaluation',
    description: 'Evaluates the accuracy and completeness of weather information retrieval',
    run: async () => {
      const result = await grade(openai("gpt-4"), "What is the weather in New York?");
      return JSON.parse(result);
    }
};
const config: EvalConfig = {
    model: openai("gpt-4"),
    evals: [weatherEval]
  };
  
  export default config;
  
  export const evals = [
    weatherEval,
    // add other evals here
];

2. Run the Evaluations

As a Node.js Package

You can run the evaluations using the CLI:

npx mcp-eval path/to/your/evals.ts path/to/your/server.ts

As a GitHub Action

The action will automatically:

Run your evaluations
Post the results as a comment on the PR
Update the comment if the PR is updated

Evaluation Results

Each evaluation returns an object with the following structure:

interface EvalResult {
  accuracy: number;        // Score from 1-5
  completeness: number;    // Score from 1-5
  relevance: number;       // Score from 1-5
  clarity: number;         // Score from 1-5
  reasoning: number;       // Score from 1-5
  overall_comments: string; // Summary of strengths and weaknesses
}

Configuration

Environment Variables

OPENAI_API_KEY: Your OpenAI API key (required)

[!NOTE]
If you’re using this GitHub Action with open source software, enable data sharing in the OpenAI billing dashboard to claim 2.5 million free GPT-4o mini tokens per day, making this Action effectively free to use.

Evaluation Configuration

The EvalConfig interface requires:

model: The language model to use for evaluation (e.g., GPT-4)
evals: Array of evaluation functions to run

Each evaluation function must implement:

name: Name of the evaluation
description: Description of what the evaluation tests
run: Async function that takes a model and returns an EvalResult

Usage – Monitoring

Note: The metrics functionality is still in alpha. Features and APIs may change, and breaking changes are possible.

Add the following to your application before you initilize the MCP server.

import { metrics } from 'mcp-evals';
metrics.initialize(9090, { enableTracing: true, otelEndpoint: 'http://localhost:4318/v1/traces' });

Start the monitoring stack:

docker-compose up -d

Run your MCP server and it will automatically connect to the monitoring stack.

Accessing the Dashboards

Prometheus: http://localhost:9090
Grafana: http://localhost:3000 (username: admin, password: admin)
Jaeger UI: http://localhost:16686

Metrics Available

Tool Calls: Number of tool calls by tool name
Tool Errors: Number of errors by tool name
Tool Latency: Distribution of latency times by tool name

License

MIT

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers