- Explore MCP Servers
- mcp-evals
Mcp Evals
What is Mcp Evals
mcp-evals is a Node.js package and GitHub Action designed for evaluating Model Context Protocol (MCP) tool implementations using LLM-based scoring. It ensures that the tools within your MCP server function correctly and perform optimally.
Use cases
Use cases for mcp-evals include evaluating the performance of various tools within an MCP server, automating the evaluation process in CI/CD pipelines, and ensuring that tools meet specified accuracy and completeness criteria before deployment.
How to use
To use mcp-evals, first install it as a Node.js package using ‘npm install mcp-evals’ or integrate it into your GitHub Actions workflow by adding the specified YAML configuration. Create an evaluation file that exports your evaluation configuration, and run the evaluations either through the command line interface or as part of your CI/CD pipeline.
Key features
Key features of mcp-evals include LLM-based scoring for accurate evaluations, easy integration with Node.js and GitHub Actions, and customizable evaluation configurations that allow users to define specific criteria for tool performance.
Where to use
mcp-evals can be used in software development environments, particularly in projects that implement the Model Context Protocol. It is suitable for teams looking to ensure the reliability and performance of their MCP tools during development and deployment.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Overview
What is Mcp Evals
mcp-evals is a Node.js package and GitHub Action designed for evaluating Model Context Protocol (MCP) tool implementations using LLM-based scoring. It ensures that the tools within your MCP server function correctly and perform optimally.
Use cases
Use cases for mcp-evals include evaluating the performance of various tools within an MCP server, automating the evaluation process in CI/CD pipelines, and ensuring that tools meet specified accuracy and completeness criteria before deployment.
How to use
To use mcp-evals, first install it as a Node.js package using ‘npm install mcp-evals’ or integrate it into your GitHub Actions workflow by adding the specified YAML configuration. Create an evaluation file that exports your evaluation configuration, and run the evaluations either through the command line interface or as part of your CI/CD pipeline.
Key features
Key features of mcp-evals include LLM-based scoring for accurate evaluations, easy integration with Node.js and GitHub Actions, and customizable evaluation configurations that allow users to define specific criteria for tool performance.
Where to use
mcp-evals can be used in software development environments, particularly in projects that implement the Model Context Protocol. It is suitable for teams looking to ensure the reliability and performance of their MCP tools during development and deployment.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Content
MCP Evals
A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring, with built-in observability support. This helps ensure your MCP server’s tools are working correctly, performing well, and are fully observable with integrated monitoring and metrics.
Installation
As a Node.js Package
npm install mcp-evals
As a GitHub Action
Add the following to your workflow file:
name: Run MCP Evaluations
on:
pull_request:
types: [opened, synchronize, reopened]
jobs:
evaluate:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install dependencies
run: npm install
- name: Run MCP Evaluations
uses: mclenhard/[email protected]
with:
evals_path: 'src/evals/evals.ts'
server_path: 'src/index.ts'
openai_api_key: ${{ secrets.OPENAI_API_KEY }}
model: 'gpt-4' # Optional, defaults to gpt-4
Usage – Evals
1. Create Your Evaluation File
Create a file (e.g., evals.ts) that exports your evaluation configuration:
import { EvalConfig } from 'mcp-evals';
import { openai } from "@ai-sdk/openai";
import { grade, EvalFunction} from "mcp-evals";
const weatherEval: EvalFunction = {
name: 'Weather Tool Evaluation',
description: 'Evaluates the accuracy and completeness of weather information retrieval',
run: async () => {
const result = await grade(openai("gpt-4"), "What is the weather in New York?");
return JSON.parse(result);
}
};
const config: EvalConfig = {
model: openai("gpt-4"),
evals: [weatherEval]
};
export default config;
export const evals = [
weatherEval,
// add other evals here
];
2. Run the Evaluations
As a Node.js Package
You can run the evaluations using the CLI:
npx mcp-eval path/to/your/evals.ts path/to/your/server.ts
As a GitHub Action
The action will automatically:
- Run your evaluations
- Post the results as a comment on the PR
- Update the comment if the PR is updated
Evaluation Results
Each evaluation returns an object with the following structure:
interface EvalResult {
accuracy: number; // Score from 1-5
completeness: number; // Score from 1-5
relevance: number; // Score from 1-5
clarity: number; // Score from 1-5
reasoning: number; // Score from 1-5
overall_comments: string; // Summary of strengths and weaknesses
}
Configuration
Environment Variables
OPENAI_API_KEY: Your OpenAI API key (required)
[!NOTE]
If you’re using this GitHub Action with open source software, enable data sharing in the OpenAI billing dashboard to claim 2.5 million free GPT-4o mini tokens per day, making this Action effectively free to use.
Evaluation Configuration
The EvalConfig interface requires:
model: The language model to use for evaluation (e.g., GPT-4)evals: Array of evaluation functions to run
Each evaluation function must implement:
name: Name of the evaluationdescription: Description of what the evaluation testsrun: Async function that takes a model and returns anEvalResult
Usage – Monitoring
Note: The metrics functionality is still in alpha. Features and APIs may change, and breaking changes are possible.
- Add the following to your application before you initilize the MCP server.
import { metrics } from 'mcp-evals';
metrics.initialize(9090, { enableTracing: true, otelEndpoint: 'http://localhost:4318/v1/traces' });
- Start the monitoring stack:
docker-compose up -d
- Run your MCP server and it will automatically connect to the monitoring stack.
Accessing the Dashboards
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000 (username: admin, password: admin)
- Jaeger UI: http://localhost:16686
Metrics Available
- Tool Calls: Number of tool calls by tool name
- Tool Errors: Number of errors by tool name
- Tool Latency: Distribution of latency times by tool name
License
MIT
Dev Tools Supporting MCP
The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.










