Smolval

1 MIT

FreeCommunity

AI Systems

A lightweight MCP server evaluation agent

What is Smolval

smolval is a lightweight Python application designed for evaluating MCP (Model Context Protocol) servers using LLM (Large Language Model) agents. It employs a ReAct (Reason + Act) pattern to systematically test MCP server implementations through structured evaluation prompts.

Use cases

Use cases for smolval include evaluating the performance of different MCP servers, comparing LLM outputs across various providers, and conducting batch evaluations for research or development purposes.

How to use

To use smolval, first set up your API key for the desired LLM provider. Then, install the necessary MCP servers and run evaluations using the command line interface. Results will be generated in various formats such as JSON, CSV, Markdown, and HTML.

Key features

Key features of smolval include support for multiple LLM providers (Anthropic Claude, OpenAI, Google Gemini, and Ollama), batch evaluations, cross-provider model comparisons, and the ability to output results in multiple formats.

Where to use

smolval can be used in fields such as AI development, machine learning research, and software testing, particularly for evaluating and comparing different MCP server implementations.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Smolval

Use cases

How to use

Key features

Where to use

smolval can be used in fields such as AI development, machine learning research, and software testing, particularly for evaluating and comparing different MCP server implementations.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

smolval

A lightweight, containerized Python application for evaluating MCP (Model Context Protocol) servers using Claude Code CLI. smolval provides a self-contained Docker environment with Claude Code CLI, development tools, and MCP server support built-in for systematic MCP server evaluation.

✨ Features

Self-Contained Container: Claude Code CLI and all tools pre-installed, zero host dependencies
MCP Server Evaluation: Systematic testing using Claude Code’s agent capabilities
Docker-in-Docker Support: Full MCP server isolation with container support
Multiple Output Formats: JSON, CSV, Markdown, and HTML results
Progress Indicators: Visual feedback during evaluation with elapsed time tracking
Standard Configuration: Uses .mcp.json format compatible with Claude Desktop/Cursor

🚀 Quick Start

Prerequisites

Docker
ANTHROPIC_API_KEY

One-Command Setup

# Clone repository
git clone https://github.com/austinlparker/smolval.git
cd smolval

# Build container (includes Claude Code CLI and all tools)
docker build -t ghcr.io/austinlparker/smolval .

# Run your first evaluation - no local installation required!
docker run --rm \
  -v $(pwd):/workspace \
  -e ANTHROPIC_API_KEY \
  ghcr.io/austinlparker/smolval eval /workspace/prompts/simple_test.txt

For MCP Servers Requiring Docker

# Enable Docker-in-Docker for containerized MCP servers
docker run --rm \
  -v $(pwd):/workspace \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -e ANTHROPIC_API_KEY \
  ghcr.io/austinlparker/smolval eval /workspace/prompts/database-test.txt

📖 Documentation

Comprehensive documentation is available in the docs/ directory:

Getting Started - Installation and setup guide
CLI Reference - Complete command-line documentation
Configuration - Configuration options and examples
Writing Prompts - Guide to creating effective evaluation prompts
Examples - Sample prompts and configurations
Architecture - Technical design and implementation details

🛠️ Commands

Single Evaluation

# Basic evaluation using Claude Code's built-in tools
docker run --rm \
  -v $(pwd):/workspace \
  -e ANTHROPIC_API_KEY \
  ghcr.io/austinlparker/smolval eval /workspace/prompts/file-test.txt

# With specific output format
docker run --rm \
  -v $(pwd):/workspace \
  -e ANTHROPIC_API_KEY \
  ghcr.io/austinlparker/smolval eval /workspace/prompts/file-test.txt --format html

Custom MCP Configuration

# Use custom .mcp.json configuration
docker run --rm \
  -v $(pwd):/workspace \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -e ANTHROPIC_API_KEY \
  ghcr.io/austinlparker/smolval eval /workspace/prompts/test.txt --mcp-config /workspace/.mcp.json

Run docker run --rm ghcr.io/austinlparker/smolval --help for all options.

⚙️ Configuration

smolval uses the standard .mcp.json configuration format compatible with Claude Desktop and Cursor:

{
  "mcpServers": {
    "sqlite": {
      "command": "docker",
      "args": [
        "run",
        "--rm",
        "-i",
        "mcp/sqlite"
      ],
      "env": {}
    },
    "filesystem": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-filesystem",
        "/workspace"
      ],
      "env": {}
    }
  }
}

Note: Claude Code has filesystem and web fetch capabilities built-in, so MCP servers are only needed for additional functionality like databases, APIs, etc.

Example configurations are available in the docs/examples/ directory.

🐳 Container Features

Pre-installed Tools

Claude Code CLI: Latest version ready to use
Node.js & npm/npx: For NPM-based MCP servers
Docker CLI: For containerized MCP servers
Development Tools: git, vim, tree, jq, uvx
MCP Servers: Common servers pre-installed for faster startup

Volume Mounting Strategy

# Basic workspace mount
-v $(pwd):/workspace

# Docker-in-Docker support
-v /var/run/docker.sock:/var/run/docker.sock

# Custom output directory
-v $(pwd)/results:/results

Environment Variables

# Required
-e ANTHROPIC_API_KEY

# Optional
-e CLAUDE_CONFIG_DIR=/app/.claude

🧪 Testing

Run the test suite in container:

# Build development image
docker build -t ghcr.io/austinlparker/smolval:dev .

# Unit tests only
docker run --rm \
  -v $(pwd):/workspace \
  -w /workspace \
  ghcr.io/austinlparker/smolval:dev uv run pytest -m "not integration and not slow"

# All tests
docker run --rm \
  -v $(pwd):/workspace \
  -w /workspace \
  ghcr.io/austinlparker/smolval:dev uv run pytest

# With coverage
docker run --rm \
  -v $(pwd):/workspace \
  -w /workspace \
  ghcr.io/austinlparker/smolval:dev uv run pytest --cov=smolval --cov-report=html

See tests/README.md for detailed testing information.

📊 MCP Server Support

smolval supports various MCP server types through the container:

Built-in Claude Code Tools: Filesystem operations, web content fetching
Pre-installed NPM: @modelcontextprotocol/server-filesystem, @modelcontextprotocol/server-memory
Docker-based: mcp/sqlite, custom containers via Docker-in-Docker
Python-based: Any uvx-installable MCP servers

🔧 Development

Container-First Development

# Interactive development container
docker run -it --rm \
  -v $(pwd):/workspace \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -e ANTHROPIC_API_KEY \
  -w /workspace \
  ghcr.io/austinlparker/smolval:dev bash

# Code quality checks in container
docker run --rm \
  -v $(pwd):/workspace \
  -w /workspace \
  ghcr.io/austinlparker/smolval:dev bash -c "uv run black src/ tests/ && uv run isort src/ tests/ && uv run ruff check src/ tests/ && uv run mypy src/"

Project Structure

smolval/
├── src/smolval/          # Main application code
├── docs/                 # Documentation
├── tests/                # Test suite
├── prompts/              # Example evaluation prompts
└── results/              # Generated evaluation results

📋 Requirements

Docker: Only host system requirement
ANTHROPIC_API_KEY: Environment variable for Claude Code CLI

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please read the Architecture documentation and check the test suite before submitting changes.

Fork the repository
Create a feature branch
Make your changes with tests
Run the quality checks
Submit a pull request

📚 Learn More

Model Context Protocol - Learn about MCP
ReAct Pattern - The reasoning pattern used by smolval
Project Documentation - Comprehensive guides and references

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

AMap MCP Server Amap Maps is a server that supports any MCP protocol client, allowing users to easily utilize the Amap Maps MCP server for various location-based services.

View All MCP Servers