Mcpworld

@SAAgenton a year ago

2 MIT

FreeCommunity

AI Systems

MCPWorld is an open-source platform for evaluating Computer-Using Agents via GUI, API, or Hybrid methods.

What is Mcpworld

MCPWorld is an open-source benchmarking framework designed for evaluating Computer-Using Agents (CUAs). It allows agents to interact with software applications through GUI, API (Model Context Protocol – MCP), or Hybrid methods.

Use cases

Use cases for MCPWorld include evaluating the performance of AI agents in software applications, automating repetitive tasks in development environments, and conducting research on agent interactions with software through different modalities.

How to use

To use MCPWorld, clone the repository from GitHub, install the necessary dependencies, and run the interactive agent demo within a Docker container. Follow the quick setup and installation instructions provided in the README.

Key features

MCPWorld offers a comprehensive task suite with approximately 170 tasks across 10+ open-source applications, supports GUI, API, and Hybrid interaction, provides white-box evaluation for precise task verification, ensures cross-platform compatibility via Docker, and features an extensible framework for adding new tasks and agents.

Where to use

MCPWorld can be used in research and development environments focused on artificial intelligence, software automation, and testing of computer-using agents across various applications.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Mcpworld

Use cases

How to use

Key features

Where to use

MCPWorld can be used in research and development environments focused on artificial intelligence, software automation, and testing of computer-using agents across various applications.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

MCPWorld: A Multi-Modal Test Platform for Computer-Using Agents (CUA)

MCPWorld is an open-source benchmarking framework designed for evaluating Computer-Using Agents (CUAs). It supports agents that interact with software applications via GUI, API (Model Context Protocol – MCP), or Hybrid methods.

🚀 Key Features

Comprehensive Task Suite
- ~170 tasks across 10+ open-source applications (VSCode, OBS, Zotero, etc.).
GUI, API, and Hybrid Interaction
- Integrated MCP support enables robust mixed-mode control, letting agents fall back to GUI when APIs are unavailable.
White-Box Evaluation
- Built-in evaluators inspect internal app signals or outputs for precise, reproducible task verification.
Cross-Platform via Docker
- Containerized environments ensure consistent setups on Linux, macOS, and Windows.
Extensible Framework
- Easily add new tasks, applications, or custom agents via clear folder structure and interfaces.

📦 Installation

Prerequisites

Docker
(Optional) VS Code + DevContainers extension

Quick Setup

git clone https://github.com/SAAgent/MCPWorld.git
cd MCPWorld
git submodule update --init PC-Canary

Then open the folder in VS Code and select Reopen in Container, or manually build the image according to the Dockerfile provided by PC-Canary.

🚩 Quickstart

🚀 Running the Interactive Agent Demo with Evaluation

These instructions assume you are running commands inside the DevContainer.

Install Dependencies:

First, ensure all Python dependencies for the agent demo are installed:
```
pip install -r computer-use-demo/computer_use_demo/requirements.txt
```
Start Required Services:

You’ll need to start several services. It’s recommended to run each in a separate terminal session within the container, or run them in the background.
- VNC Server: This provides the graphical desktop environment for the agent. The xstartup script configured in the Dockerfile will prepare an XFCE session.
```
vncserver -xstartup ~/.vnc/xstartup -geometry 1024x768 :4
```
  This typically makes VNC available on port 5904.
- noVNC Proxy: This allows you to access the VNC session via a web browser.
```
/opt/noVNC/utils/novnc_proxy \
    --vnc localhost:5904 \
    --listen 0.0.0.0:6080 \
    --web /opt/noVNC > /tmp/novnc.log 2>&1 &
```
- Main Page HTTP Server: This server provides a unified entry point to access both VNC and the Streamlit UI.
```
python computer-use-demo/image/http_server.py > /tmp/http_server.log 2>&1 &
```
- Agent Demo & Evaluator UI (Streamlit App): This application serves as the control panel for running tasks with the agent and viewing evaluation results.
```
cd computer-use-demo
STREAMLIT_SERVER_PORT=8501 python -m streamlit run computer_use_demo/streamlit.py > /tmp/streamlit.log 2>&1 &
```
Accessing the Demo:
- Unified Interface: Access the main entry page via your web browser at http://localhost:8081. This page should provide links to the VNC desktop and the Agent/Evaluator Streamlit UI.
- VNC Desktop (Direct): Access the agent’s desktop environment directly via http://localhost:6080.
- Agent & Evaluator UI (Direct): Open http://localhost:8501 directly to interact with the Streamlit application.
Through the Streamlit UI (or by direct interaction if using the headless mode below), you can assign tasks to the agent. The agent will then interact with applications within the VNC desktop environment. The Evaluator will monitor and report on the agent’s performance.

🧪 Headless Agent & Evaluator Execution (CLI-Only)

For scenarios where a UI is not needed or desired (e.g., automated batch testing), you can run the agent and evaluator directly from the command line using the run_pure_computer_use_with_eval.py script. This script handles the interaction loop and evaluation process without launching the Streamlit web interface.

Prerequisites:

Ensure the VNC server is running as described in the “Interactive Agent Demo” section if your tasks require GUI interaction. The VNC server provides the environment for the agent to operate in.
Ensure you have set your Anthropic API key, either via the --api_key argument or the ANTHROPIC_API_KEY environment variable.

Example Command:

python computer-use-demo/run_pure_computer_use_with_eval.py \
  --api_key <YOUR_ANTHROPIC_API_KEY> \
  --model claude-3-7-sonnet-20250219 \
  --task_id telegram/task01_search \
  --log_dir logs_computer_use_eval \
  --exec_mode mixed

This script will output agent interactions and evaluation events directly to the console. Final results and detailed logs will be saved in the directory specified by --log_dir.

📚 Documentation

Tasks: See PC-Canary/tests/tasks/ for JSON/JS/Python configs.
Agents: Reference implementations in computer-use-demo/.
Extension: Add new apps/tasks/agents as described in docs (Update in progress).
Evaluation: White-box evaluators guarantee objective metrics.

📝 License

Released under the MIT License.

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers