- Explore MCP Servers
- MCPWorld
Mcpworld
What is Mcpworld
MCPWorld is an open-source benchmarking framework designed for evaluating Computer-Using Agents (CUAs). It allows agents to interact with software applications through GUI, API (Model Context Protocol – MCP), or Hybrid methods.
Use cases
Use cases for MCPWorld include evaluating the performance of AI agents in software applications, automating repetitive tasks in development environments, and conducting research on agent interactions with software through different modalities.
How to use
To use MCPWorld, clone the repository from GitHub, install the necessary dependencies, and run the interactive agent demo within a Docker container. Follow the quick setup and installation instructions provided in the README.
Key features
MCPWorld offers a comprehensive task suite with approximately 170 tasks across 10+ open-source applications, supports GUI, API, and Hybrid interaction, provides white-box evaluation for precise task verification, ensures cross-platform compatibility via Docker, and features an extensible framework for adding new tasks and agents.
Where to use
MCPWorld can be used in research and development environments focused on artificial intelligence, software automation, and testing of computer-using agents across various applications.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Overview
What is Mcpworld
MCPWorld is an open-source benchmarking framework designed for evaluating Computer-Using Agents (CUAs). It allows agents to interact with software applications through GUI, API (Model Context Protocol – MCP), or Hybrid methods.
Use cases
Use cases for MCPWorld include evaluating the performance of AI agents in software applications, automating repetitive tasks in development environments, and conducting research on agent interactions with software through different modalities.
How to use
To use MCPWorld, clone the repository from GitHub, install the necessary dependencies, and run the interactive agent demo within a Docker container. Follow the quick setup and installation instructions provided in the README.
Key features
MCPWorld offers a comprehensive task suite with approximately 170 tasks across 10+ open-source applications, supports GUI, API, and Hybrid interaction, provides white-box evaluation for precise task verification, ensures cross-platform compatibility via Docker, and features an extensible framework for adding new tasks and agents.
Where to use
MCPWorld can be used in research and development environments focused on artificial intelligence, software automation, and testing of computer-using agents across various applications.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Content
MCPWorld: A Multi-Modal Test Platform for Computer-Using Agents (CUA)
MCPWorld is an open-source benchmarking framework designed for evaluating Computer-Using Agents (CUAs). It supports agents that interact with software applications via GUI, API (Model Context Protocol – MCP), or Hybrid methods.
🚀 Key Features
-
Comprehensive Task Suite
- ~170 tasks across 10+ open-source applications (VSCode, OBS, Zotero, etc.).
-
GUI, API, and Hybrid Interaction
- Integrated MCP support enables robust mixed-mode control, letting agents fall back to GUI when APIs are unavailable.
-
White-Box Evaluation
- Built-in evaluators inspect internal app signals or outputs for precise, reproducible task verification.
-
Cross-Platform via Docker
- Containerized environments ensure consistent setups on Linux, macOS, and Windows.
-
Extensible Framework
- Easily add new tasks, applications, or custom agents via clear folder structure and interfaces.
📦 Installation
Prerequisites
- Docker
- (Optional) VS Code + DevContainers extension
Quick Setup
git clone https://github.com/SAAgent/MCPWorld.git
cd MCPWorld
git submodule update --init PC-Canary
Then open the folder in VS Code and select Reopen in Container, or manually build the image according to the Dockerfile provided by PC-Canary.
🚩 Quickstart
🚀 Running the Interactive Agent Demo with Evaluation
These instructions assume you are running commands inside the DevContainer.
-
Install Dependencies:
First, ensure all Python dependencies for the agent demo are installed:
pip install -r computer-use-demo/computer_use_demo/requirements.txt -
Start Required Services:
You’ll need to start several services. It’s recommended to run each in a separate terminal session within the container, or run them in the background.
-
VNC Server: This provides the graphical desktop environment for the agent. The
xstartupscript configured in the Dockerfile will prepare an XFCE session.vncserver -xstartup ~/.vnc/xstartup -geometry 1024x768 :4This typically makes VNC available on port
5904. -
noVNC Proxy: This allows you to access the VNC session via a web browser.
/opt/noVNC/utils/novnc_proxy \ --vnc localhost:5904 \ --listen 0.0.0.0:6080 \ --web /opt/noVNC > /tmp/novnc.log 2>&1 & -
Main Page HTTP Server: This server provides a unified entry point to access both VNC and the Streamlit UI.
python computer-use-demo/image/http_server.py > /tmp/http_server.log 2>&1 & -
Agent Demo & Evaluator UI (Streamlit App): This application serves as the control panel for running tasks with the agent and viewing evaluation results.
cd computer-use-demo STREAMLIT_SERVER_PORT=8501 python -m streamlit run computer_use_demo/streamlit.py > /tmp/streamlit.log 2>&1 &
-
-
Accessing the Demo:
- Unified Interface: Access the main entry page via your web browser at
http://localhost:8081. This page should provide links to the VNC desktop and the Agent/Evaluator Streamlit UI. - VNC Desktop (Direct): Access the agent’s desktop environment directly via
http://localhost:6080. - Agent & Evaluator UI (Direct): Open
http://localhost:8501directly to interact with the Streamlit application.
Through the Streamlit UI (or by direct interaction if using the headless mode below), you can assign tasks to the agent. The agent will then interact with applications within the VNC desktop environment. The Evaluator will monitor and report on the agent’s performance.
- Unified Interface: Access the main entry page via your web browser at
🧪 Headless Agent & Evaluator Execution (CLI-Only)
For scenarios where a UI is not needed or desired (e.g., automated batch testing), you can run the agent and evaluator directly from the command line using the run_pure_computer_use_with_eval.py script. This script handles the interaction loop and evaluation process without launching the Streamlit web interface.
Prerequisites:
- Ensure the VNC server is running as described in the “Interactive Agent Demo” section if your tasks require GUI interaction. The VNC server provides the environment for the agent to operate in.
- Ensure you have set your Anthropic API key, either via the
--api_keyargument or theANTHROPIC_API_KEYenvironment variable.
Example Command:
python computer-use-demo/run_pure_computer_use_with_eval.py \ --api_key <YOUR_ANTHROPIC_API_KEY> \ --model claude-3-7-sonnet-20250219 \ --task_id telegram/task01_search \ --log_dir logs_computer_use_eval \ --exec_mode mixed
This script will output agent interactions and evaluation events directly to the console. Final results and detailed logs will be saved in the directory specified by --log_dir.
📚 Documentation
- Tasks: See
PC-Canary/tests/tasks/for JSON/JS/Python configs. - Agents: Reference implementations in
computer-use-demo/. - Extension: Add new apps/tasks/agents as described in docs (Update in progress).
- Evaluation: White-box evaluators guarantee objective metrics.
📝 License
Released under the MIT License.
Dev Tools Supporting MCP
The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.










