Web Runner Mcp Llm

@sinzy0925on 10 months ago

2 MIT

FreeCommunity

AI Systems

Web-Runner-mcp enables AI agents to automate web browsing using Playwright via MCP.

What is Web Runner Mcp Llm

Web-Runner-mcp-llm is a Python project that leverages Playwright’s browser automation capabilities, making it accessible for AI agents and applications through the Model Context Protocol (MCP).

Use cases

Use cases include automating data collection from websites, performing automated testing of web applications, and enabling AI agents to interact with web content dynamically.

How to use

To use web-runner-mcp-llm, set up the environment, start the server, create JSON data for operations, and execute commands either via command line or a GUI client. Detailed steps are provided in the README.

Key features

Key features include support for various browser actions, PDF text extraction, and robust error handling, enabling efficient web interactions.

Where to use

Web-Runner-mcp-llm can be used in fields such as AI development, web scraping, automated testing, and any application requiring web interaction.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Web Runner Mcp Llm

Web-Runner-mcp-llm is a Python project that leverages Playwright’s browser automation capabilities, making it accessible for AI agents and applications through the Model Context Protocol (MCP).

Use cases

Use cases include automating data collection from websites, performing automated testing of web applications, and enabling AI agents to interact with web content dynamically.

How to use

Key features

Key features include support for various browser actions, PDF text extraction, and robust error handling, enabling efficient web interactions.

Where to use

Web-Runner-mcp-llm can be used in fields such as AI development, web scraping, automated testing, and any application requiring web interaction.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

Web-Runner-mcp: Advanced Web Browser Operation Protocol for AI

Web-Runner-mcp is a Python project designed to make Playwright’s powerful browser automation capabilities easily accessible to AI agents and other applications through the standardized Model Context Protocol (MCP).

Web-Runner Logo

Overview
Why Web-Runner-mcp?
Key Features
Usage
JSON Format (Reference)
Comparison with Other Tools
Future Plans
Contributing
License

Overview

Information gathering and interaction with the web are essential for today’s AI agents, but existing tools have limitations. While simple content retrieval or fetching search result lists is possible, tasks like interacting with login-required sites, handling pages rendered with complex JavaScript, navigating iframe structures, and processing PDF content remain challenging. Furthermore, reliably controlling low-level APIs like Playwright directly from Large Language Models (LLMs) presents a significant hurdle.

Web-Runner-mcp proposes a new approach to tackle these challenges.

Instead of instructing the LLM to perform individual browser operations, Web-Runner-mcp allows you to define a sequence of desired operations in a JSON format and pass it to an MCP server for execution. The current version executes these operations reliably based on the JSON file instructions, without direct LLM involvement in the browser control loop itself.

This might be a “small revolution” in how AI interacts with the web, opening doors to the deeper, more complex parts of the web that were previously inaccessible to AI.

Why Web-Runner-mcp?

Advanced Web Operations:
- Login: Access and interact with websites requiring authentication.
- PDF: Download linked PDFs and extract their text content.
- Iframe: Explore and interact with elements within nested iframes (dynamic discovery).
- Multiple Tabs/Pages: Follow new pages opened by clicks.
- Dynamic Content: Wait for and interact with elements generated by JavaScript.
Versatile Data Extraction:
- Flexible text/HTML retrieval using innerText, textContent, innerHTML.
- Get specific attribute values using getAttribute.
- Efficient data collection from multiple elements using getAllAttributes, getAllTextContents (with dynamic iframe discovery).
Declarative Operation Definition:
- Describe the desired steps in JSON.
- Ensures reproducibility and simplifies debugging.
MCP Compliance:
- Standardized protocol enables integration with various MCP clients (Dify custom tools, Python AI agent frameworks, custom clients, etc.).
- Separates client and server concerns.
Reliable Execution:
- Stable browser operations powered by Playwright.
- Appropriate waiting mechanisms and error handling.

Key Features

MCP Server (web_runner_mcp_server.py): Implemented in Python (based on FastMCP), exposes Web-Runner functionality as the execute_web_runner tool.
Web-Runner Core (playwright_handler.py, utils.py, config.py): Uses Playwright (async) to execute browser operations based on input JSON. Handles core logic, settings, utility functions, dynamic iframe discovery, and PDF processing.
Web-Runner Standalone Execution (main.py): An entry point for running Web-Runner directly from the command line without the MCP server (for debugging and unit testing).
MCP Client Core (web_runner_mcp_client_core.py): Provides the core function (execute_web_runner_via_mcp) for invoking the MCP server programmatically (e.g., from AI agents).
GUI Client (web_runner_mcp_client_GUI.py): A convenient graphical interface for selecting JSON files, running tasks manually, and launching the JSON generator.

Supported Actions

click: Clicks an element.
input: Enters text into an element.
hover: Hovers over an element.
get_inner_text, get_text_content, get_inner_html: Gets text/HTML (single element).
get_attribute: Gets an attribute value (single element).
get_all_attributes, get_all_text_contents: Gets attribute values/text content as a list (multiple elements, searches within iframes).
wait_visible: Waits for an element to become visible.
select_option: Selects an option from a dropdown list.
screenshot: Saves a screenshot of the page or an element (server-side).
scroll_page_to_bottom, scroll_to_element: Performs scroll operations.
wait_page_load: Waits for the page to finish loading.
sleep: Pauses execution for a specified duration.
switch_to_iframe, switch_to_parent_frame: Moves focus between iframes (explicitly specified).

PDF Text Extraction

Automatically downloads PDFs linked via get_attribute(href=...) or get_all_attributes(href=...) and includes the extracted text in the results.

Error Handling

Records error information for each step, including the screenshot path (on the server’s filesystem) if an error occurs.

Usage

1. Setup

(1) Clone the repository:

git clone https://github.com/sinzy0925/web-runner-mcp.git
cd web-runner-mcp

(2) Prepare Python environment (Python 3.12+ recommended):

# Create a virtual environment (e.g., venv312)
python -m venv venv312
# Activate the virtual environment
# Windows PowerShell
.\venv312\Scripts\Activate
# Linux/macOS
source venv312/bin/activate

(3) Install dependencies:
Install using the requirements.txt file.

pip install -r requirements.txt

(4) Install Playwright browsers:

playwright install

2. Starting the Server (SSE Mode Example)

Note: This mode has not been fully verified and might require adjustments.
To allow access over the network (e.g., for Dify integration), start the server in SSE mode.

# Run web_runner_mcp_server.py directly
python web_runner_mcp_server.py --transport sse --host 0.0.0.0 --port 8000

Use --host 0.0.0.0 to allow access from other machines. Use 127.0.0.1 (default) for local access only.
--port 8000 specifies the port the server listens on.
Server logs are output to web_runner_mcp_server.log (default setting).

3. Creating JSON Data for Web-Runner

You can use the included json_generator.html to interactively create the JSON file in your browser.

Step 1: Prepare the JSON Generator

Open the json_generator.html file located in the project folder with your web browser (double-click).

Step 2: Get CSS Selectors for Target Elements

Open the target website you want to automate in a separate browser tab or window.
Open the developer tools on that page (usually F12 key or right-click > “Inspect”/“Inspect Element”).
Click the element selection icon (↖) in the developer tools.
Click the element you want to interact with (button, input field, etc.) on the webpage.
In the developer tools, right-click the highlighted HTML element and select [Copy] > [Copy selector].

Step 3: Create Operation Steps in json_generator.html

Go back to the json_generator.html tab.
Enter the website’s URL in “1. Target URL:”.
In “2. Operation Steps”, fill in the following:
- Target Element CSS Selector: Paste the selector you copied.
- Operation: Choose the desired action.
- Additional Parameters: Enter values if needed (e.g., value, attribute_name).
Click “Add Step” and repeat step 3 for all required actions.
Click “Generate JSON Data” to see the generated JSON.
Click “Download input.json” to save the JSON file.

Step 4: Place the JSON File

Move the downloaded JSON file into the json/ folder within the project directory. You can rename the file as needed (e.g., my_task.json).

4. Command-Line Execution (for Testing)

You can test the Web-Runner directly from the command line using the core client function (web_runner_mcp_client_core.py) without the GUI. This is useful for verifying programmatic calls, like those from an AI agent.

Ensure your desired JSON file is in the json/ folder (e.g., tdnet.json).
Run the following command in your activated terminal:

python web_runner_mcp_client_core.py --jsonfile json/tdnet.json --no-headless --slowmo 500

--jsonfile: Specifies the path to the JSON file to execute (default: json/tdnet.json).
--no-headless: Use this flag to display the browser during execution (default is visible). Use --headless to run in the background.
--slowmo: (Optional) Adds a delay (in milliseconds) between operations (e.g., --slowmo 500).
--output: (Optional) Specifies the path for the output file (default: output_web_runner.txt).

The execution results (successful data retrieval or error information) will be printed to the console in JSON format and also written to the specified output file.

5. Running from the GUI Client

For manual testing and debugging, the GUI client (web_runner_mcp_client_GUI.py) is convenient.

Run the following command in your activated terminal:

python web_runner_mcp_client_GUI.py

In the application window, select the desired JSON file from the dropdown list.
Click the “実行 ▶” (Run) button.
The execution results will be displayed in the text area below.
You can also click the “JSONジェネレーター” (JSON Generator) button to open json_generator.html.

6. Usage from AI Applications

To use Web-Runner-mcp from other Python scripts or AI agent frameworks, import and use the execute_web_runner_via_mcp function from web_runner_mcp_client_core.py.

import asyncio
import json
import sys # Add sys import
# Ensure web_runner_mcp_client_core.py is in the import path
try:
    from web_runner_mcp_client_core import execute_web_runner_via_mcp
except ImportError:
    print("Error: web_runner_mcp_client_core.py not found.")
    # Error handling or path configuration needed
    sys.exit(1) # Example

async def run_task():
    input_data = {
        "target_url": "https://example.com",
        "actions": [
            {"action": "get_text_content", "selector": "h1"},
            {"action": "get_attribute", "selector": "img", "attribute_name": "src"}
        ]
        # Optionally specify timeouts etc.
        # "default_timeout_ms": 15000
    }
    # Execute in headless mode with 50ms slow motion
    success, result_or_error = await execute_web_runner_via_mcp(
        input_data, headless=True, slow_mo=50 # Specify headless, slow_mo
    )

    if success and isinstance(result_or_error, str):
        print("Task successful! Result (JSON):")
        try:
            result_dict = json.loads(result_or_error)
            print(json.dumps(result_dict, indent=2, ensure_ascii=False))
            # --- Process the results, potentially pass to an LLM ---
            # llm_prompt = f"Analyze the following website operation results:\n```json\n{result_or_error}\n```"
            # llm_response = await call_llm(llm_prompt)
        except json.JSONDecodeError:
            print("Error: Response from server is not valid JSON:")
            print(result_or_error)
    else:
        print("Task failed:")
        print(result_or_error) # Display error information (dictionary)
        # --- Process the error information, potentially pass to an LLM ---
        # error_prompt = f"Website operation failed. Error details:\n{result_or_error}\nInfer the cause."
        # llm_response = await call_llm(error_prompt)

if __name__ == "__main__":
    asyncio.run(run_task())

JSON Format (Reference)

Refer to the JSON files provided in the json/ folder for examples.
Here is the basic structure of the input JSON:

Comparison with Other Tools

General Web Scraping Libraries (BeautifulSoup, Scrapy): Excellent for parsing static HTML, but struggle with or cannot handle JavaScript execution, logins, complex user interactions, iframes, and PDFs. Web-Runner-mcp, being Playwright-based, handles these advanced operations.
Playwright-MCP: Exposes Playwright’s low-level API directly as MCP tools. Highly flexible, but requires complex prompt engineering and state management for reliable control from LLMs. Web-Runner-mcp offers a more declarative and reliable interface by defining operation sequences in JSON.
Simple Web Fetching Tools (e.g., URL content fetchers): Easy for getting content from a single URL, but incapable of multi-step operations or interactions. Web-Runner-mcp executes multi-step workflows.

Future Plans

LLM-Powered JSON Generation: Integrate functionality to automatically generate Web-Runner JSON from natural language instructions.
Expanded Action Support: Add support for more Playwright features (e.g., file uploads, cookie manipulation).
Official Dify Custom Tool Support: Stabilize the HTTP/SSE interface aiming for potential registration in the Dify marketplace.
Enhanced Error Handling and Recovery: Implement more detailed error analysis and potentially automatic retry/recovery mechanisms.

Contributing

Bug reports, feature suggestions, and pull requests are welcome! Please see CONTRIBUTING.md for details (to be created if not present).

License

This project is licensed under the MIT License - see the LICENSE file for details.

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers

Web Runner Mcp Llm

What is Web Runner Mcp Llm

Use cases

How to use

Key features

Where to use

Clients Supporting MCP

Overview

What is Web Runner Mcp Llm

Use cases

How to use

Key features

Where to use

Clients Supporting MCP

Content

Web-Runner-mcp: Advanced Web Browser Operation Protocol for AI

Table of Contents

Overview

Why Web-Runner-mcp?

Key Features

Usage

Step 1: Prepare the JSON Generator

Step 2: Get CSS Selectors for Target Elements

Step 3: Create Operation Steps in json_generator.html

Step 4: Place the JSON File

JSON Format (Reference)

Comparison with Other Tools

Future Plans

Contributing

License

Dev Tools Supporting MCP

Tools

Comments

Recommend MCP Servers