Selenium Mcp

1 MIT

FreeCommunity

AI Systems

selenium-mcp by Naveen Automation Labs

What is Selenium Mcp

selenium-mcp is a Java implementation of the Model Context Protocol (MCP) designed for Selenium WebDriver, enabling browser automation through standardized MCP clients.

Use cases

Use cases for selenium-mcp include automated testing of web applications, performing data extraction from websites, and executing browser-based tasks triggered by AI assistants.

How to use

To use selenium-mcp, clone the repository, build it using Maven, and start the MCP Selenium server from the command line. You can then send JSON commands to the server for browser automation tasks.

Key features

Key features of selenium-mcp include a standardized interface for browser automation, compatibility with AI assistants, and support for multiple browsers like Chrome and Firefox.

Where to use

selenium-mcp can be used in various fields such as software testing, web scraping, and automating repetitive browser tasks in web applications.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Selenium Mcp

selenium-mcp is a Java implementation of the Model Context Protocol (MCP) designed for Selenium WebDriver, enabling browser automation through standardized MCP clients.

Use cases

Use cases for selenium-mcp include automated testing of web applications, performing data extraction from websites, and executing browser-based tasks triggered by AI assistants.

How to use

To use selenium-mcp, clone the repository, build it using Maven, and start the MCP Selenium server from the command line. You can then send JSON commands to the server for browser automation tasks.

Key features

Key features of selenium-mcp include a standardized interface for browser automation, compatibility with AI assistants, and support for multiple browsers like Chrome and Firefox.

Where to use

selenium-mcp can be used in various fields such as software testing, web scraping, and automating repetitive browser tasks in web applications.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

MCP Selenium

A Java implementation of the Model Context Protocol (MCP) for Selenium WebDriver, enabling browser automation through standardized MCP clients.

Overview

MCP Selenium provides a bridge between the Model Context Protocol and Selenium WebDriver. It allows AI assistants and other MCP-compatible clients to perform browser automation tasks using a standardized set of tools.

Project Structure

selenium-mcp/
├── src/
│   └── main/
│       └── java/
│           └── io/
│               └── github/
│                   └── naveenautomation/
│                       └── mcpselenium/
│                           ├── McpSeleniumServer.java       # The main MCP server implementation
│                           ├── McpSeleniumLauncher.java     # Launcher script for the server
│                           └── AdvancedMcpClient.java       # GUI client for testing
├── target/
│   └── mcp-selenium-0.1.0-jar-with-dependencies.jar         # Compiled JAR with dependencies
└── pom.xml                                                  # Maven project configuration

Prerequisites

Java 11 or higher
Maven (for building)
Chrome or Firefox browser installed

Building the Project

# Clone the repository
git clone https://github.com/naveenautomation/selenium-mcp.git
cd selenium-mcp

# Build with Maven
mvn clean package

This will create the JAR file at target/mcp-selenium-0.1.0-jar-with-dependencies.jar.

Usage Options

Option 1: Command Line Usage

Starting the Server

Start the MCP Selenium server from the command line:

java -jar target/mcp-selenium-0.1.0-jar-with-dependencies.jar

The server will start and wait for commands on standard input. You’ll see the server information as initial output.

Sending Commands

You can now send JSON commands to the server. Type or paste the commands directly into the terminal, one per line.

Example Commands

Start a Chrome browser:

{
  "type": "tool_call",
  "tool_call_id": "call-1",
  "name": "start_browser",
  "params": {
    "browser": "chrome",
    "options": {
      "headless": false
    }
  }
}

Navigate to a website:

{
  "type": "tool_call",
  "tool_call_id": "call-2",
  "name": "navigate",
  "params": {
    "url": "https://www.example.com"
  }
}

Find an element:

{
  "type": "tool_call",
  "tool_call_id": "call-3",
  "name": "find_element",
  "params": {
    "by": "css",
    "value": "h1"
  }
}

Get element text:

{
  "type": "tool_call",
  "tool_call_id": "call-4",
  "name": "get_element_text",
  "params": {
    "by": "css",
    "value": "h1"
  }
}

Click an element:

{
  "type": "tool_call",
  "tool_call_id": "call-5",
  "name": "click_element",
  "params": {
    "by": "id",
    "value": "submit-button"
  }
}

Type text into an element:

{
  "type": "tool_call",
  "tool_call_id": "call-6",
  "name": "send_keys",
  "params": {
    "by": "id",
    "value": "search-box",
    "text": "search query"
  }
}

Take a screenshot:

{
  "type": "tool_call",
  "tool_call_id": "call-7",
  "name": "take_screenshot",
  "params": {
    "outputPath": "screenshot.png"
  }
}

Close the browser:

{
  "type": "tool_call",
  "tool_call_id": "call-8",
  "name": "close_session",
  "params": {}
}

Closing the Server

To stop the server, use Ctrl+C in the terminal.

Option 2: GUI Client

For a more user-friendly experience, you can use the included GUI client.

Starting the GUI Client

Compile and run the client:

# Compile the client
javac -d target/classes src/main/java/io/github/naveenautomation/mcpselenium/AdvancedMcpClient.java

# Run the client
java -cp target/classes io.github.naveenautomation.mcpselenium.AdvancedMcpClient

Using the GUI

The GUI client automatically starts the MCP Selenium server when launched. The interface includes:

Command Selector: Dropdown menu to select the MCP command you want to execute.
JSON Parameters: Text field to enter command parameters in JSON format.
Quick Action Buttons:
- Start Chrome: Launches Chrome browser
- Navigate: Opens a dialog to enter a URL
- Screenshot: Takes a screenshot
- Close Browser: Closes the current session
Log Area: Displays server responses and logs.

Examples of GUI Usage

Find Element

Select “find_element” from the command dropdown.
Enter parameters in the JSON field:

{
  "by": "id",
  "value": "login-button"
}

Click “Send”.

Send Keys (Type Text)

Select “send_keys” from the command dropdown.
Enter parameters in the JSON field:

{
  "by": "id",
  "value": "username",
  "text": "[email protected]"
}

Click “Send”.

Click Element

Select “click_element” from the command dropdown.
Enter parameters in the JSON field:

{
  "by": "css",
  "value": "button.submit"
}

Click “Send”.

Supported Commands

The MCP Selenium Server supports the following commands:

Command	Description	Required Parameters	Optional Parameters
`start_browser`	Launches a browser	`browser` (“chrome” or “firefox”)	`options.headless`, `options.arguments`
`navigate`	Navigates to a URL	`url`	-
`find_element`	Finds an element	`by`, `value`	`timeout`
`click_element`	Clicks an element	`by`, `value`	`timeout`
`send_keys`	Types text into an element	`by`, `value`, `text`	`timeout`
`get_element_text`	Gets text from an element	`by`, `value`	`timeout`
`hover`	Hovers over an element	`by`, `value`	`timeout`
`drag_and_drop`	Drags and drops an element	`by`, `value`, `targetBy`, `targetValue`	`timeout`
`double_click`	Double-clicks an element	`by`, `value`	`timeout`
`right_click`	Right-clicks an element	`by`, `value`	`timeout`
`press_key`	Presses a keyboard key	`key`	-
`upload_file`	Uploads a file	`by`, `value`, `filePath`	`timeout`
`take_screenshot`	Takes a screenshot	-	`outputPath`
`close_session`	Closes the browser	-	-

Locator Strategies

For commands that interact with elements, use these locator strategies in the by parameter:

id: Element ID
css: CSS selector
xpath: XPath expression
name: Element name
tag: HTML tag name
class: CSS class name

Integration with AI Systems

MCP Selenium is designed to be used with AI systems that support the Model Context Protocol. To integrate with an AI assistant like Claude:

Start the MCP Selenium server
Configure the AI system to connect to the server via stdin/stdout
Send natural language commands to the AI, which will translate them to MCP commands

Troubleshooting

Common Issues

Browser Not Starting:
- Ensure you have Chrome or Firefox installed
- Try using "headless":true in options
- Check server logs for detailed error messages
Element Not Found:
- Verify your locator (by and value)
- Increase the timeout value
- Check if the element is in an iframe
Server Not Responding:
- Ensure the JSON format is correct
- Check that each command has a unique tool_call_id
- Restart the server if it becomes unresponsive
Screenshot Not Saving:
- Provide an absolute file path
- Ensure the directory exists
- Check file permissions

License

This project is licensed under the MIT License.

Acknowledgements

Built on Selenium WebDriver for browser automation
Implements the Model Context Protocol standard

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers