Chrome Control Mcp Implementation

1 MIT

FreeCommunity

AI Systems

Implementation of the chrome-control-mcp roadmap

What is Chrome Control Mcp Implementation

chrome-control-mcp-implementation is a server that implements the chrome-control-mcp roadmap, allowing AI assistants to interact with web pages efficiently by analyzing the DOM instead of relying on screenshots.

Use cases

Use cases include automating web form submissions, extracting structured data from web pages, conducting web-based research, and enhancing user interactions in AI-driven applications.

How to use

To use chrome-control-mcp-implementation, integrate it with your AI assistant framework. The server listens for requests and processes them using the Chrome API to control web browsing activities.

Key features

Key features include real-time DOM mutation observation, semantic DOM analysis, content extraction, form handling, navigation management, error recovery, automatic Chrome process management, smart caching, race condition prevention, memory management, and access to the accessibility tree.

Where to use

chrome-control-mcp-implementation can be used in AI applications that require web browsing capabilities, such as virtual assistants, automated testing tools, and data extraction services.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Chrome Control Mcp Implementation

Use cases

Use cases include automating web form submissions, extracting structured data from web pages, conducting web-based research, and enhancing user interactions in AI-driven applications.

How to use

To use chrome-control-mcp-implementation, integrate it with your AI assistant framework. The server listens for requests and processes them using the Chrome API to control web browsing activities.

Key features

Where to use

chrome-control-mcp-implementation can be used in AI applications that require web browsing capabilities, such as virtual assistants, automated testing tools, and data extraction services.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

Chrome Control MCP Implementation

Implementation of the chrome-control-mcp roadmap, providing efficient web browsing capabilities for AI assistants without relying on screenshots.

Project Overview

The Chrome Control MCP (Model Context Protocol) server enables AI assistants to interact with web pages in a more efficient and semantic manner compared to traditional screenshot-based approaches. By directly analyzing the DOM (Document Object Model), it provides a rich understanding of web page structure and content.

Key Features

DOM Mutation Observers - Real-time updates for dynamic content changes
Semantic DOM Analysis - Deep understanding of page structure and content
Content Extraction - Extracts structured content from web pages
Form Handling - Identifies and interacts with forms accurately
Navigation Management - Handles complex navigation scenarios reliably
Error Recovery - Sophisticated error handling with recovery strategies
Chrome Management - Automatic Chrome process monitoring and recovery
Caching - Smart, mutation-aware cache invalidation for performance
Race Condition Prevention - Mutex-based locking for concurrent operations
Memory Management - Proper resource cleanup to prevent memory leaks
Accessibility Tree - Access to Chrome’s accessibility tree for enhanced semantic understanding

Architecture

The implementation follows a modular architecture with these key components:

Chrome MCP Server - Handles incoming requests from AI assistants
Chrome API - Main interface to browser control
Chrome Process Manager - Manages Chrome browser lifecycle
Tab Manager - Centralized tab management
DOM Observer - Monitors real-time DOM changes
Cache System - Optimizes performance through intelligent caching
Semantic Analyzer - Builds semantic representation of pages
Content Extractor - Extracts structured content from pages
Error Handler - Provides global error handling and recovery strategies
Accessibility Tree Analyzer - Extracts and analyzes the accessibility tree

Getting Started

Prerequisites

Node.js 16+
npm or yarn

Installation

Clone this repository:

git clone https://github.com/needsupport/chrome-control-mcp-implementation.git
cd chrome-control-mcp-implementation

Install dependencies:
```
npm install
```
Build the project:
```
npm run build
```
Start the server:
```
./start-chrome-mcp.sh
```

The start script will automatically:

Build the TypeScript code if needed
Find and launch Chrome with the appropriate debugging flags
Start the MCP server
Provide a health check endpoint for verification

Environment Variables

The server can be configured using environment variables:

Variable	Description	Default
PORT	Server port	3001
CHROME_DEBUGGING_PORT	Chrome debugging port	9222
MANAGE_CHROME_PROCESS	Enable automatic Chrome management	true
CHROME_EXECUTABLE	Path to Chrome executable	auto-detected
DEBUG	Enable debug mode	false
LOG_LEVEL	Log level (debug, info, warn, error)	info
HEALTHCHECK_PATH	Health check endpoint path	/health
AUTO_FREE_DEBUG_PORT	Kill process on debug port if in use	false
AUTO_FREE_SERVER_PORT	Kill process on server port if in use	false
ENABLE_ACCESSIBILITY_TREE	Enable accessibility tree support	true

Local Development

This project is designed for local deployment where the Chrome Control MCP server and the AI assistant run on the same machine. The server automatically detects and manages Chrome, handling crashes and restarts without manual intervention.

Chrome Management

The Chrome Process Manager component has been enhanced to:

Automatically locate Chrome on Windows, macOS, and Linux
Monitor Chrome process health and resource usage
Recover from Chrome crashes with exponential backoff
Clean up temporary profiles and resources on shutdown

Port Management

The system now includes intelligent port management to:

Detect if the Chrome debug port or server port is in use
Automatically find alternative ports if needed
Provide detailed error messages for port conflicts

Health Checks

Enhanced health monitoring is available at:

/health - Basic health status
/health/details - Detailed system information
/health/chrome - Chrome-specific status
/livez - Kubernetes-style liveness probe
/readyz - Kubernetes-style readiness probe

Usage

The server provides a JSON-RPC API that can be accessed at http://localhost:3001. Here’s a basic example:

// Navigate to a URL
fetch('http://localhost:3001', {
  method: 'POST',
  headers: { 
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    jsonrpc: '2.0',
    method: 'navigate',
    params: { url: 'https://example.com' },
    id: 1
  })
})
.then(response => response.json())
.then(data => {
  const tabId = data.result.tabId;
  console.log(`Page loaded in tab: ${tabId}`);
});

Accessibility Tree

You can access and analyze the accessibility tree using the getAccessibilityTree method:

// Get accessibility tree for analysis
fetch('http://localhost:3001', {
  method: 'POST',
  headers: { 
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    jsonrpc: '2.0',
    method: 'getAccessibilityTree',
    params: { tabId: 'your-tab-id' },
    id: 1
  })
})
.then(response => response.json())
.then(data => {
  const accessibilityTree = data.result.accessibilityTree;
  console.log('Accessibility issues:', accessibilityTree.issues);
  console.log('Accessibility summary:', accessibilityTree.summary);
});

Debugging

To debug the server, you can use the following techniques:

Set LOG_LEVEL to “debug” for detailed logs:
```
LOG_LEVEL=debug ./start-chrome-mcp.sh
```
View the health check endpoint for system status:
```
curl http://localhost:3001/health/details
```

Monitor Chrome process status:

curl http://localhost:3001/health/chrome

Use Chrome DevTools to inspect the Chrome instance:
Open chrome://inspect in a separate Chrome window and look for the controlled instance in the “Remote Target” section.

Error Handling and Recovery

The system now implements robust error handling and recovery:

Chrome Process Crashes: Automatically detected and restarted with exponential backoff
Connection Failures: Detected and reconnected with retry logic
Resource Leaks: Properly tracked and cleaned up during shutdown
Tab Synchronization: Mutex-based locking prevents race conditions
Graceful Shutdown: Proper cleanup of all resources, even during abnormal termination

Testing

Run the test suite to verify functionality:

npm test

The test suite includes:

Unit tests for key components
Integration tests for Chrome Process Manager
Tests for accessibility tree functionality

Implementation Status

[x] Chrome Process Manager - Complete implementation with health monitoring and crash recovery
[x] Intelligent port management - Detection and resolution of port conflicts
[x] Health check endpoints - Comprehensive health monitoring
[x] Enhanced error handling - Robust recovery from failures
[x] Resource cleanup - Proper management of temporary files and processes
[x] Tab management with race condition prevention
[x] DOM mutation observing
[x] Semantic DOM analysis
[x] Content extraction
[x] Form handling
[x] Navigation management
[x] Authentication and security
[x] Cache system
[x] Test suite - Basic tests for critical components
[x] Accessibility tree support - Complete implementation with issue detection

License

MIT

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers