Intelligent Browser Agent With Mcp

1 MIT

FreeCommunity

AI Systems

#ai#ai-hackathon#artificial-intelligence#azure-openai#chainlit#gemini#generative-ai#hackathon#llms#openai#python#streamlit

This project combines the power of modern LLMs (like Google Gemini, Azure OpenAI, or OpenAI) with real-time web automation, allowing for a flexible and extensible AI-powered browsing experience

What is Intelligent Browser Agent With Mcp

The intelligent-browser-agent-with-mcp is an interactive application that leverages modern language models (LLMs) such as Google Gemini, Azure OpenAI, or OpenAI to automate web browsing tasks in real-time. It allows users to control a real browser using natural language commands.

Use cases

Use cases include automating repetitive web tasks, extracting information from websites, summarizing articles or content, and providing interactive browsing experiences for users. It can also be used for testing web applications and gathering insights from online data.

How to use

To use the intelligent-browser-agent-with-mcp, clone the repository, set up a virtual environment, install the required dependencies, configure your API keys in a .env file, and run the application using Streamlit. Users can then input natural language commands to navigate and interact with websites.

Key features

Key features include the ability to navigate websites using natural language, perform actions like clicking and scrolling, take screenshots, summarize content with LLMs, and switch between multiple LLM providers. The application also features a user-friendly interface built with Streamlit.

Where to use

The intelligent-browser-agent-with-mcp can be used in various fields such as web automation, data extraction, content summarization, and enhancing user interaction with web applications. It is particularly useful in research, marketing, and customer support.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Intelligent Browser Agent With Mcp

Use cases

How to use

Key features

Where to use

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

🌐 Intelligent Browser Agent with MCP

An interactive application built with Streamlit and the MCP-Agent framework to control a real browser using Puppeteer. You can send natural commands for the agent to navigate, interact with websites, take screenshots, and summarize content using a language model (LLM) like Gemini, OpenAI, or Azure OpenAI.

Alt Browser-Agent-with-MCP

🚀 Features

Navigate to websites with natural language commands.
Perform actions like click, scroll, type, and extract data.
Take screenshots of web elements.
Summarize content using LLMs (Gemini, Azure, OpenAI).
Visual interface built with Streamlit.
Easily switch between multiple LLM providers.

🛠️ Requirements

Python 3.10+
Node.js (for Puppeteer)
npx installed
API Key for Gemini, Azure, or OpenAI

📦 Installation

git clone https://github.com/your-username/your-repo.git
cd your-repo

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

🔐 Configuration

Create a .env file in the root directory with the following variables based on the LLM provider you use:

🌟 Gemini (Google AI)

GEMINI_API_KEY=your_gemini_key
GEMINI_MODEL=gemini-pro

☁️ Azure OpenAI (optional)

AZURE_API_KEY=your_azure_key
AZURE_BASE_URL=https://your-resource-name.openai.azure.com/
GITHUB_MODEL=gpt-4

🔓 OpenAI (optional)

OPENAI_API_KEY=your_openai_key
OPENAI_BASE_URL=https://api.openai.com/v1
GITHUB_MODEL=gpt-4

🧠 Run the App

streamlit run app.py

This opens a web interface where you can enter commands like:

Go to https://modelcontextprotocol.io/introduction
Click on the link to object detection
Take a screenshot of the section

📋 Example Commands

Go to Google.com and search for "Machine Learning"
Scroll down and summarize the content
Click on the first link and extract the heading text
Take a screenshot of the main section

📁 Project Structure

📁 mcp_browser_agent
├── browser-mcp-agent-v1.py        # Main Streamlit app
├── browser-mcp-agent-v2.py        # Main Chainlit app
├── .env.sample                    # Environment variables
├── mcp_agent.config.yaml          # mcp agent configs
├── mcp_agent.secrets.sample.yaml
├── requirements.txt               # Dependencies
└── README.md                      # This file

📄 License

This project is licensed under the MIT License - free to use, modify, and distribute.

🤝 Credits

This project uses:

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers