Ai Web Scraper

@PRIXYYon 10 months ago

1 MIT

FreeCommunity

AI Systems

This repo uses the the Browser Use mcp along with the Gemini API key to auutomate scrape data by just defining the task

What is Ai Web Scraper

AI-Web-Scraper is an autonomous web scraping solution that utilizes BrowserUse and AI models, specifically designed to automate data extraction from websites without extensive coding.

Use cases

Use cases include scraping product prices from e-commerce sites, gathering data for research projects, monitoring changes in web content, and automating data entry tasks.

How to use

To use AI-Web-Scraper, clone the repository, set up a virtual environment, install dependencies, configure your API key in a .env file, and run the main script to start scraping.

Key features

Key features include AI-powered automation for understanding web page content, flexible model support (Gemini by default), simple natural language commands for control, dynamic web navigation capabilities, and customizable data extraction.

Where to use

AI-Web-Scraper can be used in various fields such as data analysis, market research, e-commerce, content aggregation, and any area requiring automated data collection from the web.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Ai Web Scraper

AI-Web-Scraper is an autonomous web scraping solution that utilizes BrowserUse and AI models, specifically designed to automate data extraction from websites without extensive coding.

Use cases

Use cases include scraping product prices from e-commerce sites, gathering data for research projects, monitoring changes in web content, and automating data entry tasks.

How to use

To use AI-Web-Scraper, clone the repository, set up a virtual environment, install dependencies, configure your API key in a .env file, and run the main script to start scraping.

Key features

Where to use

AI-Web-Scraper can be used in various fields such as data analysis, market research, e-commerce, content aggregation, and any area requiring automated data collection from the web.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

AI-Powered Web Scraping with BrowserUse

This repository contains an autonomous web scraping solution powered by BrowserUse and AI models. The implementation allows for automated browser interaction and data extraction without writing extensive DOM manipulation code.

Overview

This project demonstrates how to:

Use BrowserUse as a browser automation agent
Leverage AI models (Gemini by default) to interpret webpage content
Extract data from websites autonomously
Perform complex web tasks using natural language instructions

Features

AI-Powered Automation: Uses AI models to understand web page structure and content
Flexible Model Support: Works with Gemini by default, but can be configured to use other models
Simple Natural Language Commands: Control web scraping through plain English instructions
Dynamic Web Navigation: Handle pagination, form submission, and complex interactions
Customizable Data Extraction: Extract specific data points based on your requirements

Installation

Prerequisites

Python 3.8+
Git

Setup Instructions

Clone the repository

git clone https://github.com/yourusername/your-repo-name.git
cd your-repo-name

Create a virtual environment
```
python -m venv venv
```
Activate the virtual environment

On Windows:
```
venv\Scripts\activate
```
On macOS/Linux:
```
source venv/bin/activate
```
Install dependencies
```
pip install -r requirements.txt
```
Create a .env file
Create a file named .env in the root directory of the project and add your API key:
```
GEMINI_API_KEY=Your_API_Key_Here
```

Usage

Run the script
```
python main.py
```
Modify the script

The main script contains examples of web scraping tasks. You can modify these to match your specific use case.

Using Different AI Models

By default, the project is configured to use Google’s Gemini. To use alternative AI models, refer to the BrowserUse documentation on supported models.

To change the model, you’ll need to:

Add the appropriate API key to your .env file
Update your code to use the desired model

For example, to use OpenAI:

import os
from browser_use import Browser
from dotenv import load_dotenv

load_dotenv()

browser = Browser(
    model="openai",
    api_key=os.getenv("OPENAI_API_KEY")
)

Key Components

The project consists of several key components based on the tutorial video:

Browser Initialization: Setting up the BrowserUse agent with the appropriate AI model
Task Definition: Specifying what data needs to be scraped
Browser Navigation: Instructions for browsing to specific pages
Data Extraction: Logic for retrieving and processing the target information
Data Storage: Saving the extracted data in a structured format

Example

Here’s a simple example of how to use the framework to scrape product information:

import os
from browser_use import Browser
from dotenv import load_dotenv

load_dotenv()

browser = Browser(
    model="gemini",
    api_key=os.getenv("GEMINI_API_KEY")
)

# Navigate to a product page
browser.go("https://example.com/products")

# Extract product information
products = browser.run("""
    1. Find all product items on the page
    2. For each product, extract:
       - Product name
       - Price
       - Rating (if available)
       - Description
    3. Return the data as a list of dictionaries
""")

print(products)

Browser Integration

This project uses ChromeDriver to control your Chrome browser for web automation. BrowserUse leverages this browser integration to:

Open Chrome browser instances
Navigate to specified URLs
Interact with web elements
Execute JavaScript
Handle cookies and sessions
Take screenshots
Perform complex browser interactions

ChromeDriver works behind the scenes to enable seamless browser control, allowing the AI agent to interact with websites just like a human user would. The BrowserUse framework handles the technical details of browser communication, letting you focus on defining the scraping tasks using natural language.

Make sure you have Chrome installed on your system for the automation to work properly. The appropriate ChromeDriver version will be automatically managed by the BrowserUse library.

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers