MCP ExplorerExplorer

Crawl Mcp

@Kavin-kumar10on 10 months ago
1 MIT
FreeCommunity
AI Systems
Crawl4AI MCP Server enables AI models to recursively crawl websites and extract content in markdown.

Overview

What is Crawl Mcp

Crawl-mcp is a web crawler MCP server built using crawl4ai, designed to enable AI models to crawl websites and extract content in markdown format.

Use cases

Use cases for crawl-mcp include gathering data for research, collecting content for machine learning models, and aggregating information from multiple web sources for analysis.

How to use

To use crawl-mcp, install it via uv, pipx, or pip. After installation, run the server using the command ‘pipx run crawl4ai-mcp’ or ‘python -m Crawl_mcp.main’. Configure it using the provided mcp.json file for MCP-compatible clients.

Key features

Key features include recursive crawling, configurable depth, page limit settings, and customizable timeout configurations for long crawling operations.

Where to use

Crawl-mcp can be used in various fields such as data mining, web scraping, content aggregation, and AI training where extracting structured data from websites is necessary.

Content

Crawl4AI MCP Server

A web crawler MCP server using crawl4ai that allows AI models to crawl websites and extract content in markdown format.

Features

  • Recursive Crawling: Recursively crawl websites starting from a URL
  • Configurable Depth: Control how deep the crawler should go
  • Page Limit: Set maximum number of pages to crawl
  • Timeout Configuration: Set custom timeout for long crawling operations

Installation

Using uv (Recommended)

The fastest way to install and use the Crawl4AI MCP server is with uv:

# Install directly from GitHub
uv pip install git+https://github.com/Kavin-kumar10/crawl4ai-mcp.git

# Or create a virtual environment first
uv venv -p python3.10 .venv
source .venv/bin/activate  # On Unix/macOS
# or
.venv\Scripts\activate  # On Windows
uv pip install git+https://github.com/Kavin-kumar10/crawl4ai-mcp.git

Using pipx

You can also install it with pipx for an isolated environment:

pipx install git+https://github.com/Kavin-kumar10/crawl4ai-mcp.git

This will install the package in an isolated environment and make the crawl4ai-mcp command available globally.

Using pip

You can also install it with pip:

pip install git+https://github.com/Kavin-kumar10/crawl4ai-mcp.git

From Source

To install from source:

git clone https://github.com/Kavin-kumar10/crawl4ai-mcp.git
cd crawl4ai-mcp

# Using uv (recommended)
uv pip install -e .

# Or using pip
pip install -e .

Usage with MCP

Running the Server

Once installed, you can run the MCP server directly:

# If installed with pipx
pipx run crawl4ai-mcp

# If installed with pip or uv
python -m Crawl_mcp.main

The server uses stdio transport (not URL-based) for communication with MCP clients.

Using the mcp.json Configuration

This repository includes an mcp.json file that you can use to configure the MCP server in MCP-compatible clients:

# Copy it to your home directory or project directory
cp mcp.json ~/.mcp.json

The mcp.json file contains:

  • Server configuration with stdio transport
  • Tool definitions with parameters and return types
  • Descriptions for tools and parameters

Available Tools

The MCP server provides the following tool:

crawl_recursive

Recursively crawl a website starting from a URL.

Parameters:

  • url (string): The URL to start crawling from
  • max_depth (integer, optional): Maximum depth for recursive crawling (default: 2)
  • max_pages (integer, optional): Maximum number of pages to crawl (default: 500)

Example:

{
  "url": "https://example.com",
  "max_depth": 3,
  "max_pages": 100
}

Example Usage with Claude

Here’s an example of how to use the MCP server with Claude after configuring it in your MCP environment:

I need to use the Crawl4AI MCP server to crawl a website.

<use_mcp_tool>
<server_name>crawl4ai-mcp</server_name>
<tool_name>crawl_recursive</tool_name>
<arguments>
{
  "url": "https://example.com",
  "max_depth": 2,
  "max_pages": 100
}
</arguments>
</use_mcp_tool>

Note: The MCP server uses stdio transport for communication, not HTTP/URL-based transport. This means it runs in the terminal and communicates through standard input/output streams.

Development

Requirements

  • Python 3.10 or higher
  • MCP CLI 1.7.1 or higher (pip install mcp[cli]>=1.7.1)
  • crawl4ai (pip install crawl4ai)

Development Setup

Set up your development environment:

# Clone the repository
git clone https://github.com/Kavin-kumar10/crawl4ai-mcp.git
cd crawl4ai-mcp

# Create a virtual environment with uv
uv venv -p python3.10 .venv
source .venv/bin/activate  # On Unix/macOS
# or
.venv\Scripts\activate  # On Windows

# Install in development mode
uv pip install -e .

# Install development dependencies (if you have any)
uv pip install -e ".[dev]"

Testing

To test the MCP server locally:

# Run the server in development mode with the MCP Inspector
mcp dev Crawl_mcp/server.py

# Or run the module directly (stdio mode)
python -m Crawl_mcp.main

When running in stdio mode, the server expects JSON-RPC messages on stdin and writes responses to stdout. This is how MCP clients communicate with the server.

Dependency Management

Use uv to manage dependencies:

# Check for dependency conflicts
uv pip check

# Generate a lock file (if you have a requirements.in file)
uv pip compile requirements.in -o requirements.txt

License

MIT License

Tools

No tools

Comments

Recommend MCP Servers

View All MCP Servers