Crawl4ai Rag Mcp Server

2 MIT

FreeCommunity

AI Systems

A powerful implementation of the [Model Context Protocol (MCP)](https://modelcontextprotocol.io) integrated with [Crawl4AI](https://crawl4ai.com) and [Supabase](https://supabase.com/) for providing AI agents and AI coding assistants with advanced web crawling and RAG capabilities.

What is Crawl4ai Rag Mcp Server

Crawl4AI-RAG-MCP-Server is a powerful implementation of the Model Context Protocol (MCP) integrated with Crawl4AI and Supabase, designed to provide AI agents and coding assistants with advanced web crawling and Retrieval-Augmented Generation (RAG) capabilities.

Use cases

Use cases include building AI coding assistants that can fetch relevant code snippets from the web, creating chatbots that provide up-to-date information, and developing data analysis tools that leverage web content.

How to use

To use Crawl4AI-RAG-MCP-Server, you can deploy it using Docker, then utilize its tools to crawl web pages, store content in a vector database, and perform RAG queries on the crawled data.

Key features

Key features include smart URL detection, recursive crawling, parallel processing, content chunking, vector search, and source retrieval for precise RAG.

Where to use

Crawl4AI-RAG-MCP-Server can be used in various fields such as AI development, web scraping, data analysis, and any application requiring intelligent content retrieval and processing.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Crawl4ai Rag Mcp Server

Use cases

How to use

To use Crawl4AI-RAG-MCP-Server, you can deploy it using Docker, then utilize its tools to crawl web pages, store content in a vector database, and perform RAG queries on the crawled data.

Key features

Key features include smart URL detection, recursive crawling, parallel processing, content chunking, vector search, and source retrieval for precise RAG.

Where to use

Crawl4AI-RAG-MCP-Server can be used in various fields such as AI development, web scraping, data analysis, and any application requiring intelligent content retrieval and processing.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

Crawl4AI RAG MCP Server

Web Crawling and RAG Capabilities for AI Agents and AI Coding Assistants

A powerful implementation of the Model Context Protocol (MCP) integrated with Crawl4AI and Supabase for providing AI agents and AI coding assistants with advanced web crawling and RAG capabilities.

With this MCP server, you can scrape anything and then use that knowledge anywhere for RAG.

Overview

This MCP server provides tools that enable AI agents to crawl websites, store content in a vector database (Supabase), and perform RAG over the crawled content.

Features

Smart URL Detection: Automatically detects and handles different URL types (regular webpages, sitemaps, text files)
Recursive Crawling: Follows internal links to discover content
Parallel Processing: Efficiently crawls multiple pages simultaneously
Content Chunking: Intelligently splits content by headers and size for better processing
Vector Search: Performs RAG over crawled content, optionally filtering by data source for precision
Source Retrieval: Retrieve sources available for filtering to guide the RAG process

Tools

The server provides four essential web crawling and search tools:

crawl_single_page: Quickly crawl a single web page and store its content in the vector database
smart_crawl_url: Intelligently crawl a full website based on the type of URL provided (sitemap, llms-full.txt, or a regular webpage that needs to be crawled recursively)
get_available_sources: Get a list of all available sources (domains) in the database
perform_rag_query: Search for relevant content using semantic search with optional source filtering

Prerequisites

Docker/Docker Desktop if running the MCP server as a container (recommended)
Python 3.12+ if running the MCP server directly through uv
Supabase (database for RAG)
OpenAI API key (for generating embeddings)

Installation

Using Docker (Recommended)

Clone this repository:

git clone https://github.com/coleam00/mcp-crawl4ai-rag.git
cd mcp-crawl4ai-rag

Build the Docker image:

docker build -t mcp/crawl4ai-rag --build-arg PORT=8051 .

Create a .env file based on the configuration section below

Using uv directly (no Docker)

Clone this repository:

git clone https://github.com/coleam00/mcp-crawl4ai-rag.git
cd mcp-crawl4ai-rag

Install uv if you don’t have it:
```
pip install uv
```

Create and activate a virtual environment:

uv venv
.venv\Scripts\activate
# on Mac/Linux: source .venv/bin/activate

Install dependencies:
```
uv pip install -e .
crawl4ai-setup
```
Create a .env file based on the configuration section below

Running Supabase Locally with Docker (optional)

To run Supabase locally using Docker, follow these steps:

Get the Supabase code:

git clone --depth 1 https://github.com/supabase/supabase

Create your new Supabase project directory:
```
mkdir supabase-project
```

Copy the compose files to your project:

cp -rf supabase/docker/* supabase-project

Copy the fake environment variables:

cp supabase/docker/.env.example supabase-project/.env

Switch to your project directory:
```
cd supabase-project
```
Pull the latest images:
```
docker compose pull
```
Start the services (in detached mode):
```
docker compose up -d
```

After starting Supabase locally, ensure you configure your .env file in this project with the correct SUPABASE_URL and SUPABASE_SERVICE_KEY pointing to your local Supabase instance. Typically, for a local setup, these would be:

Database Setup

Before running the server, you need to set up the database with the pgvector extension:

Go to the SQL Editor in your Supabase dashboard (create a new project first if necessary)
Create a new query and paste the contents of crawled_pages.sql
Run the query to create the necessary tables and functions

Configuration

Create a .env file in the project root with the following variables:

# MCP Server Configuration
HOST=0.0.0.0
PORT=8051
TRANSPORT=sse

# OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key

# Supabase Configuration
SUPABASE_URL=your_supabase_project_url
SUPABASE_SERVICE_KEY=your_supabase_service_key

#local supbase config
SUPABASE_URL=your_local_supbase_url
SUPABASE_SERVICE_KEY=yuut_local_supbase_service_key

Running the Server

Using Docker

docker run --env-file .env -p 8051:8051 mcp/crawl4ai-rag

Using Python

uv run src/crawl4ai_mcp.py

The server will start and listen on the configured host and port.

Integration with MCP Clients

SSE Configuration

Once you have the server running with SSE transport, you can connect to it using this configuration:

{
  "mcpServers": {
    "crawl4ai-rag": {
      "transport": "sse",
      "url": "http://localhost:8051/sse"
    }
  }
}

Note for Windsurf users: Use serverUrl instead of url in your configuration:
{
  "mcpServers": {
    "crawl4ai-rag": {
      "transport": "sse",
      "serverUrl": "http://localhost:8051/sse"
    }
  }
}
Note for Docker users: Use host.docker.internal instead of localhost if your client is running in a different container. This will apply if you are using this MCP server within n8n!

Stdio Configuration

Add this server to your MCP configuration for Claude Desktop, Windsurf, or any other MCP client:

{
  "mcpServers": {
    "crawl4ai-rag": {
      "command": "python",
      "args": [
        "path/to/crawl4ai-mcp/src/crawl4ai_mcp.py"
      ],
      "env": {
        "TRANSPORT": "stdio",
        "OPENAI_API_KEY": "your_openai_api_key",
        "SUPABASE_URL": "your_supabase_url",
        "SUPABASE_SERVICE_KEY": "your_supabase_service_key"
      }
    }
  }
}

Docker with Stdio Configuration

{
  "mcpServers": {
    "crawl4ai-rag": {
      "command": "docker",
      "args": [
        "run",
        "--rm",
        "-i",
        "-e",
        "TRANSPORT",
        "-e",
        "OPENAI_API_KEY",
        "-e",
        "SUPABASE_URL",
        "-e",
        "SUPABASE_SERVICE_KEY",
        "mcp/crawl4ai"
      ],
      "env": {
        "TRANSPORT": "stdio",
        "OPENAI_API_KEY": "your_openai_api_key",
        "SUPABASE_URL": "your_supabase_url",
        "SUPABASE_SERVICE_KEY": "your_supabase_service_key"
      }
    }
  }
}

Building Your Own Server

This implementation provides a foundation for building more complex MCP servers with web crawling capabilities. To build your own:

Add your own tools by creating methods with the @mcp.tool() decorator
Create your own lifespan function to add your own dependencies
Modify the utils.py file for any helper functions you need
Extend the crawling capabilities by adding more specialized crawlers

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers