- Explore MCP Servers
- doctor
Doctor
What is Doctor
Doctor is a tool designed for discovering, crawling, and indexing websites, enabling them to be exposed as an MCP server for LLM agents, enhancing their reasoning and code generation capabilities.
Use cases
Use cases for Doctor include building search engines, enhancing LLMs with real-time web data, and creating applications that require dynamic content generation based on the latest web information.
How to use
To use Doctor, clone the repository, set up your environment variables including your OpenAI API key, and run the stack using Docker Compose. This will start the necessary services for crawling and indexing.
Key features
Key features of Doctor include web crawling with crawl4ai, text chunking with LangChain, creating embeddings with OpenAI, storing data in DuckDB with vector search support, and exposing a FastAPI web service for search functionality.
Where to use
Doctor can be utilized in various fields such as web development, data analysis, and artificial intelligence, particularly for applications requiring up-to-date information retrieval and processing.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Overview
What is Doctor
Doctor is a tool designed for discovering, crawling, and indexing websites, enabling them to be exposed as an MCP server for LLM agents, enhancing their reasoning and code generation capabilities.
Use cases
Use cases for Doctor include building search engines, enhancing LLMs with real-time web data, and creating applications that require dynamic content generation based on the latest web information.
How to use
To use Doctor, clone the repository, set up your environment variables including your OpenAI API key, and run the stack using Docker Compose. This will start the necessary services for crawling and indexing.
Key features
Key features of Doctor include web crawling with crawl4ai, text chunking with LangChain, creating embeddings with OpenAI, storing data in DuckDB with vector search support, and exposing a FastAPI web service for search functionality.
Where to use
Doctor can be utilized in various fields such as web development, data analysis, and artificial intelligence, particularly for applications requiring up-to-date information retrieval and processing.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Content

🩺 Doctor
A tool for discovering, crawl, and indexing web sites to be exposed as an MCP server for LLM agents for better and more up-to-date reasoning and code generation.
🔍 Overview
Doctor provides a complete stack for:
- Crawling web pages using crawl4ai with hierarchy tracking
- Chunking text with LangChain
- Creating embeddings with OpenAI via litellm
- Storing data in DuckDB with vector search support
- Exposing search functionality via a FastAPI web service
- Making these capabilities available to LLMs through an MCP server
- Navigating crawled sites with hierarchical site maps
🏗️ Core Infrastructure
🗄️ DuckDB
- Database for storing document data and embeddings with vector search capabilities
- Managed by unified Database class
📨 Redis
- Message broker for asynchronous task processing
🕸️ Crawl Worker
- Processes crawl jobs
- Chunks text
- Creates embeddings
🌐 Web Server
- FastAPI service exposing endpoints
- Fetching, searching, and viewing data
- Exposing the MCP server
💻 Setup
⚙️ Prerequisites
- Docker and Docker Compose
- Python 3.10+
- uv (Python package manager)
- OpenAI API key
📦 Installation
- Clone this repository
- Set up environment variables:
export OPENAI_API_KEY=your-openai-key
- Run the stack:
docker compose up
👁 Usage
- Go to http://localhost:9111/docs to see the OpenAPI docs
- Look for the
/fetch_url
endpoint and start a crawl job by providing a URL - Use
/job_progress
to see the current job status - Configure your editor to use
http://localhost:9111/mcp
as an MCP server
☁️ Web API
Core Endpoints
POST /fetch_url
: Start crawling a URLGET /search_docs
: Search indexed documentsGET /job_progress
: Check crawl job progressGET /list_doc_pages
: List indexed pagesGET /get_doc_page
: Get full text of a page
Site Map Feature
The Maps feature provides a hierarchical view of crawled websites, making it easy to navigate and explore the structure of indexed sites.
Endpoints:
GET /map
: View an index of all crawled sitesGET /map/site/{root_page_id}
: View the hierarchical tree structure of a specific siteGET /map/page/{page_id}
: View a specific page with navigation (parent, siblings, children)GET /map/page/{page_id}/raw
: Get the raw markdown content of a page
Features:
- Hierarchical Navigation: Pages maintain parent-child relationships, allowing you to navigate through the site structure
- Domain Grouping: Pages from the same domain crawled individually are automatically grouped together
- Automatic Title Extraction: Page titles are extracted from HTML or markdown content
- Breadcrumb Navigation: Easy navigation with breadcrumbs showing the path from root to current page
- Sibling Navigation: Quick access to pages at the same level in the hierarchy
- Legacy Page Support: Pages crawled before hierarchy tracking are grouped by domain for easy access
- No JavaScript Required: All navigation works with pure HTML and CSS for maximum compatibility
Usage Example:
- Crawl a website using the
/fetch_url
endpoint - Visit
/map
to see all crawled sites - Click on a site to view its hierarchical structure
- Navigate through pages using the provided links
🔧 MCP Integration
Ensure that your Docker Compose stack is up, and then add to your Cursor or VSCode MCP Servers configuration:
🧪 Testing
Running Tests
To run all tests:
# Run all tests with coverage report
pytest
To run specific test categories:
# Run only unit tests
pytest -m unit
# Run only async tests
pytest -m async_test
# Run tests for a specific component
pytest tests/lib/test_crawler.py
Test Coverage
The project is configured to generate coverage reports automatically:
# Run tests with detailed coverage report
pytest --cov=src --cov-report=term-missing
Test Structure
tests/conftest.py
: Common fixtures for all teststests/lib/
: Tests for library componentstest_crawler.py
: Tests for the crawler moduletest_crawler_enhanced.py
: Tests for enhanced crawler with hierarchy trackingtest_chunker.py
: Tests for the chunker moduletest_embedder.py
: Tests for the embedder moduletest_database.py
: Tests for the unified Database classtest_database_hierarchy.py
: Tests for database hierarchy operations
tests/common/
: Tests for common modulestests/services/
: Tests for service layertest_map_service.py
: Tests for the map service
tests/api/
: Tests for API endpointstest_map_api.py
: Tests for map API endpoints
tests/integration/
: Integration teststest_processor_enhanced.py
: Tests for enhanced processor with hierarchy
🐞 Code Quality
Pre-commit Hooks
The project is configured with pre-commit hooks that run automatically before each commit:
ruff check --fix
: Lints code and automatically fixes issuesruff format
: Formats code according to project style- Trailing whitespace removal
- End-of-file fixing
- YAML validation
- Large file checks
Setup Pre-commit
To set up pre-commit hooks:
# Install pre-commit
uv pip install pre-commit
# Install the git hooks
pre-commit install
Running Pre-commit Manually
You can run the pre-commit hooks manually on all files:
# Run all pre-commit hooks
pre-commit run --all-files
Or on staged files only:
# Run on staged files
pre-commit run
⚖️ License
This project is licensed under the MIT License - see the LICENSE.md file for details.
Dev Tools Supporting MCP
The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.