- Explore MCP Servers
- web2mcp
Web2mcp
What is Web2mcp
Web2MCP is a Django web application designed to crawl websites, extract key metadata such as URL, title, and summary, and store it in an SQLite database. It aims to create a structured data source for Model Context Protocol (MCP) agents.
Use cases
Use cases include gathering metadata for SEO analysis, compiling information for competitive research, and creating structured datasets for machine learning applications.
How to use
To use Web2MCP, clone the repository, set up a virtual environment, install dependencies, apply database migrations, and run the development server. Access the application via your browser at http://127.0.0.1:8008/.
Key features
Key features include a simple web UI for submitting URLs, same-domain crawling, metadata extraction, efficient SQLite storage, and an integrated MCP server for searching and retrieving data.
Where to use
Web2MCP can be used in various fields such as data mining, web scraping, research, and any application requiring structured data extraction from websites.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Overview
What is Web2mcp
Web2MCP is a Django web application designed to crawl websites, extract key metadata such as URL, title, and summary, and store it in an SQLite database. It aims to create a structured data source for Model Context Protocol (MCP) agents.
Use cases
Use cases include gathering metadata for SEO analysis, compiling information for competitive research, and creating structured datasets for machine learning applications.
How to use
To use Web2MCP, clone the repository, set up a virtual environment, install dependencies, apply database migrations, and run the development server. Access the application via your browser at http://127.0.0.1:8008/.
Key features
Key features include a simple web UI for submitting URLs, same-domain crawling, metadata extraction, efficient SQLite storage, and an integrated MCP server for searching and retrieving data.
Where to use
Web2MCP can be used in various fields such as data mining, web scraping, research, and any application requiring structured data extraction from websites.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Content
Web2MCP - Website Crawler for MCP
Web2MCP is a Django web application designed to crawl websites, extract key metadata (URL, title, summary), and store it in an SQLite database. The primary goal is to create a structured data source that can be easily queried by Model Context Protocol (MCP) agents to find relevant information within the crawled websites.
Features
- Simple Web UI: Submit a starting URL via a clean web interface.
- Same-Domain Crawling: Crawls pages accessible within the same domain as the starting URL.
- Metadata Extraction: Extracts the page URL,
<title>
, and<meta name="description">
content. - SQLite Storage: Stores extracted data efficiently in an SQLite database.
- Integrated MCP Server: Includes an MCP server using
django-mcp
to provide tools for searching and retrieving crawled data.
Tech Stack
- Backend: Python 3.x, Django 4.x
- Database: SQLite
- Libraries:
requests
(for fetching URLs)BeautifulSoup4
(for HTML parsing)lxml
(HTML parser)django-mcp
(for MCP server integration)uvicorn
(ASGI server)
Setup and Installation
- Clone the repository:
git clone <repository-url> cd web2mcp
- Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # Linux/macOS # .\venv\Scripts\activate # Windows
- Install dependencies:
pip install -r requirements.txt
- Apply database migrations:
python manage.py migrate
- Run the development server (using ASGI):
# Ensure uvicorn is installed via requirements.txt uvicorn core.asgi:application --reload --port 8008
- Access the application at
http://127.0.0.1:8008/
in your browser.
Development using Docker Compose (Recommended)
This project includes Docker configuration for a consistent development environment using Docker Compose.
Prerequisites:
- Docker installed: https://docs.docker.com/get-docker/
- Docker Compose installed (usually included with Docker Desktop): https://docs.docker.com/compose/install/
Steps:
- Clone the repository (if not already done):
git clone <repository-url> cd web2mcp
- Build and start the services:
# This builds and starts the 'web' service defined in docker-compose.yml docker-compose up --build -d
--build
: Builds the image if it doesn’t exist or if the Dockerfile has changed.-d
: Runs the container in detached mode (in the background).
- Apply database migrations (first time or after model changes):
Open a separate terminal in theweb2mcp
directory and run:docker-compose exec web python manage.py migrate
- Access the Django application:
Openhttp://127.0.0.1:8008/
in your browser. The Django application running inside the container includes the integrated MCP server capabilities. - Connecting MCP Clients: Configure your MCP client (e.g., VS Code extension) to connect to the running Django application’s MCP endpoint, typically exposed via the mapped port (e.g.,
http://localhost:8008/mcp
if using HTTP transport, or configure for stdio if needed, thoughdjango-mcp
primarily uses HTTP/WebSocket). Refer todjango-mcp
and your client’s documentation. - View logs:
docker-compose logs -f web # Follow logs for the web service
- Stop the service:
docker-compose down
- Use
docker-compose down -v
to also remove the volumes (including the database data if stored in a named volume, though here it’s persisted via the bind mount).
- Use
Usage
- Navigate to the running application in your web browser.
- Enter a valid starting URL (e.g.,
https://docs.djangoproject.com/en/4.2/
) into the form. - Click “Submit”.
- The application will begin crawling the site (currently synchronous or basic threading - see
memory-bank/activeContext.md
). Status updates may be basic in the initial version. - MCP tools (
find_pages
,get_page_content
) are available via the integrated MCP server endpoint (typically/mcp
), managed bydjango-mcp
.
Contributing
Contributions are welcome! Please feel free to submit pull requests or open issues.
License
This project is licensed under the MIT License.
Dev Tools Supporting MCP
The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.