Pdf Rag Mcp Server

7 MIT

FreeCommunity

AI Systems

#cursor#mcp#mcp-server#python#rag

PDF RAG server for cursor.

What is Pdf Rag Mcp Server

pdf-rag-mcp-server is a powerful document knowledge base system designed for processing PDF documents. It utilizes PDF processing, vector storage, and the Model Context Protocol (MCP) to enable semantic search capabilities for PDF files.

Use cases

Use cases include academic research where users need to extract information from multiple PDFs, legal firms managing case documents, and businesses that require efficient document retrieval and knowledge management.

How to use

To use pdf-rag-mcp-server, you need to clone the repository, install the required dependencies, and run the server. You can upload PDF documents through the web interface or integrate with AI tools using the MCP protocol.

Key features

Key features include PDF document upload and processing, real-time processing status updates via WebSocket, vector-based semantic search, MCP protocol support for AI integration, a modern web interface built with React/Chakra UI, and efficient dependency management using uv.

Where to use

pdf-rag-mcp-server can be used in various fields such as education, research, legal, and any domain where PDF document management and semantic search capabilities are required.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Pdf Rag Mcp Server

Use cases

How to use

Key features

Where to use

pdf-rag-mcp-server can be used in various fields such as education, research, legal, and any domain where PDF document management and semantic search capabilities are required.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

PDF RAG MCP Server

A powerful document knowledge base system that leverages PDF processing, vector storage, and MCP (Model Context Protocol) to provide semantic search capabilities for PDF documents. This system allows you to upload, process, and query PDF documents through a modern web interface or via the MCP protocol for integration with AI tools like Cursor.

Features

PDF Document Upload & Processing: Upload PDFs and automatically extract, chunk, and vectorize content
Real-time Processing Status: WebSocket-based real-time status updates during document processing
Semantic Search: Vector-based semantic search across all processed documents
MCP Protocol Support: Integrate with AI tools like Cursor using the Model Context Protocol
Modern Web Interface: React/Chakra UI frontend for document management and querying
Fast Dependency Management: Uses uv for efficient Python dependency management

System Architecture

The system consists of:

FastAPI Backend: Handles API requests, PDF processing, and vector storage
React Frontend: Provides a user-friendly interface for managing documents
Vector Database: Stores embeddings for semantic search
WebSocket Server: Provides real-time updates on document processing
MCP Server: Exposes knowledge base to MCP-compatible clients

Quick Start

Prerequisites

Python 3.8 or later
uv - Fast Python package installer and resolver
Git
Cursor (optional, for MCP integration)

Quick Installation and Startup with uv and run.py

Clone the repository:

git clone https://github.com/yourusername/PdfRagMcpServer.git
cd PdfRagMcpServer

Install uv if you don’t have it already:

curl -sS https://astral.sh/uv/install.sh | bash

Install dependencies using uv:

uv init .
uv venv
source .venv/bin/activate
uv pip install -r backend/requirements.txt

Start the application with the convenient script:
```
uv run run.py
```
Access the web interface at http://localhost:8000
Using with Cursor

Go Settings -> Cursor Settings -> MCP -> Add new global MCP server, paste below into your Cursor ~/.cursor/mcp.json file. See Cursor MCP docs for more info.

{
  "mcpServers": {
    "pdf-rag": {
      "url": "http://localhost:7800/mcp"
    }
  }
}

You could also change localhost into the host ip you deployed the service. After this confige added to the mcp json, you will see the mcp server showes at the Cursor mcp config page, switch it on to enable the server:

Building the Frontend (For Developers)

If you need to rebuild the frontend, you have two options:

Option 1: Using the provided script (recommended)

# Make the script executable if needed
chmod +x build_frontend.py

# Run the script
./build_frontend.py

This script will automatically:

Install frontend dependencies
Build the frontend
Copy the build output to the backend’s static directory

Option 2: Manual build process

# Navigate to frontend directory
cd frontend

# Install dependencies
npm install

# Build the frontend
npm run build

# Create static directory if it doesn't exist
mkdir -p ../backend/static

# Copy build files
cp -r dist/* ../backend/static/

After building the frontend, you can start the application using the run.py script.

Simple Production Setup

For a production environment where the static files have already been built:

Place your pre-built frontend in the backend/static directory

Start the server:

cd backend
uv pip install -r requirements.txt
python -m app.main

Development Setup (Separate Services)

If you want to run the services separately for development:

Backend

Navigate to the backend directory:
```
cd backend
```
Install the dependencies with uv:
```
uv pip install -r requirements.txt
```

Run the backend server:

python -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

Frontend

Navigate to the frontend directory:
```
cd frontend
```
Install the dependencies:
```
npm install
```
Run the development server:
```
npm run dev
```

Usage

Uploading Documents

Access the web interface at http://localhost:8000
Click on “Upload New PDF” and select a PDF file
The system will process the file, showing progress in real-time
Once processed, the document will be available for searching

Searching Documents

Use the search functionality in the web interface
Or integrate with Cursor using the MCP protocol

MCP Integration with Cursor

Open Cursor
Go to Settings → AI & MCP
Add Custom MCP Server with URL: http://localhost:8000/mcp/v1
Save the settings
Now you can query your PDF knowledge base directly from Cursor

Troubleshooting

Connection Issues

Verify that port 8000 is not in use by other applications
Check that the WebSocket connection is working properly
Ensure your browser supports WebSockets

Processing Issues

Check if your PDF contains extractable text (some scanned PDFs may not)
Ensure the system has sufficient resources (memory and CPU)
Check the backend logs for detailed error messages

Project Structure

PdfRagMcpServer/
├── backend/               # FastAPI backend
│   ├── app/
│   │   ├── __init__.py
│   │   ├── main.py        # Main FastAPI application
│   │   ├── database.py    # Database models
│   │   ├── pdf_processor.py # PDF processing logic
│   │   ├── vector_store.py # Vector database interface
│   │   └── websocket.py   # WebSocket handling
│   ├── static/            # Static files for the web interface
│   └── requirements.txt   # Backend dependencies
├── frontend/              # React frontend
│   ├── public/
│   ├── src/
│   │   ├── components/    # UI components
│   │   ├── context/       # React context
│   │   ├── pages/         # Page components
│   │   └── App.jsx        # Main application component
│   ├── package.json       # Frontend dependencies
│   └── vite.config.js     # Vite configuration
├── uploads/               # PDF file storage
└── README.md              # This documentation

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers