- Explore MCP Servers
- mcp-pdf-extraction-server
Mcp Pdf Extraction Server
What is Mcp Pdf Extraction Server
The mcp-pdf-extraction-server is an MCP server designed to extract contents from PDF files.
Use cases
Use cases include extracting text from scanned documents, processing invoices, and converting educational materials into editable formats.
How to use
To use the mcp-pdf-extraction-server, you need to configure it in your application settings and run the ‘extract-pdf-contents’ tool by providing the required ‘pdf_path’ and optional ‘pages’ arguments.
Key features
Key features include the ability to extract contents from local PDF files, support for specifying page numbers (including negative indexing), and integration of PDF file reading and OCR capabilities.
Where to use
The mcp-pdf-extraction-server can be utilized in various fields such as document management, data extraction, and digital archiving.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Overview
What is Mcp Pdf Extraction Server
The mcp-pdf-extraction-server is an MCP server designed to extract contents from PDF files.
Use cases
Use cases include extracting text from scanned documents, processing invoices, and converting educational materials into editable formats.
How to use
To use the mcp-pdf-extraction-server, you need to configure it in your application settings and run the ‘extract-pdf-contents’ tool by providing the required ‘pdf_path’ and optional ‘pages’ arguments.
Key features
Key features include the ability to extract contents from local PDF files, support for specifying page numbers (including negative indexing), and integration of PDF file reading and OCR capabilities.
Where to use
The mcp-pdf-extraction-server can be utilized in various fields such as document management, data extraction, and digital archiving.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Content
PDF Extraction MCP Server (Claude Code Fork)
MCP server to extract contents from PDF files, with fixes for Claude Code CLI installation.
This fork includes critical fixes for installing and running the server with Claude Code (the CLI version).
What’s Different in This Fork
- Added
__main__.py- Enables the package to be run as a module withpython -m pdf_extraction - Claude Code specific instructions - Clear installation steps that work with Claude Code CLI
- Tested installation process - Verified working with
claude mcp addcommand
Components
Tools
The server implements one tool:
- extract-pdf-contents: Extract contents from a local PDF file
- Takes
pdf_pathas a required string argument (local file path) - Takes
pagesas an optional string argument (comma-separated page numbers, supports negative indexing like-1for last page) - Supports both PDF text extraction and OCR for scanned documents
- Takes
Installation for Claude Code CLI
Prerequisites
- Python 3.11 or higher
- pip or conda
- Claude Code CLI installed (
claudecommand)
Step 1: Clone and Install
# Clone this fork
git clone https://github.com/lh/mcp-pdf-extraction-server.git
cd mcp-pdf-extraction-server
# Install in development mode
pip install -e .
Step 2: Find the Installed Command
# Check where pdf-extraction was installed
which pdf-extraction
# Example output: /opt/homebrew/Caskroom/miniconda/base/bin/pdf-extraction
Step 3: Add to Claude Code
# Add the server using the full path from above
claude mcp add pdf-extraction /opt/homebrew/Caskroom/miniconda/base/bin/pdf-extraction
# Verify it was added
claude mcp list
Step 4: Use in Claude
# Start a new Claude session
claude
# In Claude, type:
/mcp
# You should see:
# MCP Server Status
# • pdf-extraction: connected
Usage Example
Once connected, you can ask Claude to extract PDF contents:
"Can you extract the content from the PDF at /path/to/document.pdf?" "Extract pages 1-3 and the last page from /path/to/document.pdf"
Troubleshooting
Server Not Connecting
- Make sure you started a NEW Claude session after adding the server
- Verify the command path is correct:
ls -la $(which pdf-extraction) - Test the command directly (it should hang waiting for input):
pdf-extraction
Module Not Found Errors
If you get Python import errors:
- Make sure you’re using the same Python environment where you installed the package
- Try using the full Python path:
claude mcp add pdf-extraction /path/to/python -m pdf_extraction
Installation Issues
If pip install -e . fails:
- Make sure you have Python 3.11+:
python --version - Try creating a fresh virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -e .
For Claude Desktop Users
This fork is specifically for Claude Code CLI. If you’re using Claude Desktop (the GUI app), please refer to the original repository for installation instructions.
Dependencies
- mcp>=1.2.0
- pypdf2>=3.0.1
- pytesseract>=0.3.10 (for OCR support)
- Pillow>=10.0.0
- pydantic>=2.10.1,<3.0.0
- pymupdf>=1.24.0
Contributing
Contributions are welcome! The main change in this fork is the addition of __main__.py to make the package runnable as a module.
License
Same as the original repository.
Credits
Dev Tools Supporting MCP
The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.










