- Explore MCP Servers
- mcp-rag-server
Mcp Rag Server
What is Mcp Rag Server
mcp-rag-server is a Model Context Protocol server that enables Retrieval Augmented Generation (RAG) by indexing documents and providing relevant context to Large Language Models through the MCP protocol.
Use cases
This tool is particularly useful for applications requiring efficient document retrieval based on user queries, enabling better responses and insights from Large Language Models by utilizing stored context from indexed documents.
How to use
To use the mcp-rag-server, install it globally via npm or clone the repository and build it from source. Set environment variables for the Base LLM API, embedding model, vector store path, and chunk size. Run the server using ‘mcp-rag-server’ or ‘npx mcp-rag-server’. Index documents using the provided MCP tools and query them as needed.
Key features
Key features include the ability to index multiple document formats, customizable chunk sizes, a local SQLite vector store, support for various embedding providers, and exposed MCP tools for seamless integration with clients.
Where to use
mcp-rag-server can be utilized in environments where Large Language Models are deployed, such as chatbots, virtual assistants, and knowledge management systems, enabling these systems to provide contextual responses based on indexed document content.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Overview
What is Mcp Rag Server
mcp-rag-server is a Model Context Protocol server that enables Retrieval Augmented Generation (RAG) by indexing documents and providing relevant context to Large Language Models through the MCP protocol.
Use cases
This tool is particularly useful for applications requiring efficient document retrieval based on user queries, enabling better responses and insights from Large Language Models by utilizing stored context from indexed documents.
How to use
To use the mcp-rag-server, install it globally via npm or clone the repository and build it from source. Set environment variables for the Base LLM API, embedding model, vector store path, and chunk size. Run the server using ‘mcp-rag-server’ or ‘npx mcp-rag-server’. Index documents using the provided MCP tools and query them as needed.
Key features
Key features include the ability to index multiple document formats, customizable chunk sizes, a local SQLite vector store, support for various embedding providers, and exposed MCP tools for seamless integration with clients.
Where to use
mcp-rag-server can be utilized in environments where Large Language Models are deployed, such as chatbots, virtual assistants, and knowledge management systems, enabling these systems to provide contextual responses based on indexed document content.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Content
mcp-rag-server
A Model Context Protocol (MCP) server that enables Retrieval Augmented Generation (RAG). It indexes your documents and serves relevant context to Large Language Models via the MCP protocol.
Integration Examples
Generic MCP Client Configuration
{
"mcpServers": {
"rag": {
"command": "npx",
"args": [
"-y",
"mcp-rag-server"
],
"env": {
"BASE_LLM_API": "http://localhost:11434/v1",
"EMBEDDING_MODEL": "nomic-embed-text",
"VECTOR_STORE_PATH": "./vector_store",
"CHUNK_SIZE": "500"
}
}
}
}
Example Interaction
# Index documents
>> tool:embedding_documents {"path":"./docs"}
# Check status
>> resource:embedding-status
<< rag://embedding/status
Current Path: ./docs/file1.md
Completed: 10
Failed: 0
Total chunks: 15
Failed Reason:
Table of Contents
- Integration Examples
- Features
- Installation
- Quick Start
- Configuration
- Usage
- How RAG Works
- Development
- Contributing
- License
Features
- Index documents in
.txt,.md,.json,.jsonl, and.csvformats - Customizable chunk size for splitting text
- Local vector store powered by SQLite (via LangChain’s LibSQLVectorStore)
- Supports multiple embedding providers (OpenAI, Ollama, Granite, Nomic)
- Exposes MCP tools and resources over stdio for seamless integration with MCP clients
Installation
From npm
npm install -g mcp-rag-server
From Source
git clone https://github.com/kwanLeeFrmVi/mcp-rag-server.git
cd mcp-rag-server
npm install
npm run build
npm start
Quick Start
export BASE_LLM_API=http://localhost:11434/v1
export EMBEDDING_MODEL=granite-embedding-278m-multilingual-Q6_K-1743674737397:latest
export VECTOR_STORE_PATH=./vector_store
export CHUNK_SIZE=500
# Run (global install)
mcp-rag-server
# Or via npx
npx mcp-rag-server
💡 Tip: We recommend using Ollama for embedding. Install and pull the
nomic-embed-textmodel:
ollama pull nomic-embed-text
export EMBEDDING_MODEL=nomic-embed-text
Configuration
| Variable | Description | Default |
|---|---|---|
BASE_LLM_API |
Base URL for embedding API | http://localhost:11434/v1 |
LLM_API_KEY |
API key for your LLM provider | (empty) |
EMBEDDING_MODEL |
Embedding model identifier | nomic-embed-text |
VECTOR_STORE_PATH |
Directory for local vector store | ./vector_store |
CHUNK_SIZE |
Characters per text chunk (number) | 500 |
💡 Recommendation: Use Ollama embedding models like
nomic-embed-textfor best performance.
Usage
MCP Tools
Once running, the server exposes these tools via MCP:
embedding_documents(path: string): Index documents under the given pathquery_documents(query: string, k?: number): Retrieve topkchunks (default 15)remove_document(path: string): Remove a specific documentremove_all_documents(confirm: boolean): Clear the entire index (confirm=true)list_documents(): List all indexed document paths
MCP Resources
Clients can also read resources via URIs:
rag://documents— List all document URIsrag://document/{path}— Fetch full content of a documentrag://query-document/{numberOfChunks}/{query}— Query documents as a resourcerag://embedding/status— Check current indexing status (completed, failed, total)
How RAG Works
- Indexing: Reads files, splits text into chunks based on
CHUNK_SIZE, and queues them for embedding. - Embedding: Processes each chunk sequentially against the embedding API, storing vectors in SQLite.
- Querying: Embeds the query and retrieves nearest text chunks from the vector store, returning them to the client.
Development
npm install
npm run build # Compile TypeScript
npm start # Run server
npm run watch # Watch for changes
Contributing
Contributions are welcome! Please open issues or pull requests on GitHub.
License
MIT 2025 Quan Le
Dev Tools Supporting MCP
The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.










