Rag Mcp

1 MIT

FreeCommunity

AI Systems

RAG-MCP optimizes LLM-tool interaction using Retrieval Augmented Generation.

What is Rag Mcp

RAG-MCP is a system that integrates Retrieval Augmented Generation (RAG) with Multi-Channel Principles (MCP) to enhance the interaction between Large Language Models (LLMs) and various tools, improving efficiency and accuracy.

Use cases

Use cases include automating tool selection in software development environments, enhancing data retrieval processes in research, and improving user interactions in customer service applications.

How to use

Users can interact with RAG-MCP by submitting queries, which the system processes to retrieve the most relevant tools from its knowledge base, streamlining the interaction and enabling more precise tool selection.

Key features

Key features include knowledge base-driven tool selection, which retrieves relevant tools based on user queries, and multi-channel interaction capabilities that support various operations such as file system interactions, knowledge base queries, and API integrations.

Where to use

RAG-MCP can be utilized in fields that require efficient interaction with multiple tools, such as software development, data analysis, and any domain where Large Language Models are employed to enhance productivity.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai/download

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Rag Mcp

Use cases

Use cases include automating tool selection in software development environments, enhancing data retrieval processes in research, and improving user interactions in customer service applications.

How to use

Key features

Where to use

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai/download

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

RAG-MCP: Retrieval Augmented Generation with Multi-Channel Principles

Project Overview

RAG-MCP is a system that combines Retrieval Augmented Generation (RAG) with Multi-Channel Principles (MCP), designed to optimize the interaction experience between Large Language Models (LLMs) and tools. The system intelligently retrieves relevant tools and knowledge, significantly improving token efficiency, response time, and accuracy.

This implementation is based on the methodology described in “RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection via Retrieval-Augmented Generation”, which addresses the challenge of “prompt bloat” when LLMs have access to numerous tools.

Research Foundation

As described in the research paper, large language models face significant challenges when presented with numerous external tools:

Prompt Bloat: Including all tool descriptions in the prompt consumes a large portion of the context window, leaving less room for actual reasoning.
Decision Overhead: With many tool options (often with overlapping functionality), models struggle to select the appropriate tool.
Performance Degradation: As the number of tools increases, selection accuracy decreases dramatically.

Our implementation follows the three-step pipeline proposed in the paper:

Tool Indexing: MCP tool schemas are stored in a knowledge base with semantic embeddings
Query-Based Retrieval: When a user submits a query, the system retrieves only the most relevant tools
Focused LLM Processing: The LLM receives a streamlined prompt with only the relevant tools, enabling more accurate tool selection

Core Features

Knowledge Base-Driven Tool Selection: Intelligently retrieves relevant tools based on user queries, rather than providing all available tools
Multi-Channel Interaction: Supports comprehensive interactions across file system operations, knowledge base queries, and API integrations
Efficient Token Usage: Significantly reduces token consumption (average 30% reduction) by providing only relevant tools
Flexible Configuration: Supports environment variable configuration, easily adaptable to different deployment environments
Comprehensive Testing: Includes detailed test suite for performance comparison and validation

System Architecture

rag-mcp/
├── chat/                   # Core chat and tool integration
│   ├── bedrock_client.py   # AWS Bedrock API client
│   ├── chat_manager.py     # Conversation management and tool coordination
│   ├── config.py           # Configuration management
│   ├── knowledge_base.py   # Knowledge base integration
│   ├── mcp_client.py       # MCP tool client
│   └── cli.py              # Command line interface
├── test/                   # Comprehensive testing system
│   ├── comprehensive_mcp_comparison_test.py  # Main comparison test
│   ├── html_report_generator.py              # HTML report generation
│   ├── run_complete_test.py                  # Test runner
│   └── [other test files]
└── .env                    # Environment variable configuration (private)

Main Components

1. Chat Management

The ChatManager class coordinates LLM conversations, tool usage, and knowledge base retrieval, providing a smooth user interaction experience. It handles message processing, tool selection, and conversation history management.

2. Tool Integration

The system integrates various tool functionalities:

File system operations (reading, writing, searching files)
Directory management
Knowledge base queries
API integrations

Available MCP tools include file management, directory operations, and search functionality.

3. Knowledge Base System

Uses AWS Bedrock Knowledge Base services to retrieve relevant tools and knowledge through vector search. The knowledge base serves as both a data source and a tool selection mechanism.

This is a key implementation of the RAG-MCP methodology described in the paper. Instead of presenting all tools to the model at once, the system:

Embeds all tool definitions and schemas into a vector database
Uses the user’s query to retrieve the most semantically relevant tools
Presents only those tools to the LLM, dramatically reducing context usage

4. Configuration Management

Provides flexible settings through environment variables and configuration classes:

AWS credentials and region
Bedrock model parameters
Knowledge base IDs
Chat options
MCP tool settings

RAG-MCP Methodology Implementation

The core RAG-MCP approach is implemented through the following components:

Tool Retrieval Mechanism

The knowledge_base.py module implements the semantic retrieval system that forms the core of the RAG-MCP approach. When a user submits a query:

The query is vectorized using the same embedding model as the knowledge base
A similarity search retrieves the most relevant tool definitions
Only these relevant tools are included in the prompt to the LLM

This approach allows the system to efficiently handle a large number of potential tools (MCP servers) without overwhelming the model’s context window.

Tool Selection Improvement

As demonstrated in our testing (and aligned with the paper’s findings), this retrieval-based approach:

Reduces prompt tokens by approximately 30-50%
Increases tool selection accuracy from ~15% (baseline with all tools) to ~40-45% (with RAG-MCP)
Maintains high performance even as the number of available tools increases

Performance Scaling

The system is designed to scale efficiently as more tools are added:

New tools can be added to the knowledge base without retraining
Retrieval precision remains high even with hundreds of potential tools
Token efficiency improves most dramatically as the tool count increases

Testing and Evaluation System

The project includes a comprehensive test suite for comparing RAG-MCP performance against full MCP tools:

Token Efficiency: Measures token usage reduction, with average savings of about 30%
Response Time: RAG-MCP is typically faster than full MCP
Accuracy: Tool selection accuracy and response quality
Success Rate: Percentage of successfully processed queries

Test Reports

The system generates detailed HTML reports including:

Interactive charts and visualizations
Key metrics cards
Detailed analysis sections
Smart recommendations based on test results

Usage

Environment Setup

Create a .env file in the project root:

# AWS Configuration
AWS_REGION=region
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key

# Bedrock Configuration
BEDROCK_MODEL_ID=anthropic.claude-3-sonnet-20240229-v1:0
BEDROCK_MAX_TOKENS=4096
BEDROCK_TEMPERATURE=0.7

# Knowledge Base Configuration
KB_KNOWLEDGE_BASE_ID=your_kb_id
KB_DATA_SOURCE_ID=your_datasource_id
KB_S3_BUCKET=your_s3_bucket

# Chat Configuration
CHAT_MAX_TOOL_ROUNDS=15
CHAT_ENABLE_AUTO_TOOLS=true

# MCP Configuration
MCP_COMMAND=your_mcp_command
MCP_ARGS=your_mcp_args

Install Dependencies

Using conda environment:

conda env create -f environment.yml
conda activate rag-mcp

Running Tests

# Run complete test and generate HTML report
python test/run_complete_test.py

# Run demo test (3 queries)
python test/run_complete_test.py --demo

# Specify number of test queries
python test/run_complete_test.py --queries 5

# Auto cleanup temporary files after test completion
python test/run_complete_test.py --demo --cleanup

Using CLI Interface

# Start chat interface
python -m chat.cli

# Specify custom configuration
python -m chat.cli --kb-id custom_kb_id --model-id custom_model_id

Performance Benchmarks

Based on comprehensive testing with file system operations, RAG-MCP demonstrates significant improvements over traditional full MCP:

Metric	RAG-MCP	Full MCP	Improvement
Average Token Usage	1,670	6,630	74.8% reduction
Average Response Time	6.71s	17.69s	62.1% faster
Success Rate	100%	88.9%	+11.1%
Tool Selection Accuracy	61.1%	66.7%	-5.6%

Key Findings from Testing

Dramatic Token Efficiency: RAG-MCP reduces token usage by nearly 75%, significantly exceeding our initial expectations of 30-50% reduction. This translates directly to lower API costs and faster processing.
Substantial Speed Improvement: Response times with RAG-MCP are approximately 62% faster than Full MCP, with the most complex queries showing the greatest improvements (up to 75% faster for multi-tool queries).
Higher Success Rate: RAG-MCP demonstrated 100% success in completing requested operations, compared to 88.9% for Full MCP, indicating greater reliability for end users.
Tool Accuracy Trade-off: While Full MCP shows slightly better tool selection accuracy (+5.6%), this advantage is offset by slower response times and significantly higher token consumption. The accuracy difference is primarily seen in complex multi-tool scenarios.
Query-Specific Performance:
- For simple queries (file listing, reading): RAG-MCP achieves comparable accuracy with 45-57% token reduction
- For complex queries (search, multi-file operations): RAG-MCP achieves 70-84% token reduction with moderate accuracy trade-offs
- For directory operations: RAG-MCP achieves 75-80% token reduction with competitive accuracy

These results align with the findings in the RAG-MCP paper, which reported:

Token reduction of over 50% in some cases
Tool selection accuracy increasing from 13.62% (baseline) to 43.13% (with RAG-MCP)
More efficient prompt utilization (1084 tokens vs 2133 tokens in baseline)

Technology Stack

AWS Bedrock: LLM and knowledge base services
Python: Primary development language
Conda: Environment management
Chart.js: Report visualization
Async I/O: Asynchronous operations

Security Best Practices

Never commit .env file to version control (it’s in .gitignore)
Use environment variables for all sensitive configuration
Use IAM roles when running on AWS infrastructure
Rotate credentials regularly
Use least privilege principle for AWS permissions

Special Commands (CLI)

The CLI interface supports the following special commands:

q or exit: Exit chat
clear: Clear conversation history
history: View conversation history
tools: View all available tools
sync: Sync tools to knowledge base
help: Show help information

Extensibility

This architecture design has good extensibility:

Can easily add new client types
Supports plugin-style tool integration
Configuration system supports dynamic extension
Exception handling mechanism can be customized

Troubleshooting

Common Issues

MCP Connection Failed

Solution: Check MCP server configuration and network connection

Knowledge Base Query Failed

Solution: Verify AWS credentials and knowledge base configuration

Inaccurate Token Statistics

Solution: Ensure Bedrock client is properly configured

File Already Exists Error

Solution: System auto-generates unique filenames. If issues persist, run cleanup script
python test/cleanup_test_files.py --execute

Future Work

As noted in the paper, several directions for future work include:

Hierarchical Retrieval: Implementing nested retrieval mechanisms for extremely large tool sets
Adaptive Selection: Dynamically adjusting the number of tools retrieved based on query complexity
Multi-Tool Workflows: Enhancing the system to support sequences of tool operations for complex tasks
Accuracy Improvement: Refining the knowledge retrieval system to improve tool selection accuracy, particularly for complex multi-tool operations
Context Window Optimization: Further research into optimizing context window usage to maximize both efficiency and accuracy

Conclusion

RAG-MCP demonstrates exceptional performance in reducing token usage (74.8%) and improving response times (62.1% faster) with minimal impact on accuracy. The approach scales effectively with increasing tool counts, making it an ideal solution for production environments where efficiency and scalability are critical. While there is a slight trade-off in tool selection accuracy for complex operations, the overall benefits in terms of cost savings, speed, and reliability make RAG-MCP the recommended approach for production LLM systems with extensive tool integration needs.

License and Contribution

RAG-MCP is an internal research project, and contributions must follow the contribution guidelines.

DevTools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. codeium.com/windsurf

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP DevTools

Tools

No tools

Rag Mcp

What is Rag Mcp

Use cases

How to use

Key features

Where to use

Clients Supporting MCP

Overview

What is Rag Mcp

Use cases

How to use

Key features

Where to use

Clients Supporting MCP

Content

RAG-MCP: Retrieval Augmented Generation with Multi-Channel Principles

Project Overview

Research Foundation

Core Features

System Architecture

Main Components

RAG-MCP Methodology Implementation

Testing and Evaluation System

Usage

Performance Benchmarks

Technology Stack

Security Best Practices

Special Commands (CLI)

Extensibility

Troubleshooting

Future Work

Conclusion

License and Contribution

DevTools Supporting MCP

Tools

Comments