- Explore MCP Servers
- rag-mcp
Rag Mcp
What is Rag Mcp
RAG-MCP is a system that integrates Retrieval Augmented Generation (RAG) with Multi-Channel Principles (MCP) to enhance the interaction between Large Language Models (LLMs) and various tools, improving efficiency and accuracy.
Use cases
Use cases include automating tool selection in software development environments, enhancing data retrieval processes in research, and improving user interactions in customer service applications.
How to use
Users can interact with RAG-MCP by submitting queries, which the system processes to retrieve the most relevant tools from its knowledge base, streamlining the interaction and enabling more precise tool selection.
Key features
Key features include knowledge base-driven tool selection, which retrieves relevant tools based on user queries, and multi-channel interaction capabilities that support various operations such as file system interactions, knowledge base queries, and API integrations.
Where to use
RAG-MCP can be utilized in fields that require efficient interaction with multiple tools, such as software development, data analysis, and any domain where Large Language Models are employed to enhance productivity.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Overview
What is Rag Mcp
RAG-MCP is a system that integrates Retrieval Augmented Generation (RAG) with Multi-Channel Principles (MCP) to enhance the interaction between Large Language Models (LLMs) and various tools, improving efficiency and accuracy.
Use cases
Use cases include automating tool selection in software development environments, enhancing data retrieval processes in research, and improving user interactions in customer service applications.
How to use
Users can interact with RAG-MCP by submitting queries, which the system processes to retrieve the most relevant tools from its knowledge base, streamlining the interaction and enabling more precise tool selection.
Key features
Key features include knowledge base-driven tool selection, which retrieves relevant tools based on user queries, and multi-channel interaction capabilities that support various operations such as file system interactions, knowledge base queries, and API integrations.
Where to use
RAG-MCP can be utilized in fields that require efficient interaction with multiple tools, such as software development, data analysis, and any domain where Large Language Models are employed to enhance productivity.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Content
RAG-MCP: Retrieval Augmented Generation with Multi-Channel Principles
Project Overview
RAG-MCP is a system that combines Retrieval Augmented Generation (RAG) with Multi-Channel Principles (MCP), designed to optimize the interaction experience between Large Language Models (LLMs) and tools. The system intelligently retrieves relevant tools and knowledge, significantly improving token efficiency, response time, and accuracy.
This implementation is based on the methodology described in “RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection via Retrieval-Augmented Generation”, which addresses the challenge of “prompt bloat” when LLMs have access to numerous tools.
Research Foundation
As described in the research paper, large language models face significant challenges when presented with numerous external tools:
- Prompt Bloat: Including all tool descriptions in the prompt consumes a large portion of the context window, leaving less room for actual reasoning.
- Decision Overhead: With many tool options (often with overlapping functionality), models struggle to select the appropriate tool.
- Performance Degradation: As the number of tools increases, selection accuracy decreases dramatically.
Our implementation follows the three-step pipeline proposed in the paper:
- Tool Indexing: MCP tool schemas are stored in a knowledge base with semantic embeddings
- Query-Based Retrieval: When a user submits a query, the system retrieves only the most relevant tools
- Focused LLM Processing: The LLM receives a streamlined prompt with only the relevant tools, enabling more accurate tool selection
Core Features
- Knowledge Base-Driven Tool Selection: Intelligently retrieves relevant tools based on user queries, rather than providing all available tools
- Multi-Channel Interaction: Supports comprehensive interactions across file system operations, knowledge base queries, and API integrations
- Efficient Token Usage: Significantly reduces token consumption (average 30% reduction) by providing only relevant tools
- Flexible Configuration: Supports environment variable configuration, easily adaptable to different deployment environments
- Comprehensive Testing: Includes detailed test suite for performance comparison and validation
System Architecture
rag-mcp/ ├── chat/ # Core chat and tool integration │ ├── bedrock_client.py # AWS Bedrock API client │ ├── chat_manager.py # Conversation management and tool coordination │ ├── config.py # Configuration management │ ├── knowledge_base.py # Knowledge base integration │ ├── mcp_client.py # MCP tool client │ └── cli.py # Command line interface ├── test/ # Comprehensive testing system │ ├── comprehensive_mcp_comparison_test.py # Main comparison test │ ├── html_report_generator.py # HTML report generation │ ├── run_complete_test.py # Test runner │ └── [other test files] └── .env # Environment variable configuration (private)
Main Components
1. Chat Management
The ChatManager
class coordinates LLM conversations, tool usage, and knowledge base retrieval, providing a smooth user interaction experience. It handles message processing, tool selection, and conversation history management.
2. Tool Integration
The system integrates various tool functionalities:
- File system operations (reading, writing, searching files)
- Directory management
- Knowledge base queries
- API integrations
Available MCP tools include file management, directory operations, and search functionality.
3. Knowledge Base System
Uses AWS Bedrock Knowledge Base services to retrieve relevant tools and knowledge through vector search. The knowledge base serves as both a data source and a tool selection mechanism.
This is a key implementation of the RAG-MCP methodology described in the paper. Instead of presenting all tools to the model at once, the system:
- Embeds all tool definitions and schemas into a vector database
- Uses the user’s query to retrieve the most semantically relevant tools
- Presents only those tools to the LLM, dramatically reducing context usage
4. Configuration Management
Provides flexible settings through environment variables and configuration classes:
- AWS credentials and region
- Bedrock model parameters
- Knowledge base IDs
- Chat options
- MCP tool settings
RAG-MCP Methodology Implementation
The core RAG-MCP approach is implemented through the following components:
Tool Retrieval Mechanism
The knowledge_base.py
module implements the semantic retrieval system that forms the core of the RAG-MCP approach. When a user submits a query:
- The query is vectorized using the same embedding model as the knowledge base
- A similarity search retrieves the most relevant tool definitions
- Only these relevant tools are included in the prompt to the LLM
This approach allows the system to efficiently handle a large number of potential tools (MCP servers) without overwhelming the model’s context window.
Tool Selection Improvement
As demonstrated in our testing (and aligned with the paper’s findings), this retrieval-based approach:
- Reduces prompt tokens by approximately 30-50%
- Increases tool selection accuracy from ~15% (baseline with all tools) to ~40-45% (with RAG-MCP)
- Maintains high performance even as the number of available tools increases
Performance Scaling
The system is designed to scale efficiently as more tools are added:
- New tools can be added to the knowledge base without retraining
- Retrieval precision remains high even with hundreds of potential tools
- Token efficiency improves most dramatically as the tool count increases
Testing and Evaluation System
The project includes a comprehensive test suite for comparing RAG-MCP performance against full MCP tools:
- Token Efficiency: Measures token usage reduction, with average savings of about 30%
- Response Time: RAG-MCP is typically faster than full MCP
- Accuracy: Tool selection accuracy and response quality
- Success Rate: Percentage of successfully processed queries
Test Reports
The system generates detailed HTML reports including:
- Interactive charts and visualizations
- Key metrics cards
- Detailed analysis sections
- Smart recommendations based on test results
Usage
Environment Setup
- Create a
.env
file in the project root:
# AWS Configuration
AWS_REGION=region
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
# Bedrock Configuration
BEDROCK_MODEL_ID=anthropic.claude-3-sonnet-20240229-v1:0
BEDROCK_MAX_TOKENS=4096
BEDROCK_TEMPERATURE=0.7
# Knowledge Base Configuration
KB_KNOWLEDGE_BASE_ID=your_kb_id
KB_DATA_SOURCE_ID=your_datasource_id
KB_S3_BUCKET=your_s3_bucket
# Chat Configuration
CHAT_MAX_TOOL_ROUNDS=15
CHAT_ENABLE_AUTO_TOOLS=true
# MCP Configuration
MCP_COMMAND=your_mcp_command
MCP_ARGS=your_mcp_args
- Install Dependencies
Using conda environment:
conda env create -f environment.yml
conda activate rag-mcp
Running Tests
# Run complete test and generate HTML report
python test/run_complete_test.py
# Run demo test (3 queries)
python test/run_complete_test.py --demo
# Specify number of test queries
python test/run_complete_test.py --queries 5
# Auto cleanup temporary files after test completion
python test/run_complete_test.py --demo --cleanup
Using CLI Interface
# Start chat interface
python -m chat.cli
# Specify custom configuration
python -m chat.cli --kb-id custom_kb_id --model-id custom_model_id
Performance Benchmarks
Based on comprehensive testing with file system operations, RAG-MCP demonstrates significant improvements over traditional full MCP:
Metric | RAG-MCP | Full MCP | Improvement |
---|---|---|---|
Average Token Usage | 1,670 | 6,630 | 74.8% reduction |
Average Response Time | 6.71s | 17.69s | 62.1% faster |
Success Rate | 100% | 88.9% | +11.1% |
Tool Selection Accuracy | 61.1% | 66.7% | -5.6% |
Key Findings from Testing
-
Dramatic Token Efficiency: RAG-MCP reduces token usage by nearly 75%, significantly exceeding our initial expectations of 30-50% reduction. This translates directly to lower API costs and faster processing.
-
Substantial Speed Improvement: Response times with RAG-MCP are approximately 62% faster than Full MCP, with the most complex queries showing the greatest improvements (up to 75% faster for multi-tool queries).
-
Higher Success Rate: RAG-MCP demonstrated 100% success in completing requested operations, compared to 88.9% for Full MCP, indicating greater reliability for end users.
-
Tool Accuracy Trade-off: While Full MCP shows slightly better tool selection accuracy (+5.6%), this advantage is offset by slower response times and significantly higher token consumption. The accuracy difference is primarily seen in complex multi-tool scenarios.
-
Query-Specific Performance:
- For simple queries (file listing, reading): RAG-MCP achieves comparable accuracy with 45-57% token reduction
- For complex queries (search, multi-file operations): RAG-MCP achieves 70-84% token reduction with moderate accuracy trade-offs
- For directory operations: RAG-MCP achieves 75-80% token reduction with competitive accuracy
These results align with the findings in the RAG-MCP paper, which reported:
- Token reduction of over 50% in some cases
- Tool selection accuracy increasing from 13.62% (baseline) to 43.13% (with RAG-MCP)
- More efficient prompt utilization (1084 tokens vs 2133 tokens in baseline)
Technology Stack
- AWS Bedrock: LLM and knowledge base services
- Python: Primary development language
- Conda: Environment management
- Chart.js: Report visualization
- Async I/O: Asynchronous operations
Security Best Practices
- Never commit
.env
file to version control (it’s in .gitignore) - Use environment variables for all sensitive configuration
- Use IAM roles when running on AWS infrastructure
- Rotate credentials regularly
- Use least privilege principle for AWS permissions
Special Commands (CLI)
The CLI interface supports the following special commands:
q
orexit
: Exit chatclear
: Clear conversation historyhistory
: View conversation historytools
: View all available toolssync
: Sync tools to knowledge basehelp
: Show help information
Extensibility
This architecture design has good extensibility:
- Can easily add new client types
- Supports plugin-style tool integration
- Configuration system supports dynamic extension
- Exception handling mechanism can be customized
Troubleshooting
Common Issues
-
MCP Connection Failed
Solution: Check MCP server configuration and network connection
-
Knowledge Base Query Failed
Solution: Verify AWS credentials and knowledge base configuration
-
Inaccurate Token Statistics
Solution: Ensure Bedrock client is properly configured
-
File Already Exists Error
Solution: System auto-generates unique filenames. If issues persist, run cleanup script python test/cleanup_test_files.py --execute
Future Work
As noted in the paper, several directions for future work include:
- Hierarchical Retrieval: Implementing nested retrieval mechanisms for extremely large tool sets
- Adaptive Selection: Dynamically adjusting the number of tools retrieved based on query complexity
- Multi-Tool Workflows: Enhancing the system to support sequences of tool operations for complex tasks
- Accuracy Improvement: Refining the knowledge retrieval system to improve tool selection accuracy, particularly for complex multi-tool operations
- Context Window Optimization: Further research into optimizing context window usage to maximize both efficiency and accuracy
Conclusion
RAG-MCP demonstrates exceptional performance in reducing token usage (74.8%) and improving response times (62.1% faster) with minimal impact on accuracy. The approach scales effectively with increasing tool counts, making it an ideal solution for production environments where efficiency and scalability are critical. While there is a slight trade-off in tool selection accuracy for complex operations, the overall benefits in terms of cost savings, speed, and reliability make RAG-MCP the recommended approach for production LLM systems with extensive tool integration needs.
License and Contribution
RAG-MCP is an internal research project, and contributions must follow the contribution guidelines.
© 2024 RAG-MCP Team
DevTools Supporting MCP
The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.