MCP ExplorerExplorer

Rag Mcp

@memoverflowon 17 days ago
1 MIT
FreeCommunity
AI Systems
RAG-MCP optimizes LLM-tool interaction using Retrieval Augmented Generation.

Overview

What is Rag Mcp

RAG-MCP is a system that integrates Retrieval Augmented Generation (RAG) with Multi-Channel Principles (MCP) to enhance the interaction between Large Language Models (LLMs) and various tools, improving efficiency and accuracy.

Use cases

Use cases include automating tool selection in software development environments, enhancing data retrieval processes in research, and improving user interactions in customer service applications.

How to use

Users can interact with RAG-MCP by submitting queries, which the system processes to retrieve the most relevant tools from its knowledge base, streamlining the interaction and enabling more precise tool selection.

Key features

Key features include knowledge base-driven tool selection, which retrieves relevant tools based on user queries, and multi-channel interaction capabilities that support various operations such as file system interactions, knowledge base queries, and API integrations.

Where to use

RAG-MCP can be utilized in fields that require efficient interaction with multiple tools, such as software development, data analysis, and any domain where Large Language Models are employed to enhance productivity.

Content

RAG-MCP: Retrieval Augmented Generation with Multi-Channel Principles

Project Overview

RAG-MCP is a system that combines Retrieval Augmented Generation (RAG) with Multi-Channel Principles (MCP), designed to optimize the interaction experience between Large Language Models (LLMs) and tools. The system intelligently retrieves relevant tools and knowledge, significantly improving token efficiency, response time, and accuracy.

This implementation is based on the methodology described in “RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection via Retrieval-Augmented Generation”, which addresses the challenge of “prompt bloat” when LLMs have access to numerous tools.

Research Foundation

As described in the research paper, large language models face significant challenges when presented with numerous external tools:

  1. Prompt Bloat: Including all tool descriptions in the prompt consumes a large portion of the context window, leaving less room for actual reasoning.
  2. Decision Overhead: With many tool options (often with overlapping functionality), models struggle to select the appropriate tool.
  3. Performance Degradation: As the number of tools increases, selection accuracy decreases dramatically.

Our implementation follows the three-step pipeline proposed in the paper:

  1. Tool Indexing: MCP tool schemas are stored in a knowledge base with semantic embeddings
  2. Query-Based Retrieval: When a user submits a query, the system retrieves only the most relevant tools
  3. Focused LLM Processing: The LLM receives a streamlined prompt with only the relevant tools, enabling more accurate tool selection

Core Features

  • Knowledge Base-Driven Tool Selection: Intelligently retrieves relevant tools based on user queries, rather than providing all available tools
  • Multi-Channel Interaction: Supports comprehensive interactions across file system operations, knowledge base queries, and API integrations
  • Efficient Token Usage: Significantly reduces token consumption (average 30% reduction) by providing only relevant tools
  • Flexible Configuration: Supports environment variable configuration, easily adaptable to different deployment environments
  • Comprehensive Testing: Includes detailed test suite for performance comparison and validation

System Architecture

rag-mcp/
├── chat/                   # Core chat and tool integration
│   ├── bedrock_client.py   # AWS Bedrock API client
│   ├── chat_manager.py     # Conversation management and tool coordination
│   ├── config.py           # Configuration management
│   ├── knowledge_base.py   # Knowledge base integration
│   ├── mcp_client.py       # MCP tool client
│   └── cli.py              # Command line interface
├── test/                   # Comprehensive testing system
│   ├── comprehensive_mcp_comparison_test.py  # Main comparison test
│   ├── html_report_generator.py              # HTML report generation
│   ├── run_complete_test.py                  # Test runner
│   └── [other test files]
└── .env                    # Environment variable configuration (private)

Main Components

1. Chat Management

The ChatManager class coordinates LLM conversations, tool usage, and knowledge base retrieval, providing a smooth user interaction experience. It handles message processing, tool selection, and conversation history management.

2. Tool Integration

The system integrates various tool functionalities:

  • File system operations (reading, writing, searching files)
  • Directory management
  • Knowledge base queries
  • API integrations

Available MCP tools include file management, directory operations, and search functionality.

3. Knowledge Base System

Uses AWS Bedrock Knowledge Base services to retrieve relevant tools and knowledge through vector search. The knowledge base serves as both a data source and a tool selection mechanism.

This is a key implementation of the RAG-MCP methodology described in the paper. Instead of presenting all tools to the model at once, the system:

  1. Embeds all tool definitions and schemas into a vector database
  2. Uses the user’s query to retrieve the most semantically relevant tools
  3. Presents only those tools to the LLM, dramatically reducing context usage

4. Configuration Management

Provides flexible settings through environment variables and configuration classes:

  • AWS credentials and region
  • Bedrock model parameters
  • Knowledge base IDs
  • Chat options
  • MCP tool settings

RAG-MCP Methodology Implementation

The core RAG-MCP approach is implemented through the following components:

Tool Retrieval Mechanism

The knowledge_base.py module implements the semantic retrieval system that forms the core of the RAG-MCP approach. When a user submits a query:

  1. The query is vectorized using the same embedding model as the knowledge base
  2. A similarity search retrieves the most relevant tool definitions
  3. Only these relevant tools are included in the prompt to the LLM

This approach allows the system to efficiently handle a large number of potential tools (MCP servers) without overwhelming the model’s context window.

Tool Selection Improvement

As demonstrated in our testing (and aligned with the paper’s findings), this retrieval-based approach:

  • Reduces prompt tokens by approximately 30-50%
  • Increases tool selection accuracy from ~15% (baseline with all tools) to ~40-45% (with RAG-MCP)
  • Maintains high performance even as the number of available tools increases

Performance Scaling

The system is designed to scale efficiently as more tools are added:

  • New tools can be added to the knowledge base without retraining
  • Retrieval precision remains high even with hundreds of potential tools
  • Token efficiency improves most dramatically as the tool count increases

Testing and Evaluation System

The project includes a comprehensive test suite for comparing RAG-MCP performance against full MCP tools:

  • Token Efficiency: Measures token usage reduction, with average savings of about 30%
  • Response Time: RAG-MCP is typically faster than full MCP
  • Accuracy: Tool selection accuracy and response quality
  • Success Rate: Percentage of successfully processed queries

Test Reports

The system generates detailed HTML reports including:

  • Interactive charts and visualizations
  • Key metrics cards
  • Detailed analysis sections
  • Smart recommendations based on test results

Usage

Environment Setup

  1. Create a .env file in the project root:
# AWS Configuration
AWS_REGION=region
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key

# Bedrock Configuration
BEDROCK_MODEL_ID=anthropic.claude-3-sonnet-20240229-v1:0
BEDROCK_MAX_TOKENS=4096
BEDROCK_TEMPERATURE=0.7

# Knowledge Base Configuration
KB_KNOWLEDGE_BASE_ID=your_kb_id
KB_DATA_SOURCE_ID=your_datasource_id
KB_S3_BUCKET=your_s3_bucket

# Chat Configuration
CHAT_MAX_TOOL_ROUNDS=15
CHAT_ENABLE_AUTO_TOOLS=true

# MCP Configuration
MCP_COMMAND=your_mcp_command
MCP_ARGS=your_mcp_args
  1. Install Dependencies

Using conda environment:

conda env create -f environment.yml
conda activate rag-mcp

Running Tests

# Run complete test and generate HTML report
python test/run_complete_test.py

# Run demo test (3 queries)
python test/run_complete_test.py --demo

# Specify number of test queries
python test/run_complete_test.py --queries 5

# Auto cleanup temporary files after test completion
python test/run_complete_test.py --demo --cleanup

Using CLI Interface

# Start chat interface
python -m chat.cli

# Specify custom configuration
python -m chat.cli --kb-id custom_kb_id --model-id custom_model_id

Performance Benchmarks

Based on comprehensive testing with file system operations, RAG-MCP demonstrates significant improvements over traditional full MCP:

Metric RAG-MCP Full MCP Improvement
Average Token Usage 1,670 6,630 74.8% reduction
Average Response Time 6.71s 17.69s 62.1% faster
Success Rate 100% 88.9% +11.1%
Tool Selection Accuracy 61.1% 66.7% -5.6%

Key Findings from Testing

  1. Dramatic Token Efficiency: RAG-MCP reduces token usage by nearly 75%, significantly exceeding our initial expectations of 30-50% reduction. This translates directly to lower API costs and faster processing.

  2. Substantial Speed Improvement: Response times with RAG-MCP are approximately 62% faster than Full MCP, with the most complex queries showing the greatest improvements (up to 75% faster for multi-tool queries).

  3. Higher Success Rate: RAG-MCP demonstrated 100% success in completing requested operations, compared to 88.9% for Full MCP, indicating greater reliability for end users.

  4. Tool Accuracy Trade-off: While Full MCP shows slightly better tool selection accuracy (+5.6%), this advantage is offset by slower response times and significantly higher token consumption. The accuracy difference is primarily seen in complex multi-tool scenarios.

  5. Query-Specific Performance:

    • For simple queries (file listing, reading): RAG-MCP achieves comparable accuracy with 45-57% token reduction
    • For complex queries (search, multi-file operations): RAG-MCP achieves 70-84% token reduction with moderate accuracy trade-offs
    • For directory operations: RAG-MCP achieves 75-80% token reduction with competitive accuracy

These results align with the findings in the RAG-MCP paper, which reported:

  • Token reduction of over 50% in some cases
  • Tool selection accuracy increasing from 13.62% (baseline) to 43.13% (with RAG-MCP)
  • More efficient prompt utilization (1084 tokens vs 2133 tokens in baseline)

Technology Stack

  • AWS Bedrock: LLM and knowledge base services
  • Python: Primary development language
  • Conda: Environment management
  • Chart.js: Report visualization
  • Async I/O: Asynchronous operations

Security Best Practices

  1. Never commit .env file to version control (it’s in .gitignore)
  2. Use environment variables for all sensitive configuration
  3. Use IAM roles when running on AWS infrastructure
  4. Rotate credentials regularly
  5. Use least privilege principle for AWS permissions

Special Commands (CLI)

The CLI interface supports the following special commands:

  • q or exit: Exit chat
  • clear: Clear conversation history
  • history: View conversation history
  • tools: View all available tools
  • sync: Sync tools to knowledge base
  • help: Show help information

Extensibility

This architecture design has good extensibility:

  • Can easily add new client types
  • Supports plugin-style tool integration
  • Configuration system supports dynamic extension
  • Exception handling mechanism can be customized

Troubleshooting

Common Issues

  1. MCP Connection Failed

    Solution: Check MCP server configuration and network connection
    
  2. Knowledge Base Query Failed

    Solution: Verify AWS credentials and knowledge base configuration
    
  3. Inaccurate Token Statistics

    Solution: Ensure Bedrock client is properly configured
    
  4. File Already Exists Error

    Solution: System auto-generates unique filenames. If issues persist, run cleanup script
    python test/cleanup_test_files.py --execute
    

Future Work

As noted in the paper, several directions for future work include:

  1. Hierarchical Retrieval: Implementing nested retrieval mechanisms for extremely large tool sets
  2. Adaptive Selection: Dynamically adjusting the number of tools retrieved based on query complexity
  3. Multi-Tool Workflows: Enhancing the system to support sequences of tool operations for complex tasks
  4. Accuracy Improvement: Refining the knowledge retrieval system to improve tool selection accuracy, particularly for complex multi-tool operations
  5. Context Window Optimization: Further research into optimizing context window usage to maximize both efficiency and accuracy

Conclusion

RAG-MCP demonstrates exceptional performance in reducing token usage (74.8%) and improving response times (62.1% faster) with minimal impact on accuracy. The approach scales effectively with increasing tool counts, making it an ideal solution for production environments where efficiency and scalability are critical. While there is a slight trade-off in tool selection accuracy for complex operations, the overall benefits in terms of cost savings, speed, and reliability make RAG-MCP the recommended approach for production LLM systems with extensive tool integration needs.

License and Contribution

RAG-MCP is an internal research project, and contributions must follow the contribution guidelines.


© 2024 RAG-MCP Team

Tools

No tools

Comments