MCP ExplorerExplorer

Graphrag Hybrid

@rileylemmon 9 months ago
13 MIT
FreeCommunity
AI Systems
Hybrid Neo4j/Qdrant retrieval system for structured Markdown documentation with YAML frontmatter. Combines graph relationships and vector search for enhanced document retrieval, with built-in MCP integration for AI assistant platforms.

Overview

What is Graphrag Hybrid

GraphRAG is a hybrid retrieval system that combines Neo4j graph database and Qdrant vector database for enhanced document retrieval. It utilizes both graph relationships and vector similarity to improve search capabilities.

Use cases

Use cases for GraphRAG include enhancing search functionalities in documentation platforms, enabling AI agents to retrieve relevant information efficiently, and providing a structured approach to managing and navigating complex document relationships.

How to use

To use GraphRAG, configure the Neo4j and Qdrant databases with the provided connection parameters. Parse and chunk your Markdown documents with YAML frontmatter, then utilize the hybrid query engine for document retrieval.

Key features

Key features include document processing, semantic search using vector similarity, graph-based navigation, hybrid search capabilities, and external integration tools for seamless connectivity with other systems.

Where to use

GraphRAG can be used in various fields such as knowledge management, AI assistant platforms, document management systems, and any application requiring advanced document retrieval capabilities.

Content

GraphRAG: Hybrid Neo4j and Qdrant Retrieval System

A powerful retrieval augmentation generation (RAG) system that combines Neo4j graph database and Qdrant vector database for advanced document retrieval. This system provides a hybrid approach that leverages both document relationships and vector similarity for enhanced search capabilities.

AI Agents: If you’re an AI agent exploring this repository, start with AI_ENTRY.md for a comprehensive overview.

System Overview

GraphRAG uses two complementary databases:

  1. Neo4j Graph Database: Stores document relationships, categories, and metadata
  2. Qdrant Vector Database: Stores document chunk embeddings for semantic search

Verified Database Connection Information

Database Service Port Authentication
Neo4j HTTP 7474 neo4j/password
Neo4j Bolt 7687 neo4j/password
Qdrant HTTP 6333 None (default)

Connection Parameters

For use in applications:

# Neo4j Configuration
NEO4J_HTTP_URI=http://localhost:7474
NEO4J_BOLT_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=password

# Qdrant Configuration
QDRANT_HOST=localhost
QDRANT_PORT=6333
QDRANT_COLLECTION=document_chunks

Features

  • Document Processing: Parse and chunk Markdown documents with YAML frontmatter
  • Semantic Search: Vector-based similarity search using transformer models
  • Graph-based Navigation: Explore document relationships using Neo4j graph database
  • Hybrid Search: Combine semantic and graph-based approaches for better results
  • External Integration: Ready-to-use tools for integration with external systems

Project Structure

graphrag/
├── src/                          # Source code
│   ├── config.py                 # Configuration management
│   ├── query_engine.py           # Hybrid query engine
│   ├── database/                 # Database managers
│   │   ├── neo4j_manager.py      # Neo4j database manager
│   │   └── qdrant_manager.py     # Qdrant vector database manager
│   └── processors/               # Data processors
│       ├── document_processor.py # Document parsing and chunking
│       └── embedding_processor.py # Text embedding generation
├── scripts/                      # Utility scripts
│   ├── import_docs.py            # Document import script
│   └── query_demo.py             # Query demonstration script
├── your_docs_here/               # Add your markdown documents here
├── data/                         # Data storage directory
├── guides/                       # User guides and documentation
├── test_db_connection/           # Database connection testing
├── docker-compose.yml            # Docker-compose for Neo4j and Qdrant
├── requirements.txt              # Python dependencies
└── .env.example                  # Example environment variables

Setup

Prerequisites

  • Python 3.9+
  • Docker and Docker Compose
  • Neo4j 5.x
  • Qdrant 1.5.0+

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/graphrag.git
cd graphrag
  1. Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Create configuration file:
cp .env.example .env
# Edit .env with your configuration
  1. Start Neo4j and Qdrant using Docker:
docker-compose up -d

Importing Documents

To import documents into the system:

python scripts/import_docs.py --docs-dir ./your_docs_here --recursive

This will:

  • Process all Markdown files in the directory
  • Extract metadata from YAML frontmatter
  • Chunk the documents into manageable pieces
  • Store document metadata and relationships in Neo4j
  • Generate embeddings and store them in Qdrant

Usage

Running Queries

Use the query demo script to explore the system:

# Hybrid search
python scripts/query_demo.py --query "What is GraphRAG?" --type hybrid --limit 5

# Category search
python scripts/query_demo.py --query "documentation" --type category --category "user-guide"

# Get document by ID
python scripts/query_demo.py --document "doc_123456"

# List all categories
python scripts/query_demo.py --list-categories

# Show system statistics
python scripts/query_demo.py --stats

External Integration

To integrate with external systems, use the provided Python modules in the src directory. See the guides in the guides/mcp directory for detailed integration instructions.

Document Format Requirements

The system processes Markdown files with YAML frontmatter. For optimal results, follow this format:

Required Front Matter Format

---
title: Analytics and Monitoring              # Document title (required)
category: frontend/ux                        # Category path (required)
updated: '2023-04-01'                        # Last updated date (optional)
related:                                     # Related documents (optional)
- ui/DATA_FETCHING.md
- ui/STATE_MANAGEMENT.md
- ux/USER_FLOWS.md
key_concepts:                                # Key concepts for indexing (optional)
- analytics_integration
- user_behavior_tracking
- performance_monitoring
---

# Analytics and Monitoring

This document outlines the approach to analytics and monitoring within the application.

## Analytics Strategy

### Core Principles

The analytics implementation adheres to these principles:

- **Purpose-Driven**: Collection tied to specific business or UX questions
- **Privacy-First**: Minimal data collection with clear user consent

## Performance Monitoring

Code examples should use language identifiers:

```javascript
function trackEvent(eventName, properties) {
  analytics.track(eventName, {
    timestamp: new Date().toISOString(),
    ...properties
  });
}

### Document Structure Best Practices

- Start with a single `# Title` (H1) heading after the front matter
- Use proper heading hierarchy (`##`, `###`, etc.)
- Include code blocks with language identifiers
- Use lists, tables, and other markdown features as needed
- Link to related documents where appropriate
- Include key concepts that might be important for retrieval

The system will process these documents by:
1. Parsing the front matter metadata
2. Extracting hierarchical structure from headings
3. Splitting content into appropriate chunks
4. Creating relationships based on the "related" field
5. Indexing key concepts for enhanced retrieval

## Configuration

Configure the system by setting environment variables or using a `.env` file:

- **Neo4j Configuration**: 
  - `NEO4J_URI=bolt://localhost:7687`
  - `NEO4J_HTTP_URI=http://localhost:7474`
  - `NEO4J_USERNAME=neo4j`
  - `NEO4J_PASSWORD=password`

- **Qdrant Configuration**: 
  - `QDRANT_HOST=localhost`
  - `QDRANT_PORT=6333`
  - `QDRANT_COLLECTION=document_chunks`

- **Embedding Configuration**: Model settings for text embeddings
- **Chunking Configuration**: Document chunking parameters

## Verification

After setup, verify database connections:

```bash
python test_db_connection/test_connections.py

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

Tools

No tools

Comments

Recommend MCP Servers

View All MCP Servers