Graphrag Hybrid

@rileylemmon 9 months ago

13 MIT

FreeCommunity

AI Systems

Hybrid Neo4j/Qdrant retrieval system for structured Markdown documentation with YAML frontmatter. Combines graph relationships and vector search for enhanced document retrieval, with built-in MCP integration for AI assistant platforms.

What is Graphrag Hybrid

GraphRAG is a hybrid retrieval system that combines Neo4j graph database and Qdrant vector database for enhanced document retrieval. It utilizes both graph relationships and vector similarity to improve search capabilities.

Use cases

Use cases for GraphRAG include enhancing search functionalities in documentation platforms, enabling AI agents to retrieve relevant information efficiently, and providing a structured approach to managing and navigating complex document relationships.

How to use

To use GraphRAG, configure the Neo4j and Qdrant databases with the provided connection parameters. Parse and chunk your Markdown documents with YAML frontmatter, then utilize the hybrid query engine for document retrieval.

Key features

Key features include document processing, semantic search using vector similarity, graph-based navigation, hybrid search capabilities, and external integration tools for seamless connectivity with other systems.

Where to use

GraphRAG can be used in various fields such as knowledge management, AI assistant platforms, document management systems, and any application requiring advanced document retrieval capabilities.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Graphrag Hybrid

Use cases

How to use

Key features

Where to use

GraphRAG can be used in various fields such as knowledge management, AI assistant platforms, document management systems, and any application requiring advanced document retrieval capabilities.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

GraphRAG: Hybrid Neo4j and Qdrant Retrieval System

A powerful retrieval augmentation generation (RAG) system that combines Neo4j graph database and Qdrant vector database for advanced document retrieval. This system provides a hybrid approach that leverages both document relationships and vector similarity for enhanced search capabilities.

AI Agents: If you’re an AI agent exploring this repository, start with AI_ENTRY.md for a comprehensive overview.

System Overview

GraphRAG uses two complementary databases:

Neo4j Graph Database: Stores document relationships, categories, and metadata
Qdrant Vector Database: Stores document chunk embeddings for semantic search

Verified Database Connection Information

Database	Service	Port	Authentication
Neo4j	HTTP	7474	neo4j/password
Neo4j	Bolt	7687	neo4j/password
Qdrant	HTTP	6333	None (default)

Connection Parameters

For use in applications:

# Neo4j Configuration
NEO4J_HTTP_URI=http://localhost:7474
NEO4J_BOLT_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=password

# Qdrant Configuration
QDRANT_HOST=localhost
QDRANT_PORT=6333
QDRANT_COLLECTION=document_chunks

Features

Document Processing: Parse and chunk Markdown documents with YAML frontmatter
Semantic Search: Vector-based similarity search using transformer models
Graph-based Navigation: Explore document relationships using Neo4j graph database
Hybrid Search: Combine semantic and graph-based approaches for better results
External Integration: Ready-to-use tools for integration with external systems

Project Structure

graphrag/
├── src/                          # Source code
│   ├── config.py                 # Configuration management
│   ├── query_engine.py           # Hybrid query engine
│   ├── database/                 # Database managers
│   │   ├── neo4j_manager.py      # Neo4j database manager
│   │   └── qdrant_manager.py     # Qdrant vector database manager
│   └── processors/               # Data processors
│       ├── document_processor.py # Document parsing and chunking
│       └── embedding_processor.py # Text embedding generation
├── scripts/                      # Utility scripts
│   ├── import_docs.py            # Document import script
│   └── query_demo.py             # Query demonstration script
├── your_docs_here/               # Add your markdown documents here
├── data/                         # Data storage directory
├── guides/                       # User guides and documentation
├── test_db_connection/           # Database connection testing
├── docker-compose.yml            # Docker-compose for Neo4j and Qdrant
├── requirements.txt              # Python dependencies
└── .env.example                  # Example environment variables

Setup

Prerequisites

Python 3.9+
Docker and Docker Compose
Neo4j 5.x
Qdrant 1.5.0+

Installation

Clone the repository:

git clone https://github.com/yourusername/graphrag.git
cd graphrag

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Create configuration file:

cp .env.example .env
# Edit .env with your configuration

Start Neo4j and Qdrant using Docker:

docker-compose up -d

Importing Documents

To import documents into the system:

python scripts/import_docs.py --docs-dir ./your_docs_here --recursive

This will:

Process all Markdown files in the directory
Extract metadata from YAML frontmatter
Chunk the documents into manageable pieces
Store document metadata and relationships in Neo4j
Generate embeddings and store them in Qdrant

Usage

Running Queries

Use the query demo script to explore the system:

# Hybrid search
python scripts/query_demo.py --query "What is GraphRAG?" --type hybrid --limit 5

# Category search
python scripts/query_demo.py --query "documentation" --type category --category "user-guide"

# Get document by ID
python scripts/query_demo.py --document "doc_123456"

# List all categories
python scripts/query_demo.py --list-categories

# Show system statistics
python scripts/query_demo.py --stats

External Integration

To integrate with external systems, use the provided Python modules in the src directory. See the guides in the guides/mcp directory for detailed integration instructions.

Document Format Requirements

The system processes Markdown files with YAML frontmatter. For optimal results, follow this format:

Required Front Matter Format

---
title: Analytics and Monitoring              # Document title (required)
category: frontend/ux                        # Category path (required)
updated: '2023-04-01'                        # Last updated date (optional)
related:                                     # Related documents (optional)
- ui/DATA_FETCHING.md
- ui/STATE_MANAGEMENT.md
- ux/USER_FLOWS.md
key_concepts:                                # Key concepts for indexing (optional)
- analytics_integration
- user_behavior_tracking
- performance_monitoring
---

# Analytics and Monitoring

This document outlines the approach to analytics and monitoring within the application.

## Analytics Strategy

### Core Principles

The analytics implementation adheres to these principles:

- **Purpose-Driven**: Collection tied to specific business or UX questions
- **Privacy-First**: Minimal data collection with clear user consent

## Performance Monitoring

Code examples should use language identifiers:

```javascript
function trackEvent(eventName, properties) {
  analytics.track(eventName, {
    timestamp: new Date().toISOString(),
    ...properties
  });
}


### Document Structure Best Practices

- Start with a single `# Title` (H1) heading after the front matter
- Use proper heading hierarchy (`##`, `###`, etc.)
- Include code blocks with language identifiers
- Use lists, tables, and other markdown features as needed
- Link to related documents where appropriate
- Include key concepts that might be important for retrieval

The system will process these documents by:
1. Parsing the front matter metadata
2. Extracting hierarchical structure from headings
3. Splitting content into appropriate chunks
4. Creating relationships based on the "related" field
5. Indexing key concepts for enhanced retrieval

## Configuration

Configure the system by setting environment variables or using a `.env` file:

- **Neo4j Configuration**: 
  - `NEO4J_URI=bolt://localhost:7687`
  - `NEO4J_HTTP_URI=http://localhost:7474`
  - `NEO4J_USERNAME=neo4j`
  - `NEO4J_PASSWORD=password`

- **Qdrant Configuration**: 
  - `QDRANT_HOST=localhost`
  - `QDRANT_PORT=6333`
  - `QDRANT_COLLECTION=document_chunks`

- **Embedding Configuration**: Model settings for text embeddings
- **Chunking Configuration**: Document chunking parameters

## Verification

After setup, verify database connections:

```bash
python test_db_connection/test_connections.py

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

Neo4j for graph database
Qdrant for vector similarity search
HuggingFace for transformer models

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers