graphrag-hybrid

Name: graphrag-hybrid
Availability: InStock
Author: rileylemm

graphrag-hybridは、Pythonで開発されたグラフ処理ライブラリで、複雑なデータ構造を効率的に扱うことができます。特に、ハイブリッドアプローチを採用しており、さまざまなアルゴリズムを統合して、ユーザーが必要とする機能を柔軟に提供します。データ可視化や分析に役立つツールとして、開発者やデータサイエンティストにとって有用です。

GitHub

GitHubスター

ユーザー評価

未評価

お気に入り

閲覧数

フォーク

イシュー

README

GraphRAG: Hybrid Neo4j and Qdrant Retrieval System

A powerful retrieval augmentation generation (RAG) system that combines Neo4j graph database and Qdrant vector database for advanced document retrieval. This system provides a hybrid approach that leverages both document relationships and vector similarity for enhanced search capabilities.

AI Agents: If you're an AI agent exploring this repository, start with AI_ENTRY.md for a comprehensive overview.

System Overview

GraphRAG uses two complementary databases:

Neo4j Graph Database: Stores document relationships, categories, and metadata
Qdrant Vector Database: Stores document chunk embeddings for semantic search

Verified Database Connection Information

Database	Service	Port	Authentication
Neo4j	HTTP	7474	neo4j/password
Neo4j	Bolt	7687	neo4j/password
Qdrant	HTTP	6333	None (default)

Connection Parameters

For use in applications:

# Neo4j Configuration
NEO4J_HTTP_URI=http://localhost:7474
NEO4J_BOLT_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=password

# Qdrant Configuration
QDRANT_HOST=localhost
QDRANT_PORT=6333
QDRANT_COLLECTION=document_chunks

Features

Document Processing: Parse and chunk Markdown documents with YAML frontmatter
Semantic Search: Vector-based similarity search using transformer models
Graph-based Navigation: Explore document relationships using Neo4j graph database
Hybrid Search: Combine semantic and graph-based approaches for better results
External Integration: Ready-to-use tools for integration with external systems

Project Structure

graphrag/
├── src/                          # Source code
│   ├── config.py                 # Configuration management
│   ├── query_engine.py           # Hybrid query engine
│   ├── database/                 # Database managers
│   │   ├── neo4j_manager.py      # Neo4j database manager
│   │   └── qdrant_manager.py     # Qdrant vector database manager
│   └── processors/               # Data processors
│       ├── document_processor.py # Document parsing and chunking
│       └── embedding_processor.py # Text embedding generation
├── scripts/                      # Utility scripts
│   ├── import_docs.py            # Document import script
│   └── query_demo.py             # Query demonstration script
├── your_docs_here/               # Add your markdown documents here
├── data/                         # Data storage directory
├── guides/                       # User guides and documentation
├── test_db_connection/           # Database connection testing
├── docker-compose.yml            # Docker-compose for Neo4j and Qdrant
├── requirements.txt              # Python dependencies
└── .env.example                  # Example environment variables

Setup

Prerequisites

Python 3.9+
Docker and Docker Compose
Neo4j 5.x
Qdrant 1.5.0+

Installation

Clone the repository:

git clone https://github.com/yourusername/graphrag.git
cd graphrag

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Create configuration file:

cp .env.example .env
# Edit .env with your configuration

Start Neo4j and Qdrant using Docker:

docker-compose up -d

Importing Documents

To import documents into the system:

python scripts/import_docs.py --docs-dir ./your_docs_here --recursive

This will:

Process all Markdown files in the directory
Extract metadata from YAML frontmatter
Chunk the documents into manageable pieces
Store document metadata and relationships in Neo4j
Generate embeddings and store them in Qdrant

Usage

Running Queries

Use the query demo script to explore the system:

# Hybrid search
python scripts/query_demo.py --query "What is GraphRAG?" --type hybrid --limit 5

# Category search
python scripts/query_demo.py --query "documentation" --type category --category "user-guide"

# Get document by ID
python scripts/query_demo.py --document "doc_123456"

# List all categories
python scripts/query_demo.py --list-categories

# Show system statistics
python scripts/query_demo.py --stats

External Integration

To integrate with external systems, use the provided Python modules in the src directory. See the guides in the guides/mcp directory for detailed integration instructions.

Document Format Requirements

The system processes Markdown files with YAML frontmatter. For optimal results, follow this format:

Required Front Matter Format

---
title: Analytics and Monitoring              # Document title (required)
category: frontend/ux                        # Category path (required)
updated: '2023-04-01'                        # Last updated date (optional)
related:                                     # Related documents (optional)
- ui/DATA_FETCHING.md
- ui/STATE_MANAGEMENT.md
- ux/USER_FLOWS.md
key_concepts:                                # Key concepts for indexing (optional)
- analytics_integration
- user_behavior_tracking
- performance_monitoring
---

# Analytics and Monitoring

This document outlines the approach to analytics and monitoring within the application.

## Analytics Strategy

### Core Principles

The analytics implementation adheres to these principles:

- **Purpose-Driven**: Collection tied to specific business or UX questions
- **Privacy-First**: Minimal data collection with clear user consent

## Performance Monitoring

Code examples should use language identifiers:

```javascript
function trackEvent(eventName, properties) {
  analytics.track(eventName, {
    timestamp: new Date().toISOString(),
    ...properties
  });
}


### Document Structure Best Practices

- Start with a single `# Title` (H1) heading after the front matter
- Use proper heading hierarchy (`##`, `###`, etc.)
- Include code blocks with language identifiers
- Use lists, tables, and other markdown features as needed
- Link to related documents where appropriate
- Include key concepts that might be important for retrieval

The system will process these documents by:
1. Parsing the front matter metadata
2. Extracting hierarchical structure from headings
3. Splitting content into appropriate chunks
4. Creating relationships based on the "related" field
5. Indexing key concepts for enhanced retrieval

## Configuration

Configure the system by setting environment variables or using a `.env` file:

- **Neo4j Configuration**: 
  - `NEO4J_URI=bolt://localhost:7687`
  - `NEO4J_HTTP_URI=http://localhost:7474`
  - `NEO4J_USERNAME=neo4j`
  - `NEO4J_PASSWORD=password`

- **Qdrant Configuration**: 
  - `QDRANT_HOST=localhost`
  - `QDRANT_PORT=6333`
  - `QDRANT_COLLECTION=document_chunks`

- **Embedding Configuration**: Model settings for text embeddings
- **Chunking Configuration**: Document chunking parameters

## Verification

After setup, verify database connections:

```bash
python test_db_connection/test_connections.py

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

Neo4j for graph database
Qdrant for vector similarity search
HuggingFace for transformer models

作者情報

rileylemm

GitHub

フォロワー

リポジトリ

Gist

貢献数

タグ

Python グラフ処理データ可視化データ分析ライブラリハイブリッドアプローチ開発者ツールデータサイエンスアルゴリズムオープンソース