GitHubスター
88
ユーザー評価
未評価
お気に入り
0
閲覧数
12
フォーク
7
イシュー
0
Modern RAG for your codebase - semantic and regex search via MCP.
Transform your codebase into a searchable knowledge base for AI assistants using semantic search via cAST algorithm and regex search. Integrates with AI assistants via the Model Context Protocol (MCP).
Features
- cAST Algorithm - Research-backed semantic code chunking
- Multi-Hop Semantic Search - Discovers interconnected code relationships beyond direct matches
- Semantic search - Natural language queries like "find authentication code"
- Regex search - Pattern matching without API keys
- Local-first - Your code stays on your machine
- 22 languages with structured parsing
- Programming (via Tree-sitter): Python, JavaScript, TypeScript, JSX, TSX, Java, Kotlin, Groovy, C, C++, C#, Go, Rust, Bash, MATLAB, Makefile
- Configuration (via Tree-sitter): JSON, YAML, TOML, Markdown
- Text-based (custom parsers): Text files, PDF
- MCP integration - Works with Claude, VS Code, Cursor, Windsurf, Zed, etc
Documentation
Visit ofriw.github.io/chunkhound for complete guides:
Requirements
- Python 3.10+
- uv package manager
- API key for semantic search (optional - regex search works without any keys)
Installation
# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install ChunkHound
uv tool install chunkhound
Quick Start
Option 1: With Embeddings (Recommended)
- Create
.chunkhound.json
in project root file
{
"embedding": {
"provider": "openai",
"api_key": "your-api-key-here"
}
}
- Index your codebase
chunkhound index
Option 2: Without embeddings (regex search only)
chunkhound index --no-embeddings
For configuration, IDE setup, and advanced usage, see the documentation.
Real-Time Indexing
Automatic File Watching: MCP servers monitor your codebase and update the index automatically as you edit files. No manual re-indexing required.
Smart Content Diffs: Only changed code chunks get re-processed. Unchanged chunks keep their existing embeddings, making updates efficient even for large codebases.
Seamless Branch Switching: When you switch git branches, ChunkHound automatically detects and re-indexes only the files that actually changed between branches.
Live Memory Systems: Index markdown notes or documentation that updates in real-time while you work, creating a dynamic knowledge base.
Why ChunkHound?
Research Foundation: Built on the cAST (Chunking via Abstract Syntax Trees) algorithm from Carnegie Mellon University, providing:
- 4.3 point gain in Recall@5 on RepoEval retrieval
- 2.67 point gain in Pass@1 on SWE-bench generation
- Structure-aware chunking that preserves code meaning
Local-First Architecture:
- Your code never leaves your machine
- Works offline with Ollama local models
- No per-token charges for large codebases
Universal Language Support:
- Structured parsing for 22 languages (Tree-sitter + custom parsers)
- Same semantic concepts across all programming languages
Intelligent Code Discovery:
- Multi-hop search follows semantic relationships to find related implementations
- Automatically discovers complete feature patterns: find "authentication" to get password hashing, token validation, session management
- Convergence detection prevents semantic drift while maximizing discovery
License
MIT