chunkhound

Name: chunkhound
Availability: InStock
Author: Ofri Wolfus

Modern RAG for your codebase - semantic and regex search via MCP

GitHub Website Docs

GitHubスター

ユーザー評価

未評価

お気に入り

閲覧数

フォーク

イシュー

README

Modern RAG for your codebase - semantic and regex search via MCP.

Transform your codebase into a searchable knowledge base for AI assistants using semantic search via cAST algorithm and regex search. Integrates with AI assistants via the Model Context Protocol (MCP).

Features

cAST Algorithm - Research-backed semantic code chunking
Multi-Hop Semantic Search - Discovers interconnected code relationships beyond direct matches
Semantic search - Natural language queries like "find authentication code"
Regex search - Pattern matching without API keys
Local-first - Your code stays on your machine
22 languages with structured parsing
- Programming (via Tree-sitter): Python, JavaScript, TypeScript, JSX, TSX, Java, Kotlin, Groovy, C, C++, C#, Go, Rust, Bash, MATLAB, Makefile
- Configuration (via Tree-sitter): JSON, YAML, TOML, Markdown
- Text-based (custom parsers): Text files, PDF
MCP integration - Works with Claude, VS Code, Cursor, Windsurf, Zed, etc

Documentation

Visit ofriw.github.io/chunkhound for complete guides:

Requirements

Python 3.10+
uv package manager
API key for semantic search (optional - regex search works without any keys)
- OpenAI | VoyageAI | Local with Ollama

Installation

# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install ChunkHound
uv tool install chunkhound

Quick Start

Option 1: With Embeddings (Recommended)

Create .chunkhound.json in project root file

{
  "embedding": {
    "provider": "openai",
    "api_key": "your-api-key-here"
  }
}

Index your codebase

chunkhound index

Option 2: Without embeddings (regex search only)

chunkhound index --no-embeddings

For configuration, IDE setup, and advanced usage, see the documentation.

Real-Time Indexing

Automatic File Watching: MCP servers monitor your codebase and update the index automatically as you edit files. No manual re-indexing required.

Smart Content Diffs: Only changed code chunks get re-processed. Unchanged chunks keep their existing embeddings, making updates efficient even for large codebases.

Seamless Branch Switching: When you switch git branches, ChunkHound automatically detects and re-indexes only the files that actually changed between branches.

Live Memory Systems: Index markdown notes or documentation that updates in real-time while you work, creating a dynamic knowledge base.

Why ChunkHound?

Research Foundation: Built on the cAST (Chunking via Abstract Syntax Trees) algorithm from Carnegie Mellon University, providing:

4.3 point gain in Recall@5 on RepoEval retrieval
2.67 point gain in Pass@1 on SWE-bench generation
Structure-aware chunking that preserves code meaning

Local-First Architecture:

Your code never leaves your machine
Works offline with Ollama local models
No per-token charges for large codebases

Universal Language Support:

Structured parsing for 22 languages (Tree-sitter + custom parsers)
Same semantic concepts across all programming languages

Intelligent Code Discovery:

Multi-hop search follows semantic relationships to find related implementations
Automatically discovers complete feature patterns: find "authentication" to get password hashing, token validation, session management
Convergence detection prevents semantic drift while maximizing discovery

License

MIT

作者情報

Ofri Wolfus

🐐 GoatDB | Ex-Googler

@goatplatform

GitHub

フォロワー

リポジトリ

Gist

貢献数

タグ

agent ai duckdb mcp-server rag semantic-search tree-sitter