chunkhound

Modern RAG for your codebase - semantic and regex search via MCP

GitHubスター

88

ユーザー評価

未評価

お気に入り

0

閲覧数

12

フォーク

7

イシュー

0

README

ChunkHound

Modern RAG for your codebase - semantic and regex search via MCP.

Tests License: MIT 100% AI Generated

Transform your codebase into a searchable knowledge base for AI assistants using semantic search via cAST algorithm and regex search. Integrates with AI assistants via the Model Context Protocol (MCP).

Features
  • cAST Algorithm - Research-backed semantic code chunking
  • Multi-Hop Semantic Search - Discovers interconnected code relationships beyond direct matches
  • Semantic search - Natural language queries like "find authentication code"
  • Regex search - Pattern matching without API keys
  • Local-first - Your code stays on your machine
  • 22 languages with structured parsing
    • Programming (via Tree-sitter): Python, JavaScript, TypeScript, JSX, TSX, Java, Kotlin, Groovy, C, C++, C#, Go, Rust, Bash, MATLAB, Makefile
    • Configuration (via Tree-sitter): JSON, YAML, TOML, Markdown
    • Text-based (custom parsers): Text files, PDF
  • MCP integration - Works with Claude, VS Code, Cursor, Windsurf, Zed, etc
Documentation

Visit ofriw.github.io/chunkhound for complete guides:

Requirements
Installation
# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install ChunkHound
uv tool install chunkhound
Quick Start
Option 1: With Embeddings (Recommended)
  1. Create .chunkhound.json in project root file
{
  "embedding": {
    "provider": "openai",
    "api_key": "your-api-key-here"
  }
}
  1. Index your codebase
chunkhound index
Option 2: Without embeddings (regex search only)
chunkhound index --no-embeddings

For configuration, IDE setup, and advanced usage, see the documentation.

Real-Time Indexing

Automatic File Watching: MCP servers monitor your codebase and update the index automatically as you edit files. No manual re-indexing required.

Smart Content Diffs: Only changed code chunks get re-processed. Unchanged chunks keep their existing embeddings, making updates efficient even for large codebases.

Seamless Branch Switching: When you switch git branches, ChunkHound automatically detects and re-indexes only the files that actually changed between branches.

Live Memory Systems: Index markdown notes or documentation that updates in real-time while you work, creating a dynamic knowledge base.

Why ChunkHound?

Research Foundation: Built on the cAST (Chunking via Abstract Syntax Trees) algorithm from Carnegie Mellon University, providing:

  • 4.3 point gain in Recall@5 on RepoEval retrieval
  • 2.67 point gain in Pass@1 on SWE-bench generation
  • Structure-aware chunking that preserves code meaning

Local-First Architecture:

  • Your code never leaves your machine
  • Works offline with Ollama local models
  • No per-token charges for large codebases

Universal Language Support:

  • Structured parsing for 22 languages (Tree-sitter + custom parsers)
  • Same semantic concepts across all programming languages

Intelligent Code Discovery:

  • Multi-hop search follows semantic relationships to find related implementations
  • Automatically discovers complete feature patterns: find "authentication" to get password hashing, token validation, session management
  • Convergence detection prevents semantic drift while maximizing discovery
License

MIT