chunker-mcp
This cli tool/Model Context Protocol (MCP) server chunks source code files using LangChain's RecursiveCharacterTextSplitter and stores the resulting chunks in a ChromaDB vector database, including line range metadata for each chunk.
GitHub Stars
2
User Rating
Not Rated
Favorites
0
Views
5
Forks
1
Issues
0
Chunker
Table of Contents
- Chunker
- Features
- Installation
- Setting up ChromaDB with Docker
- Usage
- Usage (Installed CLI)
- Using Chunker MCP with Claude for Desktop
- Querying Chunks from the CLI
- Querying
- Output
- License
This tool chunks source code files using LangChain's RecursiveCharacterTextSplitter and stores the resulting chunks in a ChromaDB vector database, including line range metadata for each chunk.
Features
- Supports chunking of code files in any language supported by LangChain.
- Stores chunks with path and line range metadata for advanced querying.
- Asynchronous, batched insertion into ChromaDB for performance.
- Command-line interface using Typer.
- Progress and status logging.
Installation
Using uv
Install uv if you don't have it:
curl -LsSf https://astral.sh/uv/install.sh | sh
Create a virtual environment and install dependencies:
uv venv .venv source .venv/bin/activate uv sync
Using pipx
Install pipx:
python -m pip install --user pipx
Install this project (from a local directory):
pipx install --editable .
Setting up ChromaDB with Docker
You can run a ChromaDB server using Docker. The following command will start ChromaDB listening on port 8000:
docker run -d --name chromadb -p 8000:8000 chromadb/chroma
- This will pull the latest ChromaDB image and run it in detached mode.
- The server will be accessible at
http://localhost:8000
.
If you want to persist data between restarts, you can mount a local directory:
docker run -d --name chromadb -p 8000:8000 -v $(pwd)/chroma-data:/chroma/.chroma chromadb/chroma
- This will store ChromaDB data in the
chroma-data
directory in your current folder.
Stopping and removing the container:
docker stop chromadb
docker rm chromadb
For more information, see the ChromaDB Docker documentation.
Usage
python chunker.py chunk-and-vectorise <project_dir> "<pattern>" --language <language>
<project_dir>
: Root directory of the project to search for files (e.g.,.
orsrc
)<pattern>
: Glob pattern for files to process (e.g.,"*.py"
,"src/**/*.js"
)--language <language>
: Programming language for splitting (default:python
). Must be supported by LangChain'sLanguage
enum.
Example for JavaScript files:
python chunker.py chunk-and-vectorise src "src/**/*.js" --language javascript
Usage (Installed CLI)
If you have installed this project using pipx
or pip install
, the chunker
command will be available on your PATH.
You can use it as follows:
chunker chunk-and-vectorise <project_dir> "<pattern>" --language <language>
<project_dir>
: Root directory of the project to search for files (e.g.,.
orsrc
)<pattern>
: Glob pattern for files to process (e.g.,"*.py"
,"src/**/*.js"
)--language <language>
: Programming language for splitting (default:python
). Must be supported by LangChain'sLanguage
enum.
Examples:
Chunk all Python files in the current directory:
chunker chunk-and-vectorise . "*.py"
Chunk all JavaScript files in a subdirectory:
chunker chunk-and-vectorise src "src/**/*.js" --language javascript
If installed with pipx, you can run the CLI directly:
chunker chunk-and-vectorise "*.py" --language python
Using Chunker MCP with Claude for Desktop
You can integrate the Chunker MCP with Claude for Desktop to enable code chunking and vectorisation directly from Claude's interface. The recommended way is to install the tool globally using pipx
and configure Claude for Desktop to use the MCP protocol.
1. Install Chunker Globally with pipx
First, ensure you have pipx installed:
python -m pip install --user pipx
pipx ensurepath
Then, install your chunker project globally (from your project directory):
pipx install --editable .
This will make the chunker
command available globally.
2. Configure Claude for Desktop to Use the MCP Server
Claude for Desktop supports the {"mcpServers":{}}
protocol for tool integration.
Add the following to your Claude for Desktop configuration (or use the UI to add a new MCP server):
{
"mcpServers": {
"chunker": {
"command": "/<home DIR>/.local/bin/chunker",
"args": [
"chunk-and-vectorise-mcp",
"--project_dir",
"/path/to/your/project",
"--chroma_host",
"localhost",
"--chroma_port",
"8000",
"--chroma_collection_name",
"default"
]
}
}
}
- Replace
/<home DIR>
with your actual home directory (e.g.,/home/username
or/Users/username
). - Replace
/path/to/your/project
with the absolute path to your codebase. - The
command
field should point to the full path of the installedchunker
CLI, which is typically located at~/.local/bin/chunker
when installed with pipx or pip. - The
args
array specifies the subcommand and required arguments.
Note: If
~/.local/bin
is in your system PATH, you can use"chunker"
as the command instead of the full path.
- Replace
/path/to/your/project
with the absolute path to your codebase. - The
chunker
command is provided globally bypipx
. - The
args
array specifies the subcommand and required arguments.
Note: All four arguments (
--project_dir
,--chroma_host
,--chroma_port
,--chroma_collection_name
) are now required for the MCP server to start.
3. Use the Tool in Claude
Once configured, you can invoke the chunker MCP tool from Claude for Desktop.
Use the prompts and commands as described in the "Using the Chunker MCP" section above.
Querying Chunks from the CLI
You can query your ChromaDB collection for relevant code chunks using the query-chunks
command:
chunker query-chunks "your search query" --chroma-host <host> --chroma-port <port> --collection-name <name> --n-results <N>
"your search query"
: The text or code you want to search for.--chroma-host
: ChromaDB host (default: 'localhost').--chroma-port
: ChromaDB port (default: 8000).--collection-name
: ChromaDB collection name (default: 'default').--n-results
: Number of results to return (default: 10).
Example:
chunker query-chunks "def my_function" --n-results 5
Querying
After vectorising your files, you can query your ChromaDB collection for relevant code chunks using the chunker
CLI.
To search for code chunks matching an expression and include chunk metadata (such as file path and line range), use:
chunker query-chunks "some expression"
"some expression"
: The text or code you want to search for.
Example:
chunker query-chunks "def my_function"
This will return all code chunks containing def my_function
, along with their file path and line range.
Output
Chunks are stored in your configured ChromaDB collection, with metadata including:
path
: Full path to the source filestart
: Start line number (0-based)end
: End line number (0-based)
License
See LICENSE.