code-context-provider

Name: code-context-provider
Availability: InStock
Author: Divar

Code Context Provider using Sourcegraph/Zoekt search backends and providing MCP api

GitHub

GitHub Stars

User Rating

Not Rated

Forks

Issues

Views

Favorites

README

Code Context Provider

A Model Context Protocol (MCP) server that provides AI-enhanced code search and context retrieval capabilities using Sourcegraph or Zoekt search backends.

Overview
Features
Architecture
Prerequisites
Installation
Configuration
Server Modes
- Search Server
- Context Server
Usage with AI Tools
- Cursor
- Claude Desktop
MCP Tools
- Search Server Tools
- Context Server Tools
Development

Overview

Code Context Provider provides two specialized MCP servers for code search and AI-enhanced context retrieval:

Search Server: Direct code search with repository exploration
Context Server: AI-enhanced search with query reformulation and intelligent code snippet extraction

Both servers support multiple search backends (Sourcegraph and Zoekt) and include comprehensive observability through Langfuse.

Features

Multiple Search Backends: Choose between Sourcegraph (cloud/enterprise) or Zoekt (local) search engines
Two Operation Modes: Direct search for speed or AI-enhanced context for intelligence
Advanced Query Language: Support for regex patterns, file filters, language filters, and boolean operators
Repository Discovery: Find repositories by name and explore their structure
Content Fetching: Browse repository files and directories with GitLab integration
AI Enhancement: Query reformulation and intelligent result ranking using LLMs
Observability: Full tracing and monitoring via Langfuse (optional)
Rate Limiting: Built-in rate limiting for API calls and token usage

Architecture

The project consists of two main MCP servers:

Search Server (/codesearch): Provides direct access to search backends
Context Server (/contextprovider): Adds AI enhancement layer on top of search

Important: The Context Server depends on the Search Server. You must:

Start the Search Server first
Set MCP_SERVER_URL to point to the Search Server's streamable-http endpoint

Supported backends:

Sourcegraph: Universal code search platform (cloud or self-hosted)
Zoekt: Fast trigram-based code search engine (typically local)

Prerequisites

Python 3.13+: Required for running the MCP servers
Search Backend: Either a Sourcegraph instance or Zoekt server
UV (optional): Modern Python package manager for easier dependency management
Langfuse (optional): For observability and tracing

Installation

Using UV (recommended)

# Install dependencies
uv sync

# Run search server (start this first)
uv run src/main.py search

# In another terminal, run context server (on different ports)
# Make sure to set MCP_SERVER_URL to the search server's streamable-http endpoint
export MCP_SERVER_URL=http://localhost:8080/codesearch/mcp/
export MCP_SSE_PORT=8001
export MCP_STREAMABLE_HTTP_PORT=8081
uv run src/main.py context

Using pip

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install package
pip install -e .

# Run search server (start this first)
python src/main.py search

# In another terminal, run context server
export MCP_SERVER_URL=http://localhost:8080/codesearch/mcp/
export MCP_SSE_PORT=8001
export MCP_STREAMABLE_HTTP_PORT=8081
python src/main.py context

Using Docker

# Build the image
docker build -t code-context-provider .

To run both servers together, create a docker-compose.yml file:

version: '3'
services:
  search:
    image: code-context-provider
    command: search
    ports:
      - "8000:8000"
      - "8080:8080"
    environment:
      - SEARCH_BACKEND=sourcegraph
      - SRC_ENDPOINT=https://sourcegraph.com
      - LANGFUSE_ENABLED=false

  context:
    image: code-context-provider
    command: context
    ports:
      - "8001:8000"
      - "8081:8080"
    environment:
      - MCP_SERVER_URL=http://search:8080/codesearch/mcp/
      - LANGFUSE_ENABLED=false
    depends_on:
      - search

Then run:

docker-compose up

Configuration

Backend Selection

Set SEARCH_BACKEND to choose your search backend:

sourcegraph: For Sourcegraph instances
zoekt: For Zoekt search servers

Required Environment Variables

For Sourcegraph Backend:

SEARCH_BACKEND=sourcegraph
SRC_ENDPOINT: Sourcegraph instance URL (e.g., https://sourcegraph.com)

For Zoekt Backend:

SEARCH_BACKEND=zoekt
ZOEKT_API_URL: Zoekt server URL (e.g., http://localhost:6070)

For Context Server (additional):

MCP_SERVER_URL: URL of the search server's streamable-http endpoint (e.g., http://localhost:8080/codesearch/mcp/)
- Important: This must point to the running Search Server
- Use http://host.docker.internal:8080/codesearch/mcp/ when running Context Server in Docker and Search Server on host
- Use container names when both servers are in the same Docker network

Optional Environment Variables

SRC_ACCESS_TOKEN: Authentication token for private Sourcegraph instances
MCP_SSE_PORT: SSE server port (default: 8000)
MCP_STREAMABLE_HTTP_PORT: HTTP server port (default: 8080)
- Important: When running both servers on the same machine, use different ports for the Context Server (e.g., 8001 and 8081)
LANGFUSE_ENABLED: Enable/disable Langfuse observability (default: false)

Observability with Langfuse

Langfuse provides comprehensive tracing and monitoring for all AI operations.

To enable Langfuse:

export LANGFUSE_ENABLED=true
export LANGFUSE_PUBLIC_KEY=your-public-key
export LANGFUSE_SECRET_KEY=your-secret-key
export LANGFUSE_HOST=your-langfuse-host

Note: The evaluation framework requires Langfuse to be enabled for tracking LLM calls and performance metrics.

Evaluation Models Configuration

The evaluation framework uses configurable LLM models:

# Configure the code snippet finder model
export CODE_SNIPPET_FINDER_MODEL_NAME=gpt-4o-mini  # default

# Configure the LLM judge model
export LLM_JUDGE_V2_MODEL_NAME=gpt-4o-mini  # default

# Configure the code parser model
export CODE_AGENT_TYPE_PARSER_MODEL_NAME=gpt-4o-mini  # default

# Optional: Use custom LLM endpoints
export LLM_JUDGE_V2_BASE_URL=https://your-llm-endpoint
export LLM_JUDGE_V2_API_KEY=your-api-key

Server Modes

Search Server

Direct access to search backends with three main tools:

# Start search server
uv run src/main.py search

Available at:

SSE: http://localhost:8000/codesearch/sse
HTTP: http://localhost:8080/codesearch/mcp/

Context Server

AI-enhanced search with query understanding and intelligent extraction:

# Start search server first (in one terminal)
uv run src/main.py search

# Then start context server (in another terminal)
export MCP_SERVER_URL=http://localhost:8080/codesearch/mcp/
export MCP_SSE_PORT=8001
export MCP_STREAMABLE_HTTP_PORT=8081
uv run src/main.py context

Note: The Context Server connects to the Search Server via the MCP_SERVER_URL. Ensure:

The Search Server is running before starting the Context Server
MCP_SERVER_URL points to the Search Server's streamable-http endpoint (/codesearch/mcp/)

Available at (when using different ports):

SSE: http://localhost:8001/contextprovider/sse
HTTP: http://localhost:8081/contextprovider/mcp/

Usage with AI Tools

Cursor

Add to your .cursor/mcp.json:

{
  "mcpServers": {
    "codesearch": {
      "url": "http://localhost:8080/codesearch/mcp/"
    },
    "contextprovider": {
      "url": "http://localhost:8081/contextprovider/mcp/"
    }
  }
}

MCP Tools

Search Server Tools

🔍 search

Search across codebases using advanced query syntax.

Example queries:

error handler - Find "error" and "handler" in code
func main lang:go - Find main functions in Go files
class.*Service lang:python - Find Python service classes
repo:github.com/example/project - Search within specific repository

📖 search_prompt_guide

Generate a query guide based on your search objective.

📂 fetch_content

Retrieve file contents or explore directory structures.

Context Server Tools

🤖 agentic_search

AI-powered search that understands natural language queries and returns relevant code snippets with explanations.

🔄 refactor_question

Reformulate queries into multiple optimized search patterns for better coverage.

Development

Linting and Formatting

# Check code style
uv run ruff check src/

# Format code
uv run ruff format src/

Manual Testing

For quick testing and dataset creation:

# First, ensure the search server is running:
uv run src/main.py search

# In another terminal, set the MCP server URL and run the agent:
export MCP_SERVER_URL=http://localhost:8080/codesearch/mcp/

# Run the agent interactively (prompts for question and saves to question.json)
uv run src/main.py agent

This creates a question.json file with the question and AI-generated answer that can be used as evaluation data.

Note: The agent command requires the search server to be running as it uses the Context Server's CodeSnippetFinder which connects to the search server via MCP.

Automated Evaluation

Run comprehensive evaluations against a Langfuse dataset:

# First, ensure the search server is running:
uv run src/main.py search

# In another terminal, configure and run evaluation:
export MCP_SERVER_URL=http://localhost:8080/codesearch/mcp/
export LANGFUSE_DATASET_NAME=your-dataset-name  # default: code-search-mcp-agentic-v2

# Run evaluation
uv run src/main.py evaluate

Note: The evaluation framework also requires the search server to be running.

Evaluation Framework

The evaluation framework provides comprehensive testing capabilities:

Components

CodeSnippetFinder: The AI agent that searches and extracts code snippets
CodeAgentTypeParser: Parses natural language responses into structured code snippets
LLMJudge: AI-powered judge that evaluates search result quality

LLM Judge

The LLM Judge evaluates search results by comparing actual vs expected answers across multiple dimensions:

Issues: Problems with the retrieved code
Strengths: Positive aspects of the result
Suggestions: Potential improvements
Pass/Fail: Binary evaluation result

Dataset Format

Langfuse datasets should contain items with the following structure:

{
  "input": {
    "question": "How do I implement a Redis cache in Python?"
  },
  "expected_output": {
    "snippet": "import redis\n\nclass RedisCache:\n    def __init__(self):\n        self.client = redis.Redis(host='localhost', port=6379)",
    "language": "python",
    "description": "Basic Redis cache implementation in Python"
  }
}

Features

Parallel Evaluation: Processes multiple test cases concurrently (5 workers by default)
Comprehensive Logging: Results saved to logs/evaluation-{timestamp}.json
Langfuse Integration: Full tracing of all LLM calls and evaluations
Scoring: Tracks pass/fail rates and generates aggregate metrics

Requirements

Langfuse must be enabled for the evaluation framework to work
A Langfuse dataset with the correct format must exist
All Langfuse environment variables must be configured

Output

The evaluation generates:

JSON logs in the logs/ directory with detailed results
Langfuse traces for each evaluation run
Console output with aggregate metrics (pass rate, average score)

Environment Variables Reference

Variable	Description	Required	Default
`SEARCH_BACKEND`	Search backend (sourcegraph/zoekt)	Yes	-
`SRC_ENDPOINT`	Sourcegraph URL	Yes (Sourcegraph)	-
`SRC_ACCESS_TOKEN`	Sourcegraph token	No	-
`ZOEKT_API_URL`	Zoekt server URL	Yes (Zoekt)	-
`MCP_SERVER_URL`	Search server URL	Yes (Context)	-
`MCP_SSE_PORT`	SSE server port	No	8000
`MCP_STREAMABLE_HTTP_PORT`	HTTP server port	No	8080
`LANGFUSE_ENABLED`	Enable Langfuse	No	false
`LANGFUSE_PUBLIC_KEY`	Langfuse public key	If enabled	-
`LANGFUSE_SECRET_KEY`	Langfuse secret key	If enabled	-
`LANGFUSE_HOST`	Langfuse host URL	If enabled	-
`LANGFUSE_DATASET_NAME`	Dataset name for evaluation	For evaluation	code-search-mcp-agentic-v2
`CODE_SNIPPET_FINDER_MODEL_NAME`	Model for code snippet extraction	No	gpt-4o-mini
`LLM_JUDGE_V2_MODEL_NAME`	Model for LLM judge	No	gpt-4o-mini
`CODE_AGENT_TYPE_PARSER_MODEL_NAME`	Model for code parsing	No	gpt-4o-mini
`LLM_JUDGE_V2_BASE_URL`	Custom LLM endpoint for judge	No	-
`LLM_JUDGE_V2_API_KEY`	API key for custom LLM judge	No	-

License

MIT License - see LICENSE file for details.

Author Information

Divar

GitHub

146

Followers

Repositories

Gists

Total Contributions

Top Contributors

erfanmirshams

code-context-provider

Code Context Provider

Table of Contents

Overview

Features

Architecture

Prerequisites

Installation

Using UV (recommended)

Using pip

Using Docker

Configuration

Backend Selection

Required Environment Variables

For Sourcegraph Backend:

For Zoekt Backend:

For Context Server (additional):

Optional Environment Variables

Observability with Langfuse

Evaluation Models Configuration

Server Modes

Search Server

Context Server

Usage with AI Tools

Cursor

MCP Tools

Search Server Tools

🔍 search

📖 search_prompt_guide

📂 fetch_content

Context Server Tools

🤖 agentic_search

🔄 refactor_question

Development

Linting and Formatting

Manual Testing

Automated Evaluation

Evaluation Framework

Components

LLM Judge

Dataset Format

Features

Requirements

Output

Environment Variables Reference

License

Top Contributors