h-codex

Name: h-codex
Availability: InStock
Author: Htoo Pyae Lwin

A semantic code search tool for intelligent, cross-repo context retrieval.

GitHub

GitHub Stars

User Rating

Not Rated

Favorites

Views

Forks

Issues

README

h-codex

A semantic code search tool for intelligent, cross-repo context retrieval.

✨ Features

AST-Based Chunking: Intelligent code parsing using Abstract Syntax Trees for optimal chunk boundaries
Embedding & Semantic Search: Using OpenAI's text-embedding-3-small model (support for voyage-code-3 planned)
Vector Database: PostgreSQL with pgvector extension for efficient similarity search
Multi-Language Support: TypeScript, JavaScript, and extensible for other languages
Multi-Project Support: Index and search multiple projects
MCP Integration: Seamlessly connects with AI coding assistants through Model Context Protocol

🚀 Demo

💻 Getting Started

h-codex can be integrated with AI assistants through the Model Context Protocol.

Example with Claude Desktop

Edit your claude_mcp_settings.json file:

{
  "mcpServers": {
    "h-codex": {
      "command": "npx",
      "args": ["@hpbyte/h-codex-mcp"],
      "env": {
        "LLM_API_KEY": "your_llm_api_key_here", 
        "LLM_BASE_URL": "your_llm_base_url_here (default is openai baseurl: https://api.openai.com/v1)",
        "DB_CONNECTION_STRING": "postgresql://postgres:password@localhost:5432/h-codex"
      }
    }
  }
}

🛠️ Development

Prerequisites

Node.js (v18+)
pnpm - Package manager
Docker - For running PostgreSQL with pgvector
OpenAI API key for embeddings

Getting Started

Clone the repository

git clone https://github.com/hpbyte/h-codex.git
cd h-codex

Set up environment variables
```
cp packages/core/.env.example packages/core/.env
```
Edit the .env file with your OpenAI API key and other configuration options.
Install dependencies
```
pnpm install
```
Start PostgreSQL database
```
cd dev && docker compose up -d
```
Set up the database
```
pnpm run db:migrate
```
Start development server
```
pnpm dev
```

🔧 Configuration Options

Environment Variable	Description	Default
`LLM_API_KEY`	LLM API key for embeddings	Required
`LLM_BASE_URL`	LLM Base url key for embeddings	`https://api.openai.com/v1`
`EMBEDDING_MODEL`	OpenAI model for embeddings	`text-embedding-3-small`
`CHUNK_SIZE`	Maximum chunk size in characters	`1000`
`SEARCH_RESULTS_LIMIT`	Max search results returned	`10`
`SIMILARITY_THRESHOLD`	Minimum similarity for results	`0.5`
`DB_CONNECTION_STRING`	PostgreSQL connection string	`postgresql://postgres:password@localhost:5432/h-codex`

🏗️ Architecture

graph TD
    subgraph "Core Package"
        subgraph "Ingestion Pipeline"
            Explorer["Explorer<br/>(file discovery)"]
            Chunker["Chunker<br/>(AST parsing & chunking)"]
            Embedder["Embedder<br/>(semantic embeddings)"]
            Indexer["Indexer<br/>(orchestration)"]

            Explorer --> Chunker
            Chunker --> Embedder
            Embedder --> Indexer
        end

        subgraph "Storage Layer"
            Repository["Repository"]
        end

        Indexer --> Repository
        Repository --> Database[(PostgreSQL Vector Database)]
    end

    subgraph "MCP Package"
        MCPServer["MCP Server"]
        CodeIndexTool["Code Index Tool"]
        CodeSearchTool["Code Search Tool"]

        MCPServer --> CodeIndexTool
        MCPServer --> CodeSearchTool
    end

    CodeIndexTool --> Indexer
    CodeSearchTool --> Repository

🗺️ Roadmap

Support for additional embedding providers (Voyage AI)
Enhanced language support with more tree-sitter parsers

📄 License

This project is licensed under the MIT License

Author Information

Htoo Pyae Lwin

Bangkok, Thailand

GitHub

Followers

Repositories

Gists

Total Contributions