h-codex

A semantic code search tool for intelligent, cross-repo context retrieval.

GitHub Stars

25

User Rating

Not Rated

Favorites

0

Views

3

Forks

1

Issues

0

README
h-codex

A semantic code search tool for intelligent, cross-repo context retrieval.

✨ Features
  • AST-Based Chunking: Intelligent code parsing using Abstract Syntax Trees for optimal chunk boundaries
  • Embedding & Semantic Search: Using OpenAI's text-embedding-3-small model (support for voyage-code-3 planned)
  • Vector Database: PostgreSQL with pgvector extension for efficient similarity search
  • Multi-Language Support: TypeScript, JavaScript, and extensible for other languages
  • Multi-Project Support: Index and search multiple projects
  • MCP Integration: Seamlessly connects with AI coding assistants through Model Context Protocol
🚀 Demo

demo

💻 Getting Started

h-codex can be integrated with AI assistants through the Model Context Protocol.

Example with Claude Desktop

Edit your claude_mcp_settings.json file:

{
  "mcpServers": {
    "h-codex": {
      "command": "npx",
      "args": ["@hpbyte/h-codex-mcp"],
      "env": {
        "LLM_API_KEY": "your_llm_api_key_here", 
        "LLM_BASE_URL": "your_llm_base_url_here (default is openai baseurl: https://api.openai.com/v1)",
        "DB_CONNECTION_STRING": "postgresql://postgres:password@localhost:5432/h-codex"
      }
    }
  }
}
🛠️ Development
Prerequisites
  • Node.js (v18+)
  • pnpm - Package manager
  • Docker - For running PostgreSQL with pgvector
  • OpenAI API key for embeddings
Getting Started
  1. Clone the repository

    git clone https://github.com/hpbyte/h-codex.git
    cd h-codex
    
  2. Set up environment variables

    cp packages/core/.env.example packages/core/.env
    

    Edit the .env file with your OpenAI API key and other configuration options.

  3. Install dependencies

    pnpm install
    
  4. Start PostgreSQL database

    cd dev && docker compose up -d
    
  5. Set up the database

    pnpm run db:migrate
    
  6. Start development server

    pnpm dev
    
🔧 Configuration Options
Environment Variable Description Default
LLM_API_KEY LLM API key for embeddings Required
LLM_BASE_URL LLM Base url key for embeddings https://api.openai.com/v1
EMBEDDING_MODEL OpenAI model for embeddings text-embedding-3-small
CHUNK_SIZE Maximum chunk size in characters 1000
SEARCH_RESULTS_LIMIT Max search results returned 10
SIMILARITY_THRESHOLD Minimum similarity for results 0.5
DB_CONNECTION_STRING PostgreSQL connection string postgresql://postgres:password@localhost:5432/h-codex
🏗️ Architecture
graph TD
    subgraph "Core Package"
        subgraph "Ingestion Pipeline"
            Explorer["Explorer<br/>(file discovery)"]
            Chunker["Chunker<br/>(AST parsing & chunking)"]
            Embedder["Embedder<br/>(semantic embeddings)"]
            Indexer["Indexer<br/>(orchestration)"]

            Explorer --> Chunker
            Chunker --> Embedder
            Embedder --> Indexer
        end

        subgraph "Storage Layer"
            Repository["Repository"]
        end

        Indexer --> Repository
        Repository --> Database[(PostgreSQL Vector Database)]
    end

    subgraph "MCP Package"
        MCPServer["MCP Server"]
        CodeIndexTool["Code Index Tool"]
        CodeSearchTool["Code Search Tool"]

        MCPServer --> CodeIndexTool
        MCPServer --> CodeSearchTool
    end

    CodeIndexTool --> Indexer
    CodeSearchTool --> Repository
🗺️ Roadmap
  • Support for additional embedding providers (Voyage AI)
  • Enhanced language support with more tree-sitter parsers
📄 License

This project is licensed under the MIT License