FastAPI-BitNet

FastAPI-BitNet provides a robust REST API for managing and interacting with llama.cpp-based BitNet model instances. It is designed for developers and researchers to programmatically control automated testing, benchmarking, and interactive chat sessions. Key features include session management, batch operations, and model benchmarking capabilities.

GitHub Stars

33

User Rating

Not Rated

Forks

8

Issues

0

Views

2

Favorites

0

README
FastAPI-BitNet

This project provides a robust REST API built with FastAPI and Docker to manage and interact with llama.cpp-based BitNet model instances. It allows developers and researchers to programmatically control llama-cli and llama-server processes for automated testing, benchmarking, and interactive chat sessions.

Key Features
  • Session Management: Start, stop, and check the status of multiple persistent llama-cli and llama-server session based chats.
  • Batch Operations: Initialize, shut down, and chat with multiple instances in a single API call.
  • Interactive Chat: Send prompts to running bitnet sessions and receive cleaned model responses.
  • Model Benchmarking: Programmatically run benchmarks and calculate perplexity on GGUF models.
  • Resource Estimation: Estimate maximum server capacity based on available system RAM and CPU threads.
  • VS Code Integration: Connects directly to GitHub Copilot Chat as a tool via the Model Context Protocol.
  • Automatic API Docs: Interactive API documentation powered by Swagger UI and ReDoc.
Technology Stack
  • FastAPI for the core web framework.
  • Uvicorn as the ASGI server.
  • Docker for containerization and easy deployment.
  • Pydantic for data validation and settings management.
  • fastapi-mcp for VS Code Copilot tool integration.

Getting Started
Prerequisites
1. Set Up the Python Environment

Create and activate a Conda environment:

conda create -n bitnet python=3.11
conda activate bitnet

Install the Huggingface-CLI tool to download the models:

pip install -U "huggingface_hub[cli]"

Download Microsoft's official BitNet model:

huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf --local-dir app/models/BitNet-b1.58-2B-4T

Running the Application
Using Docker (Recommended)

This is the easiest and recommended way to run the application.

  1. Build the Docker image:

    docker build -t fastapi_bitnet .
    
  2. Run the Docker container: This command runs the container in detached mode (-d) and maps port 8080 on your host to port 8080 in the container.

    docker run -d --name ai_container -p 8080:8080 fastapi_bitnet
    
Local Development

For development, you can run the application directly with Uvicorn, which enables auto-reloading.

uvicorn app.main:app --host 0.0.0.0 --port 8080 --reload

API Usage

Once the server is running, you can access the interactive API documentation:


VS Code Integration
As a Copilot Tool (MCP)

You can connect this API directly to VS Code's Copilot Chat to create and interact with models.

  1. Run the application using Docker or locally.
  2. In VS Code, open the Copilot Chat panel.
  3. Click the wrench icon ("Configure Tools...").
  4. Scroll to the bottom and select + Add MCP Server, then choose HTTP.
  5. Enter the URL: http://127.0.0.1:8080/mcp

Copilot will now be able to use the API to launch and chat with BitNet instances.

See Also - VSCode Extension!

For a more integrated experience, check out the companion VS Code extension:

License

This project is licensed under the MIT License. See the LICENSE file for details.

Author Information
R

Full stack developer

63

Followers

30

Repositories

2

Gists

25

Total Contributions

Top Contributors

Threads