FastAPI-BitNet
FastAPI-BitNet provides a robust REST API for managing and interacting with llama.cpp-based BitNet model instances. It is designed for developers and researchers to programmatically control automated testing, benchmarking, and interactive chat sessions. Key features include session management, batch operations, and model benchmarking capabilities.
GitHub Stars
33
User Rating
Not Rated
Forks
8
Issues
0
Views
2
Favorites
0
FastAPI-BitNet
This project provides a robust REST API built with FastAPI and Docker to manage and interact with llama.cpp
-based BitNet model instances. It allows developers and researchers to programmatically control llama-cli
and llama-server
processes for automated testing, benchmarking, and interactive chat sessions.
Key Features
- Session Management: Start, stop, and check the status of multiple persistent
llama-cli
andllama-server
session based chats. - Batch Operations: Initialize, shut down, and chat with multiple instances in a single API call.
- Interactive Chat: Send prompts to running bitnet sessions and receive cleaned model responses.
- Model Benchmarking: Programmatically run benchmarks and calculate perplexity on GGUF models.
- Resource Estimation: Estimate maximum server capacity based on available system RAM and CPU threads.
- VS Code Integration: Connects directly to GitHub Copilot Chat as a tool via the Model Context Protocol.
- Automatic API Docs: Interactive API documentation powered by Swagger UI and ReDoc.
Technology Stack
- FastAPI for the core web framework.
- Uvicorn as the ASGI server.
- Docker for containerization and easy deployment.
- Pydantic for data validation and settings management.
- fastapi-mcp for VS Code Copilot tool integration.
Getting Started
Prerequisites
- Docker Desktop
- Conda (or another Python environment manager)
- Python 3.10+
1. Set Up the Python Environment
Create and activate a Conda environment:
conda create -n bitnet python=3.11
conda activate bitnet
Install the Huggingface-CLI tool to download the models:
pip install -U "huggingface_hub[cli]"
Download Microsoft's official BitNet model:
huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf --local-dir app/models/BitNet-b1.58-2B-4T
Running the Application
Using Docker (Recommended)
This is the easiest and recommended way to run the application.
Build the Docker image:
docker build -t fastapi_bitnet .
Run the Docker container: This command runs the container in detached mode (
-d
) and maps port 8080 on your host to port 8080 in the container.docker run -d --name ai_container -p 8080:8080 fastapi_bitnet
Local Development
For development, you can run the application directly with Uvicorn, which enables auto-reloading.
uvicorn app.main:app --host 0.0.0.0 --port 8080 --reload
API Usage
Once the server is running, you can access the interactive API documentation:
- Swagger UI: http://127.0.0.1:8080/docs
- ReDoc: http://127.0.0.1:8080/redoc
VS Code Integration
As a Copilot Tool (MCP)
You can connect this API directly to VS Code's Copilot Chat to create and interact with models.
- Run the application using Docker or locally.
- In VS Code, open the Copilot Chat panel.
- Click the wrench icon ("Configure Tools...").
- Scroll to the bottom and select
+ Add MCP Server
, then chooseHTTP
. - Enter the URL:
http://127.0.0.1:8080/mcp
Copilot will now be able to use the API to launch and chat with BitNet instances.
See Also - VSCode Extension!
For a more integrated experience, check out the companion VS Code extension:
- GitHub: https://github.com/grctest/BitNet-VSCode-Extension
- Marketplace: https://marketplace.visualstudio.com/items?itemName=nftea-gallery.bitnet-vscode-extension
License
This project is licensed under the MIT License. See the LICENSE file for details.