FastAPI-BitNet
FastAPI-BitNet provides a robust REST API for managing and interacting with `llama.cpp`-based BitNet model instances. Designed for developers and researchers, it allows programmatic control of automated testing, benchmarking, and interactive chat sessions. Key features include session management, batch operations, and model benchmarking.
GitHub Stars
34
User Rating
Not Rated
Favorites
0
Views
27
Forks
8
Issues
0
FastAPI-BitNet
This project provides a robust REST API built with FastAPI and Docker to manage and interact with llama.cpp-based BitNet model instances. It allows developers and researchers to programmatically control llama-cli and llama-server processes for automated testing, benchmarking, and interactive chat sessions.
Key Features
- Session Management: Start, stop, and check the status of multiple persistent
llama-cliandllama-serversession based chats. - Batch Operations: Initialize, shut down, and chat with multiple instances in a single API call.
- Interactive Chat: Send prompts to running bitnet sessions and receive cleaned model responses.
- Model Benchmarking: Programmatically run benchmarks and calculate perplexity on GGUF models.
- Resource Estimation: Estimate maximum server capacity based on available system RAM and CPU threads.
- VS Code Integration: Connects directly to GitHub Copilot Chat as a tool via the Model Context Protocol.
- Automatic API Docs: Interactive API documentation powered by Swagger UI and ReDoc.
Technology Stack
- FastAPI for the core web framework.
- Uvicorn as the ASGI server.
- Docker for containerization and easy deployment.
- Pydantic for data validation and settings management.
- fastapi-mcp for VS Code Copilot tool integration.
Getting Started
Prerequisites
- Docker Desktop
- Conda (or another Python environment manager)
- Python 3.10+
1. Set Up the Python Environment
Create and activate a Conda environment:
conda create -n bitnet python=3.11
conda activate bitnet
Install the Huggingface-CLI tool to download the models:
pip install -U "huggingface_hub[cli]"
Download Microsoft's official BitNet model:
huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf --local-dir app/models/BitNet-b1.58-2B-4T
Running the Application
Using Docker (Recommended)
This is the easiest and recommended way to run the application.
Build the Docker image:
docker build -t fastapi_bitnet .Run the Docker container:
This command runs the container in detached mode (-d) and maps port 8080 on your host to port 8080 in the container.docker run -d --name ai_container -p 8080:8080 fastapi_bitnet
Local Development
For development, you can run the application directly with Uvicorn, which enables auto-reloading.
uvicorn app.main:app --host 0.0.0.0 --port 8080 --reload
API Usage
Once the server is running, you can access the interactive API documentation:
- Swagger UI: http://127.0.0.1:8080/docs
- ReDoc: http://127.0.0.1:8080/redoc
VS Code Integration
As a Copilot Tool (MCP)
You can connect this API directly to VS Code's Copilot Chat to create and interact with models.
- Run the application using Docker or locally.
- In VS Code, open the Copilot Chat panel.
- Click the wrench icon ("Configure Tools...").
- Scroll to the bottom and select
+ Add MCP Server, then chooseHTTP. - Enter the URL:
http://127.0.0.1:8080/mcp
Copilot will now be able to use the API to launch and chat with BitNet instances.
See Also - VSCode Extension!
For a more integrated experience, check out the companion VS Code extension:
- GitHub: https://github.com/grctest/BitNet-VSCode-Extension
- Marketplace: https://marketplace.visualstudio.com/items?itemName=nftea-gallery.bitnet-vscode-extension
License
This project is licensed under the MIT License. See the LICENSE file for details.
AGENTIX is an automation tool built with Python, designed to enable users to efficiently execute specific tasks. With an intuitive interface and powerful features, it is accessible even to users with limited programming knowledge. It offers a wide range of functionalities, including task automation, data processing, and API integration.