FastAPI-BitNet

FastAPI-BitNet provides a robust REST API for managing and interacting with llama.cpp-based BitNet model instances. It is designed for developers and researchers to programmatically control automated testing, benchmarking, and interactive chat sessions. Key features include session management, batch operations, and model benchmarking capabilities.

GitHub Website

GitHub Stars

User Rating

Not Rated

Forks

Issues

Views

Favorites

README

FastAPI-BitNet

This project provides a robust REST API built with FastAPI and Docker to manage and interact with llama.cpp-based BitNet model instances. It allows developers and researchers to programmatically control llama-cli and llama-server processes for automated testing, benchmarking, and interactive chat sessions.

Key Features

Session Management: Start, stop, and check the status of multiple persistent llama-cli and llama-server session based chats.
Batch Operations: Initialize, shut down, and chat with multiple instances in a single API call.
Interactive Chat: Send prompts to running bitnet sessions and receive cleaned model responses.
Model Benchmarking: Programmatically run benchmarks and calculate perplexity on GGUF models.
Resource Estimation: Estimate maximum server capacity based on available system RAM and CPU threads.
VS Code Integration: Connects directly to GitHub Copilot Chat as a tool via the Model Context Protocol.
Automatic API Docs: Interactive API documentation powered by Swagger UI and ReDoc.

Technology Stack

FastAPI for the core web framework.
Uvicorn as the ASGI server.
Docker for containerization and easy deployment.
Pydantic for data validation and settings management.
fastapi-mcp for VS Code Copilot tool integration.

Getting Started

Prerequisites

Docker Desktop
Conda (or another Python environment manager)
Python 3.10+

1. Set Up the Python Environment

Create and activate a Conda environment:

conda create -n bitnet python=3.11
conda activate bitnet

Install the Huggingface-CLI tool to download the models:

pip install -U "huggingface_hub[cli]"

Download Microsoft's official BitNet model:

huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf --local-dir app/models/BitNet-b1.58-2B-4T

Running the Application

Using Docker (Recommended)

This is the easiest and recommended way to run the application.

Build the Docker image:
```
docker build -t fastapi_bitnet .
```
Run the Docker container: This command runs the container in detached mode (-d) and maps port 8080 on your host to port 8080 in the container.
```
docker run -d --name ai_container -p 8080:8080 fastapi_bitnet
```

Local Development

For development, you can run the application directly with Uvicorn, which enables auto-reloading.

uvicorn app.main:app --host 0.0.0.0 --port 8080 --reload

API Usage

Once the server is running, you can access the interactive API documentation:

Swagger UI: http://127.0.0.1:8080/docs
ReDoc: http://127.0.0.1:8080/redoc

VS Code Integration

As a Copilot Tool (MCP)

You can connect this API directly to VS Code's Copilot Chat to create and interact with models.

Run the application using Docker or locally.
In VS Code, open the Copilot Chat panel.
Click the wrench icon ("Configure Tools...").
Scroll to the bottom and select + Add MCP Server, then choose HTTP.
Enter the URL: http://127.0.0.1:8080/mcp

Copilot will now be able to use the API to launch and chat with BitNet instances.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Author Information

Full stack developer

GitHub

Followers

Repositories

Gists

Total Contributions

Top Contributors

grctest

FastAPI-BitNet

FastAPI-BitNet

Key Features

Technology Stack

Getting Started

Prerequisites

1. Set Up the Python Environment

Running the Application

Using Docker (Recommended)

Local Development

API Usage

VS Code Integration

As a Copilot Tool (MCP)

See Also - VSCode Extension!

License

Top Contributors