mcp-llama-integration
このリポジトリは、ローカルで実行されるLlamaモデルと統合されたモデルコンテキストプロトコル(MCP)サーバーの実装を提供します。MCPサーバーは、AIアプリケーションに関連情報を提供するための標準化されたインターフェースを提供します。FastAPIを使用して構築されており、クエリをLlamaモデルに転送します。
GitHubスター
3
ユーザー評価
未評価
フォーク
0
イシュー
0
閲覧数
1
お気に入り
0
Model Context Protocol Server with Llama Integration
This repository contains a Model Context Protocol (MCP) server implementation that integrates with a locally running Llama model. The MCP server provides a standardized interface for context retrieval, enhancing AI applications with relevant information from a local LLM.
Overview
The project consists of two main components:
- MCP Server - A FastAPI-based server that implements the Model Context Protocol and forwards queries to a local Llama model
- Python Client - A sample client application that demonstrates how to interact with the MCP server
Prerequisites
- Python 3.7 or higher
- A running Llama model server (e.g., Ollama) at http://localhost:11434/
- Git installed on your machine
- GitHub account
Installation
Clone the Repository
git clone https://github.com/EXPESRaza/mcp-llama-integration.git
cd mcp-llama-integration
Install Dependencies
pip install -r requirements.txt
File Structure
mcp-llama-integration/
├── llama_mcp_server.py # MCP server with Llama integration
├── llama_client_app.py # Sample client application
└── README.md # Project documentation
Setting Up the Llama Model
- If you haven't already, install Ollama
- Pull the Llama model:
ollama pull llama3.2
- Verify the model is running:
curl http://localhost:11434/api/tags ``` browser http://localhost:11434 http://localhost:11434/api/tags
Running the MCP Server
Start the server:
python llama_mcp_server.py
The server will start running on
http://localhost:8000
You can verify the server is running by checking the health endpoint:
curl http://localhost:8000/health
Using the Client Application
In a separate terminal, start the client application:
python llama_client_app.py
The application will prompt you for input
Type your queries and receive responses from the Llama model
Type 'exit' to quit the application
API Documentation
MCP Server Endpoints
POST /context
Request a context for a given query.
Request Body:
{
"query_text": "Your query here",
"user_id": "optional-user-id",
"session_id": "optional-session-id",
"additional_context": {}
}
Response:
{
"context_elements": [
{
"content": "Response from Llama model",
"source": "llama_model",
"relevance_score": 0.9
}
],
"metadata": {
"processing_time_ms": 150,
"model": "llama3",
"query": "Your query here"
}
}
GET /health
Check the health status of the MCP server and its connection to the Llama model.
Response:
{
"status": "healthy",
"llama_status": "connected"
}
Customization
Changing the Llama Model
If you want to use a different Llama model, modify the model
parameter in the query_llama
function in llama_mcp_server.py
:
payload = {
"model": "your-model-name", # Change this to your model name
"prompt": text,
"stream": False
}
Modifying the Prompt Template
To change how queries are formatted before sending to Llama, update the prompt template in the get_context
function:
prompt = f"""Please provide relevant information for the following query:
{request.query_text}
Respond with factual, helpful information."""
Troubleshooting
Common Issues
Connection Refused Error
- Make sure the Llama model is running at http://localhost:11434/
- Verify Ollama is properly installed and running
Model Not Found Error
- Ensure you've pulled the correct model with Ollama
- Check available models with
ollama list
Slow Responses
- Llama model inference can be resource-intensive
- Consider using a smaller model if performance is an issue
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.