mcp-vector-database
mcp-vector-database is a Python library designed for efficiently managing and searching vector data. It specializes in handling high-dimensional data, making it ideal for data storage in machine learning and AI projects. It offers an easy-to-use API and provides scalable database functionalities.
GitHub Stars
0
User Rating
Not Rated
Favorites
0
Views
19
Forks
0
Issues
0
PhiloGraph - Philosophical Knowledge Platform (Tier 0 MVP)
PhiloGraph is a specialized knowledge platform combining semantic search and relationship modeling for philosophical texts. This README covers the setup and usage for the Tier 0 Minimum Viable Product (MVP).
Tier 0 Architecture Overview:
- Local Deployment: Runs via Docker Compose.
- Database: PostgreSQL + pgvector (Docker container).
- Backend API: Python FastAPI application (Docker container).
- Embeddings: Google Vertex AI (
text-embedding-large-exp-03-07) accessed via a local LiteLLM Proxy. - Middleware: LiteLLM Proxy (Docker container) acts as the gateway to Vertex AI.
- Text Processing: CPU-based tools (GROBID, PyMuPDF, semchunk) run within the backend container or potentially separate containers (GROBID).
- Text Acquisition: Relies on an external, separately running
zlibrary-mcpserver for acquiring missing texts. - Interfaces: CLI and PhiloGraph MCP Server.
Prerequisites
- Docker & Docker Compose: Install Docker Desktop or Docker Engine with Docker Compose.
- Python: (Optional, for direct script execution or development) Python 3.11+.
- Google Cloud Platform (GCP) Account:
- Enable the Vertex AI API.
- Create a Service Account with permissions to use Vertex AI (e.g., "Vertex AI User" role).
- Download the Service Account key file (JSON).
- Enable Billing: Even for free tier usage, GCP often requires billing to be enabled for API quotas.
zlibrary-mcpServer:- Clone the
zlibrary-mcprepository separately. - Follow its README instructions for setup (install Node.js, dependencies, Python venv, configure Z-Library credentials via environment variables).
- Ensure the
zlibrary-mcpserver is running before attempting text acquisition via PhiloGraph.
- Clone the
Setup Instructions
Clone this Repository:
git clone <repository_url> cd mcp-vector-database # Or your project directory nameCreate
.envFile:- Copy the example environment file:
cp .env.example .env - Edit
.envand fill in the required values:DB_PASSWORD: Choose a secure password for the PostgreSQL database.VERTEX_AI_PROJECT_ID: Your GCP project ID.VERTEX_AI_LOCATION: Your GCP region (e.g.,us-central1).GOOGLE_APPLICATION_CREDENTIALS: Absolute path to your downloaded GCP service account key JSON file on your host machine. This file will be mounted into the LiteLLM container.LITELLM_API_KEY: (Optional) If you configure virtual keys inlitellm_config.yaml, set the key here for the backend to use.SOURCE_FILE_DIR: (Optional) Change the default path (./data/source_documents) where PhiloGraph looks for local files to ingest. Ensure this directory exists.- Review other variables and adjust if necessary (ports, model names, etc.).
- Copy the example environment file:
Create Source Directory:
- If you changed
SOURCE_FILE_DIRin.env, create that directory. Otherwise, create the default:mkdir -p ./data/source_documents - Place your initial philosophical texts (PDF, EPUB, MD, TXT) into this directory.
- If you changed
Build and Start Docker Containers:
docker-compose up --build -d- This will build the
philograph-backendimage and start thedb,litellm-proxy, andphilograph-backendcontainers. - The first time
dbstarts, it might take a moment to initialize. Thephilograph-backendandlitellm-proxyservices depend on the database being ready (via healthcheck).
- This will build the
Initialize Database Schema (First Time Only):
- The
db_layer.pyincludes aninitialize_schemafunction. You can run this manually via the backend container or integrate it into an application startup hook (the current FastAPI lifespan attempts this, but manual execution might be safer initially). - To run manually:
(Note: This assumesdocker-compose exec philograph-backend python -m src.philograph.data_access.db_layerif __name__ == "__main__":block is added todb_layer.pyto callinitialize_schema)
Alternatively, if schema initialization is robustly handled in the FastAPI startup lifespan event, this step might not be needed.
- The
Usage
Command Line Interface (CLI)
Access the CLI by executing commands within the running philograph-backend container:
docker-compose exec philograph-backend python -m src.philograph.cli.main [COMMAND] [ARGS]...
Common Commands:
Ingest a file or directory:
# Ingest a single file (path relative to SOURCE_FILE_DIR) docker-compose exec philograph-backend python -m src.philograph.cli.main ingest path/to/your/document.pdf # Ingest all supported files in a directory (recursive) docker-compose exec philograph-backend python -m src.philograph.cli.main ingest path/to/your/directorySearch:
docker-compose exec philograph-backend python -m src.philograph.cli.main search "concept of Being in Heidegger" --limit 5 docker-compose exec philograph-backend python -m src.philograph.cli.main search "critique of judgment" --author Kant --limit 10Show Document Details:
docker-compose exec philograph-backend python -m src.philograph.cli.main show document <document_id>Manage Collections:
# Create a collection docker-compose exec philograph-backend python -m src.philograph.cli.main collection create --name "My Essay Notes" # Add a document to a collection docker-compose exec philograph-backend python -m src.philograph.cli.main collection add --collection-id <coll_id> --item-type document --item-id <doc_id> # List items in a collection docker-compose exec philograph-backend python -m src.philograph.cli.main collection list --collection-id <coll_id>Acquire Text (Requires running
zlibrary-mcp):
Attempts to acquire and ingest a text via the connectedzlibrary-mcpserver. It can search for a specific text using title/author or identify potentially missing texts based on citation frequency within the existing PhiloGraph database.# Search for a specific book by title and author docker-compose exec philograph-backend python -m src.philograph.cli.main acquire --title "Being and Time" --author "Heidegger" # Search for a specific book and auto-confirm if only one match is found docker-compose exec philograph-backend python -m src.philograph.cli.main acquire --title "Critique of Pure Reason" --author "Kant" --yes # Find and potentially acquire missing texts cited at least 5 times docker-compose exec philograph-backend python -m src.philograph.cli.main acquire --find-missing-threshold 5 # Alias: --threshold 5- Options:
--title TEXT: Title of the text (use with--author).--author TEXT: Author of the text (use with--title).--find-missing-threshold INTEGER: Minimum citation count to find missing texts (use instead of--title/--author). Alias:--threshold.--yes/-y: Automatically confirm if only one match is found. Errors if multiple matches exist with--yes.
- Workflow:
- Initiates a search via the backend and
zlibrary-mcp. - If matches are found, they are displayed in a table.
- If multiple matches exist, prompts for selection (enter number) or cancellation (enter 0).
- If one match exists and
--yesis used, confirms automatically. - On confirmation, triggers download and ingestion via the backend and
zlibrary-mcp.
- Initiates a search via the backend and
- Options:
PhiloGraph MCP Server
- The PhiloGraph MCP server tools (
philograph_ingest,philograph_search,philograph_acquire_missing) can be called by compatible MCP clients (like RooCode) if the PhiloGraph backend container is running. - Ensure the MCP client is configured to connect to the
philograph-mcp-server(the name defined in the simulated MCP framework insrc/philograph/mcp/main.py). The actual connection mechanism (stdio, network) depends on the MCP client runner.
Development
- Hot Reloading: The
docker-compose.ymlmounts thesrcdirectory into thephilograph-backendcontainer. If you run the FastAPI app directly withuvicorn src.philograph.api.main:app --reload --host 0.0.0.0 --port 8000inside the container (or modify the CMD), changes to the Python code should trigger automatic reloading. - Testing: Run tests using
pytestinside the container:docker-compose exec philograph-backend pytest tests/ - Database Access: You can connect to the PostgreSQL database directly using tools like
psqlor a GUI client via the exposed port (default 5432) using the credentials from your.envfile.psql -h localhost -p ${DB_PORT:-5432} -U ${DB_USER:-philograph_user} -d ${DB_NAME:-philograph_db}
Stopping Services
docker-compose down
To remove the database volume (WARNING: Deletes all data):
docker-compose down -v