mcp-vector-database
mcp-vector-databaseは、ベクトルデータを効率的に管理・検索するためのPythonライブラリです。高次元データの処理に特化しており、機械学習やAIプロジェクトでのデータストレージに最適です。使いやすいAPIを提供し、スケーラブルなデータベース機能を実現しています。
GitHubスター
0
ユーザー評価
未評価
お気に入り
0
閲覧数
13
フォーク
0
イシュー
0
PhiloGraph - Philosophical Knowledge Platform (Tier 0 MVP)
PhiloGraph is a specialized knowledge platform combining semantic search and relationship modeling for philosophical texts. This README covers the setup and usage for the Tier 0 Minimum Viable Product (MVP).
Tier 0 Architecture Overview:
- Local Deployment: Runs via Docker Compose.
- Database: PostgreSQL + pgvector (Docker container).
- Backend API: Python FastAPI application (Docker container).
- Embeddings: Google Vertex AI (
text-embedding-large-exp-03-07
) accessed via a local LiteLLM Proxy. - Middleware: LiteLLM Proxy (Docker container) acts as the gateway to Vertex AI.
- Text Processing: CPU-based tools (GROBID, PyMuPDF, semchunk) run within the backend container or potentially separate containers (GROBID).
- Text Acquisition: Relies on an external, separately running
zlibrary-mcp
server for acquiring missing texts. - Interfaces: CLI and PhiloGraph MCP Server.
Prerequisites
- Docker & Docker Compose: Install Docker Desktop or Docker Engine with Docker Compose.
- Python: (Optional, for direct script execution or development) Python 3.11+.
- Google Cloud Platform (GCP) Account:
- Enable the Vertex AI API.
- Create a Service Account with permissions to use Vertex AI (e.g., "Vertex AI User" role).
- Download the Service Account key file (JSON).
- Enable Billing: Even for free tier usage, GCP often requires billing to be enabled for API quotas.
zlibrary-mcp
Server:- Clone the
zlibrary-mcp
repository separately. - Follow its README instructions for setup (install Node.js, dependencies, Python venv, configure Z-Library credentials via environment variables).
- Ensure the
zlibrary-mcp
server is running before attempting text acquisition via PhiloGraph.
- Clone the
Setup Instructions
Clone this Repository:
git clone <repository_url> cd mcp-vector-database # Or your project directory name
Create
.env
File:- Copy the example environment file:
cp .env.example .env
- Edit
.env
and fill in the required values:DB_PASSWORD
: Choose a secure password for the PostgreSQL database.VERTEX_AI_PROJECT_ID
: Your GCP project ID.VERTEX_AI_LOCATION
: Your GCP region (e.g.,us-central1
).GOOGLE_APPLICATION_CREDENTIALS
: Absolute path to your downloaded GCP service account key JSON file on your host machine. This file will be mounted into the LiteLLM container.LITELLM_API_KEY
: (Optional) If you configure virtual keys inlitellm_config.yaml
, set the key here for the backend to use.SOURCE_FILE_DIR
: (Optional) Change the default path (./data/source_documents
) where PhiloGraph looks for local files to ingest. Ensure this directory exists.- Review other variables and adjust if necessary (ports, model names, etc.).
- Copy the example environment file:
Create Source Directory:
- If you changed
SOURCE_FILE_DIR
in.env
, create that directory. Otherwise, create the default:mkdir -p ./data/source_documents
- Place your initial philosophical texts (PDF, EPUB, MD, TXT) into this directory.
- If you changed
Build and Start Docker Containers:
docker-compose up --build -d
- This will build the
philograph-backend
image and start thedb
,litellm-proxy
, andphilograph-backend
containers. - The first time
db
starts, it might take a moment to initialize. Thephilograph-backend
andlitellm-proxy
services depend on the database being ready (via healthcheck).
- This will build the
Initialize Database Schema (First Time Only):
- The
db_layer.py
includes aninitialize_schema
function. You can run this manually via the backend container or integrate it into an application startup hook (the current FastAPI lifespan attempts this, but manual execution might be safer initially). - To run manually:
(Note: This assumesdocker-compose exec philograph-backend python -m src.philograph.data_access.db_layer
if __name__ == "__main__":
block is added todb_layer.py
to callinitialize_schema
)
Alternatively, if schema initialization is robustly handled in the FastAPI startup lifespan event, this step might not be needed.
- The
Usage
Command Line Interface (CLI)
Access the CLI by executing commands within the running philograph-backend
container:
docker-compose exec philograph-backend python -m src.philograph.cli.main [COMMAND] [ARGS]...
Common Commands:
Ingest a file or directory:
# Ingest a single file (path relative to SOURCE_FILE_DIR) docker-compose exec philograph-backend python -m src.philograph.cli.main ingest path/to/your/document.pdf # Ingest all supported files in a directory (recursive) docker-compose exec philograph-backend python -m src.philograph.cli.main ingest path/to/your/directory
Search:
docker-compose exec philograph-backend python -m src.philograph.cli.main search "concept of Being in Heidegger" --limit 5 docker-compose exec philograph-backend python -m src.philograph.cli.main search "critique of judgment" --author Kant --limit 10
Show Document Details:
docker-compose exec philograph-backend python -m src.philograph.cli.main show document <document_id>
Manage Collections:
# Create a collection docker-compose exec philograph-backend python -m src.philograph.cli.main collection create --name "My Essay Notes" # Add a document to a collection docker-compose exec philograph-backend python -m src.philograph.cli.main collection add --collection-id <coll_id> --item-type document --item-id <doc_id> # List items in a collection docker-compose exec philograph-backend python -m src.philograph.cli.main collection list --collection-id <coll_id>
Acquire Text (Requires running
zlibrary-mcp
):
Attempts to acquire and ingest a text via the connectedzlibrary-mcp
server. It can search for a specific text using title/author or identify potentially missing texts based on citation frequency within the existing PhiloGraph database.# Search for a specific book by title and author docker-compose exec philograph-backend python -m src.philograph.cli.main acquire --title "Being and Time" --author "Heidegger" # Search for a specific book and auto-confirm if only one match is found docker-compose exec philograph-backend python -m src.philograph.cli.main acquire --title "Critique of Pure Reason" --author "Kant" --yes # Find and potentially acquire missing texts cited at least 5 times docker-compose exec philograph-backend python -m src.philograph.cli.main acquire --find-missing-threshold 5 # Alias: --threshold 5
- Options:
--title TEXT
: Title of the text (use with--author
).--author TEXT
: Author of the text (use with--title
).--find-missing-threshold INTEGER
: Minimum citation count to find missing texts (use instead of--title
/--author
). Alias:--threshold
.--yes
/-y
: Automatically confirm if only one match is found. Errors if multiple matches exist with--yes
.
- Workflow:
- Initiates a search via the backend and
zlibrary-mcp
. - If matches are found, they are displayed in a table.
- If multiple matches exist, prompts for selection (enter number) or cancellation (enter 0).
- If one match exists and
--yes
is used, confirms automatically. - On confirmation, triggers download and ingestion via the backend and
zlibrary-mcp
.
- Initiates a search via the backend and
- Options:
PhiloGraph MCP Server
- The PhiloGraph MCP server tools (
philograph_ingest
,philograph_search
,philograph_acquire_missing
) can be called by compatible MCP clients (like RooCode) if the PhiloGraph backend container is running. - Ensure the MCP client is configured to connect to the
philograph-mcp-server
(the name defined in the simulated MCP framework insrc/philograph/mcp/main.py
). The actual connection mechanism (stdio, network) depends on the MCP client runner.
Development
- Hot Reloading: The
docker-compose.yml
mounts thesrc
directory into thephilograph-backend
container. If you run the FastAPI app directly withuvicorn src.philograph.api.main:app --reload --host 0.0.0.0 --port 8000
inside the container (or modify the CMD), changes to the Python code should trigger automatic reloading. - Testing: Run tests using
pytest
inside the container:docker-compose exec philograph-backend pytest tests/
- Database Access: You can connect to the PostgreSQL database directly using tools like
psql
or a GUI client via the exposed port (default 5432) using the credentials from your.env
file.psql -h localhost -p ${DB_PORT:-5432} -U ${DB_USER:-philograph_user} -d ${DB_NAME:-philograph_db}
Stopping Services
docker-compose down
To remove the database volume (WARNING: Deletes all data):
docker-compose down -v