uniprot-mcp-server

このMCPサーバーは、UniProtおよびEBIのプロテインAPIへの統一されたアクセスを提供します。データのステージング機能を備え、SQLiteを使用して複雑なクエリを効率的に処理します。Cloudflare Workers上に構築されており、スケーラブルなアーキテクチャを持っています。

GitHubスター

0

ユーザー評価

未評価

お気に入り

0

閲覧数

14

フォーク

1

イシュー

0

README
UniProt & Proteins API MCP Server

A comprehensive Model Context Protocol server for UniProt and EBI Proteins APIs, built on Cloudflare Workers with advanced data staging capabilities using Durable Objects and SQLite.

Overview

This MCP server provides unified access to:

  • UniProtKB: Search and retrieve protein sequence and functional information
  • EBI Proteins API: Detailed protein features, variations, and structural data
Key Features

🚀 Unified Interface: Single tool for searching UniProt and fetching detailed protein data
📊 Advanced Data Staging: Large datasets automatically staged in SQLite for complex queries
🔍 Smart Query Generation: Automatic suggestions for exploring staged data
📈 Intelligent Bypassing: Small datasets returned directly for efficiency
🏗️ Scalable Architecture: Built on Cloudflare Workers with Durable Objects
Rate Limit Aware: Intelligent handling of API rate limits

Tools Available
UniProt Database Tools
uniprot_search

Advanced UniProtKB search with comprehensive filtering and pagination:

  • Query: Complex search queries with UniProt syntax
  • Formats: JSON, TSV, FASTA, XML
  • Features: Sorting, facets, compression, isoforms
  • Pagination: Up to 500 results per page with automatic staging for large datasets
{
  "query": "organism_id:9606 AND reviewed:true",
  "format": "json",
  "fields": "accession,protein_name,gene_names,organism_name",
  "size": 100,
  "sort": "score desc",
  "compressed": true
}
uniprot_stream

Bulk download tool for large datasets with automatic staging:

  • Purpose: Stream large datasets efficiently
  • Auto-staging: Always stages responses for SQL querying
  • Compression: Built-in compression support
  • Formats: JSON, TSV, FASTA, XML
{
  "query": "organism_id:9606 AND reviewed:true",
  "format": "fasta",
  "compressed": true
}
uniprot_entry

Retrieve individual UniProtKB entries by accession:

  • Direct Access: Get specific protein entries
  • Multiple Formats: JSON, TSV, FASTA, XML
  • Isoforms: Include protein isoforms
  • Field Selection: Choose specific data fields
{
  "accession": "P04637",
  "format": "json",
  "fields": "accession,protein_name,sequence,organism_name",
  "include_isoforms": true
}
uniprot_id_mapping

Map IDs between different database systems:

  • Batch Processing: Up to 100,000 IDs per job
  • Cross-Database: Map between UniProt, Ensembl, PDB, etc.
  • Job-Based: Asynchronous processing with status tracking
  • Filtering: Taxonomy-based filtering
{
  "from_db": "Gene_Name",
  "to_db": "UniProtKB",
  "ids": ["TP53", "BRCA1", "BRCA2"],
  "taxon_id": "9606"
}
uniprot_blast

Perform BLAST searches against UniProtKB:

  • Programs: BLASTP, BLASTX, TBLASTN
  • Databases: UniProtKB, UniRef90, UniRef50
  • Parameters: E-value, matrix, hit limits
  • Async Processing: Job-based with polling
{
  "program": "blastp",
  "sequence": "MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSSWRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFYLKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFYYEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGDAGEGEN",
  "database": "uniprotkb",
  "threshold": 0.001,
  "hits": 50
}
EBI Proteins API Tools
proteins_api_details

Detailed protein information from EBI Proteins API:

  • Rich Data: Sequence, functional annotations, isoforms
  • Formats: JSON, XML
  • Isoforms: Include protein variants
{
  "accession": "P04637",
  "format": "json",
  "include_isoforms": true
}
proteins_api_features

Protein sequence features and annotations:

  • Categories: Domains, sites, regions, PTMs
  • Formats: JSON, XML, GFF
  • Filtering: Specific feature categories
{
  "accession": "P04637",
  "categories": ["DOMAINS_AND_SITES", "PTM"],
  "format": "json"
}
proteins_api_variation

Protein sequence variations and disease variants:

  • Sources: UniProt, large-scale studies
  • Consequences: Missense, nonsense, synonymous
  • Disease Filter: Disease-associated variants only
  • Clinical Data: ClinVar, COSMIC integration
{
  "accession": "P04637",
  "sources": ["uniprot", "large_scale_studies"],
  "consequences": ["missense", "nonsense"],
  "disease_filter": true
}
proteins_api_proteomics

Proteomics data from various studies:

  • Studies: PeptideAtlas, MaxQB, ProteomicsDB
  • Tissues: Brain, liver, heart, etc.
  • Quantitative: Expression levels and modifications
{
  "accession": "P04637",
  "tissues": ["brain", "liver"],
  "format": "json"
}
proteins_api_genome

Genome coordinate mappings:

  • Assemblies: GRCh38, GRCh37
  • Coordinates: Protein to genomic position mapping
  • Exon Structure: Gene structure information
{
  "accession": "P04637",
  "assembly": "GRCh38",
  "format": "json"
}
Data Management Tools
data_manager

Query, analyze, and manage staged datasets:

  • Operations: Query, schema, cleanup, export
  • SQL Interface: Full SQLite support
  • Export: JSON, CSV, TSV formats
  • Analytics: Built-in query suggestions
{
  "operation": "query",
  "data_access_id": "uniprot_1234567890_abc123",
  "sql": "SELECT * FROM protein WHERE JSON_EXTRACT(data, '$.organism.scientificName') = 'Homo sapiens' LIMIT 10"
}
Quick Start
1. Setup
npm install
2. Development
npm run dev

The server will be available at:

  • MCP Endpoint: http://localhost:8787/mcp
  • SSE Endpoint: http://localhost:8787/sse
3. Testing Examples
Search for Human Proteins
{
  "method": "tools/call",
  "params": {
    "name": "uniprot_query",
    "arguments": {
      "operation": "search",
      "query": "organism_id:9606 AND reviewed:true",
      "limit": 10
    }
  }
}
Get Protein Details
{
  "method": "tools/call", 
  "params": {
    "name": "uniprot_query",
    "arguments": {
      "operation": "protein_details",
      "accession": "P04637"
    }
  }
}
Stage and Query Multiple Proteins
{
  "method": "tools/call",
  "params": {
    "name": "data_manager", 
    "arguments": {
      "operation": "fetch_and_stage",
      "accessions": "P04637,Q92793",
      "fields": "accession,protein_name,gene_names,organism_name"
    }
  }
}
Data Staging & SQL Querying

For large datasets, the server automatically stages data in SQLite tables within Durable Objects, enabling complex analytical queries:

Automatic Table Creation

Data is normalized into tables like:

  • proteins: Core protein information
  • gene_names: Gene names and synonyms
  • features: Protein sequence features
  • keywords: Functional keywords
  • references: Literature references
Example SQL Queries
-- Query staged JSON using SQLite JSON1
SELECT 
  json_extract(data, '$.primaryAccession') as accession,
  json_extract(data, '$.genes[0].geneName.value') as gene_name,
  json_extract(data, '$.sequence.length') as length
FROM protein
WHERE json_extract(data, '$.organism.scientificName') = 'Homo sapiens'
LIMIT 10;
API Endpoints and Rate Limits
UniProtKB REST API
  • Base URL: https://rest.uniprot.org/uniprotkb/
  • Rate Limits: IP-based, ~3 requests/second recommended
  • Formats: JSON, TSV, FASTA, GFF, XML
EBI Proteins API
  • Base URL: https://www.ebi.ac.uk/proteins/api/
  • Rate Limits: ~10 requests/second per IP
  • Authentication: None required for public data
Architecture
Components
  • UniProtMCP: Main MCP agent implementing ToolContext interface
  • ToolRegistry: Manages and registers all available tools
  • JsonToSqlDO: Durable Object for data staging and SQL operations
  • ChunkingEngine: Handles large dataset chunking for efficient processing
  • DataInsertionEngine: Optimized bulk data insertion with conflict resolution
  • SchemaInferenceEngine: Automatic schema discovery and documentation
Data Flow
  1. Request: Tool receives search/fetch request
  2. API Call: Fetches data from UniProt/Proteins APIs
  3. Parsing: Normalizes JSON responses into structured entities
  4. Staging Decision: Determines if staging is beneficial
  5. Storage: Creates optimized SQLite tables in Durable Objects
  6. Querying: Enables complex SQL analysis of staged data
Deployment
Cloudflare Workers
npm run deploy
Configuration

Ensure wrangler.jsonc includes:

  • Durable Object bindings for UniProtMCP and JsonToSqlDO
  • Node.js compatibility flags
  • Proper migration configuration
Environment Variables

No API keys required - both UniProt and EBI Proteins APIs are open access.

Connect to Claude Desktop

You can connect to your remote MCP server from Claude Desktop using the mcp-remote proxy.

Update your Claude Desktop configuration:

{
  "mcpServers": {
    "uniprot": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "http://localhost:8787/sse"  // or your-uniprot-server.workers.dev/sse
      ]
    }
  }
}
Rate Limiting Strategy

See RATE_LIMITING.md for detailed information about:

  • API-specific rate limits and best practices
  • Intelligent request throttling and retry logic
  • Monitoring and optimization strategies
  • Bulk operation handling
Examples
Research Workflow: Cancer-related Proteins
  1. Search for cancer-related proteins:
{
  "operation": "search",
  "query": "keyword:Cancer AND organism_id:9606",
  "limit": 100
}
  1. Stage for analysis:
{
  "operation": "fetch_and_stage",
  "accessions": "P04637,P53_HUMAN,BRCA1_HUMAN,BRCA2_HUMAN"
}
  1. Analyze with SQL:
SELECT 
  p.accession,
  p.protein_name,
  COUNT(f.feature_id) as feature_count,
  GROUP_CONCAT(DISTINCT k.keyword) as keywords
FROM proteins p
LEFT JOIN features f ON p.accession = f.accession  
LEFT JOIN keywords k ON p.accession = k.accession
WHERE k.keyword LIKE '%cancer%'
GROUP BY p.accession
ORDER BY feature_count DESC;
Protein Family Analysis
  1. Search protein family:
{
  "operation": "search", 
  "query": "family:\"protein kinase\" AND reviewed:true",
  "fields": "accession,protein_name,gene_names,ec"
}
  1. Get detailed features:
{
  "operation": "protein_features",
  "accession": "P06493",
  "features": "DOMAIN,BINDING,ACT_SITE"
}
Contributing
  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly with both APIs
  5. Submit a pull request
License

MIT License - see LICENSE file for details.

Related Projects