NCBI-Database-MCP
The NCBI Database is a Python library designed for accessing the NCBI database and retrieving bioinformatics-related data. It offers an easy-to-use API that simplifies the process of searching and obtaining data. This tool is widely used by researchers for data analysis and research purposes.
GitHub Stars
1
User Rating
Not Rated
Favorites
0
Views
40
Forks
0
Issues
0
NCBI Database MCP
๐ MCP server for NCBI bioinformatics tools and disease-focused gene expression research
Enable AI assistants to discover gene expression datasets by disease/condition and access comprehensive NCBI databases through natural language. Perfect for researchers studying disease mechanisms and therapeutic targets.
๐งฌ Features
- ๐ฌ Disease-Focused GEO Search - Discover gene expression datasets by disease/condition and organism
- ๐ Comprehensive Study Metadata - Get detailed methodology, platform, and sample information
- ๐งฌ Gene-to-Genomic Conversion - Convert gene names to genomic DNA sequences
- ๐ญ Multi-Species Support - Human, mouse, and rat datasets
- ๐ Research Methodology Details - RNA-Seq, microarray, ChIP-Seq, and other techniques
- ๐ Direct Database Links - Easy access to full datasets and original studies
๐ Quick Start
Installation
# Clone repository
git clone https://github.com/hpend2373/NCBI-Database-MCP.git
cd NCBI-Database-MCP
# Install dependencies
pip install -r requirements.txt
Basic Usage
๐ RECOMMENDED: Use FastMCP Server for Best Performance
# Start the FastMCP server (RECOMMENDED)
./run_fastmcp_gene_server.sh
# Alternative: Standard MCP server (slower startup)
python src/gene_to_genomic_server.py
Why FastMCP?
- โก Faster startup - Instant server initialization
- ๐ง Easier debugging - Better error messages and logging
- ๐ Built-in monitoring - Performance metrics included
- ๐ฏ Optimized for research - Designed specifically for bioinformatics workflows
Configuration
Add to your MCP client config:
{
"mcpServers": {
"ncbi-database": {
"command": "python",
"args": ["src/gene_to_genomic_server.py"],
"cwd": "/path/to/NCBI-Database-MCP",
"env": {
"NCBI_API_KEY": "your_api_key_here"
}
}
}
}
Alternative: Set global environment variable
export NCBI_API_KEY="your_api_key_here"
Then use simpler config:
{
"mcpServers": {
"ncbi-database": {
"command": "python",
"args": ["src/gene_to_genomic_server.py"],
"cwd": "/path/to/NCBI-Database-MCP"
}
}
}
๐ก Usage Examples
๐ฌ Disease Expression Research (Primary Use Case)
User: "Find gene expression datasets for Alzheimer's disease in humans"
AI: [calls search_geo_datasets] โ
๐ Returns 10 datasets with:
- Study methodology (RNA-Seq, Microarray)
- Sample sizes and experimental design
- Platform information (Illumina, Affymetrix)
- Research summaries and direct GEO links
User: "Show me cancer expression studies in mice using RNA sequencing"
AI: [calls search_geo_datasets] โ
๐งช Filtered results showing:
- RNA-Seq datasets only
- Mouse-specific cancer studies
- Detailed experimental protocols
๐งฌ Gene-to-Genomic Analysis
User: "Get the genomic sequence for BRCA1"
AI: [calls gene_to_genomic_sequence] โ Returns genomic DNA sequence in FASTA format
๐ Gene Information & Location
User: "Find information about TP53 gene"
AI: [calls search_gene_info] โ Returns gene location, function, and coordinates
๐ฏ Coordinate-Based Sequence Retrieval
User: "Get sequence from chr17:43044295-43125483"
AI: [calls get_genomic_sequence] โ Returns DNA sequence for specified coordinates
๐ ๏ธ Available Tools
๐ฌ search_geo_datasets (Primary Tool)
Discover gene expression datasets by disease/condition and organism
Parameters:
disease(required) - Disease or condition name- Examples: "cancer", "diabetes", "Alzheimer", "heart disease", "depression"
organism- Target organism (default: "Homo sapiens")- Options: "Homo sapiens", "Mus musculus", "Rattus norvegicus"
study_type- Expression study methodology (optional, default: "Expression profiling by high throughput sequencing")- Options: "Expression profiling by array", "Expression profiling by high throughput sequencing"
- Default: RNA-Seq - Most comprehensive and current sequencing technology
max_results- Maximum results to return (1-50, default: 10)
Detailed Output:
- ๐ Dataset Information: GDS accession numbers and titles
- ๐ฌ Study Methodology:
- RNA-Seq (High-throughput transcriptome sequencing) - DEFAULT
- Microarray (Hybridization-based gene expression)
- ChIP-Seq (Chromatin immunoprecipitation sequencing)
- SAGE (Serial analysis of gene expression)
- ๐งฌ Data Type Classification:
- Single-Cell RNA-Seq ๐งฉ - Individual cell-level gene expression
- Bulk RNA-Seq ๐ฆ - Tissue/population-level gene expression
- Spatial Transcriptomics ๐บ๏ธ - Location-aware gene expression
- ๐งช Platform Details: Illumina, Affymetrix, Agilent technologies
- ๐ Experimental Design: Sample counts, tissue types, treatment conditions
- ๐ Research Context: Study summaries and disease relevance
- ๐ Direct Access: Links to full datasets on NCBI GEO
๐งฌ gene_to_genomic_sequence
Convert gene name to genomic DNA sequence
Parameters:
gene_name(required) - Gene symbol (e.g., "BRCA1", "TP53")organism- Target organism (default: "human")sequence_type- "genomic", "cds", "mrna", "protein"output_format- "fasta", "genbank", "json"
๐ search_gene_info
Search for gene information and genomic location
Parameters:
gene_name(required) - Gene symbol or nameorganism- Target organism (default: "human")
๐ฏ get_genomic_sequence
Get genomic sequence from chromosome coordinates
Parameters:
chromosome(required) - Chromosome accession (e.g., "NC_000017.11")start(required) - Start positionend(required) - End positionoutput_format- "fasta", "json"
โ๏ธ Configuration
Environment Variables
You can configure the server using environment variables:
# Copy example file and edit
cp .env.example .env
# Or set directly
export NCBI_API_KEY="your_api_key_here"
# Get your free API key from: https://www.ncbi.nlm.nih.gov/account/
# Without API key: 3 requests/second
# With API key: 10 requests/second
๐ Project Structure
NCBI-Database-MCP/
โโโ README.md # Documentation
โโโ requirements.txt # Python dependencies
โโโ pyproject.toml # Project configuration
โโโ .env.example # Environment variables template
โโโ run_fastmcp_gene_server.sh # Launch script
โโโ src/
โโโ gene_to_genomic_server.py # Standard MCP server
โโโ fastmcp_gene_server.py # FastMCP server (recommended)
๐ Performance Tips
๐ฌ GEO Dataset Search Optimization
- Use specific disease terms: "lung cancer" > "cancer", "type 2 diabetes" > "diabetes"
- Combine with study types: Filter by methodology for targeted results
- Start with small result sets: Use max_results=5-10 for initial exploration
- Organism specificity: Use exact names ("Homo sapiens" not "human")
๐ Troubleshooting
Common Issues
Gene not found
# Check gene name spelling
# Try alternative gene symbols
# Verify organism specification
No GEO datasets found
# Try broader disease terms (e.g., "cancer" instead of "lung adenocarcinoma")
# Check organism name (use "Homo sapiens" not "human")
# Try without study_type filter
# Verify disease spelling and terminology
API rate limiting
# Get free NCBI API key: https://www.ncbi.nlm.nih.gov/account/
# Set NCBI_API_KEY environment variable
# Without key: 3 requests/second limit
# With key: 10 requests/second limit
Network timeouts
# Check internet connection
# Increase timeout values
# Retry failed requests
๐ Resources
๐ Support
- ๐ Bug Reports: GitHub Issues
- ๐ก Feature Requests: GitHub Issues
- ๐ Documentation: README.md
Happy genomics research! ๐งฌ๐
โญAI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.๐ฏ ๅๅซไฟกๆฏ่ฟ่ฝฝ๏ผไฝ ็ AI ่ๆ ็ๆงๅฉๆไธ็ญ็น็ญ้ๅทฅๅ ท๏ผ่ๅๅคๅนณๅฐ็ญ็น + RSS ่ฎข้ ๏ผๆฏๆๅ ณ้ฎ่ฏ็ฒพๅ็ญ้ใAI ็ฟป่ฏ + AI ๅๆ็ฎๆฅ็ดๆจๆๆบ๏ผไนๆฏๆๆฅๅ ฅ MCP ๆถๆ๏ผ่ต่ฝ AI ่ช็ถ่ฏญ่จๅฏน่ฏๅๆใๆ ๆๆดๅฏไธ่ถๅฟ้ขๆต็ญใๆฏๆ Docker ๏ผๆฐๆฎๆฌๅฐ/ไบ็ซฏ่ชๆใ้ๆๅพฎไฟก/้ฃไนฆ/้้/Telegram/้ฎไปถ/ntfy/bark/slack ็ญๆธ ้ๆบ่ฝๆจ้ใ