arxiv-mcp

Name: arxiv-mcp
Availability: InStock
Author: Tejas242

arxiv-mcp is a tool that retrieves paper data from arXiv and makes it accessible via an API. Users can easily obtain paper information for analysis and research purposes. The data retrieval is automated, enabling an efficient workflow.

GitHub

GitHub Stars

User Rating

Not Rated

Favorites

Views

Forks

Issues

README

arXiv MCP Server

Access the world's largest repository of academic papers through the Model Context Protocol

A streamlined Model Context Protocol server that connects AI assistants to arXiv's vast collection of academic papers. Search, analyze, and download research papers directly from your AI workflow.

🚀 Quick Start

Prerequisites

Python 3.12+
uv package manager

Installation

Option 1: Docker (Recommended)

# Pull and run the Docker image
docker run --rm -it ghcr.io/tejas242/arxiv-mcp:latest

# Or using docker-compose
git clone https://github.com/tejas242/arxiv-mcp.git
cd arxiv-mcp
docker compose up

Option 2: Local Development

# Clone and setup
git clone https://github.com/tejas242/arxiv-mcp.git
cd arxiv-mcp
uv sync

# Test the server
uv run main.py

🛠️ Available Functions

Function	Status	Description	Parameters
`search_papers`	✅ Working	Search arXiv papers with flexible query syntax	`query`, `max_results`, `sort_by`, `sort_order`
`get_paper_details`	✅ Working	Retrieve complete metadata for any arXiv paper	`arxiv_id`
`build_advanced_query`	✅ Working	Construct complex search queries with multiple fields	`title_keywords`, `author_name`, `category`, `abstract_keywords`
`get_arxiv_categories`	✅ Working	List all available arXiv subject categories	None
`search_by_author`	⚠️ Limited	Find papers by specific author (use search_papers instead)	`author_name`, `max_results`
`search_by_category`	⚠️ Limited	Browse papers by category (use search_papers instead)	`category`, `max_results`
`download_paper_pdf`	🔧 Needs Fix	Download paper PDFs (redirect handling issue)	`arxiv_id`, `save_path`

Function Details

✅ Fully Working Functions

search_papers - The primary search function

Supports full arXiv query syntax
Handles keywords, authors, categories, titles
Configurable sorting and pagination
Returns formatted results with abstracts and links

get_paper_details - Detailed paper information

Complete metadata extraction
Author information with affiliations
Category classifications and links
Publication dates and updates

build_advanced_query - Query construction helper

Combines multiple search criteria
Supports title, author, category, and abstract searches
Returns properly formatted query strings

get_arxiv_categories - Category reference

Complete list of arXiv subject categories
Descriptions for each category
Helpful for constructing targeted searches

⚠️ Limited Functions (Workarounds Available)

search_by_author - Use search_papers('au:"Author Name"') instead
search_by_category - Use search_papers('cat:category_code') instead

🔧 Functions Needing Fixes

download_paper_pdf - HTTP redirect handling needs improvement

Currently fails due to HTTPS/HTTP redirect issues
PDFs can be accessed directly via the links provided in search results

⚙️ Configuration

Claude Desktop Setup

Configuration Instructions

For Local Installation:

Add to your Claude Desktop config file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "arxiv-mcp": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/arxiv-mcp",
        "run",
        "main.py"
      ]
    }
  }
}

For Docker Installation:

{
  "mcpServers": {
    "arxiv-mcp": {
      "command": "docker",
      "args": [
        "run",
        "--rm",
        "-i",
        "ghcr.io/tejas242/arxiv-mcp:latest"
      ]
    }
  }
}

VS Code MCP Extension

VS Code Configuration

{
  "mcp": {
    "servers": {
      "arxiv-mcp": {
        "command": "uv",
        "args": ["--directory", "/path/to/arxiv-mcp", "run", "main.py"]
      }
    }
  }
}

💡 Usage Examples

Core Search Operations

# Search for papers about transformers
search_papers("transformer architecture")

# Advanced query with specific fields
search_papers('ti:"attention mechanism" AND cat:cs.LG')

# Author-specific search (recommended approach)
search_papers('au:"Geoffrey Hinton"')

# Category browsing (recommended approach)
search_papers('cat:cs.AI')

Research Workflow

# 1. Find the famous "Attention" paper
search_papers('ti:"Attention Is All You Need"')
get_paper_details("1706.03762")

# 2. Explore related work
search_papers("transformer neural networks")

# 3. Build complex queries
query = build_advanced_query(
    title_keywords="few-shot learning",
    author_name="Tom Brown",
    category="cs.LG"
)
search_papers(query)

📊 arXiv Categories Reference

Popular Categories

Code	Description	Example Topics
`cs.AI`	Artificial Intelligence	Machine learning, neural networks, AI theory
`cs.LG`	Machine Learning	Deep learning, reinforcement learning, statistical learning
`cs.CV`	Computer Vision	Image processing, object detection, visual recognition
`cs.CL`	Computation and Language	NLP, language models, text processing
`cs.CR`	Cryptography and Security	Security protocols, encryption, privacy
`stat.ML`	Machine Learning (Statistics)	Statistical learning theory, Bayesian methods
`physics.gen-ph`	General Physics	Theoretical physics, quantum mechanics
`math.NA`	Numerical Analysis	Computational mathematics, algorithms
`q-bio.NC`	Quantitative Biology	Neuroscience, computational biology

Use get_arxiv_categories() for the complete list of available categories.

🧪 Testing Results

Based on comprehensive testing of all functions:

✅ Reliable Functions

Paper search with keywords, authors, categories: 100% success rate
Paper detail retrieval: Complete metadata extraction working
Query construction: All syntax combinations supported
Category listing: All arXiv categories accessible

⚠️ Alternative Approaches Recommended

Author search: Use search_papers('au:"Author Name"') instead of search_by_author()
Category browsing: Use search_papers('cat:category') instead of search_by_category()

🔧 Known Issues

PDF downloads: Redirect handling needs improvement (PDFs accessible via direct links)

🔧 Development

Project Structure

arxiv-mcp/
├── src/arxiv_mcp/          # Main package
│   ├── server.py           # MCP server implementation
│   ├── arxiv_client.py     # arXiv API wrapper
│   ├── models.py           # Pydantic data models
│   └── utils.py            # Helper functions
├── tests/                  # Test suite
├── main.py                 # Entry point
└── pyproject.toml         # Project config

Running Tests

uv run pytest tests/ -v

Debug Mode

# Enable detailed logging
PYTHONPATH=src uv run python -c "
import logging
logging.basicConfig(level=logging.DEBUG)
from arxiv_mcp.server import main
main()
"