MCPCorpus

MCPCorpus is a comprehensive dataset for analyzing the Model Context Protocol (MCP) ecosystem, containing ~14K MCP servers and 300 MCP clients with 20+ normalized metadata attributes.

GitHub Stars

18

User Rating

Not Rated

Forks

3

Issues

2

Views

1

Favorites

0

README
MCPCorpus: A Large-Scale Evolvable Dataset for Model Context Protocol Ecosystem and Security Analysis

MCPCorpus is a comprehensive dataset for analyzing the Model Context Protocol (MCP) ecosystem, containing ~14K MCP servers and 300 MCP clients with 20+ normalized metadata attributes.

šŸ“Š Dataset Overview
  • Scale: ~14K MCP servers + 300 MCP clients
  • Attributes: 20+ normalized metadata per artifact
  • Sources: GitHub, community hubs, package managers
  • Applications: Ecosystem analysis, security research, interoperability studies
šŸ“ Structure
MCPCorpus/
ā”œā”€ā”€ Crawler/                    # Data collection tools
│   ā”œā”€ā”€ Servers/               # Server data
│   ā”œā”€ā”€ Clients/               # Client data
│   ā”œā”€ā”€ github_info_collector.py  # GitHub metadata collector
│   └── data_cleaner.py        # Data normalization
└── Website/                   # Web search interface
    ā”œā”€ā”€ server.py              # Local web server
    └── index.html             # Search interface
    └── mcpso_servers_cleaned.json
    └── mcpso_clients_cleaned.json
šŸš€ Quick Start
Explore Dataset
cd Website
python server.py
# Open http://localhost:8000
Access Data Programmatically
import json
import pandas as pd

# Load datasets
with open('Crawler/Servers/mcpso_servers_cleaned.json', 'r') as f:
    servers = json.load(f)
with open('Crawler/Clients/mcpso_clients_cleaned.json', 'r') as f:
    clients = json.load(f)

# Convert to DataFrame
servers_df = pd.DataFrame(servers)
clients_df = pd.DataFrame(clients)
Update Dataset (Optional)
# Collect new data
cd Crawler/Servers && python Server_request.py
cd ../Clients && python Client_request.py

# Add GitHub metadata
cd .. && python github_info_collector.py --token YOUR_GITHUB_TOKEN
šŸ“š Citation

If you use MCPCorpus in your research, please cite it as:

@misc{lin2025largescaleevolvabledatasetmodel,
      title={A Large-Scale Evolvable Dataset for Model Context Protocol Ecosystem and Security Analysis}, 
      author={Zhiwei Lin and Bonan Ruan and Jiahao Liu and Weibo Zhao},
      year={2025},
      eprint={2506.23474},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2506.23474}, 
}
Author Information
Snakinya
National University of SingaporeSingapore

48

Followers

50

Repositories

0

Gists

8

Total Contributions

Top Contributors

Threads