LiveMCPBench
LiveMCPBenchは、Pythonで構築された自動化ツールで、MCP(Managed Cloud Platform)のパフォーマンスをベンチマークするための機能を提供します。ユーザーは、簡単に設定を行い、リアルタイムでデータを収集し、分析することができます。これにより、システムの最適化や問題の特定が容易になります。
GitHubスター
53
ユーザー評価
未評価
お気に入り
0
閲覧数
3
フォーク
7
イシュー
1
LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?
Benchmarking the agent in real-world tasks within a large-scale MCP toolset.
ð Website | ð Paper | ð¤ Dataset | ð³ Docker | ð Leaderboard | ð Citation
News
- [8/18/2025] We releas Docker images and add evaluation results in leaderboard for three new models: GLM 4.5, GPT-5-Mini, and Kimi-K2.
- [8/3/2025] We release the LiveMCPBench.
Getting Started
Prerequisites
We recommend using our docker image, but if you want to run the code locally, you will need to install the following tools:
- npm
- uv
Installation
Pull the docker image
docker pull hysdhlx/livemcpbench:latest
Git the repo and run the docker image
git clone https://github.com/icip-cas/LiveMCPBench.git cd LiveMCPBench docker run -itd \ -v "$(pwd):/outside" \ --gpus all \ --ipc=host \ --net=host \ --name LiveMCPBench_container \ hysdhlx/livemcpbench:latest \ bash
Prepare the .env file
cp .env_template .env
You can modify the .env file to set your own environment variables.
# MCP Copilot Agent Configuration BASE_URL= OPENAI_API_KEY= MODEL= # Tool Retrieval Configuration EMBEDDING_MODEL= EMBEDDING_BASE_URL= EMBEDDING_API_KEY= EMBEDDING_DIMENSIONS=1024 TOP_SERVERS=5 TOP_TOOLS=3 # Abstract API Configuration (optional) ABSTRACT_MODEL= ABSTRACT_API_KEY= ABSTRACT_BASE_URL= # Proxy Configuration (optional) http_proxy= https_proxy= no_proxy=127.0.0.1,localhost HTTP_PROXY= HTTPS_PROXY= NO_PROXY=127.0.0.1,localhost # lark report (optional) LARK_WEBHOOK_URL=
Enter the container & Reset the environment
As we have mounted the code repo to
/outside
, you can access the code repo in the container at/outside/
.docker exec -it LiveMCPBench_container bash
Because the agent may change the environment, we recommend resetting the environment before running the agent. To reset the environment, you can run the following command:
cd /LiveMCPBench/ bash scripts/env_reset.sh
This will copy the repo code in
/outside
to/LiveMCPBench
and link theannotated_data
to/root/
.Check the MCP tools
bash ./tools/scripts/tool_check.sh
After running this command, you can check
./tools/test/tools.json
to see the tools.You could run this script multiple times if you find some tools are not working.
Index the servers
The MCP Copilot Agent requires you have indexed the servers before running. You can run the following command to warm up the agent:
uv run -m baseline.mcp_copilot.arg_generation
Quick Start
MCP Copilot Agent
Example Run
bash ./baseline/scripts/run_example.sh
This will run the agent with a simple example and save the results in ./baseline/output/
.
Full Run
We default use /root dir to store our data that the agent will access. If you want to run locally, you need to ensure the file in the right path.
Run the MCP Copilot Agent
Be sure you have set the environment variables in the .env file.
bash ./baseline/scripts/run_baselines.sh
Check the results
After running the agent, you can check the trajectories in
./baseline/output
.
Evaluation using the LiveMCPEval
Modify the
MODEL
in .env to change evluation modelsRun the evaluation script
bash ./evaluator/scripts/run_baseline.sh
Check the results
After running the evaluation, you can check the results in
./evaluator/output
.Calculate the success rate
uv run ./evaluator/stat_success_rate.py --result_path /path/to/evaluation/
Project Structure
LiveMCPBench/
âââ annotated_data/ # Tasks and task files
âââ baseline/ # MCP Copilot Agent
â âââ scripts/ # Scripts for running the agent
â âââ output/ # Output for the agent
â âââ mcp_copilot/ # Source code for the agent
âââ evaluator/ # LiveMCPEval
â âââ scripts/ # Scripts for evaluation
â âââ output/ # Output for evaluation
âââ tools/ # LiveMCPTool
â âââ LiveMCPTool/ # Tool data
â âââ scripts/ # Scripts for the tools
âââ scripts/ # Path prepare scripts
âââ utils/ # Utility functions
âââ .env_template # Template for environment
Citation
If you find this project helpful, please use the following to cite it:
@misc{mo2025livemcpbenchagentsnavigateocean,
title={LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?},
author={Guozhao Mo and Wenliang Zhong and Jiawei Chen and Xuanang Chen and Yaojie Lu and Hongyu Lin and Ben He and Xianpei Han and Le Sun},
year={2025},
eprint={2508.01780},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2508.01780},
}