GitHub Stars
9
User Rating
Not Rated
Favorites
0
Views
25
Forks
3
Issues
0
Scorecard MCP Server
Connect Claude to Scorecard's AI evaluation platform through natural language conversations.
Test, measure, and improve your AI systems without switching between tools.
Quick Start
Add to your MCP client configuration:
{
"mcpServers": {
"scorecard": {
"command": "npx",
"args": ["mcp-remote", "https://app.scorecard.io/api/mcp"]
}
}
}
You'll authenticate with your Scorecard account on first use via Clerk OAuth.
What You Can Do
Ask Claude to help with AI evaluation tasks:
- "Create a new project for evaluating my chatbot"
- "Set up test cases for customer service scenarios"
- "Configure accuracy and helpfulness metrics"
- "Run an evaluation against my latest model"
- "Show me the performance results"
Available Operations
- Projects: Create and manage evaluation projects
- Test Sets: Build comprehensive test suites
- Test Cases: Add and organize individual test scenarios
- Metrics: Configure custom evaluation criteria
- Systems: Manage AI system configurations and versions
- Runs: Execute evaluations and analyze results
Technical Details
Built using the https://modelcontextprotocol.io/specification/2025-06-18/changelog on
Scorecard's Next.js frontend:
- Clerk OAuth for secure authentication
- JWT tokens passed to Scorecard's backend
- Auto-generated MCP tools from OpenAPI spec
- Deployed on Vercel's edge infrastructure
Security
- OAuth 2.0 authentication through Clerk
- Access limited to your authenticated Scorecard account
- Tokens passed through but never stored
Transform Scorecard into a conversational AI evaluation assistant - comprehensive model
testing through natural conversation.
Special thanks to Dustin Moore for his engineering leadership in developing this MCP implementation.
The agentic-tools-mcp-companion is an automation tool built with TypeScript, aimed at streamlining workflows. Users can easily automate tasks and enhance project productivity. It excels in API integration and automating data processing, making it a valuable asset for developers looking to improve efficiency.