Analysis-Alpaca-Researcher
Alpaca-Researcherは、Pythonを用いたAI研究のためのツールです。主にデータ分析やモデルの評価を支援する機能を提供し、研究者が効率的に実験を行えるよう設計されています。使いやすいインターフェースと豊富なドキュメントが特徴です。
GitHubスター
0
ユーザー評価
未評価
お気に入り
0
閲覧数
11
フォーク
0
イシュー
0
AnalysisAlpaca 🦙
A production-ready MCP (Model Context Protocol) server that enables comprehensive research and analysis capabilities for Claude and other MCP-compatible AI assistants. This server integrates web and academic search functionality with an optional web interface for interactive research and AI-powered report generation.
🚀 Quick Start
# 1. Clone and navigate to the project
git clone https://github.com/DeepKariaX/Analysis-Alpaca-Researcher.git
cd Analysis-Alpaca-Researcher
# 2. Install dependencies (use virtual environment recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -e .
# 3. Start the MCP server
python http_server.py
# Server runs on http://localhost:8001
# API documentation: http://localhost:8001/docs
📋 Table of Contents
- Features
- Architecture
- Installation
- Configuration
- Usage
- Web Interface
- API Reference
- Development
- Testing
- Deployment
- Troubleshooting
- Contributing
✨ Features
Core Research Capabilities
- Multi-Source Search: Combines DuckDuckGo web search and Semantic Scholar academic research
- Content Extraction: Intelligent extraction of relevant information from web pages
- Academic Integration: Direct access to scholarly articles and research papers
- Smart Formatting: Properly formatted research with citations and structured output
- Rate Limiting: Built-in retry logic and graceful handling of API limits
Web Interface Features
- Interactive Research: User-friendly web interface for conducting research
- Job Management: Track multiple research jobs with progress monitoring
- AI-Powered Reports: Generate comprehensive PDF reports using OpenAI, Anthropic, or Groq
- PDF Export: Download research results as properly named PDF files
- Real-time Updates: Live progress tracking with WebSocket-like polling
Production Features
- Comprehensive Error Handling: Graceful degradation when services are unavailable
- Extensive Logging: Detailed logging for debugging and monitoring
- Configurable Settings: Environment-based configuration management
- Auto-Dependency Installation: Automatic installation of missing dependencies
- Modular Architecture: Easy to extend and customize
🏗 Architecture
Components Overview
analysis_alpaca/
├── src/analysis_alpaca/ # Core MCP server implementation
│ ├── core/ # Server and research orchestration
│ ├── search/ # Search engine implementations
│ ├── models/ # Data models and schemas
│ ├── utils/ # Utility functions and helpers
│ └── exceptions/ # Custom exception handling
├── web_ui/ # Optional web interface
│ ├── frontend/ # React.js frontend application
│ └── backend/ # FastAPI backend for web UI
├── tests/ # Test suite
├── http_server.py # HTTP API wrapper for MCP server
└── requirements.txt # Unified dependencies
Core Components
MCP Server (
src/analysis_alpaca/core/server.py
)- FastMCP-based server exposing research tools to Claude
- Main tool:
deep_research()
for comprehensive research - Built-in prompt templates for structured research methodology
Research Service (
src/analysis_alpaca/core/research_service.py
)- Orchestrates the entire research workflow
- Coordinates web and academic searches
- Manages content extraction and result formatting
- Handles parallel execution and error recovery
Search Implementations
- WebSearcher: DuckDuckGo web search with result parsing
- AcademicSearcher: Semantic Scholar API integration with retry logic
- ContentExtractor: Web page content extraction and processing
HTTP Server (
http_server.py
)- REST API wrapper for MCP functionality
- Enables direct HTTP access to research capabilities
- CORS-enabled for web interface integration
Web Interface
- Frontend: React.js application with PDF generation
- Backend: FastAPI server for job management and AI report generation
🔧 Installation
Prerequisites
- Python 3.8+ (recommended: Python 3.11+)
- Node.js 16+ (only if using web interface)
- npm or yarn (only if using web interface)
Basic Installation
# Clone the repository
git clone https://github.com/DeepKariaX/Analysis-Alpaca-Researcher.git
cd Analysis-Alpaca-Researcher
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install the package
pip install -e .
# Or install with all optional dependencies
pip install -e ".[dev,ai]"
Web Interface Setup
# Install frontend dependencies
cd web_ui/frontend
npm install
# Return to project root
cd ../..
Dependencies Overview
Core Dependencies:
httpx>=0.25.0
- HTTP client for API requestsbeautifulsoup4>=4.12.0
- HTML parsing for content extractionmcp>=0.1.0
- Model Context Protocol server frameworkfastapi>=0.104.0
- Web framework for HTTP APIuvicorn>=0.24.0
- ASGI server for FastAPI
Optional AI Dependencies:
pip install -e ".[ai]" # Installs OpenAI, Anthropic, and Groq clients
Development Dependencies:
pip install -e ".[dev]" # Installs testing and linting tools
⚙️ Configuration
Environment Variables
Create a .env
file in the project root:
# Search Configuration
AA_MAX_RESULTS=5 # Maximum results per search
AA_DEFAULT_NUM_RESULTS=3 # Default number of results
AA_WEB_TIMEOUT=15.0 # Web search timeout (seconds)
AA_USER_AGENT="AnalysisAlpaca 1.0"
# Content Configuration
AA_MAX_CONTENT_SIZE=10000 # Maximum response size
AA_MAX_EXTRACTION_SIZE=150000 # Maximum content to extract
# Server Configuration
AA_LOG_LEVEL=INFO # Logging level (DEBUG, INFO, WARNING, ERROR)
AA_LOG_FILE="logs/research.log" # Optional log file path
AA_AUTO_INSTALL_DEPS=true # Auto-install missing dependencies
# AI Provider API Keys (Optional - for web interface)
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here
GROQ_API_KEY=your_groq_key_here
# Web UI Configuration
MCP_SERVER_URL=http://localhost:8001 # URL of the MCP HTTP server
Configuration Files
The system uses a hierarchical configuration approach:
- Default values in
config.py
- Environment variables (override defaults)
- Optional
.env
file (override environment)
🚀 Usage
MCP Server (for Claude Desktop)
Add to your Claude Desktop configuration:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json
{
"mcpServers": {
"analysis-alpaca": {
"command": "/path/to/python",
"args": ["/path/to/analysis_alpaca/http_server.py"],
"env": {
"AA_MAX_RESULTS": "5",
"AA_LOG_LEVEL": "INFO"
}
}
}
}
Standalone HTTP Server
# Start the HTTP API server
python http_server.py
# Server runs on http://localhost:8001
# API documentation available at http://localhost:8001/docs
Web Interface (Optional)
The web interface provides a user-friendly way to interact with AnalysisAlpaca through a browser.
Requirements:
- Node.js 16+ and npm for the frontend
- The MCP HTTP server must be running (see above)
Setup:
# Install frontend dependencies
cd web_ui/frontend
npm install
cd ../..
Manual Startup (2 terminals required):
Terminal 1 - Backend API Server:
cd web_ui/backend
python main.py
# Backend runs on http://localhost:8000
# API documentation: http://localhost:8000/docs
Terminal 2 - Frontend Development Server:
cd web_ui/frontend
npm start
# Frontend runs on http://localhost:3000
# Access the web interface at http://localhost:3000
Complete Setup (3 servers total):
- MCP Server (Terminal 1):
python http_server.py
→ http://localhost:8001 - Backend API (Terminal 2):
cd web_ui/backend && python main.py
→ http://localhost:8000 - Frontend UI (Terminal 3):
cd web_ui/frontend && npm start
→ http://localhost:3000
Research Tool Usage
The main deep_research
tool accepts these parameters:
- query (required): The research question or topic
- sources (optional): "web", "academic", or "both" (default: "both")
- num_results (optional): Number of sources to examine (default: 2)
Example Prompts for Claude
Research the latest developments in quantum computing using both web and academic sources.
Can you do comprehensive research on climate change mitigation strategies? Focus on academic sources and examine 3 results.
I need detailed information about the impact of artificial intelligence on healthcare. Use the deep_research tool with web sources only.
Direct API Usage
# Research via HTTP API
curl -X POST "http://localhost:8001/deep_research" \
-H "Content-Type: application/json" \
-d '{
"query": "artificial intelligence in healthcare",
"sources": "both",
"num_results": 3
}'
🌐 Web Interface
Features
- Research Form: Interactive form to submit research queries
- Progress Tracking: Real-time progress updates with detailed logs
- Job Management: View and manage multiple research jobs
- AI Report Generation: Generate comprehensive reports using various LLM providers
- PDF Export: Download reports as properly named PDF files
- History: Browse previous research jobs and results
Supported LLM Providers
- OpenAI: GPT-4, GPT-3.5-turbo, and other models
- Anthropic: Claude 3 (Sonnet, Opus, Haiku)
- Groq: Fast inference with various open-source models
File Naming Convention
Downloaded reports use the format: {sanitized_title}_{source_type}.pdf
Example: artificial_intelligence_healthcare_web_academic.pdf
📚 API Reference
MCP Tools
deep_research
Perform comprehensive research on a topic.
Parameters:
query
(string, required): Research question or topicsources
(string, optional): Source type ("web", "academic", "both")num_results
(integer, optional): Number of sources to examine
Returns: Formatted research results with sources and content
research_prompt
Generate a structured research prompt for multi-stage research.
Parameters:
topic
(string, required): Topic to research
Returns: Comprehensive research prompt with methodology
HTTP API Endpoints
POST /deep_research
Execute research query via HTTP.
{
"query": "string",
"sources": "both",
"num_results": 2
}
GET /health
Health check endpoint.
GET /docs
Interactive API documentation (Swagger UI).
Web UI API Endpoints
POST /research
Start a new research job.
GET /research/{job_id}
Get research job status and results.
GET /research/{job_id}/progress
Get detailed progress for a research job.
🛠 Development
Project Structure
analysis_alpaca/
├── src/analysis_alpaca/
│ ├── __init__.py
│ ├── config.py # Configuration management
│ ├── core/
│ │ ├── __init__.py
│ │ ├── server.py # MCP server implementation
│ │ └── research_service.py # Research orchestration
│ ├── search/
│ │ ├── __init__.py
│ │ ├── base.py # Base searcher class
│ │ ├── web_search.py # DuckDuckGo implementation
│ │ ├── academic_search.py # Semantic Scholar implementation
│ │ └── content_extractor.py # Content extraction
│ ├── models/
│ │ ├── __init__.py
│ │ └── research.py # Data models
│ ├── utils/
│ │ ├── __init__.py
│ │ ├── logging.py # Logging utilities
│ │ └── text.py # Text processing
│ └── exceptions/
│ ├── __init__.py
│ └── base.py # Custom exceptions
├── web_ui/
│ ├── frontend/ # React.js application
│ └── backend/ # FastAPI backend
├── tests/ # Test suite
├── http_server.py # HTTP wrapper
├── requirements.txt # Dependencies
├── pyproject.toml # Package configuration
└── Makefile # Development commands
Development Setup
# Install with development dependencies
pip install -e ".[dev,ai]"
# Set up pre-commit hooks (optional)
pre-commit install
# Run tests
make test
# Code formatting
make format
# Linting
make lint
# Type checking
make type-check
Adding New Search Providers
- Create a new searcher class inheriting from
BaseSearcher
- Implement the
search()
method - Add the searcher to
ResearchService
- Update configuration and documentation
Example:
from .base import BaseSearcher
class NewSearcher(BaseSearcher):
async def search(self, query: str, num_results: int) -> List[SearchResult]:
# Implement search logic
pass
🧪 Testing
Running Tests
# Run all tests
make test
# Run with coverage
make test-cov
# Run specific test file
pytest tests/test_models.py
# Run with verbose output
pytest -v
Test Structure
tests/test_models.py
- Data model teststests/test_utils.py
- Utility function teststests/conftest.py
- Test configuration and fixtures
Writing Tests
Tests use pytest and pytest-asyncio for async testing:
import pytest
from analysis_alpaca.models.research import ResearchQuery
@pytest.mark.asyncio
async def test_research_query():
query = ResearchQuery(query="test", sources="web", num_results=2)
assert query.query == "test"
🚀 Deployment
Production Deployment
Docker Deployment
FROM python:3.11-slim
WORKDIR /app
COPY . .
RUN pip install -e .
EXPOSE 8001
CMD ["python", "http_server.py"]
Environment Configuration
For production, set these environment variables:
AA_LOG_LEVEL=WARNING
AA_LOG_FILE=/var/log/analysis-alpaca.log
AA_AUTO_INSTALL_DEPS=false
AA_MAX_RESULTS=3
AA_WEB_TIMEOUT=20.0
Reverse Proxy Setup (Nginx)
server {
listen 80;
server_name your-domain.com;
location / {
proxy_pass http://127.0.0.1:8001;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
Monitoring
The application provides comprehensive logging. Monitor these key metrics:
- Research request rates
- Search success/failure rates
- Content extraction success rates
- Response times
- Error patterns
Scaling Considerations
- The application is stateless and can be horizontally scaled
- Consider implementing Redis for caching search results
- Use a proper message queue for background processing in high-traffic scenarios
🔍 Troubleshooting
Common Issues
Import Errors
# Ensure proper installation
pip install -e .
# Check Python path
python -c "import analysis_alpaca; print('OK')"
Search Timeouts
# Increase timeout values
export AA_WEB_TIMEOUT=30.0
export AA_ACADEMIC_TIMEOUT=30.0
Academic Search Rate Limiting
The system automatically handles Semantic Scholar rate limits with:
- Exponential backoff retry logic
- Graceful degradation (returns web results only)
- Request spacing
Content Extraction Failures
- Check network connectivity
- Verify target site availability
- Some sites may block automated requests
Large Response Truncation
# Increase content size limits
export AA_MAX_CONTENT_SIZE=15000
export AA_MAX_EXTRACTION_SIZE=200000
Debug Mode
Enable detailed logging:
export AA_LOG_LEVEL=DEBUG
export AA_LOG_FILE="debug.log"
python http_server.py
View logs:
tail -f debug.log
Getting Help
- Check the logs for detailed error messages
- Verify your configuration against the examples
- Test with simple queries first
- Ensure all dependencies are properly installed
🤝 Contributing
Development Workflow
- Fork the repository
- Create a feature branch:
git checkout -b feature-name
- Make your changes with tests
- Run quality checks:
make check-all
- Submit a pull request
Code Style
The project uses:
- Black for code formatting
- isort for import sorting
- flake8 for linting
- mypy for type checking
Run all checks:
make check-all
Commit Guidelines
Use conventional commits:
feat:
for new featuresfix:
for bug fixesdocs:
for documentationtest:
for testsrefactor:
for refactoring
📄 License
MIT License - see LICENSE file for details.
🙏 Acknowledgments
- Semantic Scholar for academic search API
- DuckDuckGo for web search functionality
- Model Context Protocol for the integration framework
- FastMCP for the server implementation
- React.js and FastAPI for the web interface
📊 Roadmap
Planned Features
Additional Search Providers
- Google Scholar integration
- Bing Academic search
- ArXiv direct integration
Enhanced Content Processing
- PDF content extraction
- Image and chart analysis
- Table data extraction
Performance Improvements
- Redis caching layer
- Async processing optimization
- Response streaming
Advanced Features
- Citation graph analysis
- Research trend detection
- Multi-language support
Enterprise Features
- User authentication
- Usage analytics
- API rate limiting
- Custom search domains
Version History
- v1.0.0 - Initial release with core research functionality
- v1.1.0 - Added web interface and PDF export
- v1.2.0 - Enhanced error handling and rate limiting
- Current - Comprehensive cleanup and documentation
For the latest updates and detailed changelog, visit the GitHub repository.