documentation-mcp-using-pocketbase

説明なし

GitHubスター

0

ユーザー評価

未評価

フォーク

0

イシュー

1

閲覧数

0

お気に入り

0

README
Document Extractor MCP Server

A Model Context Protocol (MCP) server that extracts document content from Microsoft Learn and GitHub URLs, storing them in PocketBase for easy retrieval and search.

Features

Latest MCP SDK Features (v1.12.0+)

  • Modern McpServer architecture with enhanced capabilities
  • Multiple transport protocols: STDIO, Streamable HTTP, SSE
  • Dynamic tool management with lazy loading
  • Session management for stateful connections
  • Server-Sent Events support with backwards compatibility
  • Real-time server statistics and metrics

Content Extraction

  • Microsoft Learn articles with rich metadata
  • GitHub files (README, documentation, code files)
  • Intelligent content parsing and cleaning
  • Duplicate detection and updates

PocketBase Integration

  • Persistent document storage
  • Full-text search capabilities
  • Metadata preservation
  • CRUD operations

Advanced Server Features

  • Multiple transport modes (STDIO/HTTP)
  • Health check and info endpoints
  • Read-only mode support
  • Enhanced error handling and debugging
  • Resource endpoints for server metrics

Rich Metadata

  • Word counts and content statistics
  • Source attribution and URLs
  • Extraction timestamps
  • Content headers and descriptions
Requirements
  • Node.js 18+ with ES modules support
  • PocketBase server running
  • Network access for content extraction
Installation
1. Install Dependencies
# Navigate to the project directory
cd c:\powershell_scripts\pocketbase_document_mcp\document-extractor-mcp

# Install dependencies
npm install
2. PocketBase Setup

The MCP server supports both local and remote PocketBase instances. Choose the setup that best fits your needs:

Option A: Local PocketBase Instance
  1. Download and install PocketBase:

    # Download from https://pocketbase.io/docs/
    # Extract the executable to your preferred directory
    
  2. Start local PocketBase server:

    # Run from the directory containing pocketbase.exe
    .\pocketbase.exe serve
    
    # Or specify custom port and data directory
    .\pocketbase.exe serve --http="127.0.0.1:8090" --dir="./pb_data"
    
  3. Set up admin account:

    • Access PocketBase Admin UI at http://127.0.0.1:8090/_/
    • Create your admin account
    • Note the email/password for configuration
Option B: Remote PocketBase Instance
  1. Deploy PocketBase to your preferred hosting:

    • Railway, Fly.io, DigitalOcean, AWS, etc.
    • Follow your hosting provider's deployment guide
    • Ensure HTTPS is enabled for production
  2. Configure your remote instance:

Option C: Docker PocketBase
  1. Using Docker Compose:

    version: '3.8'
    services:
      pocketbase:
        image: ghcr.io/muchobien/pocketbase:latest
        ports:
          - "8090:8090"
        volumes:
          - ./pb_data:/pb/pb_data
    
  2. Collection Management (Automatic for all setups):

    • The server will automatically create the required documents collection on startup
    • If AUTO_CREATE_COLLECTION=true (default), no manual setup needed
    • Use the ensure_collection tool to manually verify/create collections
    • Use the collection_info tool to check collection status
  3. Manual Collection Setup (if needed):

    • Access PocketBase Admin UI
    • Create a new collection named documents
    • Add these fields:
      title (Text, required)
      content (Text, required)
      metadata (JSON, required)
      created (Date, auto-generated)
      updated (Date, optional)
      
3. Environment Configuration

Create a .env file in the project root. The server supports both local and remote PocketBase instances:

For Local PocketBase Instance:
# PocketBase Configuration - Local
POCKETBASE_URL=http://127.0.0.1:8090
POCKETBASE_ADMIN_EMAIL=admin@example.com
POCKETBASE_ADMIN_PASSWORD=your-secure-password

# Collection Settings
DOCUMENTS_COLLECTION=documents

# Transport Configuration
TRANSPORT_MODE=stdio
HTTP_PORT=3000

# Development Settings
DEBUG=true
NODE_ENV=development
READ_ONLY_MODE=false

# Collection Management ✨ New!
AUTO_CREATE_COLLECTION=true
For Remote PocketBase Instance:
# PocketBase Configuration - Remote
POCKETBASE_URL=https://your-pocketbase-instance.com
POCKETBASE_ADMIN_EMAIL=admin@yourdomain.com
POCKETBASE_ADMIN_PASSWORD=your-secure-password

# Collection Settings
DOCUMENTS_COLLECTION=documents

# Transport Configuration
TRANSPORT_MODE=stdio
HTTP_PORT=3000

# Production Settings
DEBUG=false
NODE_ENV=production
READ_ONLY_MODE=false

# Collection Management
AUTO_CREATE_COLLECTION=true
For Dockerized PocketBase:
# PocketBase Configuration - Docker
POCKETBASE_URL=http://pocketbase:8090
POCKETBASE_ADMIN_EMAIL=admin@localhost
POCKETBASE_ADMIN_PASSWORD=admin123

# Collection Settings
DOCUMENTS_COLLECTION=documents

# Transport Configuration
TRANSPORT_MODE=stdio
HTTP_PORT=3000

# Container Settings
DEBUG=false
NODE_ENV=production
READ_ONLY_MODE=false

# Collection Management
AUTO_CREATE_COLLECTION=true
Usage
Starting the Server

The server supports multiple transport modes:

# STDIO mode (default) - for Claude Desktop and CLI clients
npm start
# or explicitly
npm run start:stdio

# HTTP mode - for web clients and testing
npm run start:http

# Development modes with debug logging
npm run dev              # STDIO mode with debugging
npm run dev:http         # HTTP mode with debugging
npm run dev:stdio        # STDIO mode with debugging

# Test the setup
npm run test
Transport Modes
STDIO Mode (Default)

Perfect for Claude Desktop and command-line MCP clients:

npm start
HTTP Mode

Enables web-based clients and testing with multiple protocols:

npm run start:http

Available endpoints in HTTP mode:

  • POST /mcp - Streamable HTTP transport (modern protocol 2025-03-26)
  • GET /sse - Server-Sent Events transport (legacy protocol 2024-11-05)
  • POST /messages - SSE message endpoint
  • GET /health - Health check endpoint
  • GET /info - Server information endpoint
Available Tools
1. extract_document

Extract and store content from URLs.

Parameters:

  • url (string, required): Microsoft Learn or GitHub URL

Example:

{
  "url": "https://learn.microsoft.com/en-us/azure/cognitive-services/openai/"
}
2. list_documents

List stored documents with pagination.

Parameters:

  • limit (number, optional): Max results per page (1-100, default: 20)
  • page (number, optional): Page number (default: 1)
3. search_documents

Search documents by title or content.

Parameters:

  • query (string, required): Search query
  • limit (number, optional): Max results (1-100, default: 50)
4. get_document

Retrieve a specific document by ID.

Parameters:

  • id (string, required): Document ID
5. delete_document

Delete a document by ID.

Parameters:

  • id (string, required): Document ID to delete
6. ensure_collection ✨ New!

Check if the documents collection exists and create it if needed.

Parameters: None

Description: Automatically verifies the documents collection exists in PocketBase. If not found, creates the collection with the proper schema including all required fields and indexes.

7. collection_info ✨ New!

Get detailed information about the documents collection including statistics.

Parameters: None

Description: Returns comprehensive collection information including schema details, record counts, indexes, and timestamps.

Available Resources
1. stats://server

Real-time server statistics and metrics.

Content:

  • Total document count
  • Server information (name, version, uptime)
  • Memory usage statistics
  • Environment information
  • Read-only mode status
Dynamic Tool Management

The server supports dynamic tool management with lazy loading:

// Tools can be dynamically enabled/disabled
if (process.env.READ_ONLY_MODE === 'true') {
  // Write operations are disabled in read-only mode
  deleteDocumentTool.disable();
  extractDocumentTool.disable();
}

// Tools can be re-enabled at runtime
tool.enable();
Session Management

In HTTP mode, the server supports session management:

  • Streamable HTTP: Modern session management with automatic session ID generation
  • SSE (Legacy): Backwards compatible session handling
  • Session persistence: Sessions are maintained across requests
  • Automatic cleanup: Sessions are cleaned up when connections close
Supported Sources
Microsoft Learn
  • Full article extraction
  • Metadata preservation (description, keywords, author)
  • Section headers extraction
  • Content cleaning and formatting

Example URLs:

  • https://learn.microsoft.com/en-us/azure/cognitive-services/openai/
  • https://learn.microsoft.com/en-us/dotnet/core/introduction
GitHub
  • File content extraction (README, docs, code)
  • Repository metadata
  • Branch handling (main/master fallback)
  • File type detection

Supported URL formats:

  • https://github.com/owner/repo (assumes README.md)
  • https://github.com/owner/repo/blob/main/file.md
  • https://raw.githubusercontent.com/owner/repo/main/file.md
Configuration Options
Environment Variables
Variable Description Default
POCKETBASE_URL PocketBase server URL http://127.0.0.1:8090
POCKETBASE_ADMIN_EMAIL Admin email for authentication Required
POCKETBASE_ADMIN_PASSWORD Admin password Required
DOCUMENTS_COLLECTION Collection name for documents documents
DEBUG Enable debug logging false
NODE_ENV Environment mode development
READ_ONLY_MODE Disable write operations false
AUTO_CREATE_COLLECTION Auto-create collections on startup true
Debug Mode

Enable detailed logging:

$env:DEBUG="true"; node server.js

Debug logs include:

  • Authentication status
  • Content extraction details
  • Database operations
  • Error context
Error Handling

The server implements comprehensive error handling:

  • Network errors: Timeout and connection issues
  • Authentication errors: PocketBase connection problems
  • Validation errors: Invalid input parameters
  • Content errors: Extraction failures
  • Database errors: Storage and retrieval issues

All errors are returned as structured MCP responses with appropriate error codes.

Development
Scripts
# Start in development mode
npm run dev

# Start in production mode
npm start

# Install dependencies
npm run install-deps
Testing the Server
# Test basic functionality
$env:DEBUG="true"; node server.js

# In another terminal, you can test with MCP tools or:
# Use Claude Desktop with MCP configuration
# Use other MCP-compatible clients
Troubleshooting
Common Issues
  1. Authentication Failed

    • Verify PocketBase is running: http://127.0.0.1:8090
    • Check admin credentials in .env
    • Ensure admin user exists in PocketBase
  2. Content Extraction Errors

    • Check network connectivity
    • Verify URL accessibility
    • Review debug logs for details
  3. Collection Not Found

    • Use the ensure_collection tool to automatically create the collection
    • Check collection name in environment variables
    • Verify AUTO_CREATE_COLLECTION is enabled
    • Check collection permissions
  4. Module Import Errors

    • Ensure "type": "module" in package.json
    • Use Node.js 18+ with ES modules support
    • Check all dependencies are installed
Debug Information

Enable debug mode to see detailed logs:

$env:DEBUG="true"; node server.js
PocketBase Collection Schema

If you need to recreate the collection, use this schema:

{
  "name": "documents",
  "type": "base",
  "schema": [
    {
      "name": "title",
      "type": "text",
      "required": true,
      "options": {
        "max": 255
      }
    },
    {
      "name": "content",
      "type": "text",
      "required": true
    },
    {
      "name": "metadata",
      "type": "json",
      "required": true
    },
    {
      "name": "created",
      "type": "date",
      "required": false
    },
    {
      "name": "updated",
      "type": "date",
      "required": false
    }
  ]
}
MCP Client Configuration
Claude Desktop Configuration

Add this to your Claude Desktop MCP settings:

{
  "mcpServers": {
    "document-extractor": {
      "command": "node",
      "args": ["c:\\powershell_scripts\\pocketbase_document_mcp\\document-extractor-mcp\\server.js"],
      "env": {
        "POCKETBASE_URL": "http://127.0.0.1:8090",
        "POCKETBASE_ADMIN_EMAIL": "your-admin@example.com",
        "POCKETBASE_ADMIN_PASSWORD": "your-password",
        "DEBUG": "false"
      }
    }
  }
}
License

MIT License - see LICENSE file for details.

Contributing
  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request
Changelog
v1.1.0 ✨ Latest Update
  • Latest MCP SDK v1.13.1+: Upgraded to the newest Model Context Protocol SDK
  • Latest PocketBase SDK v0.26.1+: Updated to the latest PocketBase features
  • Collection Management Tools: Added ensure_collection and collection_info tools
  • Auto-Collection Creation: Automatic database schema setup on startup
  • Enhanced Lazy Loading: Improved dynamic tool management
  • Latest SSE Features: Modern Server-Sent Events implementation
  • Improved Error Handling: Better collection management error recovery
  • Enhanced Documentation: Comprehensive usage examples and troubleshooting
v1.0.0
  • Updated to latest Anthropic MCP SDK
  • Added comprehensive error handling
  • Implemented input validation with Zod
  • Enhanced metadata extraction
  • Added debug logging
  • Improved documentation
  • Added PocketBase integration
  • Support for Microsoft Learn and GitHub
Deployment
Smithery Deployment

This MCP server supports deployment on Smithery, a platform for hosting MCP servers.

TypeScript Deploy (Recommended)

The fastest way to deploy this server on Smithery:

  1. Fork or Clone this repository to your GitHub account
  2. Connect GitHub to Smithery (or claim your server if already listed)
  3. Navigate to the Deployments tab on your server page
  4. Click Deploy - Smithery will automatically build and host your server

The smithery.yaml file is already configured for TypeScript/Node.js deployment.

Note: Despite being called "TypeScript Deploy", this method works perfectly for Node.js projects with ES modules.

Custom Deploy (Docker)

For advanced deployment with full Docker control:

  1. Replace smithery.yaml with the container configuration:
    cp smithery-container.yaml smithery.yaml
    
  2. Push to GitHub with the updated configuration
  3. Deploy via Smithery's Deployments tab

The Dockerfile is optimized for production deployment with security best practices.

Configuration

When deploying on Smithery, you'll configure:

  • PocketBase URL: Your PocketBase instance URL
  • Admin Credentials: Email and password for PocketBase admin
  • Collection Settings: Default collection name and auto-creation
  • Debug Mode: Enable detailed logging (optional)
Best Practices for Smithery
  • Tool Discovery: All tools are available without authentication for discovery
  • Lazy Authentication: API validation occurs only when tools are invoked
  • Environment Variables: Configuration is handled via Smithery's config schema
  • Health Checks: Built-in health monitoring at /health endpoint
作者情報
Kameron
Dynamic EndpointsVancouver

10

フォロワー

76

リポジトリ

4

Gist

26

貢献数

トップ貢献者

スレッド