dataset-viewer

MCP server for Hugging Face dataset viewer

GitHubスター

23

ユーザー評価

未評価

フォーク

10

イシュー

3

閲覧数

0

お気に入り

0

README
Dataset Viewer MCP Server

An MCP server for interacting with the Hugging Face Dataset Viewer API, providing capabilities to browse and analyze datasets hosted on the Hugging Face Hub.

Features
Resources
  • Uses dataset:// URI scheme for accessing Hugging Face datasets
  • Supports dataset configurations and splits
  • Provides paginated access to dataset contents
  • Handles authentication for private datasets
  • Supports searching and filtering dataset contents
  • Provides dataset statistics and analysis
Tools

The server provides the following tools:

  1. validate

    • Check if a dataset exists and is accessible
    • Parameters:
      • dataset: Dataset identifier (e.g. 'stanfordnlp/imdb')
      • auth_token (optional): For private datasets
  2. get_info

    • Get detailed information about a dataset
    • Parameters:
      • dataset: Dataset identifier
      • auth_token (optional): For private datasets
  3. get_rows

    • Get paginated contents of a dataset
    • Parameters:
      • dataset: Dataset identifier
      • config: Configuration name
      • split: Split name
      • page (optional): Page number (0-based)
      • auth_token (optional): For private datasets
  4. get_first_rows

    • Get first rows from a dataset split
    • Parameters:
      • dataset: Dataset identifier
      • config: Configuration name
      • split: Split name
      • auth_token (optional): For private datasets
  5. get_statistics

    • Get statistics about a dataset split
    • Parameters:
      • dataset: Dataset identifier
      • config: Configuration name
      • split: Split name
      • auth_token (optional): For private datasets
  6. search_dataset

    • Search for text within a dataset
    • Parameters:
      • dataset: Dataset identifier
      • config: Configuration name
      • split: Split name
      • query: Text to search for
      • auth_token (optional): For private datasets
  7. filter

    • Filter rows using SQL-like conditions
    • Parameters:
      • dataset: Dataset identifier
      • config: Configuration name
      • split: Split name
      • where: SQL WHERE clause (e.g. "score > 0.5")
      • orderby (optional): SQL ORDER BY clause
      • page (optional): Page number (0-based)
      • auth_token (optional): For private datasets
  8. get_parquet

    • Download entire dataset in Parquet format
    • Parameters:
      • dataset: Dataset identifier
      • auth_token (optional): For private datasets
Installation
Prerequisites
  • Python 3.12 or higher
  • uv - Fast Python package installer and resolver
Setup
  1. Clone the repository:
git clone https://github.com/privetin/dataset-viewer.git
cd dataset-viewer
  1. Create a virtual environment and install:
# Create virtual environment
uv venv

# Activate virtual environment
# On Unix:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate

# Install in development mode
uv add -e .
Configuration
Environment Variables
  • HUGGINGFACE_TOKEN: Your Hugging Face API token for accessing private datasets
Claude Desktop Integration

Add the following to your Claude Desktop config file:

On Windows: %APPDATA%\Claude\claude_desktop_config.json

On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "dataset-viewer": {
      "command": "uv",
      "args": [
        "--directory",
        "parent_to_repo/dataset-viewer",
        "run",
        "dataset-viewer"
      ]
    }
  }
}
License

MIT License - see LICENSE for details

作者情報
김정석

I used to teach myself natural languages. Now I am more into processing them.

1

フォロワー

7

リポジトリ

0

Gist

3

貢献数

トップ貢献者

スレッド