Dataset Viewer

18 MIT

FreeCommunity

Analytics

#Hugging Face#datasets#data analysis

Browse and analyze Hugging Face datasets with features like search, filtering, statistics, and data export

What is Dataset Viewer

The MCP Server is an interface for interacting with the Hugging Face Dataset Viewer API, allowing users to browse and analyze datasets hosted on the Hugging Face Hub.

Use cases

Users can validate the existence of datasets, retrieve detailed information, access paginated dataset contents, perform searches, filter data using SQL-like conditions, and download datasets in various formats, including Parquet.

How to use

To use the server, clone the repository, set up a virtual environment using ‘uv’, activate it, and install the package in development mode. Configuration involves setting environment variables for authentication and integrating the MCP server into Claude Desktop.

Key features

Key features include support for private datasets, pagination, dataset exploration with configurations and splits, searching and filtering capabilities, as well as obtaining statistics and downloading datasets in different formats.

Where to use

This server can be used for any application or research scenario that requires dataset access and analysis, especially in machine learning and data science, where Hugging Face datasets are commonly utilized.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Dataset Viewer

The MCP Server is an interface for interacting with the Hugging Face Dataset Viewer API, allowing users to browse and analyze datasets hosted on the Hugging Face Hub.

Use cases

How to use

Key features

Where to use

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

Dataset Viewer MCP Server

An MCP server for interacting with the Hugging Face Dataset Viewer API, providing capabilities to browse and analyze datasets hosted on the Hugging Face Hub.

Features

Resources

Uses dataset:// URI scheme for accessing Hugging Face datasets
Supports dataset configurations and splits
Provides paginated access to dataset contents
Handles authentication for private datasets
Supports searching and filtering dataset contents
Provides dataset statistics and analysis

Tools

The server provides the following tools:

validate
- Check if a dataset exists and is accessible
- Parameters:
  - dataset: Dataset identifier (e.g. ‘stanfordnlp/imdb’)
  - auth_token (optional): For private datasets
get_info
- Get detailed information about a dataset
- Parameters:
  - dataset: Dataset identifier
  - auth_token (optional): For private datasets
get_rows
- Get paginated contents of a dataset
- Parameters:
  - dataset: Dataset identifier
  - config: Configuration name
  - split: Split name
  - page (optional): Page number (0-based)
  - auth_token (optional): For private datasets
get_first_rows
- Get first rows from a dataset split
- Parameters:
  - dataset: Dataset identifier
  - config: Configuration name
  - split: Split name
  - auth_token (optional): For private datasets
get_statistics
- Get statistics about a dataset split
- Parameters:
  - dataset: Dataset identifier
  - config: Configuration name
  - split: Split name
  - auth_token (optional): For private datasets
search_dataset
- Search for text within a dataset
- Parameters:
  - dataset: Dataset identifier
  - config: Configuration name
  - split: Split name
  - query: Text to search for
  - auth_token (optional): For private datasets
filter
- Filter rows using SQL-like conditions
- Parameters:
  - dataset: Dataset identifier
  - config: Configuration name
  - split: Split name
  - where: SQL WHERE clause (e.g. “score > 0.5”)
  - orderby (optional): SQL ORDER BY clause
  - page (optional): Page number (0-based)
  - auth_token (optional): For private datasets
get_parquet
- Download entire dataset in Parquet format
- Parameters:
  - dataset: Dataset identifier
  - auth_token (optional): For private datasets

Installation

Prerequisites

Python 3.12 or higher
uv - Fast Python package installer and resolver

Setup

Clone the repository:

git clone https://github.com/privetin/dataset-viewer.git
cd dataset-viewer

Create a virtual environment and install:

# Create virtual environment
uv venv

# Activate virtual environment
# On Unix:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate

# Install in development mode
uv add -e .

Configuration

Environment Variables

HUGGINGFACE_TOKEN: Your Hugging Face API token for accessing private datasets

Claude Desktop Integration

Add the following to your Claude Desktop config file:

On Windows: %APPDATA%\Claude\claude_desktop_config.json

On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "dataset-viewer": {
      "command": "uv",
      "args": [
        "--directory",
        "parent_to_repo/dataset-viewer",
        "run",
        "dataset-viewer"
      ]
    }
  }
}

License

MIT License - see LICENSE for details

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

get_info

Get detailed information about a Hugging Face dataset including description, features, splits, and statistics. Run validate first to check if the dataset exists and is accessible.

get_rows

Get paginated rows from a Hugging Face dataset

get_first_rows

Get first rows from a Hugging Face dataset split

search_dataset

Search for text within a Hugging Face dataset

filter

Filter rows in a Hugging Face dataset using SQL-like conditions

get_statistics

Get statistics about a Hugging Face dataset

1 / 2

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers