MCP ExplorerExplorer

Sitemcp

@ryoppippion 18 days ago
596 MIT
FreeCommunity
AI Systems
Fetch an entire site and use it as an MCP Server

Overview

What is Sitemcp

sitemcp is a tool that allows users to fetch an entire website and utilize it as an MCP Server. It is a fork of the sitefetch project.

Use cases

Use cases for sitemcp include creating local copies of websites for development, serving static sites as MCP Servers, and extracting content from specific pages for analysis or documentation.

How to use

To use sitemcp, you can install it globally or run it as a one-off command. For example, run ‘sitemcp https://daisyui.com’ to fetch the site. You can also specify concurrency and other options for customization.

Key features

Key features of sitemcp include the ability to set a custom tool name strategy, specify the maximum length of content to fetch, and match specific pages using patterns.

Where to use

sitemcp can be used in web development, content management, and data scraping scenarios where fetching and serving a complete website as an MCP Server is required.

Content

SiteMCP

npm version
NPM Downloads

DeepWiki

Fetch an entire site and use it as an MCP Server

https://github.com/user-attachments/assets/ebe2d7c6-4ddc-4a37-8e1e-d80fac49d8ae

Demo in Japanese

https://github.com/user-attachments/assets/24288140-be2a-416c-9e7c-c49be056a373

[!NOTE]
sitemcp is a fork of sitefetch by @egoist

Install

One-off usage (choose one of the followings):

bunx sitemcp
npx sitemcp
pnpx sitemcp

Install globally (choose one of the followings):

bun i -g sitemcp
npm i -g sitemcp
pnpm i -g sitemcp

Usage

sitemcp https://daisyui.com

# or better concurrency
sitemcp https://daisyui.com --concurrency 10

Tool Name Strategy

Use -t, --tool-name-strategy to specify the tool name strategy, it will be used as the MCP server name (default: domain).
This will be used as the MCP server name.

sitemcp https://vite.dev -t domain # indexOfVite / getDocumentOfVite
sitemcp https://react-tweet.vercel.app/ -t subdomain # indexOfReactTweet / getDocumentOfReactTweet
sitemcp https://ryoppippi.github.io/vite-plugin-favicons/ -t pathname # indexOfVitePluginFavicons / getDocumentOfVitePluginFavicons

Max Length of Content

Use -l, --max-length to specify the max length of content, default is 2000 characters.
This is useful for sites with long content, such as blogs or documentation.
The acceptable content length depends on the MCP client you are using, so please check the documentation of your MCP client for more details.
Also welcome to open an issue if you have any questions.

sitemcp https://vite.dev -l 10000

Match specific pages

Use the -m, --match flag to specify the pages you want to fetch:

sitemcp https://vite.dev -m "/blog/**" -m "/guide/**"

The match pattern is tested against the pathname of target pages, powered by micromatch, you can check out all the supported matching features.

Content selector

We use mozilla/readability to extract readable content from the web page, but on some pages it might return irrelevant contents, in this case you can specify a CSS selector so we know where to find the readable content:

sitemcp https://vite.dev --content-selector ".content"

How to configure with MCP Client

You can execute server from your MCP client (e.g. Claude Desktop).

The below example configuration for Claude Desktop

{
  "mcpServers": {
    "daisy-ui": {
      "command": "npx",
      "args": [
        "-y",
        "sitemcp",
        "https://daisyui.com",
        "-m",
        "/components/**"
      ]
    }
  }
}

Tips

  • Some site has a lot of pages. It is better to run sitemcp before registering the server to the MCP client. sitemcp caches the pages in ~/.cache/sitemcp by default. You can disable by --no-cache flag.

License

MIT.

Sponsors

Stats

Star History Chart

Stats by Repobeats

Tools

No tools

Comments