External Doc Reader Mcp

1 MIT

FreeCommunity

AI Systems

An MCP server that helps find relevant documentation pages based on user queries.

What is External Doc Reader Mcp

The external-doc-reader-mcp is an MCP server designed to help users find relevant documentation pages from a specified website based on their queries. It aims to streamline the process of locating specific information within extensive documentation sites.

Use cases

Use cases include assisting developers in finding API documentation, helping technical writers locate specific guidelines, and enabling customer support agents to quickly access troubleshooting documents.

How to use

To use external-doc-reader-mcp, a user can invoke the function @find_relevant_doc_pages with parameters including the root URL of the documentation site, the user’s query, and an optional limit on the number of results. The MCP client, such as Cursor, facilitates this interaction.

Key features

Key features include the ability to discover and crawl documentation pages, cache the list of pages for faster access, filter results using a Large Language Model (LLM) to identify the most relevant pages, and serve this functionality within an MCP-compatible environment.

Where to use

external-doc-reader-mcp can be used in various fields that require extensive technical documentation, such as software development, technical writing, and customer support, where quick access to specific information is crucial.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai/download

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is External Doc Reader Mcp

Use cases

How to use

Key features

Where to use

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai/download

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

External Documentation URL Relevance Engine

This project implements an MCP (Model Context Protocol) server that provides a tool to find relevant documentation pages from a given website based on a user’s query. It’s designed to help users quickly locate specific information within large documentation sites.

Project Idea

The core idea is to build an intelligent assistant that can:

Discover: Crawl a specified root documentation URL to find all available pages and their titles.
Cache: Store the list of discovered pages to speed up subsequent requests for the same site.
Filter: Use a Large Language Model (LLM) to compare the user’s query against the discovered pages (titles and potentially content snippets in a more advanced version) to identify the most relevant ones.
Serve: Provide this functionality as a tool within an MCP-compatible environment (like Cursor), allowing AI agents or users to easily invoke it.

This engine is particularly useful when dealing with extensive technical documentation where finding the right page can be time-consuming.

Architecture

The system follows the architecture depicted below:

┌───────────────────────────────────────────────────────────────────┐
│          External Documentation URL Relevance Engine MCP Server   │
├───────────────────────────────────────────────────────────────────┤
│                                                                   │
│  1. MCP Client (e.g., Cursor)                                     │
│      │ Calls @find_relevant_doc_pages(root_url, query, limit)     │
│      ▼                                                            │
│  2. MCP Server (`src/index.ts` - `Server` class)                  │
│      │ Tool: `FIND_RELEVANT_DOC_PAGES_TOOL` defined               │
│      │ Routes to: `handleFindRelevantPages(request)`              │
│      ▼                                                            │
│  3. Tool Handler (`handleFindRelevantPages` in `src/index.ts`)    │
│      │ Receives: { root_url, query, max_pages_to_discover }       │
│      │ Validates args using `FindRelevantDocPagesArgsSchema`      │
│      │ Creates `cacheKey` for discovered pages list               │
│      │                                                            │
│      ├─► 4. CacheManager (`src/cacheManager.ts`)                  │
│      │      │ Input: `cacheKey`                                   │
│      │      │                                                     │
│      │      └─► `cache.get(cacheKey)`                             │
│      │          │ (Tries to get `PageMetadata[]`)                 │
│      │          │                                                 │
│      │          ├─► If cache hit:                                 │
│      │          │     `discoveredPages` = cached_list             │
│      │          │     `source`  = "cache (discovered URLs)"       │
│      │          └─► If cache miss:                                │
│      │                │                                           │
│      │                ▼                                           │
│      │            5. URL Discoverer (`src/scraper.ts`)            │
│      │                │ Uses: `discoverPageUrlsAndMetadata(...)`  │
│      │                │                                           │
│      │                └─► Fetches URLs & titles via Firecrawl     │
│      │                │   (`crawlUrl`)                            │
│      │                └─► `discoveredPages` = list of `PageMetadata`│
│      │                └─► `source` = "live discovery (URLs)"      │
│      │                └─► `cache.set(cacheKey, discoveredPages)`  │
│      │                                                            │
│      │ (If no pages found, returns empty list/message to client)  │
│      │                                                            │
│      ▼                                                            │
│  6. LLM Service (`src/llmService.ts`)                             │
│      │ Input: `discoveredPages` (`PageMetadata[]`), `query`       │
│      │ Uses: `filterRelevantPages(discoveredPages, query)`        │
│      │                                                            │
│      └─► Queries Gemini (via LangChain) with prompt:              │
│            "Given [list of URLs & titles] & [query],              │
│             return comma-separated list of relevant URLs or NONE" │
│      │                                                            │
│      ▼                                                            │
│  7. Response Formatting (`handleFindRelevantPages` in `src/index.ts`)│
│      │ Output: `{ content: [{text: message}, ...], isError }`     │
│      │ Returns to MCP Client (list of relevant URL strings)       │
│                                                                   │
└───────────────────────────────────────────────────────────────────┘

Key Components:

MCP Client: Any MCP-compliant client (e.g., Cursor) that can call the exposed tool.
MCP Server (src/index.ts): The backbone of the service. It defines and exposes the find_relevant_doc_pages tool.
Tool Handler (handleFindRelevantPages in src/index.ts): The core logic orchestrating the request. It validates arguments, interacts with the cache, triggers URL discovery, and invokes the LLM service.
Cache Manager (src/cacheManager.ts): Implements a simple file-based cache to store lists of discovered page URLs and metadata, reducing redundant calls to the discovery service for the same root URL.
URL Discoverer (src/scraper.ts): Leverages the Firecrawl service to crawl a given root_url and extract page URLs and their titles (PageMetadata).
LLM Service (src/llmService.ts): Uses a Google Gemini model (via LangChain) to process the list of discovered pages and the user’s query. It then identifies and returns only the URLs that are most relevant to the query.
Configuration (src/config.ts): Manages API keys (Google API Key for Gemini, Firecrawl API Key) loaded from environment variables.

Setup and Usage

1. Prerequisites:

Node.js (e.g., v18+ or v20+)
npm or yarn

2. Installation:
If you want to run the server from source for development:

git clone https://github.com/DwcQuocXa/external-doc-reader-mcp.git
cd external-doc-reader-mcp
npm install

Then you can run it locally using npm start (see “Running Locally” below).

For usage with MCP clients like Cursor or Claude Desktop, it’s recommended to use npx (once the package is published to npm) or Docker.

3. Configuration:

This MCP server requires two API keys to function:

Google API Key: For accessing Google Gemini LLM via LangChain to filter relevant pages.
Firecrawl API Key: For crawling websites to discover pages and their metadata.

Getting API Keys:

Google API Key:
1. Go to Google AI Studio (or Google Cloud Console for Vertex AI).
2. Create an API key.
Firecrawl API Key:
1. Sign up at Firecrawl.dev.
2. Obtain your API key from your dashboard.

Setting Environment Variables:
These keys must be available as environment variables to the server.

If running locally from source: Create a .env file in the project root:

GOOGLE_API_KEY="your_google_api_key_here"
FIRECRAWL_API_KEY="your_firecrawl_api_key_here"

If using NPX/Docker with MCP clients: The environment variables will be configured in the client (see examples below).

4. Running Locally (for development/from source):
After cloning, installing dependencies, and setting up your .env file:

npm start

This will start the MCP server, listening for requests via stdio.

5. Usage with MCP Clients:

Once the package is published to npm as @dwcquocxa/external-doc-reader-mcp and/or a Docker image is available (e.g., dwcquocxa/external-doc-reader-mcp).

A. Usage with Claude Desktop:
Add this to your claude_desktop_config.json:

NPX:

{
  "mcpServers": {
    "external-doc-reader": {
      "command": "npx",
      "args": [
        "-y",
        "@dwcquocxa/external-doc-reader-mcp"
      ],
      "env": {
        "GOOGLE_API_KEY": "YOUR_GOOGLE_API_KEY_HERE",
        "FIRECRAWL_API_KEY": "YOUR_FIRECRAWL_API_KEY_HERE"
      }
    }
  }
}

Docker (Optional - if you publish a Docker image):
Replace dwcquocxa/external-doc-reader-mcp with your actual Docker image name if different.

{
  "mcpServers": {
    "external-doc-reader": {
      "command": "docker",
      "args": [
        "run",
        "-i",
        "--rm",
        "-e",
        "GOOGLE_API_KEY",
        "-e",
        "FIRECRAWL_API_KEY",
        "dwcquocxa/external-doc-reader-mcp"
      ],
      "env": {
        "GOOGLE_API_KEY": "YOUR_GOOGLE_API_KEY_HERE",
        "FIRECRAWL_API_KEY": "YOUR_FIRECRAWL_API_KEY_HERE"
      }
    }
  }
}

B. Usage with VS Code:
For quick installation, use the one-click installation buttons below (ensure your package is on npm and your Docker image, if used, is public):

(Note: Docker buttons assume you’ve published an image named dwcquocxa/external-doc-reader-mcp to a public registry like Docker Hub. Update the image name in the links if necessary.)

For manual installation, add the following JSON block to your User Settings (JSON) file in VS Code (Ctrl+Shift+P, then “Preferences: Open User Settings (JSON)”) or to a .vscode/mcp.json file in your workspace.

NPX:

{
  "mcp": {
    "inputs": [
      {
        "type": "promptString",
        "id": "google_api_key",
        "description": "Google API Key (for Gemini)",
        "password": true
      },
      {
        "type": "promptString",
        "id": "firecrawl_api_key",
        "description": "Firecrawl API Key",
        "password": true
      }
    ],
    "servers": {
      "external-doc-reader": {
        "command": "npx",
        "args": [
          "-y",
          "@dwcquocxa/external-doc-reader-mcp"
        ],
        "env": {
          "GOOGLE_API_KEY": "${input:google_api_key}",
          "FIRECRAWL_API_KEY": "${input:firecrawl_api_key}"
        }
      }
    }
  }
}

Docker (Optional - if you publish a Docker image):
Replace dwcquocxa/external-doc-reader-mcp with your actual Docker image name if different.

{
  "mcp": {
    "inputs": [
      {
        "type": "promptString",
        "id": "google_api_key",
        "description": "Google API Key (for Gemini)",
        "password": true
      },
      {
        "type": "promptString",
        "id": "firecrawl_api_key",
        "description": "Firecrawl API Key",
        "password": true
      }
    ],
    "servers": {
      "external-doc-reader": {
        "command": "docker",
        "args": [
          "run",
          "-i",
          "--rm",
          "-e",
          "GOOGLE_API_KEY",
          "-e",
          "FIRECRAWL_API_KEY",
          "dwcquocxa/external-doc-reader-mcp"
        ],
        "env": {
          "GOOGLE_API_KEY": "${input:google_api_key}",
          "FIRECRAWL_API_KEY": "${input:firecrawl_api_key}"
        }
      }
    }
  }
}

6. Tool Invocation (Example):
Once the server is running and your MCP client (Cursor, VS Code with MCP extension, Claude Desktop) is configured, you can invoke the tool:

@find_relevant_doc_pages(root_url="https://docs.example.com", query="how to configure authentication", max_pages_to_discover=25)

Parameters:

root_url (string, required): The base URL of the documentation site you want to search.
query (string, required): The question or search terms to find relevant pages for.
max_pages_to_discover (integer, optional, default: 20): The maximum number of pages the tool should discover within the root_url. Min: 1, Max: 50.

The tool will return a list of URLs deemed most relevant to your query.

Troubleshooting

FIRECRAWL_API_KEY not set / Scraper not initialized: Ensure FIRECRAWL_API_KEY is correctly set in your .env file and accessible to the application. The scraper relies on this key.
GOOGLE_API_KEY not set / LLM calls will fail: Ensure GOOGLE_API_KEY is correctly set. The LLM filtering service needs this to function.
Invalid arguments: Double-check that root_url is a valid URL and other parameters meet the schema requirements.
Failed to discover pages: The target website might be blocking crawlers, or the root_url might be incorrect or inaccessible.
LLM processing failed: This could be due to issues with the LLM service itself, API key problems, or the prompt/data sent to the LLM. Check server logs for more details.

DevTools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. codeium.com/windsurf

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP DevTools

Tools

No tools

External Doc Reader Mcp

What is External Doc Reader Mcp

Use cases

How to use

Key features

Where to use

Clients Supporting MCP

Overview

What is External Doc Reader Mcp

Use cases

How to use

Key features

Where to use

Clients Supporting MCP

Content

External Documentation URL Relevance Engine

Project Idea

Architecture

Setup and Usage

Troubleshooting

DevTools Supporting MCP

Tools

Comments