- Explore MCP Servers
- mcp-academic-rag-system
Mcp Academic Rag System
What is Mcp Academic Rag System
The mcp-academic-rag-system is a Model Context Protocol (MCP) server designed for academic literature processing, providing OCR capabilities, intelligent retrieval, and RAG (Retrieval-Augmented Generation) functionalities.
Use cases
Use cases include converting scanned academic papers into searchable formats, categorizing literature for research projects, facilitating literature reviews, and enabling interactive Q&A sessions based on academic content.
How to use
Users can interact with the mcp-academic-rag-system through its API, allowing them to upload documents for OCR processing, perform intelligent searches using keywords or semantic queries, and manage conversations based on literature content.
Key features
Key features include document OCR processing, automatic classification of literature, intelligent retrieval through natural language queries, structured content access, and conversation management tools.
Where to use
The mcp-academic-rag-system is applicable in academic research, libraries, educational institutions, and any field that requires efficient literature management and retrieval.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Overview
What is Mcp Academic Rag System
The mcp-academic-rag-system is a Model Context Protocol (MCP) server designed for academic literature processing, providing OCR capabilities, intelligent retrieval, and RAG (Retrieval-Augmented Generation) functionalities.
Use cases
Use cases include converting scanned academic papers into searchable formats, categorizing literature for research projects, facilitating literature reviews, and enabling interactive Q&A sessions based on academic content.
How to use
Users can interact with the mcp-academic-rag-system through its API, allowing them to upload documents for OCR processing, perform intelligent searches using keywords or semantic queries, and manage conversations based on literature content.
Key features
Key features include document OCR processing, automatic classification of literature, intelligent retrieval through natural language queries, structured content access, and conversation management tools.
Where to use
The mcp-academic-rag-system is applicable in academic research, libraries, educational institutions, and any field that requires efficient literature management and retrieval.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Content
MCP学术文献RAG服务器
基于Model Context Protocol (MCP)的学术文献检索增强生成(RAG)服务器,提供文献OCR处理、自动分类、智能检索与AI交互功能。
⚠️ 开发状态警告
当前项目状态:原型开发中 - 尚未完成
本项目目前处于积极开发阶段,尚未准备好用于生产环境。API、功能和结构可能会发生重大变化。欢迎贡献和反馈,但请注意当前的不稳定性。
MCP集成功能
作为MCP服务器,本项目将提供以下MCP功能:
工具 (Tools)
Currently implemented tools (some are placeholders):
-
echo- Description: Echoes back the input message.
- MCP Command Parameters (
tool_params):message(string, required): The message to echo.
- Example Result:
{"echo_response": "your message"}
-
document_search- Description: Searches a persistent list of academic documents. The document store is loaded from
documents.jsonon server start and saved to this file when new documents are added. Documents added via theadd_document_to_storetool will persist across server restarts. The search is case-insensitive and covers document titles, abstracts, and keywords. - MCP Command Parameters (
tool_params):query(string, required): The search term or question.max_results(integer, optional, default: 3): The maximum number of search results to return.
- Example MCP Command (for POST to
/mcp_commandor STDIO input):{ "command": "execute_tool", "tool_name": "document_search", "tool_params": { "query": "healthcare", "max_results": 1 } } - Example Result (in
datafield oftool_resultSSE event or STDIO output):(Note: The actual results will depend on the query and the content of the{ "search_results": [ { "id": "doc101", "title": "Exploring Artificial Intelligence in Modern Healthcare", "abstract": "This paper discusses the impact of AI on diagnostics and treatment, highlighting machine learning advancements.", "keywords": [ "ai", "healthcare", "diagnostics", "machine learning", "treatment" ] } ], "query_received": "healthcare" }documents.jsonfile.)
- Description: Searches a persistent list of academic documents. The document store is loaded from
-
add_document_to_store- Description: Adds a new document to the persistent document store (saved in
documents.json) from its raw text content. A title is automatically derived from the first line of the text. Keywords are optional. Added documents will be available after server restarts. - MCP Command Parameters (
tool_params):document_text(string, required): The full text content of the document.keywords(string, optional): Comma-separated list of keywords.
- Example MCP Command:
{ "command": "execute_tool", "tool_name": "add_document_to_store", "tool_params": { "document_text": "First line as derived title.\nThis is the rest of the document content, which will be stored as the abstract.", "keywords": "text processing, auto-title, mcp" } } - Example Result (in
datafield oftool_resultSSE event or STDIO output):
Success:
Error (e.g., empty text):{ "error": "Missing required parameter: document_text cannot be empty." }
- Description: Adds a new document to the persistent document store (saved in
-
add_document_from_file- Description: Adds a new document to the store from an uploaded text file (.txt). The file content is provided as a Base64 encoded string. The server decodes the text, derives a title (from the first line or filename), and stores the document.
- MCP Command Parameters (
tool_params):file_content_base64(string, required): Base64 encoded content of the .txt file.filename(string, required): The original name of the file (e.g., “mypaper.txt”).keywords(string, optional): Comma-separated list of keywords.
- Example MCP Command:
(Note: The Base64 string is “First line as derived title.\nThis is the rest of the document content, which will be stored as the abstract.”){ "command": "execute_tool", "tool_name": "add_document_from_file", "tool_params": { "filename": "example_document.txt", "file_content_base64": "Rmlyc3QgbGluZSBhcyBkZXJpdmVkIHRpdGxlLgpUaGlzIGlzIHRoZSByZXN0IG9mIHRoZSBkb2N1bWVudCBjb250ZW50LCB3aGljaCB3aWxsIGJlIHN0b3JlZCBhcyB0aGUgYWJzdHJhY3Qu", "keywords": "file upload, base64, example" } } - Example Result (in
datafield oftool_resultSSE event or STDIO output):
Success:
Error (e.g., missing parameters or invalid base64):{ "error": "Missing required parameter: file_content_base64 or filename cannot be empty." }{ "error": "Invalid Base64 content." }
-
(Planned) 文献搜索工具:Through keyword, topic, or semantic queries to find relevant documents from a larger, persistent database.
-
(Planned) 文献处理工具:Advanced OCR processing, and structuring of various document formats (PDF, DOCX). Current basic .txt upload is a step towards this.
-
(Planned) 聊天会话工具:管理基于文献内容的对话交互
资源 (Resources)
The server can register and serve various resources. Resource content is not included in the initial capabilities discovery but can be fetched using the get_resource command (see “MCP Commands” section).
In addition to any statically defined resources, all documents stored by the server (from documents.json) are dynamically exposed as MCP resources.
Dynamic Document Resources
Documents stored in the server’s documents.json file (including default documents and any added via tools) are automatically registered as MCP resources.
- URI Scheme:
mcp://resources/documents/{document_id}- Example:
mcp://resources/documents/doc101
- Example:
{document_id}: This corresponds to theidfield of a document in thedocuments.jsonstore.- Content: The
contentof such a resource is the full JSON object of the document itself, including itsid,title,abstract, andkeywords. - Example
get_resourcefor a dynamic document:
Command:Expected{ "command": "get_resource", "uri": "mcp://resources/documents/doc101" }resource_datain the response:{ "uri": "mcp://resources/documents/doc101", "name": "Document: Exploring Artificial Intelligence in Modern Healthcare", "description": "Access to document doc101 - 'Exploring Artificial Intelligence in Modern Healthcare'", "mime_type": "application/json", "content": { "id": "doc101", "title": "Exploring Artificial Intelligence in Modern Healthcare", "abstract": "This paper discusses the impact of AI on diagnostics and treatment, highlighting machine learning advancements.", "keywords": [ "ai", "healthcare", "diagnostics", "machine learning", "treatment" ] } }
Static Sample Resource
-
Sample Resource (Static Example):
- URI:
mcp://resources/literature/doc123 - Name: Sample Document 123
- Description: A sample academic paper providing placeholder content, distinct from the dynamically available document resources from
documents.json. Its content includes fields liketitle,author,abstract, etc. - This resource is registered by default for demonstration and can be retrieved using the
get_resourcecommand.
- URI:
-
(Planned) 会话历史:查看和继续之前的交互记录
-
(Planned) 文献集合:管理主题相关的文献分组
提示模板 (Prompts)
The server can register and provide definitions for various prompt templates. Prompt definitions (including name, description, and arguments) can be retrieved using the get_prompt_definition command (see “MCP Commands” section). The actual execution of prompts (i.e., generating text based on a template and arguments) is a planned feature.
-
Sample Prompt:
summarize_document_abstract- Description: Generates a brief summary of a document’s abstract. Requires the document’s resource URI.
- Arguments:
document_uri(string, required): The MCP URI of the document resource (e.g.,mcp://resources/literature/doc123) whose abstract needs summarizing.
- This prompt is registered by default. Its full definition can be fetched using the
get_prompt_definitioncommand. - Execution Example:
{ "command": "execute_prompt", "name": "summarize_document_abstract", "arguments": { "document_uri": "mcp://resources/literature/doc123" } } - Expected Result (from
prompt_resultevent or STDIO):{ "mcp_protocol_version": "1.0", "status": "success", "prompt_name": "summarize_document_abstract", "result": { "summary": "Summary of abstract: This paper explores the fundamental principles of sciences that don't actually exist." } }
-
(Planned) 文献分析提示:用于分析和总结文献内容
-
(Planned) 比较研究提示:比较多篇文献的内容和观点
-
(Planned) 论文撰写辅助提示:帮助构建论文结构和引用
系统功能
本系统是一个基于API的学术文献OCR电子化、自动分类与智能检索平台,采用流水线架构处理学术文献,将扫描文档转换为结构化电子格式,并提供基于向量数据库的智能检索与自然语言对话功能。
- 文档OCR处理:将扫描的学术文献转换为可搜索文本 (Planned, current support is for .txt uploads)
- 文档内容导入: 支持从纯文本文件 (.txt) 上传文档内容,并自动提取标题。
- 文档结构识别:自动识别标题、摘要、章节等结构元素
- 内容自动分类:基于内容对文献进行主题分类和标签标注
- 格式转换:生成Markdown和PDF输出,保留原文排版
- 向量化存储:将文档内容转换为向量表示并存入向量数据库
- 智能检索:通过自然语言查询检索相关文献内容
- 知识对话:基于文献内容回答用户问题,提供引用来源
- 持久化存储:学术文献元数据和内容(或其引用)通过
documents.json文件进行持久化存储,确保服务器重启后数据不丢失。
开发路线图
- [x] 基础文档处理流水线实现
- [x] 命令行工具开发
- [x] 基本RAG功能实现
- [x] MCP服务器接口实现 (STDIO transport, basic tool execution, basic SSE transport)
- [x] MCP工具 (Tools) 功能开发 (echo, document_search, add_document_to_store, add_document_from_file core logic implemented; persistent storage for docs)
- [x] MCP资源 (Resources) 功能开发 (Documents in store dynamically available as resources via mcp://resources/documents/{id}; sample static resource ‘literature/doc123’ also present. ‘get_resource’ command implemented.)
- [x] MCP提示 (Prompts) 功能开发 (sample ‘summarize_document_abstract’ definition and execution implemented)
- [x] Web界面开发 (interactive viewer: can execute echo, summarize_abstract, document_search, add_document_to_store, and add_document_from_file via .txt upload)
- [x] 高级RAG功能增强 (document_search, add_document_to_store, add_document_from_file use a persistent JSON-based document store ‘documents.json’)
- [ ] 文献处理工具 (Advanced OCR, structuring for PDF/DOCX. Basic .txt upload via
add_document_from_fileimplemented as a first step) - [ ] 安全性和性能优化
- [ ] 文档与教程完善
Basic Usage (STDIO)
The server can be run using app.py and defaults to STDIO transport.
-
Start the server:
python3 app.py -
Interact with the server via STDIN/STDOUT:
Once the server is running, you can send commands to its standard input and receive JSON responses on its standard output.-
Discover capabilities:
Send the plain text command:discoverThe server will respond with a JSON object detailing its capabilities, including available tools. Example (structure may vary):
-
Execute the echo tool:
Send the following JSON command:{ "command": "execute_tool", "tool_name": "echo", "tool_params": { "message": "Hello MCP" } }Receive (example):
{ "mcp_protocol_version": "1.0", "status": "success", "tool_name": "echo", "result": { "echo_response": "Hello MCP" } } -
Stop the server:
Send the plain text command:quit
-
SSE Usage
Starting the Server in SSE Mode
To use Server-Sent Events (SSE) for communication, run the server with the --transport sse flag. You can also specify a port (defaults to 3000):
python3 app.py --transport sse --port 8000
Interacting over SSE
Once the server is running in SSE mode (e.g., on port 8000):
-
Listen for events (including initial capabilities):
Usecurlor a similar tool to connect to the SSE endpoint. The server will stream events here.
The correct path for SSE is/mcp_sse.curl -N http://localhost:8000/mcp_sseYou should immediately receive an
event: capabilitieswith the server details. Subsequent events (like tool results or errors) will appear here. -
Send commands:
Commands are sent via HTTP POST requests to the/mcp_commandendpoint.# Example: Execute the "echo" tool curl -X POST -H "Content-Type: application/json" \ -d '{"command": "execute_tool", "tool_name": "echo", "tool_params": {"message": "Hello from SSE client"}}' \ http://localhost:8000/mcp_commandThe POST request will receive an HTTP 202 Accepted response:
{"status": "accepted", "message": "Tool execution initiated."}.
The actual result of the “echo” tool will then be broadcast as an SSE event (e.g.,event: tool_result) to all connected SSE clients (including yourcurl -Nsession).
MCP Commands
This section details common MCP commands supported by the server across different transports.
get_resource
- Description: Retrieves a registered MCP resource, including its content.
- Parameters (in JSON payload):
command(string, required): Must be"get_resource".uri(string, required): The URI of the resource to retrieve.
- Example MCP Command (for POST to
/mcp_commandor STDIO input):{ "command": "get_resource", "uri": "mcp://resources/documents/doc101" } - Success Response (STDIO or
resource_dataSSE event data):
The full resource object, including itsuri,name,description,mime_type, andcontent.
Example formcp://resources/documents/doc101(assumingdoc101is from the default document store):(For the static sample{ "mcp_protocol_version": "1.0", "status": "success", "uri": "mcp://resources/documents/doc101", "resource_data": { "uri": "mcp://resources/documents/doc101", "name": "Document: Exploring Artificial Intelligence in Modern Healthcare", "description": "Access to document doc101 - 'Exploring Artificial Intelligence in Modern Healthcare'", "mime_type": "application/json", "content": { "id": "doc101", "title": "Exploring Artificial Intelligence in Modern Healthcare", "abstract": "This paper discusses the impact of AI on diagnostics and treatment, highlighting machine learning advancements.", "keywords": [ "ai", "healthcare", "diagnostics", "machine learning", "treatment" ] } } }mcp://resources/literature/doc123, the structure would be similar but with its specific content.) - Error Responses (STDIO or
resource_errorSSE event data):- If resource not found:
{"mcp_protocol_version": "1.0", "status": "error", "uri": "<requested_uri>", "error": "Resource not found"} - If URI missing:
{"mcp_protocol_version": "1.0", "status": "error", "error": "Missing URI for get_resource"}
- If resource not found:
get_prompt_definition
- Description: Retrieves the definition (name, description, arguments) of a registered MCP prompt.
- Parameters (in JSON payload):
command(string, required): Must be"get_prompt_definition".name(string, required): The name of the prompt to retrieve.
- Example MCP Command (for POST to
/mcp_commandor STDIO input):{ "command": "get_prompt_definition", "name": "summarize_document_abstract" } - Success Response (STDIO or
prompt_definition_dataSSE event data):
Contains the prompt’s full definition.
Example forsummarize_document_abstract:{ "mcp_protocol_version": "1.0", "status": "success", "name": "summarize_document_abstract", "prompt_definition": { "name": "summarize_document_abstract", "description": "Generates a brief summary of a document's abstract. Requires the document's resource URI.", "arguments": [ { "name": "document_uri", "type": "string", "description": "The MCP URI of the document resource (e.g., mcp://resources/literature/doc123) whose abstract needs summarizing.", "required": true } ] } } - Error Responses (STDIO or
prompt_definition_errorSSE event data):- If prompt not found:
{"mcp_protocol_version": "1.0", "status": "error", "name": "<requested_name>", "error": "Prompt not found"} - If name missing:
{"mcp_protocol_version": "1.0", "status": "error", "error": "Missing name for get_prompt_definition"}
- If prompt not found:
execute_prompt
- Description: Executes a registered MCP prompt with the provided arguments. (Currently, only “summarize_document_abstract” has implemented execution logic).
- Parameters (in JSON payload):
command(string, required): Must be"execute_prompt".name(string, required): The name of the prompt to execute.arguments(object, required): An object containing key-value pairs for the arguments required by the prompt.
- Example MCP Command (for POST to
/mcp_commandor STDIO input to execute “summarize_document_abstract”):{ "command": "execute_prompt", "name": "summarize_document_abstract", "arguments": { "document_uri": "mcp://resources/literature/doc123" } } - Success Response (STDIO or
prompt_resultSSE event data):
Contains the result of the prompt execution.
Example for “summarize_document_abstract”:{ "mcp_protocol_version": "1.0", "status": "success", "prompt_name": "summarize_document_abstract", "result": { "summary": "Summary of abstract: This paper explores the fundamental principles of sciences that don't actually exist." } } - Error Responses (STDIO or
prompt_errorSSE event data):- Prompt not found:
{"mcp_protocol_version": "1.0", "status": "error", "name": "<prompt_name>", "error": "Prompt not found"} - Argument missing:
{"mcp_protocol_version": "1.0", "status": "error", "name": "<prompt_name>", "error": "Missing <argument_name> argument for <prompt_name>"}(e.g., “Missing document_uri argument for summarize_document_abstract”) - Resource not found (if applicable to prompt):
{"mcp_protocol_version": "1.0", "status": "error", "name": "<prompt_name>", "error": "Resource not found: <uri>"} - Abstract not found (if applicable):
{"mcp_protocol_version": "1.0", "status": "error", "name": "<prompt_name>", "error": "Abstract not found in resource: <uri>"} - Prompt execution not implemented:
{"mcp_protocol_version": "1.0", "status": "error", "name": "<prompt_name>", "error": "Prompt execution not implemented yet"}
- Prompt not found:
Web Interface
A web interface is available to display the server’s capabilities and interact with some of its features. It currently allows:
- Viewing available tools, resources, and prompts.
- Executing the “echo” tool by providing a message.
- Executing the “summarize_document_abstract” prompt by providing a document URI.
- Executing the “document_search” tool by providing a query and maximum number of results.
- Adding a new document via direct text input using the
add_document_to_storetool. - Adding a new document from a .txt file upload using the
add_document_from_filetool (title derived from filename or first line, content stored as abstract).
Results of executions are displayed on the page, updated via Server-Sent Events.
How to Access:
-
Start the MCP server in SSE mode (as this also enables the web server functionality on the same port):
python3 app.py --transport sse --port 8000(Replace
8000with your desired port if different). -
Open your web browser and navigate to:
http://localhost:8000/(Or
http://127.0.0.1:8000/)
The page will connect to the server’s SSE endpoint and display the information it receives.
Dev Tools Supporting MCP
The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.










