- Explore MCP Servers
- mcp-image-recognition
Mcp Image Recognition
What is Mcp Image Recognition
mcp-image-recognition is an MCP server that provides image recognition capabilities using the vision APIs from Anthropic and OpenAI.
Use cases
Use cases for mcp-image-recognition include automated image tagging, content moderation, accessibility enhancements through image descriptions, and extracting text from images for further processing.
How to use
To use mcp-image-recognition, clone the repository, configure your environment file with API keys, and run the server using Python. You can describe images by providing either Base64-encoded data or file paths.
Key features
Key features include image description using Anthropic Claude Vision or OpenAI GPT-4 Vision, support for multiple image formats (JPEG, PNG, GIF, WebP), configurable primary and fallback providers, Base64 and file-based image input support, and optional text extraction using Tesseract OCR.
Where to use
undefined
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Overview
What is Mcp Image Recognition
mcp-image-recognition is an MCP server that provides image recognition capabilities using the vision APIs from Anthropic and OpenAI.
Use cases
Use cases for mcp-image-recognition include automated image tagging, content moderation, accessibility enhancements through image descriptions, and extracting text from images for further processing.
How to use
To use mcp-image-recognition, clone the repository, configure your environment file with API keys, and run the server using Python. You can describe images by providing either Base64-encoded data or file paths.
Key features
Key features include image description using Anthropic Claude Vision or OpenAI GPT-4 Vision, support for multiple image formats (JPEG, PNG, GIF, WebP), configurable primary and fallback providers, Base64 and file-based image input support, and optional text extraction using Tesseract OCR.
Where to use
undefined
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Content
MCP Image Recognition Server
An MCP server that provides image recognition capabilities using Anthropic and OpenAI vision APIs. Version 0.1.2.
Features
- Image description using Anthropic Claude Vision or OpenAI GPT-4 Vision
- Support for multiple image formats (JPEG, PNG, GIF, WebP)
- Configurable primary and fallback providers
- Base64 and file-based image input support
- Optional text extraction using Tesseract OCR
Requirements
- Python 3.8 or higher
- Tesseract OCR (optional) - Required for text extraction feature
- Windows: Download and install from UB-Mannheim/tesseract
- Linux:
sudo apt-get install tesseract-ocr - macOS:
brew install tesseract
Installation
- Clone the repository:
git clone https://github.com/mario-andreschak/mcp-image-recognition.git
cd mcp-image-recognition
- Create and configure your environment file:
cp .env.example .env
# Edit .env with your API keys and preferences
- Build the project:
build.bat
Usage
Running the Server
Spawn the server using python:
python -m image_recognition_server.server
Start the server using batch instead:
run.bat server
Start the server in development mode with the MCP Inspector:
run.bat debug
Available Tools
-
describe_image- Input: Base64-encoded image data and MIME type
- Output: Detailed description of the image
-
describe_image_from_file- Input: Path to an image file
- Output: Detailed description of the image
Environment Configuration
ANTHROPIC_API_KEY: Your Anthropic API key.OPENAI_API_KEY: Your OpenAI API key.VISION_PROVIDER: Primary vision provider (anthropicoropenai).FALLBACK_PROVIDER: Optional fallback provider.LOG_LEVEL: Logging level (DEBUG, INFO, WARNING, ERROR).ENABLE_OCR: Enable Tesseract OCR text extraction (trueorfalse).TESSERACT_CMD: Optional custom path to Tesseract executable.OPENAI_MODEL: OpenAI Model (default:gpt-4o-mini). Can use OpenRouter format for other models (e.g.,anthropic/claude-3.5-sonnet:beta).OPENAI_BASE_URL: Optional custom base URL for the OpenAI API. Set tohttps://openrouter.ai/api/v1for OpenRouter.OPENAI_TIMEOUT: Optional custom timeout (in seconds) for the OpenAI API.
Using OpenRouter
OpenRouter allows you to access various models using the OpenAI API format. To use OpenRouter, follow these steps:
- Obtain an OpenAI API key from OpenRouter.
- Set
OPENAI_API_KEYin your.envfile to your OpenRouter API key. - Set
OPENAI_BASE_URLtohttps://openrouter.ai/api/v1. - Set
OPENAI_MODELto the desired model using the OpenRouter format (e.g.,anthropic/claude-3.5-sonnet:beta). - Set
VISION_PROVIDERtoopenai.
Default Models
- Anthropic:
claude-3.5-sonnet-beta - OpenAI:
gpt-4o-mini - OpenRouter: Use the
anthropic/claude-3.5-sonnet:betaformat inOPENAI_MODEL.
Development
Running Tests
Run all tests:
run.bat test
Run specific test suite:
run.bat test server
run.bat test anthropic
run.bat test openai
Docker Support
Build the Docker image:
docker build -t mcp-image-recognition .
Run the container:
docker run -it --env-file .env mcp-image-recognition
License
MIT License - see LICENSE file for details.
Release History
- 0.1.2 (2025-02-20): Improved OCR error handling and added comprehensive test coverage for OCR functionality
- 0.1.1 (2025-02-19): Added Tesseract OCR support for text extraction from images (optional feature)
- 0.1.0 (2025-02-19): Initial release with Anthropic and OpenAI vision support
Dev Tools Supporting MCP
The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.










