Mcp Openvision

1 MIT

FreeCommunity

AI Systems

MCP OpenVision is a Model Context Protocol server for enhanced compatibility.

What is Mcp Openvision

MCP OpenVision is a Model Context Protocol (MCP) server that provides image analysis capabilities powered by OpenRouter vision models, enabling AI assistants to analyze images through a simple interface within the MCP ecosystem.

Use cases

Use cases for mcp-openvision include developing AI assistants that can interpret images, enhancing user interactions in applications with visual content, and automating image analysis tasks in various industries.

How to use

To use mcp-openvision, you can install it via Smithery, pip, or UV. Ensure you have an OpenRouter API key and configure it through environment variables. The installation commands are: npx -y @smithery/cli install @Nazruden/mcp-openvision --client claude, pip install mcp-openvision, or uv pip install mcp-openvision.

Key features

Key features of mcp-openvision include compatibility with various OpenRouter vision models, a simple installation process, and the ability to analyze images effectively within the MCP ecosystem.

Where to use

MCP OpenVision can be used in fields such as AI development, image processing, and any application requiring image analysis capabilities.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Mcp Openvision

Use cases

How to use

Key features

Key features of mcp-openvision include compatibility with various OpenRouter vision models, a simple installation process, and the ability to analyze images effectively within the MCP ecosystem.

Where to use

MCP OpenVision can be used in fields such as AI development, image processing, and any application requiring image analysis capabilities.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

mcp-openvision-main

Hosted by Modl, any commits or changes made by the Modl team is to ensure compatibility

MCP OpenVision

Overview

MCP OpenVision is a Model Context Protocol (MCP) server that provides image analysis capabilities powered by OpenRouter vision models. It enables AI assistants to analyze images via a simple interface within the MCP ecosystem.

Installation

Installing via Smithery

To install mcp-openvision for Claude Desktop automatically via Smithery:

npx -y @smithery/cli install @Nazruden/mcp-openvision --client claude

Using pip

pip install mcp-openvision

Using UV (recommended)

uv pip install mcp-openvision

Configuration

MCP OpenVision requires an OpenRouter API key and can be configured through environment variables:

OPENROUTER_API_KEY (required): Your OpenRouter API key
OPENROUTER_DEFAULT_MODEL (optional): The vision model to use

OpenRouter Vision Models

MCP OpenVision works with any OpenRouter model that supports vision capabilities. The default model is qwen/qwen2.5-vl-32b-instruct:free, but you can specify any other compatible model.

Some popular vision models available through OpenRouter include:

qwen/qwen2.5-vl-32b-instruct:free (default)
anthropic/claude-3-5-sonnet
anthropic/claude-3-opus
anthropic/claude-3-sonnet
openai/gpt-4o

You can specify custom models by setting the OPENROUTER_DEFAULT_MODEL environment variable or by passing the model parameter directly to the image_analysis function.

Usage

Testing with MCP Inspector

The easiest way to test MCP OpenVision is with the MCP Inspector tool:

npx @modelcontextprotocol/inspector uvx mcp-openvision

Integration with Claude Desktop or Cursor

Edit your MCP configuration file:
- Windows: %USERPROFILE%\.cursor\mcp.json
- macOS: ~/.cursor/mcp.json or ~/Library/Application Support/Claude/claude_desktop_config.json
Add the following configuration:

{
  "mcpServers": {
    "openvision": {
      "command": "uvx",
      "args": [
        "mcp-openvision"
      ],
      "env": {
        "OPENROUTER_API_KEY": "your_openrouter_api_key_here",
        "OPENROUTER_DEFAULT_MODEL": "anthropic/claude-3-sonnet"
      }
    }
  }
}

Running Locally for Development

# Set the required API key
export OPENROUTER_API_KEY="your_api_key"

# Run the server module directly
python -m mcp_openvision

Features

MCP OpenVision provides the following core tool:

image_analysis: Analyze images with vision models, supporting various parameters:
- image: Can be provided as:
  - Base64-encoded image data
  - Image URL (http/https)
  - Local file path
- query: User instruction for the image analysis task
- system_prompt: Instructions that define the model’s role and behavior (optional)
- model: Vision model to use
- temperature: Controls randomness (0.0-1.0)
- max_tokens: Maximum response length

Crafting Effective Queries

The query parameter is crucial for getting useful results from the image analysis. A well-crafted query provides context about:

Purpose: Why you’re analyzing this image
Focus areas: Specific elements or details to pay attention to
Required information: The type of information you need to extract
Format preferences: How you want the results structured

Examples of Effective Queries

Basic Query	Enhanced Query
“Describe this image”	“Identify all retail products visible in this store shelf image and estimate their price range”
“What’s in this image?”	“Analyze this medical scan for abnormalities, focusing on the highlighted area and providing possible diagnoses”
“Analyze this chart”	“Extract the numerical data from this bar chart showing quarterly sales, and identify the key trends from 2022-2023”
“Read the text”	“Transcribe all visible text in this restaurant menu, preserving the item names, descriptions, and prices”

By providing context about why you need the analysis and what specific information you’re seeking, you help the model focus on relevant details and produce more valuable insights.

Example Usage

# Analyze an image from a URL
result = await image_analysis(
    image="https://example.com/image.jpg",
    query="Describe this image in detail"
)

# Analyze an image from a local file with a focused query
result = await image_analysis(
    image="path/to/local/image.jpg",
    query="Identify all traffic signs in this street scene and explain their meanings for a driver education course"
)

# Analyze with a base64-encoded image and a specific analytical purpose
result = await image_analysis(
    image="SGVsbG8gV29ybGQ=...",  # base64 data
    query="Examine this product packaging design and highlight elements that could be improved for better visibility and brand recognition"
)

# Customize the system prompt for specialized analysis
result = await image_analysis(
    image="path/to/local/image.jpg",
    query="Analyze the composition and artistic techniques used in this painting, focusing on how they create emotional impact",
    system_prompt="You are an expert art historian with deep knowledge of painting techniques and art movements. Focus on formal analysis of composition, color, brushwork, and stylistic elements."
)

Image Input Types

The image_analysis tool accepts several types of image inputs:

Base64-encoded strings
Image URLs - must start with http:// or https://
File paths:
- Absolute paths: full paths starting with / (Unix) or drive letter (Windows)
- Relative paths: paths relative to the current working directory
- Relative paths with project_root: use the project_root parameter to specify a base directory

Using Relative Paths

When using relative file paths (like “examples/image.jpg”), you have two options:

The path must be relative to the current working directory where the server is running
Or, you can specify a project_root parameter:

# Example with relative path and project_root
result = await image_analysis(
    image="examples/image.jpg",
    project_root="/path/to/your/project",
    query="What is in this image?"
)

This is particularly useful in applications where the current working directory may not be predictable or when you want to reference files using paths relative to a specific directory.

Development

Setup Development Environment

# Clone the repository
git clone https://github.com/modelcontextprotocol/mcp-openvision.git
cd mcp-openvision

# Install development dependencies
pip install -e ".[dev]"

Code Formatting

This project uses Black for automatic code formatting. The formatting is enforced through GitHub Actions:

All code pushed to the repository is automatically formatted with Black
For pull requests from repository collaborators, Black formats the code and commits directly to the PR branch
For pull requests from forks, Black creates a new PR with the formatted code that can be merged into the original PR

You can also run Black locally to format your code before committing:

# Format all Python code in the src and tests directories
black src tests

Run Tests

pytest

Release Process

This project uses an automated release process:

Update the version in pyproject.toml following Semantic Versioning principles
- You can use the helper script: python scripts/bump_version.py [major|minor|patch]
Update the CHANGELOG.md with details about the new version
- The script also creates a template entry in CHANGELOG.md that you can fill in
Commit and push these changes to the main branch
The GitHub Actions workflow will:
- Detect the version change
- Automatically create a new GitHub release
- Trigger the publishing workflow that publishes to PyPI

This automation helps maintain a consistent release process and ensures that every release is properly versioned and documented.

Support

If you find this project helpful, consider buying me a coffee to support ongoing development and maintenance.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers