- Explore MCP Servers
- app-mcp
App Mcp
What is App Mcp
AppMCP is an AI-powered macOS application automation tool that utilizes the Model Context Protocol to bridge the gap between AI models and macOS applications, enabling intelligent automation through visual inspection and UI interpretation.
Use cases
Use cases for AppMCP include automating repetitive tasks in applications, conducting automated testing of user interfaces, and enhancing workflows in software development environments.
How to use
To use AppMCP, clone the repository from GitHub, build the project using Swift, and run the daemon. Ensure that the necessary macOS permissions for Accessibility are granted.
Key features
Key features of AppMCP include Smart Screenshots for high-resolution captures, UI Tree Analysis for detailed accessibility hierarchies, precise interaction automation, intelligent waiting mechanisms, robust error recovery, and strong privacy and security measures.
Where to use
AppMCP can be used in various fields including software testing, UI automation, and any scenario where intelligent interaction with macOS applications is required.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Overview
What is App Mcp
AppMCP is an AI-powered macOS application automation tool that utilizes the Model Context Protocol to bridge the gap between AI models and macOS applications, enabling intelligent automation through visual inspection and UI interpretation.
Use cases
Use cases for AppMCP include automating repetitive tasks in applications, conducting automated testing of user interfaces, and enhancing workflows in software development environments.
How to use
To use AppMCP, clone the repository from GitHub, build the project using Swift, and run the daemon. Ensure that the necessary macOS permissions for Accessibility are granted.
Key features
Key features of AppMCP include Smart Screenshots for high-resolution captures, UI Tree Analysis for detailed accessibility hierarchies, precise interaction automation, intelligent waiting mechanisms, robust error recovery, and strong privacy and security measures.
Where to use
AppMCP can be used in various fields including software testing, UI automation, and any scenario where intelligent interaction with macOS applications is required.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Content
🤖 AppMCP
AI-Powered macOS Application Automation via Model Context Protocol
AppMCP bridges the gap between AI models and macOS applications, enabling intelligent automation through visual inspection, UI interpretation, and precise control of native applications.
✨ Features
🎯 Visual Intelligence
- Smart Screenshots: Capture high-resolution app windows using ScreenCaptureKit
- OCR Text Recognition: Extract text from screenshots using Apple’s Vision Framework
- UI Tree Analysis: Extract detailed accessibility hierarchies for precise element targeting
- Multi-App Discovery: Identify and monitor multiple running applications simultaneously
🛠 Automation Toolkit
- Precise Interactions: Mouse clicks, keyboard input, and gesture automation
- Smart Waiting: Intelligent delays and condition-based waiting mechanisms
- Error Recovery: Robust fallback strategies for reliable automation
🔒 Privacy & Security
- Permission Management: Seamless TCC (Transparency, Consent, and Control) integration
- Secure Communication: JSON-RPC over STDIO with structured error handling
- Bundle ID Validation: Verified application targeting for enhanced security
🚀 Quick Start
Prerequisites
- macOS 15.0+ (Sequoia or later)
- Swift 6.1+
- Xcode 16.0+
Installation
# Clone the repository
git clone https://github.com/your-username/AppMCP.git
cd AppMCP
# Build the project
swift build -c release
# Run the daemon
./.build/release/appmcpd --stdio
Permissions Setup
AppMCP requires the following macOS permissions:
- 🔓 Accessibility: System Preferences → Privacy & Security → Accessibility
- 📺 Screen Recording: System Preferences → Privacy & Security → Screen Recording
The application will guide you through the permission setup process.
🎮 Usage Examples
Weather App Automation
import json
import subprocess
# Start AppMCP server
process = subprocess.Popen(['./appmcpd', '--stdio'])
# Take screenshot of Weather app
request = {
"jsonrpc": "2.0",
"id": 1,
"method": "resources/read",
"params": {"uri": "app://app_screenshot"}
}
# Send request and get response
response = send_mcp_request(request)
print(f"📸 Screenshot captured: {response['result']['contents'][0]['text']}")
UI Element Discovery
# Get accessibility tree
request = {
"jsonrpc": "2.0",
"id": 2,
"method": "resources/read",
"params": {"uri": "app://app_accessibility_tree"}
}
tree = send_mcp_request(request)
print(f"🌳 UI Elements: {tree['result']['contents'][0]['text']}")
Automated Interactions
# Click on coordinates
request = {
"jsonrpc": "2.0",
"id": 3,
"method": "tools/call",
"params": {
"name": "mouse_click",
"arguments": {"x": 300, "y": 150}
}
}
# Type text
request = {
"jsonrpc": "2.0",
"id": 4,
"method": "tools/call",
"params": {
"name": "type_text",
"arguments": {"text": "Tokyo"}
}
}
🏗 Architecture
graph TB A[🤖 AI Model] -->|JSON-RPC| B[📡 MCP Server] B --> C[🔍 Resources] B --> D[🛠 Tools] C --> E[📱 App Screenshot] C --> F[🌳 Accessibility Tree] C --> G[📋 Running Apps] D --> H[🖱 Mouse Control] D --> I[⌨️ Keyboard Input] D --> J[⏱ Wait Functions] E --> K[📱 macOS Apps] F --> K G --> K H --> K I --> K J --> K
Core Components
| Component | Description | Technology |
|---|---|---|
| 🎯 AppSelector | Application discovery and targeting | AppKit, NSWorkspace |
| 📸 ScreenCaptureProvider | High-quality screenshot capture | ScreenCaptureKit (macOS 15+) |
| 🌳 AppAXTreeProvider | Accessibility tree extraction | Accessibility API |
| 🖱 MouseClickTool | Precise mouse automation | CGEvent, Quartz |
| ⌨️ KeyboardTool | Text input and shortcuts | CGEvent, Carbon |
| 🔒 TCCManager | Permission management | TCC Framework |
📦 Package Structure
AppMCP/ ├── 📁 Sources/ │ └── 📁 AppMCP/ │ ├── 🎯 AppMCP.swift # Core protocols & types │ ├── 🖥 MCPServer.swift # Main MCP server │ ├── 📁 Resources/ # Data providers │ ├── 📁 Tools/ # Automation tools │ └── 📁 Permissions/ # Security management ├── 📁 Sources/appmcpd/ │ └── 🚀 Command.swift # CLI daemon ├── 📁 Tests/ │ └── 📁 AppMCPTests/ # Comprehensive test suite ├── 📋 Package.swift # Swift Package configuration └── 📖 CLAUDE.md # Development guidelines
🧪 Testing
Run All Tests
swift test
Test Categories
- 🔧 Unit Tests: Core functionality validation
- 🔗 Integration Tests: End-to-end workflow testing
- ⚡ Performance Tests: Response time benchmarking
- 🛡 Security Tests: Permission and validation checks
Example Test Results
Test Suite 'AppMCPTests' passed at 2025-06-04 16:42:04.049 Executed 19 tests, with 0 failures (0 unexpected) in 0.015 seconds ✅ All tests passing
🛠 API Reference
MCP Tools
AppMCP provides the following specialized tools for macOS automation:
Screenshot & UI Analysis
-
capture_ui_snapshot: Capture screenshot with UI element hierarchy- Optional text recognition via Vision Framework
- Element filtering with queries
- Returns base64 screenshot + structured UI data
-
recognize_text_in_screenshot: 🆕 OCR text extraction from app windows- Multi-language support (en-US, ja-JP, zh-Hans, etc.)
- Fast vs accurate recognition modes
- Confidence scores and bounding boxes
Automation Controls
click_element: Element-based clicking with multi-button supportinput_text: Text input with setValue/type methodsdrag_drop: Drag and drop between elementsscroll_window: Scrolling at specific element locations
App Discovery
list_running_applications: Get all running apps with metadatalist_application_windows: List windows with bounds and visibility
Text Recognition Features
The Vision Framework integration provides powerful OCR capabilities:
{
"bundleID": "com.apple.TextEdit",
"includeTextRecognition": true,
"recognitionLanguages": [
"en-US",
"ja-JP"
],
"recognitionLevel": "accurate"
}
Recognition Results:
- Full text extraction in reading order
- Individual text regions with confidence scores
- Bounding boxes in normalized coordinates
- Support for 50+ languages
- Handwritten text detection
🎯 Roadmap
🌟 Current (v1.0.0)
- [x] Weather app automation PoC
- [x] Basic screenshot & UI tree extraction
- [x] Mouse & keyboard automation
- [x] Permission management
- [x] Vision Framework OCR text recognition
🚀 Near Future (v0.2.0)
- [ ] Multi-app simultaneous control
- [ ] DevTools integration
- [ ] Enhanced error recovery
- [ ] Performance optimizations
🔮 Long Term (v1.0.0)
- [ ] HTTP transport support
- [ ] Shortcuts.app integration
- [ ] Plugin SDK for extensions
- [ ] Real-time UI streaming
🤝 Contributing
We welcome contributions! Please see our Contributing Guidelines for details.
Development Setup
# Install dependencies
swift package resolve
# Run tests
swift test
# Format code
swift-format -i -r Sources/ Tests/
# Build for development
swift build
📄 License
AppMCP is released under the MIT License. See LICENSE for details.
🙏 Acknowledgments
- Model Context Protocol - For the excellent MCP Swift SDK
- Apple Developer Team - For the powerful macOS automation APIs
- Swift Community - For the robust Swift ecosystem
Built with ❤️ for the AI automation community
Dev Tools Supporting MCP
The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.










