Local Stt Mcp

1 MIT

FreeCommunity

AI Systems

#apple#apple-silicon#m1#m2#m3#m4#mcp#mcp-server#speech-to-text#stt#whisper#whisper-cpp

A high-performance Model Context Protocol (MCP) server providing local speech-to-text transcription using whisper.cpp, optimized for Apple Silicon.

What is Local Stt Mcp

local-stt-mcp is a high-performance Model Context Protocol (MCP) server that provides local speech-to-text transcription using whisper.cpp, specifically optimized for Apple Silicon devices.

Use cases

Use cases for local-stt-mcp include transcribing meetings, creating subtitles for videos, developing voice-activated applications, and providing real-time transcription for live events.

How to use

To use local-stt-mcp, ensure you have Node.js 18+, install whisper.cpp and ffmpeg for audio format conversion. Clone the repository, install dependencies, build the project, and set up the models. Configure your MCP client to connect to the local-stt-mcp server.

Key features

Key features include 100% local processing for privacy, Apple Silicon optimization for 15x+ real-time transcription speed, speaker diarization for identifying multiple speakers, universal audio support with automatic format conversion, multiple output formats (txt, json, vtt, srt, csv), low memory usage (<2GB), and full type safety with TypeScript.

Where to use

local-stt-mcp can be used in various fields such as transcription services, accessibility tools for the hearing impaired, voice command applications, and any scenario requiring efficient and private speech-to-text conversion.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai/download

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Local Stt Mcp

local-stt-mcp is a high-performance Model Context Protocol (MCP) server that provides local speech-to-text transcription using whisper.cpp, specifically optimized for Apple Silicon devices.

Use cases

Use cases for local-stt-mcp include transcribing meetings, creating subtitles for videos, developing voice-activated applications, and providing real-time transcription for live events.

How to use

Key features

Where to use

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai/download

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

Local Speech-to-Text MCP Server

A high-performance Model Context Protocol (MCP) server providing local speech-to-text transcription using whisper.cpp, optimized for Apple Silicon.

🎯 Features

🏠 100% Local Processing: No cloud APIs, complete privacy
🚀 Apple Silicon Optimized: 15x+ real-time transcription speed
🎤 Speaker Diarization: Identify and separate multiple speakers
🎵 Universal Audio Support: Automatic conversion from MP3, M4A, FLAC, and more
📝 Multiple Output Formats: txt, json, vtt, srt, csv
💾 Low Memory Footprint: <2GB memory usage
🔧 TypeScript: Full type safety and modern development

🚀 Quick Start

Prerequisites

Node.js 18+
whisper.cpp (brew install whisper-cpp)
For audio format conversion: ffmpeg (brew install ffmpeg) - automatically handles MP3, M4A, FLAC, OGG, etc.
For speaker diarization: Python 3.8+ and HuggingFace token (free)

Supported Audio Formats

Native whisper.cpp formats: WAV, FLAC
Auto-converted formats: MP3, M4A, AAC, OGG, WMA, and more
Automatic conversion: Powered by ffmpeg with 16kHz/mono optimization for whisper.cpp
Format detection: Automatic format detection and conversion when needed

Installation

git clone https://github.com/your-username/local-stt-mcp.git
cd local-stt-mcp/mcp-server
npm install
npm run build

# Download whisper models
npm run setup:models

# For speaker diarization, set HuggingFace token
export HF_TOKEN="your_token_here"  # Get free token from huggingface.co

Speaker Diarization Note: Requires HuggingFace account and accepting pyannote/speaker-diarization-3.1 license.

MCP Client Configuration

Add to your MCP client configuration:

{
  "mcpServers": {
    "whisper-mcp": {
      "command": "node",
      "args": [
        "path/to/local-stt-mcp/mcp-server/dist/index.js"
      ]
    }
  }
}

🛠️ Available Tools

Tool	Description
`transcribe`	Basic audio transcription with automatic format conversion
`transcribe_long`	Long audio file processing with chunking and format conversion
`transcribe_with_speakers`	Speaker diarization and transcription with format support
`list_models`	Show available whisper models
`health_check`	System diagnostics
`version`	Server version information

📊 Performance

Apple Silicon Benchmarks:

Processing Speed: 15.8x real-time (vs WhisperX 5.5x)
Memory Usage: <2GB (vs WhisperX ~4GB)
GPU Acceleration: ✅ Apple Neural Engine
Setup: Medium complexity but superior performance

See /benchmarks/ for detailed performance comparisons.

🏗️ Project Structure

mcp-server/
├── src/                    # TypeScript source code
│   ├── tools/             # MCP tool implementations
│   ├── whisper/           # whisper.cpp integration
│   ├── utils/             # Speaker diarization & utilities
│   └── types/             # Type definitions
├── dist/                  # Compiled JavaScript
└── python/                # Python dependencies

🔧 Development

# Build
npm run build

# Development mode (watch)
npm run dev

# Linting & formatting
npm run lint
npm run format

# Type checking
npm run type-check

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

whisper.cpp for optimized inference
OpenAI Whisper for the original models
Model Context Protocol for the framework
Pyannote.audio for speaker diarization

DevTools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. codeium.com/windsurf

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP DevTools

Tools

No tools

Local Stt Mcp

What is Local Stt Mcp

Use cases

How to use

Key features

Where to use

Clients Supporting MCP

Overview

What is Local Stt Mcp

Use cases

How to use

Key features

Where to use

Clients Supporting MCP

Content

Local Speech-to-Text MCP Server

🎯 Features

🚀 Quick Start

🛠️ Available Tools

📊 Performance

🏗️ Project Structure

🔧 Development

🤝 Contributing

📄 License

🙏 Acknowledgments

DevTools Supporting MCP

Tools

Comments