- Explore MCP Servers
- local-stt-mcp
Local Stt Mcp
What is Local Stt Mcp
local-stt-mcp is a high-performance Model Context Protocol (MCP) server that provides local speech-to-text transcription using whisper.cpp, specifically optimized for Apple Silicon devices.
Use cases
Use cases for local-stt-mcp include transcribing meetings, creating subtitles for videos, developing voice-activated applications, and providing real-time transcription for live events.
How to use
To use local-stt-mcp, ensure you have Node.js 18+, install whisper.cpp and ffmpeg for audio format conversion. Clone the repository, install dependencies, build the project, and set up the models. Configure your MCP client to connect to the local-stt-mcp server.
Key features
Key features include 100% local processing for privacy, Apple Silicon optimization for 15x+ real-time transcription speed, speaker diarization for identifying multiple speakers, universal audio support with automatic format conversion, multiple output formats (txt, json, vtt, srt, csv), low memory usage (<2GB), and full type safety with TypeScript.
Where to use
local-stt-mcp can be used in various fields such as transcription services, accessibility tools for the hearing impaired, voice command applications, and any scenario requiring efficient and private speech-to-text conversion.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Overview
What is Local Stt Mcp
local-stt-mcp is a high-performance Model Context Protocol (MCP) server that provides local speech-to-text transcription using whisper.cpp, specifically optimized for Apple Silicon devices.
Use cases
Use cases for local-stt-mcp include transcribing meetings, creating subtitles for videos, developing voice-activated applications, and providing real-time transcription for live events.
How to use
To use local-stt-mcp, ensure you have Node.js 18+, install whisper.cpp and ffmpeg for audio format conversion. Clone the repository, install dependencies, build the project, and set up the models. Configure your MCP client to connect to the local-stt-mcp server.
Key features
Key features include 100% local processing for privacy, Apple Silicon optimization for 15x+ real-time transcription speed, speaker diarization for identifying multiple speakers, universal audio support with automatic format conversion, multiple output formats (txt, json, vtt, srt, csv), low memory usage (<2GB), and full type safety with TypeScript.
Where to use
local-stt-mcp can be used in various fields such as transcription services, accessibility tools for the hearing impaired, voice command applications, and any scenario requiring efficient and private speech-to-text conversion.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Content
Local Speech-to-Text MCP Server
A high-performance Model Context Protocol (MCP) server providing local speech-to-text transcription using whisper.cpp, optimized for Apple Silicon.
🎯 Features
- 🏠 100% Local Processing: No cloud APIs, complete privacy
- 🚀 Apple Silicon Optimized: 15x+ real-time transcription speed
- 🎤 Speaker Diarization: Identify and separate multiple speakers
- 🎵 Universal Audio Support: Automatic conversion from MP3, M4A, FLAC, and more
- 📝 Multiple Output Formats: txt, json, vtt, srt, csv
- 💾 Low Memory Footprint: <2GB memory usage
- 🔧 TypeScript: Full type safety and modern development
🚀 Quick Start
Prerequisites
- Node.js 18+
- whisper.cpp (
brew install whisper-cpp
) - For audio format conversion: ffmpeg (
brew install ffmpeg
) - automatically handles MP3, M4A, FLAC, OGG, etc. - For speaker diarization: Python 3.8+ and HuggingFace token (free)
Supported Audio Formats
- Native whisper.cpp formats: WAV, FLAC
- Auto-converted formats: MP3, M4A, AAC, OGG, WMA, and more
- Automatic conversion: Powered by ffmpeg with 16kHz/mono optimization for whisper.cpp
- Format detection: Automatic format detection and conversion when needed
Installation
git clone https://github.com/your-username/local-stt-mcp.git
cd local-stt-mcp/mcp-server
npm install
npm run build
# Download whisper models
npm run setup:models
# For speaker diarization, set HuggingFace token
export HF_TOKEN="your_token_here" # Get free token from huggingface.co
Speaker Diarization Note: Requires HuggingFace account and accepting pyannote/speaker-diarization-3.1 license.
MCP Client Configuration
Add to your MCP client configuration:
{
"mcpServers": {
"whisper-mcp": {
"command": "node",
"args": [
"path/to/local-stt-mcp/mcp-server/dist/index.js"
]
}
}
}
🛠️ Available Tools
Tool | Description |
---|---|
transcribe |
Basic audio transcription with automatic format conversion |
transcribe_long |
Long audio file processing with chunking and format conversion |
transcribe_with_speakers |
Speaker diarization and transcription with format support |
list_models |
Show available whisper models |
health_check |
System diagnostics |
version |
Server version information |
📊 Performance
Apple Silicon Benchmarks:
- Processing Speed: 15.8x real-time (vs WhisperX 5.5x)
- Memory Usage: <2GB (vs WhisperX ~4GB)
- GPU Acceleration: ✅ Apple Neural Engine
- Setup: Medium complexity but superior performance
See /benchmarks/
for detailed performance comparisons.
🏗️ Project Structure
mcp-server/ ├── src/ # TypeScript source code │ ├── tools/ # MCP tool implementations │ ├── whisper/ # whisper.cpp integration │ ├── utils/ # Speaker diarization & utilities │ └── types/ # Type definitions ├── dist/ # Compiled JavaScript └── python/ # Python dependencies
🔧 Development
# Build
npm run build
# Development mode (watch)
npm run dev
# Linting & formatting
npm run lint
npm run format
# Type checking
npm run type-check
🤝 Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
📄 License
MIT License - see LICENSE file for details.
🙏 Acknowledgments
- whisper.cpp for optimized inference
- OpenAI Whisper for the original models
- Model Context Protocol for the framework
- Pyannote.audio for speaker diarization
DevTools Supporting MCP
The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.