MCP ExplorerExplorer

Local Stt Mcp

@SmartLittleAppson 17 days ago
1 MIT
FreeCommunity
AI Systems
#apple#apple-silicon#m1#m2#m3#m4#mcp#mcp-server#speech-to-text#stt#whisper#whisper-cpp
A high-performance Model Context Protocol (MCP) server providing local speech-to-text transcription using whisper.cpp, optimized for Apple Silicon.

Overview

What is Local Stt Mcp

local-stt-mcp is a high-performance Model Context Protocol (MCP) server that provides local speech-to-text transcription using whisper.cpp, specifically optimized for Apple Silicon devices.

Use cases

Use cases for local-stt-mcp include transcribing meetings, creating subtitles for videos, developing voice-activated applications, and providing real-time transcription for live events.

How to use

To use local-stt-mcp, ensure you have Node.js 18+, install whisper.cpp and ffmpeg for audio format conversion. Clone the repository, install dependencies, build the project, and set up the models. Configure your MCP client to connect to the local-stt-mcp server.

Key features

Key features include 100% local processing for privacy, Apple Silicon optimization for 15x+ real-time transcription speed, speaker diarization for identifying multiple speakers, universal audio support with automatic format conversion, multiple output formats (txt, json, vtt, srt, csv), low memory usage (<2GB), and full type safety with TypeScript.

Where to use

local-stt-mcp can be used in various fields such as transcription services, accessibility tools for the hearing impaired, voice command applications, and any scenario requiring efficient and private speech-to-text conversion.

Content

Local Speech-to-Text MCP Server

A high-performance Model Context Protocol (MCP) server providing local speech-to-text transcription using whisper.cpp, optimized for Apple Silicon.

🎯 Features

  • 🏠 100% Local Processing: No cloud APIs, complete privacy
  • 🚀 Apple Silicon Optimized: 15x+ real-time transcription speed
  • 🎤 Speaker Diarization: Identify and separate multiple speakers
  • 🎵 Universal Audio Support: Automatic conversion from MP3, M4A, FLAC, and more
  • 📝 Multiple Output Formats: txt, json, vtt, srt, csv
  • 💾 Low Memory Footprint: <2GB memory usage
  • 🔧 TypeScript: Full type safety and modern development

🚀 Quick Start

Prerequisites

  • Node.js 18+
  • whisper.cpp (brew install whisper-cpp)
  • For audio format conversion: ffmpeg (brew install ffmpeg) - automatically handles MP3, M4A, FLAC, OGG, etc.
  • For speaker diarization: Python 3.8+ and HuggingFace token (free)

Supported Audio Formats

  • Native whisper.cpp formats: WAV, FLAC
  • Auto-converted formats: MP3, M4A, AAC, OGG, WMA, and more
  • Automatic conversion: Powered by ffmpeg with 16kHz/mono optimization for whisper.cpp
  • Format detection: Automatic format detection and conversion when needed

Installation

git clone https://github.com/your-username/local-stt-mcp.git
cd local-stt-mcp/mcp-server
npm install
npm run build

# Download whisper models
npm run setup:models

# For speaker diarization, set HuggingFace token
export HF_TOKEN="your_token_here"  # Get free token from huggingface.co

Speaker Diarization Note: Requires HuggingFace account and accepting pyannote/speaker-diarization-3.1 license.

MCP Client Configuration

Add to your MCP client configuration:

{
  "mcpServers": {
    "whisper-mcp": {
      "command": "node",
      "args": [
        "path/to/local-stt-mcp/mcp-server/dist/index.js"
      ]
    }
  }
}

🛠️ Available Tools

Tool Description
transcribe Basic audio transcription with automatic format conversion
transcribe_long Long audio file processing with chunking and format conversion
transcribe_with_speakers Speaker diarization and transcription with format support
list_models Show available whisper models
health_check System diagnostics
version Server version information

📊 Performance

Apple Silicon Benchmarks:

  • Processing Speed: 15.8x real-time (vs WhisperX 5.5x)
  • Memory Usage: <2GB (vs WhisperX ~4GB)
  • GPU Acceleration: ✅ Apple Neural Engine
  • Setup: Medium complexity but superior performance

See /benchmarks/ for detailed performance comparisons.

🏗️ Project Structure

mcp-server/
├── src/                    # TypeScript source code
│   ├── tools/             # MCP tool implementations
│   ├── whisper/           # whisper.cpp integration
│   ├── utils/             # Speaker diarization & utilities
│   └── types/             # Type definitions
├── dist/                  # Compiled JavaScript
└── python/                # Python dependencies

🔧 Development

# Build
npm run build

# Development mode (watch)
npm run dev

# Linting & formatting
npm run lint
npm run format

# Type checking
npm run type-check

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

Tools

No tools

Comments