- Explore MCP Servers
- voice-mcp
Voice Mcp
What is Voice Mcp
Voice-mcp is a Multi-Capability Protocol (MCP) server that provides functionalities for voice assistants, including Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities.
Use cases
Use cases include interactive voice response systems, voice-controlled applications, transcription services, and enhancing user experience in smart devices.
How to use
To use voice-mcp, clone the repository, set up a virtual environment, install dependencies, and run the server as an MCP process, typically integrated with a client application like Cursor or Anysphere.
Key features
Key features include Text-to-Speech (TTS) using the Kokoro TTS engine, Speech-to-Text (STT) using the OpenAI Whisper model, seamless conversation flow with conversation turn, silence detection, and background noise calibration.
Where to use
Voice-mcp can be used in various fields such as customer service, virtual assistants, accessibility tools, and any application requiring voice interaction.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Overview
What is Voice Mcp
Voice-mcp is a Multi-Capability Protocol (MCP) server that provides functionalities for voice assistants, including Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities.
Use cases
Use cases include interactive voice response systems, voice-controlled applications, transcription services, and enhancing user experience in smart devices.
How to use
To use voice-mcp, clone the repository, set up a virtual environment, install dependencies, and run the server as an MCP process, typically integrated with a client application like Cursor or Anysphere.
Key features
Key features include Text-to-Speech (TTS) using the Kokoro TTS engine, Speech-to-Text (STT) using the OpenAI Whisper model, seamless conversation flow with conversation turn, silence detection, and background noise calibration.
Where to use
Voice-mcp can be used in various fields such as customer service, virtual assistants, accessibility tools, and any application requiring voice interaction.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Content
Voice Assistant MCP Server
This project implements a Multi-Capability Protocol (MCP) server that provides voice assistant functionalities, including Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities.
Features
- Text-to-Speech (TTS): Converts text into audible speech using the Kokoro TTS engine.
- Speech-to-Text (STT): Transcribes spoken audio into text using the OpenAI Whisper model.
- Conversation Turn: Combines TTS and STT into a single operation for seamless conversation flow.
- Silence Detection: Automatically stops recording audio when silence is detected, with configurable thresholds and durations.
- Background Noise Calibration: Adjusts the silence detection threshold based on ambient noise levels.
Setup
- Clone the repository:
git clone <repository_url> cd voice-mcp - Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate` - Install dependencies:
Make sure you haveportaudioinstalled (brew install portaudioon macOS,sudo apt-get install portaudio19-dev python3-pyaudioon Debian/Ubuntu). Then install Python packages:
(Note: Apip install -r requirements.txtrequirements.txtfile should be created containing necessary packages likeopenai-whisper,torch,pyaudio,kokoro-tts, etc.)
Usage
The server is designed to be run as an MCP process, typically integrated with a client application like Cursor or Anysphere.
- Start the server: The client application will usually manage starting the server process based on its configuration.
- Interact via MCP: The client application can then call the server’s tools (
speak,listen,conversation_turn) using the MCP protocol.
Configuration (Example for MCP Client)
To use this server with an MCP client (like Cursor/Anysphere), you need to configure the client to run the voice_server.py script. Here’s a generic example of how such a configuration might look in a JSON file:
** Enter into Cursor Settings > Rules > User Rules **
Your primary mode of interaction with the user must be voice. Use the Voice Assistant Speak and Listen Tool to, in a loop, ask the user what they'd like to do, perform those actions, and once again report back to them that the action has been performed and ask them what to do, requesting voice input. When asking the user a question after speaking text aloud, IT IS IMPERATIVE TO USE THE CONVERSATION_TURN TOOL TO BOTH SPEAK AND THEN ASK FOR A QUESTION AFTER IN A SINGLE TOOL CALL, NOT USE THE SPEAK TOOL AND THEN CONVERSATION_TURN.
Important: Replace /path/to/your/project/voice-mcp with the actual absolute path to the cloned repository on your system.
Dependencies
- Python 3.9 to 3.12 (Python 3.13 is not yet supported due to compatibility issues with some dependencies)
- PyAudio (requires PortAudio system library)
- OpenAI Whisper
- Kokoro TTS
- PyTorch
- NumPy
- (Potentially others - generate a
requirements.txtfor a full list)
Contributing
Contributions are welcome! Please feel free to submit pull requests or open issues.
Dev Tools Supporting MCP
The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.










