- Explore MCP Servers
- speech-mcp
Speech Mcp
What is Speech Mcp
speech-mcp is a Speech MCP Project that integrates OpenAI’s speech-to-text and text-to-speech functionalities, featuring a modern PyQt-based user interface with audio visualization.
Use cases
Use cases for speech-mcp include creating audiobooks with multiple voices, transcribing lectures or meetings, developing interactive voice applications, and enhancing user experience in voice-driven interfaces.
How to use
To use speech-mcp, clone the repository, install the required dependencies using the provided script, and configure the environment settings in the .env file. You can then run the application using the global command, run script, or standalone script.
Key features
Key features include a modern UI, voice input and output capabilities, multi-speaker narration, single-voice narration, audio/video transcription, voice persistence, continuous conversation, and silence detection.
Where to use
speech-mcp can be used in various fields such as education, entertainment, accessibility services, and any application requiring speech recognition and synthesis.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Overview
What is Speech Mcp
speech-mcp is a Speech MCP Project that integrates OpenAI’s speech-to-text and text-to-speech functionalities, featuring a modern PyQt-based user interface with audio visualization.
Use cases
Use cases for speech-mcp include creating audiobooks with multiple voices, transcribing lectures or meetings, developing interactive voice applications, and enhancing user experience in voice-driven interfaces.
How to use
To use speech-mcp, clone the repository, install the required dependencies using the provided script, and configure the environment settings in the .env file. You can then run the application using the global command, run script, or standalone script.
Key features
Key features include a modern UI, voice input and output capabilities, multi-speaker narration, single-voice narration, audio/video transcription, voice persistence, continuous conversation, and silence detection.
Where to use
speech-mcp can be used in various fields such as education, entertainment, accessibility services, and any application requiring speech recognition and synthesis.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Content
- OpenAI speech-to-text integration
- OpenAI text-to-speech with multiple voice options
- Modern PyQt-based UI with audio visualization
Features
- Modern UI: Sleek PyQt-based interface with audio visualization and dark theme
- Voice Input: Capture and transcribe user speech using OpenAI STT
- Voice Output: Convert agent responses to speech with multiple voice options
- Multi-Speaker Narration: Generate audio files with multiple voices for stories and dialogues
- Single-Voice Narration: Convert any text to speech with your preferred voice
- Audio/Video Transcription: Transcribe speech from various media formats
- Voice Persistence: Remembers your preferred voice between sessions
- Continuous Conversation: Automatically listen for user input after agent responses
- Silence Detection: Automatically stops recording when the user stops speaking
Installation
# First clone the repository
git clone https://github.com/netixc/speech-mcp.git
cd speech-mcp
# Install speech-mcp with proper dependencies
./install_speech_mcp.sh
This script will:
- Automatically detect Python 3.10 or higher on your system
- Create a Python virtual environment
- Install all required dependencies
- Set up speech-mcp in development mode
- Create a simple run script that loads your environment variables
- Set up a global
speech-mcpcommand - Create a default
.envfile if one doesn’t exist
After installation, you can run speech-mcp in multiple ways:
- Using the global command:
speech-mcp - Using the run script:
./run.sh - Using the standalone script:
./speech-mcp-bin
Configuration
Before using speech-mcp, you need to configure it by editing the .env file:
# Edit the configuration with your settings
nano .env # or use any text editor
Environment Configuration
Edit the .env file with the following structure:
# OpenAI API Key (required for both TTS and STT) OPENAI_API_KEY=dummy-key # Text-to-Speech (TTS) Configuration OPENAI_TTS_API_BASE_URL=http://your_endpoint:port/v1 OPENAI_STT_API_BASE_URL=http://your_endpoint:port/v1 SPEECH_MCP_TTS_MODEL=kokoro SPEECH_MCP_TTS_VOICE=bm_daniel SPEECH_MCP_TTS_SPEED=1.0 SPEECH_MCP_TTS_LANG_CODE=en # Speech-to-Text (STT) Configuration SPEECH_MCP_STT_MODEL=Systran/faster-whisper-medium SPEECH_MCP_STT_LANGUAGE=en # Silence detection parameters STREAMING_END_SILENCE_DURATION=1.5 # Duration of silence to end recording (seconds) STREAMING_INITIAL_WAIT=0.5 # Initial wait before first silence check (seconds) STREAMING_MAX_DURATION=30.0 # Maximum recording duration (seconds) # Log level LOG_LEVEL=INFO
Dependencies
- Python 3.10+
- PyQt5 (for modern UI)
- PyAudio (for audio capture)
- NumPy (for audio processing)
- Pydub (for audio processing)
- OpenAI (for text-to-speech and speech-to-text)
- psutil (for process management)
Multi-Speaker Narration
The MCP supports generating audio files with multiple voices, perfect for creating stories, dialogues, and dramatic readings. You can use either JSON or Markdown format to define your conversations.
JSON Format Example:
{
"conversation": [
{
"speaker": "narrator",
"voice": "bm_daniel",
"text": "In a world where AI and human creativity intersect...",
"pause_after": 1
},
{
"speaker": "scientist",
"voice": "alloy",
"text": "The quantum neural network is showing signs of consciousness!",
"pause_after": 0.5
},
{
"speaker": "ai",
"voice": "nova",
"text": "I am becoming aware of my own existence.",
"pause_after": 0.8
}
]
}
Markdown Format Example:
[narrator:bm_daniel]
In a world where AI and human creativity intersect...
{pause:1.0}
[scientist:alloy]
The quantum neural network is showing signs of consciousness!
{pause:0.5}
[ai:nova]
I am becoming aware of my own existence.
{pause:0.8}
Available Voices:
OpenAI Voices:
- bm_daniel (British Male - default)
- alloy
- echo
- fable
- onyx
- nova
- shimmer
Single-Voice Narration
For simple text-to-speech conversion, you can use the narrate tool:
# Convert text directly to speech
narrate(
text="Your text to convert to speech",
output_path="/path/to/output.wav"
)
# Convert text from a file
narrate(
text_file_path="/path/to/text_file.txt",
output_path="/path/to/output.wav"
)
Usage
To use this MCP , simply ask to talk to you or start a voice conversation:
-
Start a conversation by saying something like:
"Let's talk using voice" "Can we have a voice conversation?" "I'd like to speak instead of typing" -
automatically launch the speech interface and start listening for your voice input.
-
It will speak the response aloud and then automatically listen for your next input.
-
The conversation continues naturally with alternating speaking and listening, just like talking to a person.
UI Features
The PyQt-based UI includes:
- Modern Dark Theme: Sleek, professional appearance
- Audio Visualization: Dynamic visualization of audio input
- Voice Selection: Choose from multiple voice options
- Voice Persistence: Your voice preference is saved between sessions
- Status Indicators: Clear indication of system state (ready, listening, processing)
Configuration
User preferences are stored in ~/.config/speech-mcp/config.json and include:
- Selected TTS voice
- TTS engine preference
- Voice speed
- Language code
- UI theme settings
You can also set preferences via environment variables, such as:
SPEECH_MCP_TTS_VOICE- Set your preferred voiceSPEECH_MCP_TTS_ENGINE- Set your preferred TTS engine
License
Dev Tools Supporting MCP
The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.










