Speech Mcp

@netixcon 9 months ago

1 MIT

FreeCommunity

AI Systems

Speech MCP Project

What is Speech Mcp

speech-mcp is a Speech MCP Project that integrates OpenAI’s speech-to-text and text-to-speech functionalities, featuring a modern PyQt-based user interface with audio visualization.

Use cases

Use cases for speech-mcp include creating audiobooks with multiple voices, transcribing lectures or meetings, developing interactive voice applications, and enhancing user experience in voice-driven interfaces.

How to use

To use speech-mcp, clone the repository, install the required dependencies using the provided script, and configure the environment settings in the .env file. You can then run the application using the global command, run script, or standalone script.

Key features

Key features include a modern UI, voice input and output capabilities, multi-speaker narration, single-voice narration, audio/video transcription, voice persistence, continuous conversation, and silence detection.

Where to use

speech-mcp can be used in various fields such as education, entertainment, accessibility services, and any application requiring speech recognition and synthesis.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Speech Mcp

speech-mcp is a Speech MCP Project that integrates OpenAI’s speech-to-text and text-to-speech functionalities, featuring a modern PyQt-based user interface with audio visualization.

Use cases

How to use

Key features

Where to use

speech-mcp can be used in various fields such as education, entertainment, accessibility services, and any application requiring speech recognition and synthesis.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

OpenAI speech-to-text integration
OpenAI text-to-speech with multiple voice options
Modern PyQt-based UI with audio visualization

Features

Modern UI: Sleek PyQt-based interface with audio visualization and dark theme
Voice Input: Capture and transcribe user speech using OpenAI STT
Voice Output: Convert agent responses to speech with multiple voice options
Multi-Speaker Narration: Generate audio files with multiple voices for stories and dialogues
Single-Voice Narration: Convert any text to speech with your preferred voice
Audio/Video Transcription: Transcribe speech from various media formats
Voice Persistence: Remembers your preferred voice between sessions
Continuous Conversation: Automatically listen for user input after agent responses
Silence Detection: Automatically stops recording when the user stops speaking

Installation

# First clone the repository
git clone https://github.com/netixc/speech-mcp.git
cd speech-mcp

# Install speech-mcp with proper dependencies
./install_speech_mcp.sh

This script will:

Automatically detect Python 3.10 or higher on your system
Create a Python virtual environment
Install all required dependencies
Set up speech-mcp in development mode
Create a simple run script that loads your environment variables
Set up a global speech-mcp command
Create a default .env file if one doesn’t exist

After installation, you can run speech-mcp in multiple ways:

Using the global command: speech-mcp
Using the run script: ./run.sh
Using the standalone script: ./speech-mcp-bin

Configuration

Before using speech-mcp, you need to configure it by editing the .env file:

# Edit the configuration with your settings
nano .env  # or use any text editor

Environment Configuration

Edit the .env file with the following structure:

# OpenAI API Key (required for both TTS and STT)
OPENAI_API_KEY=dummy-key

# Text-to-Speech (TTS) Configuration
OPENAI_TTS_API_BASE_URL=http://your_endpoint:port/v1
OPENAI_STT_API_BASE_URL=http://your_endpoint:port/v1

SPEECH_MCP_TTS_MODEL=kokoro
SPEECH_MCP_TTS_VOICE=bm_daniel
SPEECH_MCP_TTS_SPEED=1.0
SPEECH_MCP_TTS_LANG_CODE=en

# Speech-to-Text (STT) Configuration
SPEECH_MCP_STT_MODEL=Systran/faster-whisper-medium
SPEECH_MCP_STT_LANGUAGE=en

# Silence detection parameters
STREAMING_END_SILENCE_DURATION=1.5  # Duration of silence to end recording (seconds)
STREAMING_INITIAL_WAIT=0.5  # Initial wait before first silence check (seconds)
STREAMING_MAX_DURATION=30.0  # Maximum recording duration (seconds)

# Log level
LOG_LEVEL=INFO

Dependencies

Python 3.10+
PyQt5 (for modern UI)
PyAudio (for audio capture)
NumPy (for audio processing)
Pydub (for audio processing)
OpenAI (for text-to-speech and speech-to-text)
psutil (for process management)

Multi-Speaker Narration

The MCP supports generating audio files with multiple voices, perfect for creating stories, dialogues, and dramatic readings. You can use either JSON or Markdown format to define your conversations.

JSON Format Example:

{
  "conversation": [
    {
      "speaker": "narrator",
      "voice": "bm_daniel",
      "text": "In a world where AI and human creativity intersect...",
      "pause_after": 1
    },
    {
      "speaker": "scientist",
      "voice": "alloy",
      "text": "The quantum neural network is showing signs of consciousness!",
      "pause_after": 0.5
    },
    {
      "speaker": "ai",
      "voice": "nova",
      "text": "I am becoming aware of my own existence.",
      "pause_after": 0.8
    }
  ]
}

Markdown Format Example:

[narrator:bm_daniel]
In a world where AI and human creativity intersect...
{pause:1.0}

[scientist:alloy]
The quantum neural network is showing signs of consciousness!
{pause:0.5}

[ai:nova]
I am becoming aware of my own existence.
{pause:0.8}

Available Voices:

OpenAI Voices:

bm_daniel (British Male - default)
alloy
echo
fable
onyx
nova
shimmer

Single-Voice Narration

For simple text-to-speech conversion, you can use the narrate tool:

# Convert text directly to speech
narrate(
    text="Your text to convert to speech",
    output_path="/path/to/output.wav"
)

# Convert text from a file
narrate(
    text_file_path="/path/to/text_file.txt",
    output_path="/path/to/output.wav"
)

Usage

To use this MCP , simply ask to talk to you or start a voice conversation:

Start a conversation by saying something like:

"Let's talk using voice"
"Can we have a voice conversation?"
"I'd like to speak instead of typing"

automatically launch the speech interface and start listening for your voice input.
It will speak the response aloud and then automatically listen for your next input.
The conversation continues naturally with alternating speaking and listening, just like talking to a person.

UI Features

The PyQt-based UI includes:

Modern Dark Theme: Sleek, professional appearance
Audio Visualization: Dynamic visualization of audio input
Voice Selection: Choose from multiple voice options
Voice Persistence: Your voice preference is saved between sessions
Status Indicators: Clear indication of system state (ready, listening, processing)

Configuration

User preferences are stored in ~/.config/speech-mcp/config.json and include:

Selected TTS voice
TTS engine preference
Voice speed
Language code
UI theme settings

You can also set preferences via environment variables, such as:

SPEECH_MCP_TTS_VOICE - Set your preferred voice
SPEECH_MCP_TTS_ENGINE - Set your preferred TTS engine

License

MIT License

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers