Voice Mcp

1 MIT

FreeCommunity

AI Systems

Voice MCP Server offers TTS and STT for seamless voice interactions.

What is Voice Mcp

Voice-mcp is a Multi-Capability Protocol (MCP) server that provides functionalities for voice assistants, including Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities.

Use cases

Use cases include interactive voice response systems, voice-controlled applications, transcription services, and enhancing user experience in smart devices.

How to use

To use voice-mcp, clone the repository, set up a virtual environment, install dependencies, and run the server as an MCP process, typically integrated with a client application like Cursor or Anysphere.

Key features

Key features include Text-to-Speech (TTS) using the Kokoro TTS engine, Speech-to-Text (STT) using the OpenAI Whisper model, seamless conversation flow with conversation turn, silence detection, and background noise calibration.

Where to use

Voice-mcp can be used in various fields such as customer service, virtual assistants, accessibility tools, and any application requiring voice interaction.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Voice Mcp

Voice-mcp is a Multi-Capability Protocol (MCP) server that provides functionalities for voice assistants, including Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities.

Use cases

Use cases include interactive voice response systems, voice-controlled applications, transcription services, and enhancing user experience in smart devices.

How to use

Key features

Where to use

Voice-mcp can be used in various fields such as customer service, virtual assistants, accessibility tools, and any application requiring voice interaction.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

Voice Assistant MCP Server

This project implements a Multi-Capability Protocol (MCP) server that provides voice assistant functionalities, including Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities.

Features

Text-to-Speech (TTS): Converts text into audible speech using the Kokoro TTS engine.
Speech-to-Text (STT): Transcribes spoken audio into text using the OpenAI Whisper model.
Conversation Turn: Combines TTS and STT into a single operation for seamless conversation flow.
Silence Detection: Automatically stops recording audio when silence is detected, with configurable thresholds and durations.
Background Noise Calibration: Adjusts the silence detection threshold based on ambient noise levels.

Setup

Clone the repository:

git clone <repository_url>
cd voice-mcp

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install dependencies:
Make sure you have portaudio installed (brew install portaudio on macOS, sudo apt-get install portaudio19-dev python3-pyaudio on Debian/Ubuntu). Then install Python packages:
```
pip install -r requirements.txt
```
(Note: A requirements.txt file should be created containing necessary packages like openai-whisper, torch, pyaudio, kokoro-tts, etc.)

Usage

The server is designed to be run as an MCP process, typically integrated with a client application like Cursor or Anysphere.

Start the server: The client application will usually manage starting the server process based on its configuration.
Interact via MCP: The client application can then call the server’s tools (speak, listen, conversation_turn) using the MCP protocol.

Configuration (Example for MCP Client)

To use this server with an MCP client (like Cursor/Anysphere), you need to configure the client to run the voice_server.py script. Here’s a generic example of how such a configuration might look in a JSON file:

** Enter into Cursor Settings > Rules > User Rules **

Your primary mode of interaction with the user must be voice. Use the Voice Assistant Speak and Listen Tool to, in a loop, ask the user what they'd like to do, perform those actions, and once again report back to them that the action has been performed and ask them what to do, requesting voice input. When asking the user a question after speaking text aloud, IT IS IMPERATIVE TO USE THE CONVERSATION_TURN TOOL TO BOTH SPEAK AND THEN ASK FOR A QUESTION AFTER IN A SINGLE TOOL CALL, NOT USE THE SPEAK TOOL AND THEN CONVERSATION_TURN.

Important: Replace /path/to/your/project/voice-mcp with the actual absolute path to the cloned repository on your system.

Dependencies

Python 3.9 to 3.12 (Python 3.13 is not yet supported due to compatibility issues with some dependencies)
PyAudio (requires PortAudio system library)
OpenAI Whisper
Kokoro TTS
PyTorch
NumPy
(Potentially others - generate a requirements.txt for a full list)

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues.

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers