- Explore MCP Servers
- chatterbox-mcp
Chatterbox Mcp
What is Chatterbox Mcp
Chatterbox-mcp is a simplified Model Context Protocol (MCP) server designed for text-to-speech (TTS) generation using the Chatterbox TTS model. It automates model loading and playback, providing real-time progress notifications.
Use cases
Use cases for chatterbox-mcp include creating voiceovers for videos, generating audio content for visually impaired users, developing interactive voice applications, and enhancing user engagement in educational platforms.
How to use
To use chatterbox-mcp, simply invoke the speak_text tool with the required text parameter. Optionally, you can adjust the exaggeration and cfg_weight parameters to customize the speech output. The server will handle model loading and playback automatically.
Key features
Key features include automatic model loading with progress notifications, real-time updates during speech generation, audio playback on macOS, and temporary file management with auto-cleanup.
Where to use
Chatterbox-mcp can be used in various fields such as education, entertainment, accessibility tools, and any application requiring text-to-speech functionality.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Overview
What is Chatterbox Mcp
Chatterbox-mcp is a simplified Model Context Protocol (MCP) server designed for text-to-speech (TTS) generation using the Chatterbox TTS model. It automates model loading and playback, providing real-time progress notifications.
Use cases
Use cases for chatterbox-mcp include creating voiceovers for videos, generating audio content for visually impaired users, developing interactive voice applications, and enhancing user engagement in educational platforms.
How to use
To use chatterbox-mcp, simply invoke the speak_text tool with the required text parameter. Optionally, you can adjust the exaggeration and cfg_weight parameters to customize the speech output. The server will handle model loading and playback automatically.
Key features
Key features include automatic model loading with progress notifications, real-time updates during speech generation, audio playback on macOS, and temporary file management with auto-cleanup.
Where to use
Chatterbox-mcp can be used in various fields such as education, entertainment, accessibility tools, and any application requiring text-to-speech functionality.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Content
Chatterbox TTS MCP Server
A simplified Model Context Protocol (MCP) server that provides text-to-speech generation with automatic playback using the Chatterbox TTS model. The server loads the model automatically on first use and provides real-time progress notifications to keep users informed throughout the process.
Overview
This MCP server exposes Chatterbox TTS functionality through a single, streamlined tool that generates speech from text and plays it automatically. The server handles model loading, progress reporting, temporary file management, and audio playback seamlessly.
Features
Single Tool: speak_text
The speak_text tool provides complete text-to-speech functionality:
-
Parameters:
text(required): The text to convert to speechexaggeration(optional): Controls expressiveness (0.0-1.0, default 0.5)cfg_weight(optional): Controls classifier-free guidance (0.0-1.0, default 0.5)
-
Features:
- Automatic model loading with progress notifications
- Generates speech using temporary files (auto-cleanup)
- Plays audio automatically on macOS using
afplay - Real-time progress updates during all phases:
- Model initialization and loading
- Speech generation
- Audio playback
Resource: chatterbox://model-info
Get information about the TTS model status and device capabilities:
- Model loading status (loaded/not loaded)
- Device information (MPS/CUDA/CPU)
- Hardware acceleration availability
Progress Notifications
The server provides detailed progress notifications throughout the speech generation process:
-
Model Loading Phase:
- “Loading Chatterbox TTS model…”
- “Initializing PyTorch device…”
- “Loading model weights…”
- “Model loaded successfully!”
-
Speech Generation Phase:
- “Starting speech generation…”
- “Speech generated, saving to temporary file…”
-
Playback Phase:
- “Playing audio…”
- “Audio playback completed!”
-
Status Updates:
- Device selection (MPS/CUDA/CPU)
- Voice prompt usage when applicable
- Success/error messages
Installation
-
Install dependencies:
pip install mcp torch torchaudio -
Install Chatterbox TTS:
Follow the Chatterbox TTS installation instructions to ensure thechatterbox.ttsmodule is available.
Configuration
Audio File Storage
By default, the server stores audio files in ~/.chatterbox/audio. You can configure a custom location using:
Command line argument:
python chatterbox_mcp_server.py --audio-dir /path/to/custom/audio/directory
Environment variable:
export CHATTERBOX_AUDIO_DIR="/path/to/custom/audio/directory"
python chatterbox_mcp_server.py
Priority order:
- Command line
--audio-dirargument (highest priority) CHATTERBOX_AUDIO_DIRenvironment variable- Default:
~/.chatterbox/audio(lowest priority)
Audio File TTL (Time To Live)
By default, audio files are automatically cleaned up after 1 hour. You can configure a custom TTL:
Command line argument:
python chatterbox_mcp_server.py --audio-ttl-hours 24 # Keep files for 24 hours
Environment variable:
export CHATTERBOX_AUDIO_TTL_HOURS=24
python chatterbox_mcp_server.py
Priority order:
- Command line
--audio-ttl-hoursargument (highest priority) CHATTERBOX_AUDIO_TTL_HOURSenvironment variable- Default: 1 hour (lowest priority)
Model Auto-Loading
By default, the TTS model is loaded on first use to minimize startup time. You can pre-load it at startup:
Command line argument:
python chatterbox_mcp_server.py --auto-load-model
This will load the model during server startup, which takes a few seconds but ensures the first TTS request is faster.
Audio Storage Features:
- Audio files are stored persistently with configurable automatic cleanup
- Files are accessible via
chatterbox://audio/{resource_id}resources - Directory is created automatically if it doesn’t exist
- Supports relative paths (will be expanded) and
~home directory notation
Usage
Running the Server
Standalone:
python chatterbox_mcp_server.py
With MCP tools:
mcp dev chatterbox_mcp_server.py
Integration with Claude Desktop
Add to your Claude Desktop MCP configuration:
Basic configuration:
{
"mcpServers": {
"chatterbox-tts": {
"command": "python",
"args": [
"/path/to/chatterbox_mcp_server.py"
],
"env": {}
}
}
}
With custom configuration:
{
"mcpServers": {
"chatterbox-tts": {
"command": "python",
"args": [
"/path/to/chatterbox_mcp_server.py",
"--audio-dir",
"/custom/audio/path",
"--auto-load-model",
"--audio-ttl-hours",
"24"
],
"env": {
"CHATTERBOX_AUDIO_DIR": "/custom/audio/path",
"CHATTERBOX_AUDIO_TTL_HOURS": "24"
}
}
}
}
Example Usage from LLM
-
Basic text-to-speech:
Please use the speak_text tool to say "Hello, welcome to the Chatterbox TTS demonstration!" -
Expressive speech:
Use speak_text to generate enthusiastic speech for "This is amazing!" with high expressiveness
The tool will automatically:
- Load the model if needed (with progress updates)
- Generate the speech
- Play the audio
- Clean up temporary files
- Provide status updates throughout
Technical Details
Device Support
- Apple Silicon (M1/M2/M3/M4): Uses MPS acceleration when available
- NVIDIA GPUs: Uses CUDA when available
- CPU fallback: Works on any system
Audio Processing
- Uses temporary files for audio storage
- Automatic cleanup after playback
- WAV format output
- High-quality audio generation
Model Management
- Model loads once on first use
- Shared across all subsequent requests
- Thread-safe loading with progress tracking
- Automatic device detection and optimization
File Structure
chatterbox-mcp/ ├── chatterbox_mcp_server.py # MCP server implementation └── README.md # This documentation
Development
Key Improvements in This Version
- Simplified Interface: Single
speak_texttool instead of multiple tools - Automatic Playback: No need to manually play generated files
- Progress Notifications: Real-time updates on model loading and generation
- Persistent Audio Storage: Audio files are stored with configurable automatic cleanup
- Better Error Handling: Comprehensive error reporting and recovery
- Streamlined Workflow: One command generates and plays speech
Troubleshooting
Common Issues:
-
Model loading slow:
- First-time loading downloads model weights
- Progress notifications show current status
- Subsequent uses are much faster
-
Audio playback issues:
afplaycommand is macOS-specific- Ensure system audio is working
- Check volume settings
-
Memory issues:
- Model requires significant GPU/CPU memory
- Monitor system resources during loading
- Consider closing other applications
-
Device selection:
- Server automatically selects best available device
- Check model info resource for current device
- MPS (Apple Silicon) > CUDA (NVIDIA) > CPU
License
This MCP server implementation follows the same license as the underlying Chatterbox TTS model.
Dev Tools Supporting MCP
The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.










