Chatterbox Mcp

3 MIT

FreeCommunity

AI Systems

Chatterbox TTS MCP Server for real-time text-to-speech generation and playback.

What is Chatterbox Mcp

Chatterbox-mcp is a simplified Model Context Protocol (MCP) server designed for text-to-speech (TTS) generation using the Chatterbox TTS model. It automates model loading and playback, providing real-time progress notifications.

Use cases

Use cases for chatterbox-mcp include creating voiceovers for videos, generating audio content for visually impaired users, developing interactive voice applications, and enhancing user engagement in educational platforms.

How to use

To use chatterbox-mcp, simply invoke the speak_text tool with the required text parameter. Optionally, you can adjust the exaggeration and cfg_weight parameters to customize the speech output. The server will handle model loading and playback automatically.

Key features

Key features include automatic model loading with progress notifications, real-time updates during speech generation, audio playback on macOS, and temporary file management with auto-cleanup.

Where to use

Chatterbox-mcp can be used in various fields such as education, entertainment, accessibility tools, and any application requiring text-to-speech functionality.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Chatterbox Mcp

Use cases

How to use

Key features

Key features include automatic model loading with progress notifications, real-time updates during speech generation, audio playback on macOS, and temporary file management with auto-cleanup.

Where to use

Chatterbox-mcp can be used in various fields such as education, entertainment, accessibility tools, and any application requiring text-to-speech functionality.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

Chatterbox TTS MCP Server

A simplified Model Context Protocol (MCP) server that provides text-to-speech generation with automatic playback using the Chatterbox TTS model. The server loads the model automatically on first use and provides real-time progress notifications to keep users informed throughout the process.

Overview

This MCP server exposes Chatterbox TTS functionality through a single, streamlined tool that generates speech from text and plays it automatically. The server handles model loading, progress reporting, temporary file management, and audio playback seamlessly.

Features

Single Tool: speak_text

The speak_text tool provides complete text-to-speech functionality:

Parameters:
- text (required): The text to convert to speech
- exaggeration (optional): Controls expressiveness (0.0-1.0, default 0.5)
- cfg_weight (optional): Controls classifier-free guidance (0.0-1.0, default 0.5)
Features:
- Automatic model loading with progress notifications
- Generates speech using temporary files (auto-cleanup)
- Plays audio automatically on macOS using afplay
- Real-time progress updates during all phases:
  - Model initialization and loading
  - Speech generation
  - Audio playback

Resource: chatterbox://model-info

Get information about the TTS model status and device capabilities:

Model loading status (loaded/not loaded)
Device information (MPS/CUDA/CPU)
Hardware acceleration availability

Progress Notifications

The server provides detailed progress notifications throughout the speech generation process:

Model Loading Phase:
- “Loading Chatterbox TTS model…”
- “Initializing PyTorch device…”
- “Loading model weights…”
- “Model loaded successfully!”
Speech Generation Phase:
- “Starting speech generation…”
- “Speech generated, saving to temporary file…”
Playback Phase:
- “Playing audio…”
- “Audio playback completed!”
Status Updates:
- Device selection (MPS/CUDA/CPU)
- Voice prompt usage when applicable
- Success/error messages

Installation

Install dependencies:
```
pip install mcp torch torchaudio
```
Install Chatterbox TTS:
Follow the Chatterbox TTS installation instructions to ensure the chatterbox.tts module is available.

Configuration

Audio File Storage

By default, the server stores audio files in ~/.chatterbox/audio. You can configure a custom location using:

Command line argument:

python chatterbox_mcp_server.py --audio-dir /path/to/custom/audio/directory

Environment variable:

export CHATTERBOX_AUDIO_DIR="/path/to/custom/audio/directory"
python chatterbox_mcp_server.py

Priority order:

Command line --audio-dir argument (highest priority)
CHATTERBOX_AUDIO_DIR environment variable
Default: ~/.chatterbox/audio (lowest priority)

Audio File TTL (Time To Live)

By default, audio files are automatically cleaned up after 1 hour. You can configure a custom TTL:

Command line argument:

python chatterbox_mcp_server.py --audio-ttl-hours 24  # Keep files for 24 hours

Environment variable:

export CHATTERBOX_AUDIO_TTL_HOURS=24
python chatterbox_mcp_server.py

Priority order:

Command line --audio-ttl-hours argument (highest priority)
CHATTERBOX_AUDIO_TTL_HOURS environment variable
Default: 1 hour (lowest priority)

Model Auto-Loading

By default, the TTS model is loaded on first use to minimize startup time. You can pre-load it at startup:

Command line argument:

python chatterbox_mcp_server.py --auto-load-model

This will load the model during server startup, which takes a few seconds but ensures the first TTS request is faster.

Audio Storage Features:

Audio files are stored persistently with configurable automatic cleanup
Files are accessible via chatterbox://audio/{resource_id} resources
Directory is created automatically if it doesn’t exist
Supports relative paths (will be expanded) and ~ home directory notation

Usage

Running the Server

Standalone:

python chatterbox_mcp_server.py

With MCP tools:

mcp dev chatterbox_mcp_server.py

Integration with Claude Desktop

Add to your Claude Desktop MCP configuration:

Basic configuration:

{
  "mcpServers": {
    "chatterbox-tts": {
      "command": "python",
      "args": [
        "/path/to/chatterbox_mcp_server.py"
      ],
      "env": {}
    }
  }
}

With custom configuration:

{
  "mcpServers": {
    "chatterbox-tts": {
      "command": "python",
      "args": [
        "/path/to/chatterbox_mcp_server.py",
        "--audio-dir",
        "/custom/audio/path",
        "--auto-load-model",
        "--audio-ttl-hours",
        "24"
      ],
      "env": {
        "CHATTERBOX_AUDIO_DIR": "/custom/audio/path",
        "CHATTERBOX_AUDIO_TTL_HOURS": "24"
      }
    }
  }
}

Example Usage from LLM

Basic text-to-speech:

Please use the speak_text tool to say "Hello, welcome to the Chatterbox TTS demonstration!"

Expressive speech:

Use speak_text to generate enthusiastic speech for "This is amazing!" with high expressiveness

The tool will automatically:

Load the model if needed (with progress updates)
Generate the speech
Play the audio
Clean up temporary files
Provide status updates throughout

Technical Details

Device Support

Apple Silicon (M1/M2/M3/M4): Uses MPS acceleration when available
NVIDIA GPUs: Uses CUDA when available
CPU fallback: Works on any system

Audio Processing

Uses temporary files for audio storage
Automatic cleanup after playback
WAV format output
High-quality audio generation

Model Management

Model loads once on first use
Shared across all subsequent requests
Thread-safe loading with progress tracking
Automatic device detection and optimization

File Structure

chatterbox-mcp/
├── chatterbox_mcp_server.py    # MCP server implementation
└── README.md                   # This documentation

Development

Key Improvements in This Version

Simplified Interface: Single speak_text tool instead of multiple tools
Automatic Playback: No need to manually play generated files
Progress Notifications: Real-time updates on model loading and generation
Persistent Audio Storage: Audio files are stored with configurable automatic cleanup
Better Error Handling: Comprehensive error reporting and recovery
Streamlined Workflow: One command generates and plays speech

Troubleshooting

Common Issues:

Model loading slow:
- First-time loading downloads model weights
- Progress notifications show current status
- Subsequent uses are much faster
Audio playback issues:
- afplay command is macOS-specific
- Ensure system audio is working
- Check volume settings
Memory issues:
- Model requires significant GPU/CPU memory
- Monitor system resources during loading
- Consider closing other applications
Device selection:
- Server automatically selects best available device
- Check model info resource for current device
- MPS (Apple Silicon) > CUDA (NVIDIA) > CPU

License

This MCP server implementation follows the same license as the underlying Chatterbox TTS model.

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers