- Explore MCP Servers
- web-voice-assistant
Web Voice Assistant
What is Web Voice Assistant
Web-voice-assistant is a web-based voice assistant that utilizes microphone input to capture voice, transcribes it using Whisper, processes it through OpenAI’s language model, and responds with synthesized speech using Text-to-Speech (TTS) technologies.
Use cases
Use cases include voice-activated customer support, interactive voice response systems, language learning applications, and accessibility tools for individuals with disabilities.
How to use
To use web-voice-assistant, you can either run it locally using Docker or build it from source. After setting it up, users can interact with the assistant by speaking into a microphone or via a phone call, and the assistant will respond with audio output.
Key features
Key features include real-time speech-to-text conversion using Whisper, integration with OpenAI’s language models for intelligent responses, and high-quality text-to-speech output using Coqui or Google TTS.
Where to use
Web-voice-assistant can be used in various fields such as customer service, virtual assistants, educational tools, and any application requiring voice interaction.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Overview
What is Web Voice Assistant
Web-voice-assistant is a web-based voice assistant that utilizes microphone input to capture voice, transcribes it using Whisper, processes it through OpenAI’s language model, and responds with synthesized speech using Text-to-Speech (TTS) technologies.
Use cases
Use cases include voice-activated customer support, interactive voice response systems, language learning applications, and accessibility tools for individuals with disabilities.
How to use
To use web-voice-assistant, you can either run it locally using Docker or build it from source. After setting it up, users can interact with the assistant by speaking into a microphone or via a phone call, and the assistant will respond with audio output.
Key features
Key features include real-time speech-to-text conversion using Whisper, integration with OpenAI’s language models for intelligent responses, and high-quality text-to-speech output using Coqui or Google TTS.
Where to use
Web-voice-assistant can be used in various fields such as customer service, virtual assistants, educational tools, and any application requiring voice interaction.
Clients Supporting MCP
The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.
Content
High-Level Voice Bot Flow:
User speaks into mic / phone call
Voice stream → Speech-to-Text (STT)
Transcribed text → LLM (e.g., GPT or agentic model)
LLM response → Text-to-Speech (TTS)
Send synthesized voice stream back to user
🎙️ Dockerized Voice Assistant (STT + OpenAI + TTS)
This project uses:
- 🗣️ Whisper (speech-to-text)
- 🤖 OpenAI GPT (agentic assistant for fintech)
- 🔊 Coqui TTS (text-to-speech)
- ⚡ Node.js Express server + HTML frontend
🚀 Run with Docker only UI part, for full e 2 e to work need to use docker compose.
- cd web-va
docker build -t voice-assistant . docker run -p 3000:3000 --name va voice-assistant Visit: http://localhost:3000 - For local build: npm install npm start
From root folder use docker compose to start both your project and ollama, which will use ollama llm image
docker-compose up --build
- Once containers are up, use below command to pull model in ollama running server inside docker container.
docker exec -it ollama-va ollama pull llama3
Dev Tools Supporting MCP
The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.










