Web Voice Assistant

@sjmishra1976on 10 months ago

2 MIT

FreeCommunity

AI Systems

Web-based Voice Assistant that takes mic input, transcribes it using Whisper, feeds it to OpenAI (via your MCP agent), and responds using TTS (like Coqui or Google TTS).

What is Web Voice Assistant

Web-voice-assistant is a web-based voice assistant that utilizes microphone input to capture voice, transcribes it using Whisper, processes it through OpenAI’s language model, and responds with synthesized speech using Text-to-Speech (TTS) technologies.

Use cases

Use cases include voice-activated customer support, interactive voice response systems, language learning applications, and accessibility tools for individuals with disabilities.

How to use

To use web-voice-assistant, you can either run it locally using Docker or build it from source. After setting it up, users can interact with the assistant by speaking into a microphone or via a phone call, and the assistant will respond with audio output.

Key features

Key features include real-time speech-to-text conversion using Whisper, integration with OpenAI’s language models for intelligent responses, and high-quality text-to-speech output using Coqui or Google TTS.

Where to use

Web-voice-assistant can be used in various fields such as customer service, virtual assistants, educational tools, and any application requiring voice interaction.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Overview

What is Web Voice Assistant

Use cases

Use cases include voice-activated customer support, interactive voice response systems, language learning applications, and accessibility tools for individuals with disabilities.

How to use

Key features

Where to use

Web-voice-assistant can be used in various fields such as customer service, virtual assistants, educational tools, and any application requiring voice interaction.

Clients Supporting MCP

The following are the main client software that supports the Model Context Protocol. Click the link to visit the official website for more information.

Claude Desktop: Official desktop application from Anthropic, natively supports MCP protocol. claude.ai

Cherry Studio: Cross-platform desktop client supporting multiple LLM providers, built-in MCP server support. cherry-ai.com

LobeChat: Modern open-source ChatGPT/LLMs UI, supports MCP protocol integration. lobehub.com

DeepChat: Cross-platform desktop AI assistant, compatible with MCP protocol, focusing on privacy and efficiency. deepchat.thinkinai.xyz

5ire: Cross-platform open-source desktop intelligent assistant MCP client, supports local knowledge base and MCP server. 5ire.app

View More MCP Clients

Content

High-Level Voice Bot Flow:

User speaks into mic / phone call

Voice stream → Speech-to-Text (STT)

Transcribed text → LLM (e.g., GPT or agentic model)

LLM response → Text-to-Speech (TTS)

Send synthesized voice stream back to user

🎙️ Dockerized Voice Assistant (STT + OpenAI + TTS)

This project uses:

🗣️ Whisper (speech-to-text)
🤖 OpenAI GPT (agentic assistant for fintech)
🔊 Coqui TTS (text-to-speech)
⚡ Node.js Express server + HTML frontend

🚀 Run with Docker only UI part, for full e 2 e to work need to use docker compose.

cd web-va

docker build -t voice-assistant .
docker run -p 3000:3000 --name va voice-assistant
Visit: http://localhost:3000

-  For local build:
npm install
npm start

From root folder use docker compose to start both your project and ollama, which will use ollama llm image

docker-compose up --build

Once containers are up, use below command to pull model in ollama running server inside docker container.
docker exec -it ollama-va ollama pull llama3

Dev Tools Supporting MCP

The following are the main code editors that support the Model Context Protocol. Click the link to visit the official website for more information.

Zed: High-performance collaborative code editor, supports MCP protocol, providing a smooth programming experience. zed.dev

Cursor: AI code editor built on VS Code, supports MCP protocol for context-aware programming. cursor.com

Windsurf: AI code editor from Codeium, integrates MCP protocol to provide intelligent code assistance. windsurf.com

Continue: Open-source AI programming assistant plugin, supports VS Code and JetBrains, compatible with MCP protocol. continue.dev

Trae: AI-driven code editor, supports MCP protocol, focusing on enhancing developer programming experience. trae.ai

View More MCP Dev Tools

Tools

No tools

Comments

Recommend MCP Servers

Tavily MCP Server The Tavily MCP server provides: search, extract, map, crawl tools Real-time web search capabilities through the tavily-search tool Intelligent data extraction from web pages via the tavily-extract tool Powerful web mapping tool that creates a structured map of website Web crawler that systematically explores websites.

MCP Server Chart This is a TypeScript-based MCP server that provides chart generation capabilities. It allows you to create various types of charts through MCP tools. You can also use it in Dify.

GitHub MCP Server MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.

Brave Search MCP Server Web and local search using Brave's Search API

Firecrawl MCP Server Advanced web scraping with JavaScript rendering, PDF support, and smart rate limiting

Context7 MCP LLMs rely on outdated or generic information about the libraries you use. You get:

Slack MCP server Channel management and messaging capabilities

Sequential Thinking MCP Server Dynamic and reflective problem-solving through thought sequences

Fetch MCP Server A Model Context Protocol server that provides web content fetching capabilities.

Playwright MCP A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

View All MCP Servers