mcp-realtime-voice
mcp-realtime-voice is a Python library designed for real-time audio processing. It offers functionalities for speech recognition and synthesis, allowing integration with other applications via an API. It is particularly suitable for developing voice chat applications and voice assistants.
GitHub Stars
2
User Rating
Not Rated
Favorites
0
Views
23
Forks
1
Issues
0
MCP Realtime Voice
Turn Claude into a voice assistant that listens and speaks. This MCP server handles speech recognition and text-to-speech for a natural voice interface.
⚠️ IMPORTANT:
- This has only been tested on macOS.
- Launch Claude from the terminal with the command below instead of clicking the app icon. This fixes microphone permission issues.
Features
- Speech recognition with silence detection
- Text-to-speech for AI responses
- Voice Activity Detection using Silero
- Works on Windows, macOS, and Linux
- Audio device management
- Simple voice conversation interface
Prerequisites
- Python 3.8+
- A microphone and speakers
- MCP client (Claude)
Installation
Clone this repo:
git clone https://github.com/yourusername/mcp-realtime-voice.git cd mcp-realtime-voice
Set up a virtual environment:
python -m venv venv # On Windows venv\Scripts\activate # On macOS/Linux source venv/bin/activate
Install dependencies:
pip install -r requirements.txt
System dependencies:
- Ubuntu/Debian:
sudo apt-get install portaudio19-dev
- macOS:
brew install portaudio
- Ubuntu/Debian:
Usage
Connecting to Claude
Launch Claude from the terminal:
# On macOS /Applications/Claude.app/Contents/MacOS/Claude # On Windows # Use the path to your Claude executable start "" "C:\Path\to\Claude.exe"
Install the MCP server:
mcp install voice_server.py --name "Realtime Voice"
Or test with the MCP Inspector:
mcp dev voice_server.py
Available Tools
- list_audio_devices: Shows all audio input/output devices
- listen_for_speech: Records and transcribes speech
- speak_text: Converts text to spoken audio
- voice_mode: Starts interactive voice conversation
Voice Conversation Mode
To start:
- Connect the MCP server to Claude
- Ask Claude to "enter voice mode"
- Start talking - the system will:
- Listen for your speech
- Detect when you finish speaking
- Send transcribed text to Claude
- Speak Claude's response
To exit, just say "exit voice mode" or "stop voice mode".
Configuration
Edit these values in voice_server.py
if needed:
- VAD_THRESHOLD: Voice detection sensitivity (default: 0.2)
- SILENCE_DURATION: Seconds of silence before recording stops (default: 3)
- Audio sample rate and format settings
🤩 Easy-to-use global IM bot platform designed for the LLM era / 简单易用的大模型即时通信机器人开发平台 ⚡️ Bots for QQ / QQ频道 / Discord / WeChat(微信)/ Telegram / 飞书 / 钉钉 / Slack 🧩 Integrated with ChatGPT(GPT)、DeepSeek、Dify、n8n、Claude、Google Gemini、xAI、PPIO、Ollama、阿里云百炼、SiliconFlow、Qwen、Moonshot(Kimi K2)、SillyTraven、MCP、WeClone etc. LLM & Agent & RAG