mcp-realtime-voice
mcp-realtime-voiceは、リアルタイムで音声を処理するためのPythonライブラリです。音声認識や音声合成機能を提供し、APIを通じて他のアプリケーションと連携できます。特に、音声チャットや音声アシスタントの開発に適しています。
GitHubスター
2
ユーザー評価
未評価
お気に入り
0
閲覧数
22
フォーク
1
イシュー
0
README
MCP Realtime Voice
Turn Claude into a voice assistant that listens and speaks. This MCP server handles speech recognition and text-to-speech for a natural voice interface.
⚠️ IMPORTANT:
- This has only been tested on macOS.
- Launch Claude from the terminal with the command below instead of clicking the app icon. This fixes microphone permission issues.
Features
- Speech recognition with silence detection
- Text-to-speech for AI responses
- Voice Activity Detection using Silero
- Works on Windows, macOS, and Linux
- Audio device management
- Simple voice conversation interface
Prerequisites
- Python 3.8+
- A microphone and speakers
- MCP client (Claude)
Installation
Clone this repo:
git clone https://github.com/yourusername/mcp-realtime-voice.git cd mcp-realtime-voice
Set up a virtual environment:
python -m venv venv # On Windows venv\Scripts\activate # On macOS/Linux source venv/bin/activate
Install dependencies:
pip install -r requirements.txt
System dependencies:
- Ubuntu/Debian:
sudo apt-get install portaudio19-dev
- macOS:
brew install portaudio
- Ubuntu/Debian:
Usage
Connecting to Claude
Launch Claude from the terminal:
# On macOS /Applications/Claude.app/Contents/MacOS/Claude # On Windows # Use the path to your Claude executable start "" "C:\Path\to\Claude.exe"
Install the MCP server:
mcp install voice_server.py --name "Realtime Voice"
Or test with the MCP Inspector:
mcp dev voice_server.py
Available Tools
- list_audio_devices: Shows all audio input/output devices
- listen_for_speech: Records and transcribes speech
- speak_text: Converts text to spoken audio
- voice_mode: Starts interactive voice conversation
Voice Conversation Mode
To start:
- Connect the MCP server to Claude
- Ask Claude to "enter voice mode"
- Start talking - the system will:
- Listen for your speech
- Detect when you finish speaking
- Send transcribed text to Claude
- Speak Claude's response
To exit, just say "exit voice mode" or "stop voice mode".
Configuration
Edit these values in voice_server.py
if needed:
- VAD_THRESHOLD: Voice detection sensitivity (default: 0.2)
- SILENCE_DURATION: Seconds of silence before recording stops (default: 3)
- Audio sample rate and format settings