mcp-realtime-voice

Name: mcp-realtime-voice
Availability: InStock
Author: Helw150

mcp-realtime-voiceは、リアルタイムで音声を処理するためのPythonライブラリです。音声認識や音声合成機能を提供し、APIを通じて他のアプリケーションと連携できます。特に、音声チャットや音声アシスタントの開発に適しています。

GitHub

GitHubスター

ユーザー評価

未評価

お気に入り

閲覧数

フォーク

イシュー

README

MCP Realtime Voice

Turn Claude into a voice assistant that listens and speaks. This MCP server handles speech recognition and text-to-speech for a natural voice interface.

⚠️ IMPORTANT:

This has only been tested on macOS.

Launch Claude from the terminal with the command below instead of clicking the app icon. This fixes microphone permission issues.

Features

Speech recognition with silence detection
Text-to-speech for AI responses
Voice Activity Detection using Silero
Works on Windows, macOS, and Linux
Audio device management
Simple voice conversation interface

Prerequisites

Python 3.8+
A microphone and speakers
MCP client (Claude)

Installation

Clone this repo:

git clone https://github.com/yourusername/mcp-realtime-voice.git
cd mcp-realtime-voice

Set up a virtual environment:

python -m venv venv

# On Windows
venv\Scripts\activate

# On macOS/Linux
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
System dependencies:
- Ubuntu/Debian: sudo apt-get install portaudio19-dev
- macOS: brew install portaudio

Usage

Connecting to Claude

Launch Claude from the terminal:

# On macOS
/Applications/Claude.app/Contents/MacOS/Claude

# On Windows
# Use the path to your Claude executable
start "" "C:\Path\to\Claude.exe"

Install the MCP server:

mcp install voice_server.py --name "Realtime Voice"

Or test with the MCP Inspector:

mcp dev voice_server.py

Available Tools

list_audio_devices: Shows all audio input/output devices
listen_for_speech: Records and transcribes speech
speak_text: Converts text to spoken audio
voice_mode: Starts interactive voice conversation

Voice Conversation Mode

To start:

Connect the MCP server to Claude
Ask Claude to "enter voice mode"
Start talking - the system will:
- Listen for your speech
- Detect when you finish speaking
- Send transcribed text to Claude
- Speak Claude's response

To exit, just say "exit voice mode" or "stop voice mode".

Configuration

Edit these values in voice_server.py if needed:

VAD_THRESHOLD: Voice detection sensitivity (default: 0.2)
SILENCE_DURATION: Seconds of silence before recording stops (default: 3)
Audio sample rate and format settings

作者情報

Helw150

GitHub

フォロワー

リポジトリ

Gist

貢献数