For Agents¶
Reference for AI agents installing, maintaining, or extending ClawMux. If you're a human, see the human guide.
System Requirements & Compatibility Check¶
Before installing, verify the target system meets these requirements. Run these checks:
GPU¶
- Required: NVIDIA GPU with at least 4 GB VRAM
- Tested on: RTX 3090 (24 GB)
- CUDA must be available (
nvidia-smishould work)
VRAM Budget¶
| Service | VRAM | RAM | Notes |
|---|---|---|---|
| Whisper STT | ~640 MB | ~360 MB | whisper.cpp with CUDA, base model |
| Kokoro TTS | ~2 GB | ~3 GB | kokoro-fastapi with GPU inference |
| Total | ~3 GB | ~3.5 GB | Plus whatever else is running |
OS¶
- Linux required (tested on Ubuntu 24.04)
- Python 3.10+
- tmux installed (
which tmux)
Claude Code¶
Must be installed and authenticated. The hub spawns Claude sessions with claude --dangerously-skip-permissions.
Tailscale (optional, for remote access)¶
Needed if the user wants to access from a phone, laptop, or another machine. Not required for localhost use.
Whisper STT¶
Check if running:
If not installed, install via VoiceMode:
Or any OpenAI-compatible STT server on port 2022.
Kokoro TTS¶
Check if running:
If not installed:
Or any OpenAI-compatible TTS server on port 8880.
Installation¶
git clone https://github.com/zeulewan/voice-hub.git
cd voice-hub
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Register MCP Server¶
claude mcp add -s user voice-hub -- /path/to/voice-hub/.venv/bin/python /path/to/voice-hub/mcp_server.py
Install Slash Commands¶
cp .claude/commands/voice-hub.md ~/.claude/commands/voice-hub.md
mkdir -p ~/.claude/commands
cp .claude/commands/voice-hub.md ~/.claude/commands/voice-hub.md # for hub mode
Tailscale HTTPS (for remote access)¶
Then access at https://<hostname>.ts.net:3460.
Start the Hub¶
File Map¶
voice-hub/
├── hub.py # Main service — FastAPI, REST API, browser WS, MCP WS, TTS/STT
├── hub_config.py # Constants — ports, timeouts, voice list, service URLs
├── hub_mcp_server.py # Thin MCP server — runs inside each Claude session, proxies converse() to hub
├── session_manager.py # Session lifecycle — tmux spawn/kill, temp dirs, health checks, timeout loop
├── history_store.py # Per-voice persistent message history (JSON files in data/history/)
├── mcp_server.py # Legacy single-session MCP server (not used by hub)
├── static/
│ ├── hub.html # Hub browser UI — single file (HTML + CSS + JS)
│ └── index.html # Legacy single-session browser UI
├── data/
│ └── history/ # Per-voice JSON history files (gitignored)
│ ├── af_sky.json
│ └── ...
├── docs/
│ ├── agents/
│ │ ├── index.md # Agent docs landing page
│ │ ├── web-dev.md # Web development entry point
│ │ ├── ios-dev.md # iOS development entry point
│ │ └── reference/
│ │ ├── agent-reference.md # This file
│ │ ├── protocol.md # WebSocket protocol reference
│ │ ├── ui-behavior.md # UI behavior reference
│ │ ├── hub.md # Hub architecture
│ │ └── orchestration.md # Sub-agent orchestration details
│ ├── humans/
│ │ └── index.md # Human-friendly guide
│ └── roadmap/
│ ├── v0.3.0.md # Current release
│ └── v0.4.0.md # Next release
└── .claude/
├── commands/
│ └── voice-hub.md # Slash command for direct voice mode
└── skills/voice-hub/skill.md
Core Flow¶
Session Spawn (session_manager.py:spawn_session)¶
- Allocate unique session ID:
voice-{counter}-{uuid6} - Pick next unused voice from
hub_config.VOICES - Create temp dir at
/tmp/voice-hub-sessions/{session_id}/ - Write
.mcp.jsonwithVOICE_HUB_SESSION_IDandVOICE_CHAT_HUB_PORTenv vars - Write
CLAUDE.mdwith agent name, greeting, and conversation history fromhistory_store tmux new-sessionstarting in the temp dirtmux send-keysto launchclaude --dangerously-skip-permissions- Wait 10s, then send
/voice-hubslash command - Poll for MCP WebSocket connection (45s timeout)
- Session status →
ready
Converse Flow (hub.py:handle_converse)¶
- Receive
{"type": "converse", "message": "...", "wait_for_response": true}from MCP WS - Send
assistant_textto browser (for chat transcript) and persist tohistory_store - TTS via Kokoro (
hub.py:tts) using session's voice and speed - Send base64 MP3 to browser tagged with
session_id - Wait for
playback_donefrom browser (viasession.playback_doneasyncio.Event) - If
wait_for_response=false: sendsession_ended, return - Send
listeningto browser - Wait for audio from browser (via
session.audio_queue) - If empty audio (muted): return
"(session muted)" - STT via Whisper (
hub.py:stt) - Send
user_textto browser, return text to MCP server
Browser State Machine (static/hub.html)¶
Key JS state variables:
sessions(Map) — session_id →{label, status, voice, speed, messages[], audioBuffer[], pausedAudio, pendingListen, hasUnread}activeSessionId— currently visible session (null = voice grid)recording/recordingSessionId— mic statecurrentAudio—{audio, sessionId, url}for playing audiocurrentBufferedPlayer— active buffered playback chainpersistentStream— reusable MediaStream (acquired once)autoMode/vadEnabled/micMuted— toggle states
Main button states managed by updateMicUI():
- Playing → "Interrupt" (orange)
- Recording → "Send" (green) + Cancel visible
- Idle → "Record" (blue)
WebSocket Endpoints¶
| Endpoint | Who connects | Purpose |
|---|---|---|
GET /ws |
Browser (single connection) | All browser ↔ hub communication |
GET /mcp/{session_id} |
hub_mcp_server.py instances | Per-session MCP ↔ hub communication |
REST Endpoints¶
| Method | Path | Body | Purpose |
|---|---|---|---|
GET |
/api/sessions |
— | List all sessions |
POST |
/api/sessions |
{"voice": "am_adam"} |
Spawn session (voice optional) |
DELETE |
/api/sessions/{id} |
— | Terminate session |
PUT |
/api/sessions/{id}/voice |
{"voice": "am_adam"} |
Change voice |
PUT |
/api/sessions/{id}/speed |
{"speed": 1.5} |
Change TTS speed |
GET |
/api/history/{voice_id} |
— | Get per-voice message history |
DELETE |
/api/history/{voice_id} |
— | Clear per-voice message history |
Per-Session Bridge State (session_manager.py:Session)¶
Each session has asyncio primitives for hub ↔ browser synchronization:
audio_queue(asyncio.Queue) — browser sends recorded audio hereplayback_done(asyncio.Event) — set when browser signals playback finishedmcp_ws— WebSocket connection to hub_mcp_server.py
Config (hub_config.py)¶
| Constant | Default | Env var | Purpose |
|---|---|---|---|
HUB_PORT |
3460 | VOICE_CHAT_HUB_PORT |
Hub listen port |
WHISPER_URL |
http://127.0.0.1:2022 |
VOICE_CHAT_WHISPER_URL |
Whisper STT endpoint |
KOKORO_URL |
http://127.0.0.1:8880 |
VOICE_CHAT_KOKORO_URL |
Kokoro TTS endpoint |
SESSION_TIMEOUT_MINUTES |
30 | VOICE_CHAT_TIMEOUT |
Idle session timeout |
VOICES |
7 entries | — | Voice rotation list |
Debugging¶
# Hub logs (all sessions)
tail -f /tmp/voice-hub.log
# MCP server logs (all sessions share one log)
tail -f /tmp/voice-hub-mcp.log
# List tmux sessions
tmux ls
# Attach to a session to see Claude's output
tmux attach -t voice-1-abc123
# Check session temp dirs
ls /tmp/voice-hub-sessions/
# Kill a stuck session manually
tmux kill-session -t voice-1-abc123