For Agents¶

Reference for AI agents installing, maintaining, or extending ClawMux. If you're a human, see the human guide.

System Requirements & Compatibility Check¶

Before installing, verify the target system meets these requirements. Run these checks:

GPU¶

nvidia-smi --query-gpu=name,memory.total --format=csv,noheader

Required: NVIDIA GPU with at least 4 GB VRAM
Tested on: RTX 3090 (24 GB)
CUDA must be available (nvidia-smi should work)

VRAM Budget¶

Service	VRAM	RAM	Notes
Whisper STT	~640 MB	~360 MB	whisper.cpp with CUDA, `base` model
Kokoro TTS	~2 GB	~3 GB	kokoro-fastapi with GPU inference
Total	~3 GB	~3.5 GB	Plus whatever else is running

OS¶

Linux required (tested on Ubuntu 24.04)
Python 3.10+
tmux installed (which tmux)

Claude Code¶

claude --version

Must be installed and authenticated. The hub spawns Claude sessions with claude --dangerously-skip-permissions.

Tailscale (optional, for remote access)¶

tailscale status

Needed if the user wants to access from a phone, laptop, or another machine. Not required for localhost use.

Whisper STT¶

Check if running:

curl -s http://127.0.0.1:2022/v1/models | head -c 200

If not installed, install via VoiceMode:

uvx voice-mode-install --yes
voicemode whisper install
voicemode whisper start

Or any OpenAI-compatible STT server on port 2022.

Kokoro TTS¶

Check if running:

curl -s http://127.0.0.1:8880/v1/models | head -c 200

If not installed:

voicemode kokoro install
voicemode kokoro start

Or any OpenAI-compatible TTS server on port 8880.

Installation¶

git clone https://github.com/zeulewan/voice-hub.git
cd voice-hub
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Register MCP Server¶

claude mcp add -s user voice-hub -- /path/to/voice-hub/.venv/bin/python /path/to/voice-hub/mcp_server.py

Install Slash Commands¶

cp .claude/commands/voice-hub.md ~/.claude/commands/voice-hub.md
mkdir -p ~/.claude/commands
cp .claude/commands/voice-hub.md ~/.claude/commands/voice-hub.md  # for hub mode

Tailscale HTTPS (for remote access)¶

sudo tailscale serve --bg --https=3460 http://127.0.0.1:3460

Then access at https://<hostname>.ts.net:3460.

Start the Hub¶

cd /path/to/voice-hub
source .venv/bin/activate
python hub.py

File Map¶

voice-hub/
├── hub.py                  # Main service — FastAPI, REST API, browser WS, MCP WS, TTS/STT
├── hub_config.py           # Constants — ports, timeouts, voice list, service URLs
├── hub_mcp_server.py       # Thin MCP server — runs inside each Claude session, proxies converse() to hub
├── session_manager.py      # Session lifecycle — tmux spawn/kill, temp dirs, health checks, timeout loop
├── history_store.py        # Per-voice persistent message history (JSON files in data/history/)
├── mcp_server.py           # Legacy single-session MCP server (not used by hub)
├── static/
│   ├── hub.html            # Hub browser UI — single file (HTML + CSS + JS)
│   └── index.html          # Legacy single-session browser UI
├── data/
│   └── history/            # Per-voice JSON history files (gitignored)
│       ├── af_sky.json
│       └── ...
├── docs/
│   ├── agents/
│   │   ├── index.md        # Agent docs landing page
│   │   ├── web-dev.md      # Web development entry point
│   │   ├── ios-dev.md      # iOS development entry point
│   │   └── reference/
│   │       ├── agent-reference.md  # This file
│   │       ├── protocol.md        # WebSocket protocol reference
│   │       ├── ui-behavior.md     # UI behavior reference
│   │       ├── hub.md             # Hub architecture
│   │       └── orchestration.md   # Sub-agent orchestration details
│   ├── humans/
│   │   └── index.md        # Human-friendly guide
│   └── roadmap/
│       ├── v0.3.0.md       # Current release
│       └── v0.4.0.md       # Next release
└── .claude/
    ├── commands/
    │   └── voice-hub.md   # Slash command for direct voice mode
    └── skills/voice-hub/skill.md

Core Flow¶

Session Spawn (`session_manager.py:spawn_session`)¶

Allocate unique session ID: voice-{counter}-{uuid6}
Pick next unused voice from hub_config.VOICES
Create temp dir at /tmp/voice-hub-sessions/{session_id}/
Write .mcp.json with VOICE_HUB_SESSION_ID and VOICE_CHAT_HUB_PORT env vars
Write CLAUDE.md with agent name, greeting, and conversation history from history_store
tmux new-session starting in the temp dir
tmux send-keys to launch claude --dangerously-skip-permissions
Wait 10s, then send /voice-hub slash command
Poll for MCP WebSocket connection (45s timeout)
Session status → ready

Converse Flow (`hub.py:handle_converse`)¶

Receive {"type": "converse", "message": "...", "wait_for_response": true} from MCP WS
Send assistant_text to browser (for chat transcript) and persist to history_store
TTS via Kokoro (hub.py:tts) using session's voice and speed
Send base64 MP3 to browser tagged with session_id
Wait for playback_done from browser (via session.playback_done asyncio.Event)
If wait_for_response=false: send session_ended, return
Send listening to browser
Wait for audio from browser (via session.audio_queue)
If empty audio (muted): return "(session muted)"
STT via Whisper (hub.py:stt)
Send user_text to browser, return text to MCP server

Browser State Machine (`static/hub.html`)¶

Key JS state variables:

sessions (Map) — session_id → {label, status, voice, speed, messages[], audioBuffer[], pausedAudio, pendingListen, hasUnread}
activeSessionId — currently visible session (null = voice grid)
recording / recordingSessionId — mic state
currentAudio — {audio, sessionId, url} for playing audio
currentBufferedPlayer — active buffered playback chain
persistentStream — reusable MediaStream (acquired once)
autoMode / vadEnabled / micMuted — toggle states

Main button states managed by updateMicUI():

Playing → "Interrupt" (orange)
Recording → "Send" (green) + Cancel visible
Idle → "Record" (blue)

WebSocket Endpoints¶

Endpoint	Who connects	Purpose
`GET /ws`	Browser (single connection)	All browser ↔ hub communication
`GET /mcp/{session_id}`	hub_mcp_server.py instances	Per-session MCP ↔ hub communication

REST Endpoints¶

Method	Path	Body	Purpose
`GET`	`/api/sessions`	—	List all sessions
`POST`	`/api/sessions`	`{"voice": "am_adam"}`	Spawn session (voice optional)
`DELETE`	`/api/sessions/{id}`	—	Terminate session
`PUT`	`/api/sessions/{id}/voice`	`{"voice": "am_adam"}`	Change voice
`PUT`	`/api/sessions/{id}/speed`	`{"speed": 1.5}`	Change TTS speed
`GET`	`/api/history/{voice_id}`	—	Get per-voice message history
`DELETE`	`/api/history/{voice_id}`	—	Clear per-voice message history

Per-Session Bridge State (`session_manager.py:Session`)¶

Each session has asyncio primitives for hub ↔ browser synchronization:

audio_queue (asyncio.Queue) — browser sends recorded audio here
playback_done (asyncio.Event) — set when browser signals playback finished
mcp_ws — WebSocket connection to hub_mcp_server.py

Config (`hub_config.py`)¶

Constant	Default	Env var	Purpose
`HUB_PORT`	3460	`VOICE_CHAT_HUB_PORT`	Hub listen port
`WHISPER_URL`	`http://127.0.0.1:2022`	`VOICE_CHAT_WHISPER_URL`	Whisper STT endpoint
`KOKORO_URL`	`http://127.0.0.1:8880`	`VOICE_CHAT_KOKORO_URL`	Kokoro TTS endpoint
`SESSION_TIMEOUT_MINUTES`	30	`VOICE_CHAT_TIMEOUT`	Idle session timeout
`VOICES`	7 entries	—	Voice rotation list

Debugging¶

# Hub logs (all sessions)
tail -f /tmp/voice-hub.log

# MCP server logs (all sessions share one log)
tail -f /tmp/voice-hub-mcp.log

# List tmux sessions
tmux ls

# Attach to a session to see Claude's output
tmux attach -t voice-1-abc123

# Check session temp dirs
ls /tmp/voice-hub-sessions/

# Kill a stuck session manually
tmux kill-session -t voice-1-abc123