WebSocket Protocol¶

Complete message reference for browser and MCP clients connecting to the hub.

Endpoints¶

Endpoint	Who connects	Purpose
`ws(s)://{host}:{port}/ws`	Browser (single connection, last-wins)	All browser-hub communication
`ws://{host}:{port}/mcp/{session_id}`	`hub_mcp_server.py` (one per session)	Per-session MCP-hub communication

Client WebSocket (`/ws`)¶

Multiple clients can connect simultaneously (browser, iOS app, etc.). All receive the same messages. When the last client disconnects, all sessions with pending playback_done waits are unblocked.

On Connect¶

Hub sends the full session list:

{"type": "session_list", "sessions": [<session_object>, ...]}

Browser → Hub¶

All messages must include session_id:

Type	Fields	Description
`playback_done`	`session_id`	Audio playback finished, hub can proceed to listen phase
`audio`	`session_id`, `data` (base64 webm)	Recorded user audio. Empty `data` = muted/cancelled

{"session_id": "voice-1-abc123", "type": "playback_done"}
{"session_id": "voice-1-abc123", "type": "audio", "data": "base64..."}

Hub → Browser (Session-scoped)¶

All include session_id:

Type	Fields	Description
`audio`	`data` (base64 MP3)	TTS audio to play
`assistant_text`	`text`	Claude's spoken text for chat display
`user_text`	`text`	Transcribed user speech for chat display
`listening`	—	Browser should start recording (or queue pending listen)
`status`	`text`	Status update (e.g. "Speaking...", "Transcribing...")
`done`	—	Converse turn complete, session returns to ready
`session_ended`	—	Agent said goodbye, session will terminate

{"session_id": "voice-1-abc123", "type": "audio", "data": "base64mp3..."}
{"session_id": "voice-1-abc123", "type": "assistant_text", "text": "Hello!"}
{"session_id": "voice-1-abc123", "type": "user_text", "text": "Hi there"}
{"session_id": "voice-1-abc123", "type": "listening"}
{"session_id": "voice-1-abc123", "type": "status", "text": "Speaking..."}
{"session_id": "voice-1-abc123", "type": "done"}
{"session_id": "voice-1-abc123", "type": "session_ended"}

Hub → Browser (Hub-level)¶

No session_id:

Type	Fields	Description
`session_list`	`sessions` (array of session objects)	Full session list, sent on connect
`session_status`	`session_id`, `status`	Session state changed (e.g. "ready")
`project_status`	`session_id`, `project`, `area`	Agent's current project context updated
`session_terminated`	`session_id`	Session was terminated
`ping`	—	Heartbeat every 30s. Clients should ignore (no response needed).

{"type": "session_list", "sessions": [{...}, {...}]}
{"type": "session_status", "session_id": "voice-1-abc123", "status": "ready"}
{"type": "project_status", "session_id": "voice-1-abc123", "project": "voice-hub", "area": "frontend"}
{"type": "session_terminated", "session_id": "voice-1-abc123"}
{"type": "ping"}

MCP WebSocket (`/mcp/{session_id}`)¶

One connection per session. The hub_mcp_server.py instance connects here after being spawned by Claude Code.

MCP Server → Hub¶

Type	Fields	Description
`converse`	`message`, `wait_for_response` (bool), `voice` (ignored)	Speak and optionally listen
`set_project_status`	`project`, `area` (optional)	Update sidebar with current project context
`status_check`	—	Check if browser is connected

{"type": "converse", "message": "Hello!", "wait_for_response": true, "voice": "af_sky"}
{"type": "set_project_status", "project": "voice-hub", "area": "frontend"}
{"type": "status_check"}

Note: The voice field in converse is ignored by the hub. The hub uses the session's voice setting from the browser UI.

Hub → MCP Server¶

Type	Fields	Description
`converse_result`	`text`	User's transcribed speech, or status string
`status_result`	`connected` (bool)	Whether browser is connected

{"type": "converse_result", "text": "User said something"}
{"type": "status_result", "connected": true}

`converse_result` values¶

Value	Meaning
User's speech	Normal transcribed text
`(no speech detected)`	STT returned empty text
`Message delivered.`	`wait_for_response` was false, audio played
`Error: ...`	Something went wrong

Note: (session muted) is no longer returned. The hub now retries internally when audio is empty or clients disconnect — the converse call blocks until real audio arrives or a client reconnects.

Session Object¶

Returned by REST API and included in session_list:

{
  "session_id": "voice-1-abc123",
  "tmux_session": "voice-1-abc123",
  "status": "ready",
  "created_at": 1739980000.0,
  "last_activity": 1739980100.0,
  "label": "Sky",
  "voice": "af_sky",
  "speed": 1.0,
  "mcp_connected": true,
  "status_text": ""
}

Field	Type	Description
`session_id`	string	Unique ID (`voice-{n}-{uuid6}`)
`tmux_session`	string	tmux session name (same as session_id)
`status`	string	`starting`, `ready`, `active`, or `dead`
`created_at`	float	Unix timestamp
`last_activity`	float	Unix timestamp of last converse or browser interaction
`label`	string	Display name (voice name)
`voice`	string	Kokoro voice ID
`speed`	float	TTS speed multiplier
`mcp_connected`	bool	Whether the MCP server WebSocket is connected
`status_text`	string	Current activity: `"Speaking..."`, `"Listening..."`, `"Transcribing..."`, `"Waiting for client..."`, or `""` (idle)
`project`	string	Current project/repo name (set by agent via `set_project_status`)
`project_area`	string	Current sub-area (e.g. `"frontend"`, `"docs"`)

REST API¶

Method	Path	Body	Response
`GET`	`/api/sessions`	—	`[<session_object>, ...]`
`POST`	`/api/sessions`	`{"voice": "am_adam"}` (optional)	`<session_object>`
`DELETE`	`/api/sessions/{id}`	—	`{"status": "terminated"}`
`PUT`	`/api/sessions/{id}/voice`	`{"voice": "am_adam"}`	`{"voice": "am_adam"}`
`PUT`	`/api/sessions/{id}/speed`	`{"speed": 1.5}`	`{"speed": 1.5}`
`GET`	`/api/history/{voice_id}`	—	`{"voice_id": "...", "messages": [{role, text, ts}, ...]}`
`DELETE`	`/api/history/{voice_id}`	—	`{"status": "cleared", "voice_id": "..."}`
`GET`	`/api/settings`	—	`{"model": "opus", "auto_record": false, "auto_end": true, "auto_interrupt": false}`
`PUT`	`/api/settings`	`{"model": "haiku"}` (partial update)	Full settings object
`GET`	`/api/debug`	—	Hub info, sessions, tmux, services
`GET`	`/api/debug/log`	—	`{"lines": [...]}` (last 50 hub log lines)

Converse Flow¶

The full sequence for one converse() call:

MCP Server                    Hub                         Browser
    |                          |                              |
    |-- converse(msg) -------->|                              |
    |                          |-- assistant_text ----------->|
    |                          |-- status "Speaking..." ----->|
    |                          |-- [TTS via Kokoro] --------->|
    |                          |-- audio (base64 MP3) ------->|
    |                          |                              |-- plays audio
    |                          |<-------- playback_done ------|
    |                          |-- listening ---------------->|
    |                          |                              |-- records audio
    |                          |<-------- audio (base64) -----|
    |                          |-- [STT via Whisper] -------->|
    |                          |-- user_text ---------------->|
    |                          |-- done --------------------->|
    |<-- converse_result ------|                              |

If wait_for_response=false, the flow ends after audio playback with done + session_ended.

Resilience¶

The hub handles client disconnects gracefully during converse:

No clients when sending audio — playback_done is auto-set, flow continues to listen phase
Client disconnects during playback — playback_done is set by the disconnect handler
Audio arrives during playback wait — the hub skips the playback_done wait and uses the audio immediately (supports device switching mid-flow)
No clients during listen phase — hub waits for a client to reconnect, then re-sends listening
Empty/muted audio — hub retries listening instead of returning to Claude
listening re-sent every 5 seconds — ensures newly connected clients pick up the pending listen