UI Behavior Reference¶
Detailed reference for every state, button, toggle, and audio behavior in the ClawMux browser UI.
Pages¶
The UI has three views, controlled by the tab bar:
| View | Tab | Description |
|---|---|---|
| Home | Home (always visible, left side) | Voice card grid showing all voices and their status |
| Session | Per-session tab (created on spawn) | Chat transcript + controls for one agent |
| Debug | Debug (always visible, right side) | Hub internals, services, tmux, logs |
Only one view is visible at a time. The Home tab is active by default on page load.
Voice Cards (Home Page)¶
Each of the 7 voices (Sky, Alloy, Sarah, Adam, Echo, Onyx, Fable) gets a card. Cards show real-time status.
Card States¶
| State | Dot Color | Label | Trigger |
|---|---|---|---|
| Available | Grey | "Available" | No session exists for this voice |
| Spawning | Yellow | "Spawning..." | User clicked the card, REST call in progress |
| Starting | Yellow | "Starting..." | Session created, Claude booting, MCP not yet connected |
| Ready | Green | "Ready" | MCP connected, session idle |
| Thinking | Orange (pulsing) | "Thinking..." | Agent is processing (between user send and agent response) |
| Speaking | Blue | "Speaking..." | Agent audio is playing (or buffered for background session) |
| Listening | Red | "Listening..." | Session is recording or waiting for Record tap |
| Waiting | Red | "Waiting..." | Background session needs mic input but isn't focused |
Card Interactions¶
- Click an Available card — Spawns a new session with that voice. Card immediately shows "Spawning..." and won't accept double-clicks.
- Click a connected card — Switches to that session's tab.
Session Tabs¶
Each active session gets a tab in the tab bar between Home and Debug.
Tab Elements¶
- Status dot — Color matches session state (green=ready, yellow=starting, blue=active)
- Label — Voice display name (e.g. "Sky", "Adam")
- Badge (!) — Red badge, appears when a background session has buffered audio or is waiting for mic input
- Close (x) — Terminates the session (kills tmux, cleans up temp dir)
Tab Switching Behavior¶
When switching away from a session:
- Currently playing audio pauses and is saved on the session for resume
- Any buffered playback chain stops (remaining chunks stay in buffer)
- Active recording is discarded (not sent)
- Thinking sound stops
- Pending listen request is cleared
When switching to a session:
- If the session had paused audio, it resumes playback from where it stopped
- If the session has buffered audio (received while in background), it plays all chunks in sequence
- If the session has a pending listen request:
- Mic muted → sends silent audio automatically
- Auto Record on → starts recording immediately
- Auto Record off → shows "Tap Record" status, waits for manual click
- Tab badge clears on switch
- Thinking sound resumes if the session is in thinking state
- Voice and speed dropdowns update to match the session's settings
- Tmux session name appears in the bottom bar
Main Button¶
The center button cycles through states based on context:
| Button State | Color | Label | When |
|---|---|---|---|
| Record | Blue | "Record" | Idle, ready for input |
| Send | Green | "Send" | Currently recording |
| Interrupt | Orange | "Interrupt" | Audio is playing |
| Processing | Grey | "Processing..." | Audio sent, waiting for Claude |
Button Click Actions¶
- Record → Starts recording. If there's a pending listen from the hub, records for that request. Otherwise starts a freeform recording for the active session.
- Send → Stops recording and sends audio to the hub for STT.
- Interrupt → Immediately stops audio playback, sends
playback_doneto the hub so it can proceed to the listening phase. Status resets to "Ready". - Processing → Disabled (no click action). Button shows this state after audio is sent until Claude responds.
Cancel Button (X)¶
Visible only during recording (next to the main button). Discards the recording and sends silent audio to the hub so it doesn't hang waiting for input.
Toggles¶
Auto Record¶
Default: off. When enabled, recording starts automatically after Claude finishes speaking (after playback_done). When disabled, the user must click Record manually — the status shows "Tap Record" and a listening cue plays.
Auto End (VAD)¶
Default: on. Voice Activity Detection. When enabled during recording, monitors the mic for silence. After detecting speech followed by 3 seconds of silence (RMS < 10), automatically stops recording and sends the audio. The Send button is always available for early manual send.
VAD constants:
SILENCE_THRESHOLD = 10— RMS level below which counts as silenceSILENCE_DURATION = 3000— Milliseconds of continuous silence before auto-stop- Only triggers after speech has been detected (won't auto-stop on initial silence)
Auto Interrupt¶
Default: off. When enabled, monitors the mic during audio playback for speech. If sustained speech is detected (300ms above threshold), automatically interrupts playback and starts recording. Designed for natural conversation flow — speak over Claude to interrupt.
Playback VAD constants:
SPEECH_THRESHOLD = 25— Higher than Auto End's threshold to avoid speaker audio bleeding into the mic and causing false triggersSPEECH_DURATION = 300— Milliseconds of sustained speech before interrupting- Check interval: 50ms (more responsive than Auto End's 100ms)
- Only active when Auto Interrupt is on AND mic is not muted AND audio is playing
Mic Mute¶
Global toggle (not per-session). When muted:
- The Mic button shows "Muted" with a red border
- Persistent mic stream tracks are disabled
- Any session that requests mic input receives silent audio automatically
- The hub treats empty audio as "(session muted)" and Claude gets that text
Audio Behavior¶
Focused Session¶
When the active session tab is focused:
- TTS audio plays immediately through the browser
- After playback, hub sends
listeningand the browser either auto-records or waits for Record click - Thinking sound plays — Soft double-tick pattern (1200Hz + 900Hz tones, every 800ms) while Claude is processing
- Thinking indicator shows — Three pulsing dots in the chat transcript
Background Session¶
When a session is NOT the active tab:
- Audio is buffered — TTS audio chunks are stored in
s.audioBufferinstead of playing. Badge (!) appears on the tab. - Listen requests are deferred — Session is marked with
pendingListen = true, badge appears. Listen activates when user switches to that tab. - Thinking sound does NOT play — Only plays for the focused session
- Thinking indicator is tracked —
s.isThinkingis set, so the indicator appears immediately when switching to the tab - Status text updates —
s.statusTextis kept current even for background sessions so the Home page voice cards show accurate state
Tab Switch Audio Resume¶
When switching to a session with buffered or paused audio:
- Paused audio (was playing when you switched away) — resumes from the pause point
- Buffered audio (received while in background) — plays all chunks sequentially, then sends
playback_done - Both paths support Auto Interrupt if enabled
Audio Cues¶
Short tones played via Web Audio API:
| Cue | Sound | When |
|---|---|---|
| Listening | Ascending two-tone (660Hz → 880Hz) | Hub requests mic input |
| Processing | Single soft low tone (440Hz) | Audio sent to hub |
| Session ready | Three-note chime (C5 → E5 → G5) | MCP connects, session becomes ready |
| Thinking | Double-tick (1200Hz + 900Hz, repeating) | Claude is processing (focused session only) |
Chat Transcript¶
Each session has its own message history, persisted to localStorage.
Message Types¶
| Type | Alignment | Style | Source |
|---|---|---|---|
| User | Right | Blue bubble | user_text from hub (after STT) |
| Assistant | Left | Dark bubble | assistant_text from hub (before TTS) |
| System | Center | Grey, no bubble | Session events (connected, ended) |
| Thinking | Left | Dark bubble with pulsing dots | Shown while Claude is processing |
Thinking Indicator¶
Three animated dots that pulse in sequence. Appears after user sends audio (when user_text is received from hub). Disappears when assistant_text or done is received. Survives tab switches — if you switch away and back, the dots reappear if the session is still thinking.
Session Lifecycle¶
Spawn Flow¶
- User clicks voice card or "+ New Session"
- Card shows "Spawning..." immediately (before server responds)
- REST
POST /api/sessionscreates tmux session - Tab appears, switches to session view, shows "Waiting for Claude..."
- Claude boots in tmux, MCP server connects to hub
- Hub sends
session_status: ready→ tab dot turns green, "Claude connected." system message, ready chime plays - Hub sends
/voice-hubto tmux → Claude enters voice mode and greets user
Session End¶
Two ways a session ends:
- User closes tab —
DELETE /api/sessions/{id}, kills tmux, removes temp dir - Agent says goodbye — Hub sends
session_ended, "Session ended." system message appears, session auto-terminates after 3 seconds
Inactivity Timeout¶
Sessions auto-terminate after 30 minutes of no activity (configurable via VOICE_CHAT_TIMEOUT). Activity is tracked by session.touch() on converse calls and browser messages.
WebSocket Connection¶
Single WebSocket between browser and hub. Shown in the header:
- Green dot + "Connected" — WebSocket is open
- Red dot + "Disconnected" — WebSocket is closed, auto-reconnects every 2 seconds
On disconnect, all sessions with pending playback_done waits are unblocked so converse() calls don't hang forever.
Debug Panel¶
Shows hub internals with auto-refresh every 5 seconds:
- Hub — Port, uptime, browser connection status, session count
- Services — Whisper and Kokoro URLs and connectivity
- Hub Sessions — ID, voice, status, MCP connected, idle time, age, work directory
- tmux Sessions — All tmux sessions (voice and non-voice), window count, attached status
- Hub Log — Last 50 lines of
/tmp/voice-hub.log
Switching to the Debug tab stops audio and recording from the current session (same cleanup as switching to any other view).
Voice and Speed¶
Voice Selection¶
Per-session dropdown. 7 Kokoro voices available:
| Voice ID | Display Name | Gender |
|---|---|---|
af_sky |
Sky | F |
af_alloy |
Alloy | F |
af_sarah |
Sarah | F |
am_adam |
Adam | M |
am_echo |
Echo | M |
am_onyx |
Onyx | M |
bm_fable |
Fable | - |
When spawning sessions, the hub auto-rotates through unused voices so each session starts with a different voice. Changing the voice updates the tab label and sends a PUT /api/sessions/{id}/voice to the hub.
Speed Selection¶
Per-session dropdown. Options: 0.75x, 1x (default), 1.25x, 1.5x, 2x. Sends PUT /api/sessions/{id}/speed to the hub. Affects Kokoro TTS generation speed.
Persistence¶
- Chat messages — Saved to
localStorageper session. Restored on page reload if the session still exists. - Toggle states — Not persisted. Auto Record defaults to off, Auto End defaults to on, Auto Interrupt defaults to off on each page load.
- Sessions — Hub sends the session list on WebSocket connect, so existing sessions appear on page reload.