Early Releases (v0.0.1 – v0.4.0)¶

v0.0.1 - Initial Release¶

Initial release — standalone FastAPI server using the Claude API directly.

Features¶

Claude API integration — Used the Anthropic Python SDK (anthropic.Anthropic()) to call Claude Sonnet 4.5 directly. Each message was a new API call with conversation history passed in.
FastAPI REST endpoints — Browser communicated via HTTP POST to /api/transcribe, /api/chat, and /api/speak.
Browser client — Hold-to-talk mic button, audio playback, conversation managed client-side in JavaScript.
Whisper STT + Kokoro TTS — GPU-accelerated speech services on localhost.

Limitations¶

API costs — Every message was a direct Claude API call, adding up quickly.
No tool access — Claude could only chat, not use tools or access the filesystem.

Superseded Architecture¶

After v0.0.1, an intermediate approach using claude -p (Claude Code CLI pipe mode) was tried but never committed. Each invocation loaded all skills and MCP servers from scratch, making it extremely slow. This was abandoned in favor of the persistent MCP server in v0.1.0.

v0.1.0 - MCP Rewrite¶

Architecture rewrite — replaced HTTP proxy with MCP server + WebSocket bridge.

Features¶

MCP server architecture — Replaced HTTP proxy with FastMCP stdio server that communicates directly with Claude Code.
WebSocket bridge — Embedded FastAPI + WebSocket server for real-time browser audio exchange.
Logging — Dual logging to stderr and /tmp/clawmux-mcp.log.
Port retry — Server retries every 5 seconds if port is in use.
Integration tests — Test suite for server functionality.

v0.2.0 - Voice Modes¶

Released features and improvements.

Features¶

MCP server + WebSocket bridge — Single-process server with FastMCP stdio transport and embedded FastAPI + WebSocket for browser audio.
Browser client — Vanilla JavaScript UI with tap-to-record mic, connection status indicator, and auto-reconnect.
Whisper STT integration — OpenAI-compatible speech-to-text via whisper.cpp on GPU.
Kokoro TTS integration — OpenAI-compatible text-to-speech via kokoro-fastapi on GPU.
Tailscale remote access — HTTPS + WSS proxy via tailscale serve for access from any device on the tailnet.
No recording timeout — Removed 120-second and 60-second timeouts on recording and playback waits.
Hardware requirements documentation — VRAM and RAM usage table in getting started guide.
Roadmap and project documentation — Structured docs site with Zensical, folder hierarchy, and navigation tabs.

v0.3.0 - Multi-Session¶

Multi-session hub, simplified controls, and background conversation support.

Hub¶

Controls & Recording¶

Simplified button flow — Single main button cycles: Record (blue) → Send (green) → Interrupt (orange) → Processing (grey). Cancel button separate.
Mic mute — Toggle to mute microphone input across all sessions.
Auto Record toggle — Auto-start recording after Claude speaks.
Auto End (VAD) toggle — Voice activity detection auto-stops recording on 1.5s silence. Send button always available for early send.
Interrupt support — Tap Interrupt button during audio playback to stop it immediately. Hub receives playback_done and proceeds to listen.
Cancel recording — Separate cancel button discards audio and sends silence so hub doesn't hang.
Persistent mic stream — Mic permission acquired once and stream reused. Fixes background tab recording (no re-prompt when unfocused).

Multi-Session¶

Background audio buffering — Background sessions buffer TTS audio. Plays on tab switch before resuming conversation.
Pause/resume on tab switch — Switching away pauses audio, switching back resumes from where it stopped.
Background pending listen — If a background session wants mic input, it activates when you switch to that tab.
Stop recording on tab switch — Discards in-progress recording when switching away to prevent cross-session audio.
Tab badge notifications — Badge appears on tabs with pending audio or listen requests.

Audio¶

Audio feedback cues — Ascending tone for listening, soft tone for processing, chime for session ready.
No recording timeout — Removed the 120-second timeout on voice recording.
Per-session speed control — Adjustable TTS speed (0.75x–2x) per session.
Voice interrupt — Interrupt Claude by speaking instead of tapping the button. VAD detects speech during playback, stops audio, and starts recording immediately. Toggle: "Auto Interrupt".
Thinking sounds — Periodic double-tick audio cue via Web Audio API while Claude is processing. Only plays on the focused session tab.
Thinking indicator in chat — Animated pulsing dots in the chat transcript while Claude is processing. Disappears when the agent responds.

Debug¶

Inline debug panel — Debug tab showing hub info, service connectivity, active sessions, tmux sessions, and log tail. Auto-refreshes every 5 seconds.

Branding & Docs¶

Favicon and tab title — Custom SVG mic favicon, tab title "ClawMux".
Human guide — docs/guide/overview.md — what it is, how to use it, controls reference.
Agent reference — docs/agents/agent-reference.md — file map, code pointers, endpoints, debugging.

v0.4.0 - iOS App¶

iOS companion app, browser UI overhaul, and hub reliability fixes.

iOS Companion App¶

Browser UI¶

Visual overhaul — Dark/light mode with CSS custom properties, iOS-style design language, safe area support.
Voice grid landing page — Grid of voice cards with live session status, spawning feedback, and click-to-connect.
Home tab — Navigate back to voice grid without losing sessions.
Settings panel — Model selection, auto-record, auto-end, auto-interrupt toggles.
Debug panel — tmux sessions table, hub log viewer, Kill All Sessions button.
Text input mode — Type messages to the agent instead of speaking.
Thinking sounds — Double-tick audio pattern while agent processes.
Chat persistence — Messages saved to localStorage and restored on reload.

Hub Improvements¶

Goodbye parameter — converse() accepts goodbye=true to explicitly end sessions. Prevents premature session_ended on wait_for_response=false calls.
Thinking signal — Hub sends thinking message at start of handle_converse so clients show immediate feedback.
Session timeout — Increased to 120 minutes.
Resilient converse flow — v0.3.1 improvements to error handling and faster session spawning.
Multi-client broadcast fix — send_to_browser iterates over list(browser_clients) to avoid set-changed-during-iteration errors.
No-cache headers — Index route sends Cache-Control: no-cache so browsers always get fresh HTML.

Audio Fixes¶

Safari autoplay — Switched from new Audio() to Web Audio API (audioCtx.decodeAudioData) so second+ voice responses play on iOS Safari.
Repeated listening cue — Added pendingListenSessionId guard to ignore re-sent listening messages.
Listening cue cooldown — 2-second cooldown prevents rapid-fire cue sounds.
VAD grace period — 0.8s delay to ignore audio cue bleedthrough on mic open.

Documentation¶

CLAUDE.md — Added project instructions warning about hub.html syntax fragility.
Conversation dynamics — New doc explaining the converse cycle, message flow, and who controls what.
Agent reference docs — WebSocket protocol, UI behavior reference, iOS dev and web dev guides.