iOS App¶

Native iPhone companion app for ClawMux. Connects to the hub over WebSocket and provides a multi-session voice interface with three input modes.

Status: Beta (v0.5.0)

Features¶

Input Modes¶

Auto - Mic opens automatically after the agent speaks, VAD auto-stops on silence
Push to Talk (PTT) - Hold mic button to record with 4-direction gestures:
Swipe up: send audio immediately
Swipe left: cancel recording
Swipe right: open keyboard with transcription for editing
Release: show inline transcript preview (tap to edit, send, or dismiss)
Typing - Keyboard input, no voice/TTS
Mode toggle in the session view (tap the mode pill below the mic button)

Voice Sessions¶

Multi-session support with voice grid landing page
Voice card states: Thinking (orange), Speaking (blue), Listening (red), Ready (green)
Chat transcript with persistence across restarts
Voice selection (7 Kokoro voices) and speed control (0.75x-2x) per session
Context menu on voice cards to terminate or reset history

Audio¶

Background audio with dual keepalive (AVAudioEngine input tap + silent audio loop)
Background recording with auto-record in auto mode
Audio buffering for background sessions, played on switch
Interrupt playback by tapping mic during speech
Audio cues: thinking tick, listening cue, processing cue, session ready chime
Per-mode sound and haptics toggles

Live Activity¶

Dynamic Island: session status dot + voice name
Lock Screen: voice name, status, last message preview
Per-mode toggle (auto and PTT only, typing uses notifications)

PTT Extras¶

4-direction gesture: up=send, left=cancel, right=keyboard, release=preview transcript
Inline transcript preview with send/edit/dismiss after recording
Keyboard mode with mic button for additional voice-to-text
Dismissing keyboard returns to transcript preview if text exists
Parallel transcription on send shows what you said while waiting for response

Left-edge swipe to go back to hub from session view
Swipe-back reveals home view underneath (no black flash)

Requirements¶

iPhone running iOS 17.0+
Xcode 16+ (with iOS platform SDK)
XcodeGen (brew install xcodegen)
Apple Developer account (free works, apps expire every 7 days)

Build & Deploy¶

The development workflow uses three commands: build, install, launch.

One-time setup¶

cd ios/
xcodegen generate

Find your device ID:

xcrun xctrace list devices

Build¶

cd ios/
xcodebuild -project VoiceHub.xcodeproj -scheme VoiceHub \
  -destination 'id=DEVICE_ID' 2>&1 | grep -E '(BUILD|error:)'

Use generic/platform=iOS as the destination if you just want to verify compilation without a connected device.

Install¶

xcrun devicectl device install app \
  --device DEVICE_ID \
  ~/Library/Developer/Xcode/DerivedData/VoiceHub-*/Build/Products/Debug-iphoneos/VoiceHub.app

Launch¶

xcrun devicectl device process launch \
  --device DEVICE_ID \
  com.zeul.voicehub

Phone must be unlocked for remote launch to work.

On first install, go to Settings > General > VPN & Device Management and trust the developer certificate.

All-in-one¶

cd ios/ && \
xcodebuild -project VoiceHub.xcodeproj -scheme VoiceHub \
  -destination 'id=DEVICE_ID' 2>&1 | grep -E '(BUILD|error:)' && \
xcrun devicectl device install app --device DEVICE_ID \
  ~/Library/Developer/Xcode/DerivedData/VoiceHub-*/Build/Products/Debug-iphoneos/VoiceHub.app && \
xcrun devicectl device process launch --device DEVICE_ID com.zeul.voicehub

Configuration¶

On launch, tap the gear icon and enter your hub URL with port:

workstation.tailee9084.ts.net:3460

The app connects via WebSocket (ws:// or wss:// depending on scheme). Tailscale direct connections work.

Project Structure¶

ios/
  project.yml                  # XcodeGen project definition
  VoiceHub/
    Info.plist                 # Background modes, permissions, URL schemes
    VoiceHubApp.swift         # SwiftUI app entry point
    VoiceHubViewModel.swift   # WebSocket, audio, state, Live Activity, recording
    ContentView.swift          # UI (voice grid, session view, settings, debug)
    Assets.xcassets/           # App icon, colors
  VoiceHubShared/
    VoiceHubActivityAttributes.swift  # ActivityKit attributes (shared with widget)
  VoiceHubWidget/
    Info.plist                 # Widget extension Info.plist
    VoiceHubWidgetBundle.swift  # Widget entry point
    VoiceHubLiveActivity.swift  # Dynamic Island + Lock Screen UI

Key architecture¶

VoiceHubViewModel (~2400 lines) - All state management. WebSocket connection, audio session, recording (AVAudioRecorder), playback (AVAudioPlayer), Live Activity lifecycle, background keepalive, VAD, tone player, notifications.
ContentView (~1350 lines) - All UI. Voice grid, session view with chat, three bottom control variants (auto/PTT controls, PTT text input, typing text input), settings (per-mode pages), debug panel.

Settings Structure¶

Settings are organized by input mode:

Auto - Auto record, VAD + tuning, auto interrupt, record while thinking, sounds, haptics, notifications, Live Activity
PTT - Record while thinking, sounds, haptics, notifications, Live Activity
Typing - Haptics, notifications

Global settings (server, model, background mode) are on the root settings page.

Background Audio¶

The app uses UIBackgroundModes: audio with a layered keepalive strategy (modeled after the OpenClaw approach):

Primary: AVAudioEngine with a continuous input tap that keeps the audio processing pipeline alive. iOS won't suspend apps with active audio engine work.
Secondary: Silent audio loop via AVAudioPlayer (8kHz, 1s, near-silent WAV, volume 0).
Audio session: .playAndRecord with .spokenAudio mode, Bluetooth support, 48kHz preferred sample rate.
Interruption recovery: On audio session interruption end, re-activates the session and restarts the keepalive engine if it died.

Both keepalive mechanisms start when the app backgrounds with active sessions, and stop when returning to foreground.

Persistence¶

All state is saved to UserDefaults:

Key	Type	Description
`serverURL`	String	Hub connection URL
`inputMode`	String	auto, ptt, or typing
`autoRecord`	Bool	Auto-record after assistant speaks
`vadEnabled`	Bool	Voice activity detection
`backgroundMode`	Bool	Background keepalive enabled
`liveActivityAuto`	Bool	Live Activity for auto mode
`liveActivityPTT`	Bool	Live Activity for PTT mode
`voice-hub-chats`	JSON	Chat messages per session
`sessionPrefs`	JSON	Per-session voice and speed
Sound/haptic toggles	Bool	Per-mode audio cue and haptic settings

Pending Feature Parity¶

Features added to the web client that are not yet in the iOS app. These are tracked here so that future iOS development stays in sync.

Karaoke Word Highlighting¶

The web client highlights each word in assistant messages in real-time as it is spoken, using word-level timestamps from Kokoro's /dev/captioned_speech endpoint.

How it works (web): - The hub calls /dev/captioned_speech instead of plain TTS, which returns {audio: base64_mp3, timestamps: [{word, start_time, end_time}]} - The audio WebSocket message now includes a words field alongside data - Browser spans each word in the latest assistant message, then a 60fps RAF loop highlights the current word based on audioCtx.currentTime - startTime - Active word gets text-shadow (bold effect without layout shift) + voice-color background highlight - Words are saved and re-applied when switching sessions mid-playback

iOS implementation notes: - Use the new /api/tts-captioned endpoint (POST {text, voice, speed}, returns {audio_b64, words}) instead of /api/tts - For live speech from the hub, parse the words field from the audio WebSocket message - Use AVAudioPlayer.currentTime as the clock, drive updates with a CADisplayLink (60fps) - Highlight the current word by applying an AttributedString overlay or animating text color/weight in the chat view - Preserve word list across session switches — re-apply to the last assistant message when switching back

Audio Resume on Session Switch (Seek)¶

The web client saves the playback offset when you switch away from a session mid-speech, then seeks to that offset when you return.

How it works (web): - On session switch, remaining audio chunks are stashed to s.audioBuffer with an {offset: elapsed} marker - On return, playAudio calls source.start(0, offset) to seek into the buffer - Karaoke timestamps are adjusted by the same offset so highlighting stays in sync

iOS implementation notes: - iOS already buffers audio for background sessions and replays on switch - Add seek: save player.currentTime before stopping, store with the audio data, call player.currentTime = savedOffset before play() on resume - AVAudioPlayer supports seeking via currentTime property

Mute Button¶

The web client has a mute toggle that suppresses mic input (auto-record is suspended) without changing the input mode. The button occupies the same space as the cancel button so the layout doesn't shift when recording starts.

iOS implementation notes: - Add a micMuted boolean to ViewModel - When muted: skip auto-record, send silent audio to unblock the agent if it's waiting for input, show a visual indicator (mic with slash icon) - The button should sit at a fixed position alongside the mic button — does not appear/disappear; always present

Hub Reconnect Toast¶

When the WebSocket reconnects (not on first connect), the web client briefly shows a "Hub reconnected" toast that dismisses automatically.

iOS implementation notes: - Track whether a previous connection was established before the reconnect - On reconnect, show a brief system toast or overlay label that auto-dismisses after 2s - Do not show on first connection

Per-Session Model Selection¶

The web client exposes a model picker per session (claude-opus-4-5, claude-sonnet-4-5, etc.) that overrides the global default.

iOS implementation notes: - Add model selector to session view settings (or the session detail area) - Send model selection via the existing session settings WebSocket message or via /api/sessions/{id}/model REST endpoint - Persist per-session in sessionPrefs UserDefaults key alongside voice and speed