iOS Implementation Backlog¶
Work items for the iOS app. Delete each item after completing it. When this file is empty, the app is caught up with the web client.
Note: Some of these features may already be partially or fully implemented. Read the current codebase first and use your discretion — skip or delete items that are already done, and adapt the remaining ones to fit the existing implementation.
Sync with Web Client¶
These items bring the iOS app in line with the current web client behavior.
Remove Tab Bar¶
The web client no longer has a tab bar. Navigation is via the voice grid (landing page). Remove:
sessionTabBarview andsessionTab()function fromContentView.swift- The "+" New button (spawning is done from voice cards)
- The Debug tab button (move to header or settings)
Replace with a debug link in the header bar (small text, like the web client's header).
Show Active Voice Name in Header¶
When viewing a session, display the voice name (e.g. "Sky") in the header bar next to "ClawMux". Hide it when on the voice grid.
Fetch Message History from Server¶
Messages are persisted server-side per voice. On session open, fetch history instead of starting with an empty chat.
GET /api/history/{voice_id}returns{"voice_id": "...", "messages": [{role, text, ts}, ...]}- In
addSessionFromDict(), fetch history and populatesession.messageswith it - In
switchToSession(), optionally re-fetch to pick up messages from other clients
Reset History¶
Add a way to clear a voice's history:
DELETE /api/history/{voice_id}clears it- Add to a context menu (long-press on voice card) or a button in the session view
- After clearing, empty local
session.messagesand re-render
Unread Badge on Voice Cards¶
When a background session has activity (audio buffered, pending listen), show a small red dot on its voice card. The web client uses session.hasUnread for this.
Background Audio Buffering¶
The web client buffers audio for background sessions and plays it when you switch to that tab. The iOS app currently ignores audio for non-active sessions (line 497: if sid == activeSessionId). Add:
audioBuffer: [Data]toVoiceSession- Buffer audio when
sid != activeSessionId - Play buffered audio on
switchToSession() - Send
playback_doneafter all buffered audio plays
Auto Interrupt Toggle¶
The web client has an "Auto Interrupt" toggle (voice-based interrupt during playback). The iOS app is missing this. Add:
autoInterruptEnabledpublished var- During playback, monitor mic for sustained speech (like the web client's
startPlaybackVAD) - If speech detected for 300ms, interrupt and start recording
- Toggle in settings or controls
Controls Bar Cleanup¶
Match the web client's minimal controls:
- Remove tmux session name display (line 308-312)
- Move toggles and voice/speed pickers into an options menu or sheet
- Big mic button with colored glow matching state (blue=record, green=send, orange=interrupt)
Mic Mute¶
The web client has a mic mute toggle. When muted, sendSilentAudio() is called instead of recording. Add:
micMutedpublished var- When muted and
listeningreceived, send empty audio - Toggle in options/settings
Voice Card States¶
The web client shows detailed voice card states (thinking, speaking, listening). The iOS app shows basic states. Update voiceCardLabel and voiceCardDotColor to match:
- Thinking (orange, pulsing) — when
session.isThinking - Speaking (blue) — when status text is "Playing..." or "Speaking..."
- Listening (red) — when status text is "Recording..." or "Tap Record" or
pendingListen
Handle Heartbeat Pings¶
The hub sends {"type": "ping"} every 30 seconds to all connected clients. The iOS app should:
- Ignore
pingmessages in the WebSocket message handler (don't process them as unknown types) - Track the last ping time. If no ping received for ~60 seconds, assume the connection is dead and reconnect
- On reconnect, hub sends a fresh
session_list— use it to rebuild state
Multi-Client Sync¶
The hub now supports multiple simultaneous clients. The iOS app no longer "replaces" the browser — both can be connected at the same time and receive the same messages. Remove any "single client" assumptions if present.
Voice Grid as Landing Page¶
On app launch and reconnect, show the voice grid (not a session view). When tapping a voice card:
- If the voice already has an active session → switch to its chat view immediately
- If no session exists → spawn one (see Spawn Flow below)
The server rejects duplicate voice spawns (returns 503 if that voice already has a session).
Spawn Flow¶
Spawning a session is a long-running operation (~30-60 seconds). The POST /api/sessions request blocks until the session is ready or times out. The iOS app must handle this correctly:
- User taps an inactive voice card
- Show "Spawning..." on the card immediately (local UI state)
- Call
POST /api/sessionswith{"voice": "af_sky"}— set a long URLSession timeout (90s+), this request takes ~30-60s - The response comes back with
status: "ready"and the full session object - Add the session to local state and switch to its chat view
- If the request fails (503 = duplicate voice, 504 = timeout, network error), show an error and clear the "Spawning..." state
Important: While the POST is pending, the WebSocket may also send session_status messages for the session. You can use either the REST response or the WebSocket session_status with status: "ready" to trigger the switch — whichever arrives first. The session object from the REST response is the authoritative one.
Do NOT use a short timeout on the spawn request — the default URLSession timeout (60s) may be too short. Use at least 90 seconds.
Connection Status Indicator¶
The header shows a connection status dot and label:
- Connecting — pulsing yellow dot, "Connecting..." text (on app launch / reconnect)
- Connected — green dot, "Connected" text
- Disconnected — red dot, "Disconnected" text
Show this in the app's header or status area.
Waveform Visualizer¶
The web client shows a live audio waveform while recording. Use AVAudioEngine's input tap to get audio levels and render a waveform (oscilloscope style) or level meter above the mic button during recording. Color it with the active voice's color. Hide when not recording.
Chat Display Cap¶
Only render the last 50 messages in the chat view. The server stores up to 200 per voice, but displaying all of them is unnecessary. New messages during the session still append live.
Settings Page¶
The web client has a dedicated settings page (accessible from header) with persistent server-side settings. Add a settings view to the iOS app:
- Fetch settings on launch:
GET /api/settings - Model picker (Opus/Sonnet/Haiku) — changes apply to new sessions only
- Auto Record, Auto End, Auto Interrupt toggles
- Save changes via
PUT /api/settingswith partial JSON - Settings are shared across all clients (browser + iOS)
Device Switching¶
The hub supports seamless device switching mid-conversation. If the user closes the browser while an agent is speaking or listening, and then opens the iOS app:
- The hub waits for a client to reconnect (no timeout, no error)
- When the iOS app connects, the hub re-sends
listeningevery 5 seconds until a client responds - Audio that was playing on the old device is skipped — the conversation continues from the listen phase
- The app should handle receiving
listeningmessages for sessions it didn't initiate — switch to the session or mark it as pending
Voice Colors¶
Each voice has a unique accent color. Use these for the voice name text on cards and as a left border on assistant chat bubbles:
| Voice | Color |
|---|---|
| Sky | #3a86ff (blue) |
| Alloy | #e67e22 (orange) |
| Sarah | #e63946 (red) |
| Adam | #2ecc71 (green) |
| Echo | #9b59b6 (purple) |
| Onyx | #1abc9c (teal) |
| Fable | #f1c40f (gold) |