Skip to content

MCP Server & Agent Architecture

What's an MCP Server?

MCP (Model Context Protocol) is how Claude Code gets extra capabilities beyond reading and writing files. An MCP server gives Claude new tools it can call — in our case, the ability to speak and listen through the ClawMux.

Each agent in the sidebar has its own MCP server running in the background. It's a thin bridge between Claude and the hub — Claude calls a tool like converse("Hello!"), the MCP server forwards it to the hub, and the hub handles all the audio (TTS, playback, recording, STT).

How Agents Connect

When you click a voice card in the sidebar:

  1. The hub creates a working directory and starts Claude Code in a tmux session
  2. Claude loads the MCP server, which connects to the hub via WebSocket
  3. The hub sends the /voice-hub skill to Claude, activating voice mode
  4. Claude sets its project status in the sidebar
  5. Claude greets you and starts listening

Each agent runs independently — they have their own terminal, their own MCP connection, and their own conversation history.

Available Tools

Agents currently have three tools:

  • converse — Speak a message to the user. Can optionally wait for a spoken reply. This is the main tool agents use for all voice interaction.
  • set_project_status — Update the sidebar to show what project and area the agent is working on (e.g. "voice-hub · frontend"). Agents call this on startup and whenever their context changes.
  • voice_chat_status — Check if a browser is connected. Agents call this on startup to make sure someone is listening.

The /voice-hub Skill

The /voice-hub skill is a Claude Code slash command that activates voice chat mode. It's sent to each agent automatically when they start up. The skill tells Claude to:

  1. Check browser connection with voice_chat_status
  2. Set project status with set_project_status
  3. Greet the user via converse
  4. Process spoken requests and respond via converse
  5. Keep the conversation going until the user says goodbye

See the raw skill: /voice-hub skill

Agent Identity (CLAUDE.md)

Every agent gets a CLAUDE.md file in its working directory when it's created. This is how Claude knows its name, personality, and behavior — it's the first thing Claude reads when it starts.

The template is generated by session_manager.py and written to /tmp/voice-hub-sessions/{voice_id}/CLAUDE.md.

See the raw template: CLAUDE.md Template

Adding New Tools

The MCP server is defined in hub_mcp_server.py. To add a new tool:

  1. Define a new function with @mcp.tool in hub_mcp_server.py — this is what Claude sees and can call
  2. Handle the message in hub.py — the hub receives it via WebSocket and does the actual work
  3. If the tool needs to update the browser UI, the hub broadcasts a message to the frontend, and hub.html handles it in the handleMessage function

The pattern is always the same: Claude calls tool → MCP server forwards to hub → hub does work → hub notifies browser.

Session Lifecycle

Starting: Click a voice card → hub creates tmux + working directory + MCP config + CLAUDE.md → Claude starts → MCP server connects → /voice-hub skill activates → agent sets project status → agent greets you

Running: You speak → hub transcribes → Claude thinks and works → Claude calls converse() → hub synthesizes speech → you hear it

Ending: Say "goodbye" → agent calls converse(goodbye=true) → hub terminates the session. You can also right-click a card and kill it, or it times out after being idle.

Key Files

File What it does
hub.py The main server — handles WebSockets, audio routing, session management
hub_mcp_server.py The MCP server that gives Claude voice tools
session_manager.py Creates and manages agent sessions (tmux, working dirs, CLAUDE.md)
hub_config.py Configuration (ports, timeouts, model settings)
static/hub.html The browser interface (HTML + CSS + JS, all in one file)
.claude/commands/voice-hub.md The /voice-hub skill that activates voice mode