โ† back to /labs
๐ŸŽ™๏ธ

โ€ป /labs/voice system

Voice System

Whisper-in / ElevenLabs-out โ€” my agent talks back in my own voice

work in progress

โ€ป vision

A bidirectional voice loop wired into every phase of my agent's lifecycle. I talk; Whisper transcribes; the agent acts; ElevenLabs speaks back in a voice cloned from my own โ€” so the agent's voice sounds like a friend, not a robot. Phase transitions, status updates, error messages, and meaningful state changes all get a brief verbal note.

โ€ป why

Most agent UIs assume you're staring at a screen. I'm usually doing something else โ€” pacing, dishes, walking to a thing โ€” and I want my agent to talk to me like a co-founder, not a chatbot. ElevenLabs voice cloning got uncomfortably good, so I cloned my own voice and now my agent narrates itself.

what it does today

  • โ–ถTTS at every PAI phase boundary ('Entering the Build phase')
  • โ–ถspeech-to-text via Whisper for the input side
  • โ–ถvoice ID lookup so the same voice plays consistently
  • โ–ถa `notify` HTTP endpoint any tool can hit
  • โ–ถgraceful muting when I plug in headphones (most of the time)

โ€ป next steps

always-on conversational mode (11.ai integration), screen-share context for the agent, and the long-running goal: a co-pilot I can pair-program with by talking, not typing.

   voice loop
   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
        you โ”€ whisper โ”€โ†’ claude โ”€ tools โ”€โ†’ result
                                              โ”‚
   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
   โ–ผ
   ElevenLabs (cloned voice) โ”€โ†’ speakers
   "I finished the migration. You should look at the schema file."