โป vision
A bidirectional voice loop wired into every phase of my agent's lifecycle. I talk; Whisper transcribes; the agent acts; ElevenLabs speaks back in a voice cloned from my own โ so the agent's voice sounds like a friend, not a robot. Phase transitions, status updates, error messages, and meaningful state changes all get a brief verbal note.
โป why
Most agent UIs assume you're staring at a screen. I'm usually doing something else โ pacing, dishes, walking to a thing โ and I want my agent to talk to me like a co-founder, not a chatbot. ElevenLabs voice cloning got uncomfortably good, so I cloned my own voice and now my agent narrates itself.
what it does today
- โถTTS at every PAI phase boundary ('Entering the Build phase')
- โถspeech-to-text via Whisper for the input side
- โถvoice ID lookup so the same voice plays consistently
- โถa `notify` HTTP endpoint any tool can hit
- โถgraceful muting when I plug in headphones (most of the time)
โป next steps
always-on conversational mode (11.ai integration), screen-share context for the agent, and the long-running goal: a co-pilot I can pair-program with by talking, not typing.
voice loop
โโโโโโโโโโโโโ
you โ whisper โโ claude โ tools โโ result
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โผ
ElevenLabs (cloned voice) โโ speakers
"I finished the migration. You should look at the schema file."