Voice and Generative UI

Talk to Zai like a person. Get back an interface that fits whatever you're trying to do.

Real-time bidirectional voice

Zai speaks and listens in real time. Powered by Deepgram for speech recognition and ElevenLabs for voice synthesis, with LiveKit Agents handling the conversational pipeline. Sub-second latency, natural turn-taking, no walkie-talkie feel.

Generative UI

The interface adapts to the task. Ask for a comparison and get a comparison view. Ask for a checklist and get a checklist. Ask for a calendar and get a calendar. The UI is generated on the fly to fit the work, not selected from a fixed menu of templates.

Vision

Screen and webcam vision powered by Gemini 3.1 Flash-Lite. Zai can see what you see, share screens, and reason about images and video alongside the conversation.

Why voice and generative UI together

Voice without UI is good for reminders and chat but bad for anything visual. UI without voice is fast but excludes anyone who can't or doesn't want to type. Together they cover everything — speak when speaking is faster, look when looking is faster, switch fluidly between the two.