Home/Docs/Architecture

How your assistant is built

A living map of the assistant’s anatomy. Drag the head to look around, and tap any glowing point to see how that part works and what powers it.

  • Every assistant belongs to one organization — an isolated tenant with its own data, members and policies.

    Isolated multi-tenant dataA dedicated database and storage per organization — strict isolation by design.
    Feature & usage governanceAdmins enable capabilities and cap usage so behaviour stays consistent for everyone.
    How it works

    Nothing is shared between organizations. Each one keeps its own knowledge, conversations, integrations and audit trail, so your data never mixes with anyone else’s.

    Admins decide which capabilities are switched on and set sensible usage limits, keeping the assistant predictable and costs in check across the whole team.

    The body that every other ability lives inside — it shapes what the Brain and Connection are allowed to do.

    Learn more
  • One or more AI models do the thinking — reading what you typed or said, and writing the reply.

    Multi-model LLMsOpenAI, Gemini, local Ollama and others run together; the assistant routes to a capable one automatically or on your pick.
    Realtime voice modelsGemini Live or OpenAI Realtime carry a true two-way spoken conversation with very low latency.
    How it works

    You can wire up several models side by side — OpenAI, Google Gemini, local Ollama and more — and switch which one is active in the moment. The right model for the job, never a single lock-in.

    For natural spoken conversation, a realtime model can listen and speak in one continuous flow instead of passing audio between separate steps.

    Takes everything the senses gather, reasons over it, and hands its answer to the Mouth to be spoken.

    Learn more
  • Share your screen and microphone so the assistant can see what you’re doing and help in context.

    Live screen + audio captureStreams your screen and microphone to the assistant in real time over the LiveKit connection.
    Shared vision modelUses the model you already set up in Thinking — no separate vision pipeline.
    How it works

    Perfect for walkthroughs, demos and getting unstuck while you work — the assistant follows along live rather than guessing from a description.

    It reasons with the same AI model as the rest of chat, and can talk you through what it sees when spoken replies are available.

    Feeds what it sees straight to the Brain so the reasoning has real visual context.

    Learn more
  • Your microphone audio is turned into text the assistant can read.

    Streaming cloud speech-to-textReal-time transcription via Deepgram Nova-3, with Google Cloud Speech as an alternative.
    On-device fallbackPrivate, offline transcription that always works — no account or network required.
    How it works

    On-device transcription runs privately, works offline and needs no third-party service — ideal when speech should never leave the device.

    For natural, flowing conversation, a streaming cloud service transcribes speech the instant you say it.

    Sends the words it heard to the Brain to be understood.

    Learn more
  • The assistant turns its written answer into a natural spoken voice.

    Neural text-to-speechNatural voices from Google Cloud TTS or Deepgram Aura-2, chosen for quality or lower latency.
    Wide language coverageMany languages and voices, so the assistant speaks the way your team does.
    How it works

    A neural text-to-speech voice reads the reply aloud, so a voice session feels like a real conversation rather than reading from a screen.

    You pick the voice provider that best fits your needs for quality, latency and language coverage.

    Rides the Connection out to your speakers — the final step of a spoken reply.

    Learn more
  • Every voice or video stream flows through a LiveKit media server.

    LiveKit Cloud or self-hostedUse the managed service, or run your own server for full control and privacy.
    Real-time WebRTC mediaLow-latency, encrypted audio and video built for live conversation, not file transfer.
    How it works

    LiveKit moves audio and video between you and the assistant — and between teammates — with the low latency that real conversation needs.

    Run it as a managed cloud service with nothing to maintain, or self-host it for full control and data residency. Either way it carries the ears, mouth and eyes for everyone in the organization.

    Carries all voice and video between you and the assistant — Listening, Speaking and Seeing all ride on it.

    Learn more