Run any number of tenant-server replicas behind a load balancer with no "must be on the same pod" constraints. Every service component is designed to be stateless β shared state lives in Postgres and Redis.
Point TENANT_BASE_URL at your load balancer. All replicas share the same Postgres and Redis. No sticky sessions required for voice. The customization-agent worker can reach any replica.
The tenant server holds no in-memory state that must survive a round-robin to a different replica. JWT signing keys live in Postgres (via the livekit_server namespace) and in your deploy-time passwords.yaml. Every replica reads from the same source, so any pod can mint a LiveKit JWT for any user.
The customization-agent (voice/avatar worker) calls back to /internal/customization-agent/* routes. These routes look up the org user in Postgres, validate the tenant, and proceed β no per-replica in-memory state is consulted. Point TENANT_BASE_URL at the load balancer and any replica can answer.
Anywhere replicas would otherwise step on each other under load balancing, YOffice coordinates through Redis. Each concern below maps to a distributed primitive that keeps every replica in lockstep:
| Coordination concern | Why it matters across replicas | How YOffice handles it |
|---|---|---|
| Periodic sweeps in server.dart | Without coordination, every replica would fire every sweep β NΓ the work and duplicate triggers | LeaderOnlyTimer.periodic(taskName: β¦) β a Redis-elected leader runs the sweep |
| Per-org request rate limiting | An in-memory counter would let each replica enforce the cap on its own slice, so N replicas allow NΓ the limit | DistributedRateLimiter β atomic Redis INCRBY per tenant + time bucket |
| Per-IP rate limit for public forms | Requests could otherwise bypass the limit by hitting different replicas | DistributedRateLimiter keyed by IP + bucket |
| Embedding concurrency | A per-replica semaphore would multiply by N replicas and exceed provider quotas | DistributedSemaphore with cluster-wide capacity |
| Directive cache invalidation | A local-only invalidate would leave other replicas serving stale directives for up to 5 minutes | Redis generation counter β any replica's invalidate bumps it; all replicas see the bump |
| Real-time pub/sub fan-out | postMessage global: true reaches only the publishing replica's subscribers | All 137 call sites use ClusterPubsub.broadcast (global hardwired) |
All primitives live in command_center_tenant_server/lib/src/scaling/horizontal_scaling.dart. They require only a shared Redis cluster β no separate coordination service.
Redis SET NX PX advisory lock. Used to coordinate cross-replica mutual exclusion. Auto-expires after a configurable TTL so a crashed owner cannot block the cluster forever. Long-running holders must call renew() within the TTL window.
A drop-in replacement for Timer.periodic that uses a per-task Redis leader lease. Only the elected leader replica runs the body; all other replicas sit idle. Each task elects its own leader independently, spreading load naturally across pods.
A counting semaphore that bounds cluster-wide concurrency with Redis slot keys. Replaces per-isolate semaphores that only enforce limits locally β important for bulk knowledge-folder embeddings and similar provider-quota-sensitive operations.
Atomic INCRBY in Redis keyed by tenant + time bucket. Enforces per-org request caps across every replica simultaneously. Per-org overrides are written by the admin rate-limit settings screen and pushed to Redis within ~30 seconds.
A thin wrapper around Serverpod session.messages that passes global: true so pub/sub events reach subscribers on every replica, not just the one that published. All real-time chat, presence, and workflow stream events use this wrapper.
A typical production setup looks like this:
Clients (Flutter / browser)
β HTTPS / WSS
Load balancer (nginx / Cloudflare / ALB)
β β
tenant-server pod tenant-server pod β¦ Γ N
β β
βββ Postgres (shared)
βββ Redis (shared)
βββ LiveKit Cloud
βββ MinIO / Supabase Storage
customization-agent worker(s)
βββ HTTP callbacks β TENANT_BASE_URL/internal/β¦
(any pod can answer)Set TENANT_BASE_URL to your load-balancer hostname (e.g. https://tenant.your-domain). Every callback the customization-agent worker makes β /internal/customization-agent/* β carries tenantId, organizationUserId, and a secret header. Any replica can respond; no in-memory state from the JWT-minting replica is required.
All replicas must read from the same Postgres instance and the same Redis cluster. The distributed primitives (locks, leader elections, semaphores, rate limits, pub/sub) only work correctly when all pods share a single Redis keyspace.
Sticky sessions are not required for voice β LiveKit Cloud connects clients directly to its own media plane after JWT minting. Sticky sessions ARE helpful for the Serverpod WebSocket so chat clients do not churn through reconnects when a pod restarts. Use ip_hash on nginx or a session-affinity policy on a managed load balancer.
Each pod derives a stable identity from CC_INSTANCE_ID β HOSTNAME β platform hostname β random fallback. Setting CC_INSTANCE_ID explicitly in your deployment manifest gives you consistent pod names in logs, distributed-lock owner tokens, and the /internal/instance-info debug route.
LiveKit Cloud connects clients directly to its own WebRTC media plane after the tenant server issues a JWT. The tenant server is not in the media path after token issuance, so session affinity does not affect voice quality.
The customization-agent worker registers with LiveKit Cloud using the agent_name=customization identifier. It dispatches callbacks to TENANT_BASE_URL β your load balancer β and any replica can handle them. The worker itself can also run as multiple instances; each instance registers with LiveKit independently.
Organization admins can adjust the per-org callback rate cap in Settings β Rate Limits. The new value is written to Postgres and pushed to Redis within ~30 seconds, so all replicas pick it up without a restart. The default is 240 requests per minute.
The distributed primitives (locks, leader elections, rate limits, pub/sub) all require Redis. If Redis is not available, the server falls back to in-process behavior, which is safe for single-replica deployments but will exhibit the bugs described above under load balancing.