Status: Accepted Date: 2026-02-06 Deciders: Reflections Maintainers
Context
Realtime voice response latency and background learning throughput have different reliability and safety requirements. Combining them in one execution path risks write-side side effects impacting user-facing turn responsiveness.Decision
Use a two-plane architecture:- Realtime plane (
apps/api+packages/brain-core) is read-only for truth tables and optimized for latency. Voice orchestration is delegated to a managed provider (ElevenLabs Conversations API); the API serves session bootstrap, server-tool callbacks, and webhook receivers. There is no in-process agent service. - Background plane (
apps/workers) performs ingestion, extraction, evaluation, and patch application. Transcript capture from the managed voice provider enters this plane via webhook-triggered ingestion dispatch. - Learning remains gated: candidate facts are only activated after evaluation and patch application.
- Each plane uses a functional-core/imperative-shell split: orchestrators manage IO/state transitions, while lifecycle rules and transformation logic remain in focused, testable core modules.
Alternatives considered
Alternative 1: Single unified plane for read + write
Pros:- Simple conceptual model.
- Fewer services to deploy.
- Higher blast radius from ingestion/extraction failures.
- Latency regressions from background workload contention.
- Weaker enforcement of read-only invariants for live turns.
Alternative 2: Eventual write-back directly from realtime plane
Pros:- Immediate capture of conversational learning opportunities.
- High risk of unvetted writes to truth tables.
- Harder to reason about correctness and rollback.
Alternative 3: Human approval only, no automated eval gate
Pros:- Maximal manual control over learning.
- Operational bottleneck and low throughput.
- Delayed model memory updates.
Consequences
Benefits:- Tight safety boundary: realtime can retrieve but not mutate truth.
- Better operational isolation and clearer failure domains.
- Cleaner ownership split for low-latency vs batch concerns.
- More services and pipeline orchestration to maintain.
- Cross-plane status propagation complexity.
Implementation notes
- Worker orchestration is implemented via Inngest.
- Realtime read-only behavior is guarded by import contract tests in
packages/brain-core. - Lifecycle semantics are anchored in patch/fact state models in DB migrations.
- The realtime plane exposes a server-tool endpoint (
POST /v1/tools/retrieve-context) that the managed voice provider calls during conversations to fetch knowledge-graph evidence. A webhook endpoint and a reconciliation endpoint bridge conversation transcripts from the voice provider into the background ingestion pipeline. - Prior to 2026-02-09, the realtime plane included a separate in-process agent service using LiveKit Agents SDK. This was removed in favor of delegating voice orchestration to ElevenLabs. See ADR-0019: Voice runtime provider strategy.

