Skip to main content
Status: Accepted Date: 2026-02-06 Deciders: Reflections Maintainers

Context

Realtime voice response latency and background learning throughput have different reliability and safety requirements. Combining them in one execution path risks write-side side effects impacting user-facing turn responsiveness.

Decision

Use a two-plane architecture:
  • Realtime plane (apps/api + packages/brain-core) is read-only for truth tables and optimized for latency. Voice orchestration is delegated to a managed provider (ElevenLabs Conversations API); the API serves session bootstrap, server-tool callbacks, and webhook receivers. There is no in-process agent service.
  • Background plane (apps/workers) performs ingestion, extraction, evaluation, and patch application. Transcript capture from the managed voice provider enters this plane via webhook-triggered ingestion dispatch.
  • Learning remains gated: candidate facts are only activated after evaluation and patch application.
  • Each plane uses a functional-core/imperative-shell split: orchestrators manage IO/state transitions, while lifecycle rules and transformation logic remain in focused, testable core modules.

Alternatives considered

Alternative 1: Single unified plane for read + write

Pros:
  • Simple conceptual model.
  • Fewer services to deploy.
Cons:
  • Higher blast radius from ingestion/extraction failures.
  • Latency regressions from background workload contention.
  • Weaker enforcement of read-only invariants for live turns.

Alternative 2: Eventual write-back directly from realtime plane

Pros:
  • Immediate capture of conversational learning opportunities.
Cons:
  • High risk of unvetted writes to truth tables.
  • Harder to reason about correctness and rollback.

Alternative 3: Human approval only, no automated eval gate

Pros:
  • Maximal manual control over learning.
Cons:
  • Operational bottleneck and low throughput.
  • Delayed model memory updates.

Consequences

Benefits:
  • Tight safety boundary: realtime can retrieve but not mutate truth.
  • Better operational isolation and clearer failure domains.
  • Cleaner ownership split for low-latency vs batch concerns.
Costs:
  • More services and pipeline orchestration to maintain.
  • Cross-plane status propagation complexity.

Implementation notes

  • Worker orchestration is implemented via Inngest.
  • Realtime read-only behavior is guarded by import contract tests in packages/brain-core.
  • Lifecycle semantics are anchored in patch/fact state models in DB migrations.
  • The realtime plane exposes a server-tool endpoint (POST /v1/tools/retrieve-context) that the managed voice provider calls during conversations to fetch knowledge-graph evidence. A webhook endpoint and a reconciliation endpoint bridge conversation transcripts from the voice provider into the background ingestion pipeline.
  • Prior to 2026-02-09, the realtime plane included a separate in-process agent service using LiveKit Agents SDK. This was removed in favor of delegating voice orchestration to ElevenLabs. See ADR-0019: Voice runtime provider strategy.