ADR-0003: Two-plane system architecture

Status: Accepted Date: 2026-02-06 Deciders: Reflections Maintainers

Context

Realtime voice response latency and background learning throughput have different reliability and safety requirements. Combining them in one execution path risks write-side side effects impacting user-facing turn responsiveness.

Decision

Use a two-plane architecture:

Realtime plane (apps/api + packages/brain-core) is read-only for truth tables and optimized for latency. Voice orchestration is delegated to a managed provider (ElevenLabs Conversations API); the API serves session bootstrap, server-tool callbacks, and webhook receivers. There is no in-process agent service.
Background plane (apps/workers) performs ingestion, extraction, evaluation, and patch application. Transcript capture from the managed voice provider enters this plane via webhook-triggered ingestion dispatch.
Learning remains gated: candidate facts are only activated after evaluation and patch application.
Each plane uses a functional-core/imperative-shell split: orchestrators manage IO/state transitions, while lifecycle rules and transformation logic remain in focused, testable core modules.

Alternatives considered

Alternative 1: Single unified plane for read + write

Pros:

Simple conceptual model.
Fewer services to deploy.

Cons:

Higher blast radius from ingestion/extraction failures.
Latency regressions from background workload contention.
Weaker enforcement of read-only invariants for live turns.

Alternative 2: Eventual write-back directly from realtime plane

Pros:

Immediate capture of conversational learning opportunities.

Cons:

High risk of unvetted writes to truth tables.
Harder to reason about correctness and rollback.

Alternative 3: Human approval only, no automated eval gate

Pros:

Maximal manual control over learning.

Cons:

Operational bottleneck and low throughput.
Delayed model memory updates.

Consequences

Benefits:

Tight safety boundary: realtime can retrieve but not mutate truth.
Better operational isolation and clearer failure domains.
Cleaner ownership split for low-latency vs batch concerns.

Costs:

More services and pipeline orchestration to maintain.
Cross-plane status propagation complexity.

Implementation notes

Worker orchestration is implemented via Inngest.
Realtime read-only behavior is guarded by import contract tests in packages/brain-core.
Lifecycle semantics are anchored in patch/fact state models in DB migrations.
The realtime plane exposes a server-tool endpoint (POST /v1/tools/retrieve-context) that the managed voice provider calls during conversations to fetch knowledge-graph evidence. A webhook endpoint and a reconciliation endpoint bridge conversation transcripts from the voice provider into the background ingestion pipeline.
Prior to 2026-02-09, the realtime plane included a separate in-process agent service using LiveKit Agents SDK. This was removed in favor of delegating voice orchestration to ElevenLabs. See ADR-0019: Voice runtime provider strategy.

​Context

​Decision

​Alternatives considered

​Alternative 1: Single unified plane for read + write

​Alternative 2: Eventual write-back directly from realtime plane

​Alternative 3: Human approval only, no automated eval gate

​Consequences

​Implementation notes

​Related ADRs

Context

Decision

Alternatives considered

Alternative 1: Single unified plane for read + write

Alternative 2: Eventual write-back directly from realtime plane

Alternative 3: Human approval only, no automated eval gate

Consequences

Implementation notes

Related ADRs