> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reflections.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# ADR-0003: Two-plane system architecture

> Separate low-latency inference from asynchronous learning to preserve reliability and control risk.

<Info>**Status:** Accepted **Date:** 2026-02-06 **Deciders:** Reflections Maintainers</Info>

## Context

Realtime voice response latency and background learning throughput have different reliability and safety requirements. Combining them in one execution path risks write-side side effects impacting user-facing turn responsiveness.

## Decision

Use a two-plane architecture:

* **Realtime plane** (`apps/api` + `packages/brain-core`) is read-only for truth tables and optimized for latency. Voice orchestration is delegated to a managed provider (ElevenLabs Conversations API); the API serves session bootstrap, server-tool callbacks, and webhook receivers. There is no in-process agent service.
* **Background plane** (`apps/workers`) performs ingestion, extraction, evaluation, and patch application. Transcript capture from the managed voice provider enters this plane via webhook-triggered ingestion dispatch.
* Learning remains gated: candidate facts are only activated after evaluation and patch application.
* Each plane uses a functional-core/imperative-shell split: orchestrators manage IO/state transitions, while lifecycle rules and transformation logic remain in focused, testable core modules.

## Alternatives considered

### Alternative 1: Single unified plane for read + write

Pros:

* Simple conceptual model.
* Fewer services to deploy.

Cons:

* Higher blast radius from ingestion/extraction failures.
* Latency regressions from background workload contention.
* Weaker enforcement of read-only invariants for live turns.

### Alternative 2: Eventual write-back directly from realtime plane

Pros:

* Immediate capture of conversational learning opportunities.

Cons:

* High risk of unvetted writes to truth tables.
* Harder to reason about correctness and rollback.

### Alternative 3: Human approval only, no automated eval gate

Pros:

* Maximal manual control over learning.

Cons:

* Operational bottleneck and low throughput.
* Delayed model memory updates.

## Consequences

**Benefits:**

* Tight safety boundary: realtime can retrieve but not mutate truth.
* Better operational isolation and clearer failure domains.
* Cleaner ownership split for low-latency vs batch concerns.

**Costs:**

* More services and pipeline orchestration to maintain.
* Cross-plane status propagation complexity.

## Implementation notes

* Worker orchestration is implemented via Inngest.
* Realtime read-only behavior is guarded by import contract tests in `packages/brain-core`.
* Lifecycle semantics are anchored in patch/fact state models in DB migrations.
* The realtime plane exposes a server-tool endpoint (`POST /v1/tools/retrieve-context`) that the managed voice provider calls during conversations to fetch knowledge-graph evidence. A webhook endpoint and a reconciliation endpoint bridge conversation transcripts from the voice provider into the background ingestion pipeline.
* Prior to 2026-02-09, the realtime plane included a separate in-process agent service using LiveKit Agents SDK. This was removed in favor of delegating voice orchestration to ElevenLabs. See [ADR-0019: Voice runtime provider strategy](/decisions/adr-0019).

## Related ADRs

* [ADR-0005: Temporal fact and patch lifecycle](/decisions/adr-0005)
* [ADR-0006: DB query surface segregation](/decisions/adr-0006)
* [ADR-0010: Ingestion orchestration, idempotency, and recovery](/decisions/adr-0010)
* [ADR-0019: Voice runtime provider strategy](/decisions/adr-0019)
