Two-plane architecture

The Reflections platform splits execution into two planes — realtime and background — because voice response latency and knowledge ingestion throughput have fundamentally different reliability and safety requirements.

Why two planes

Combining read and write paths in a single execution flow creates two problems:

Latency contamination — background workload contention (LLM extraction, embedding generation, database writes) slows down user-facing voice responses.
Safety risk — if the realtime path can write to truth tables, unvetted facts could enter the knowledge graph without evaluation.

Separating the planes gives each side clear failure boundaries: a crash in the ingestion pipeline does not affect a live conversation, and the realtime plane cannot accidentally mutate active truth.

Realtime plane

Components: apps/api + packages/brain-core The realtime plane handles everything that happens during a live voice conversation:

Session bootstrap — the API creates a signed session URL for the managed voice provider (ElevenLabs Conversations API).
Server-tool callbacks — during a conversation, the voice provider calls back to the API to retrieve knowledge-graph evidence (POST /v1/tools/retrieve-context).
Webhook receivers — conversation lifecycle events (e.g., conversation ended) are received and dispatched to the background plane.

The realtime plane is read-only against truth tables (facts, entities, chunks, sources). It may read these tables for retrieval but must never write to them directly. This invariant is enforced by ESLint import rules, architecture guard scripts, and file-scan tests.

The realtime plane can write to operational tables — conversations, messages, and retrieval traces — but these are session data, not truth.

There is no in-process agent service. Voice orchestration is fully delegated to ElevenLabs. The API serves as a thin coordination layer between the voice provider and the knowledge graph.

Background plane

Components: apps/workers (Inngest pipeline) The background plane handles everything that happens after a source is uploaded or a conversation ends:

Ingestion — new sources (documents, transcripts) are chunked and stored.
Extraction — an LLM extracts candidate facts, entities, and predicates from chunks.
Evaluation — candidate facts are scored and evaluated against existing knowledge.
Patch application — approved facts transition from candidate to active state, becoming visible to the realtime retrieval path.

Worker orchestration runs on Inngest, which provides durable step functions with built-in retry and idempotency.

The learning gate

The learning gate is the core safety mechanism connecting the two planes. It ensures that no extracted information becomes active truth without passing through evaluation.

Candidate creation

The extraction step inserts new facts with candidate status, linked to a patch_batch.

Evaluation

Each candidate fact is evaluated — scored for confidence, checked against existing facts for contradictions, and assessed for quality.

Approval

Patch batches that pass evaluation are approved for application.

Application

Approved facts transition to active status with temporal validity fields (valid_from, valid_to). They become visible to the realtime retrieval path.

Facts that fail evaluation are marked as rejected and never enter active truth.

How the planes communicate

The planes communicate through the database and event dispatch — never through direct function calls or shared in-process state.

Direction	Mechanism	Example
Realtime to background	Event dispatch (Inngest)	Conversation-ended webhook triggers transcript ingestion
Background to realtime	Database state	Newly active facts appear in retrieval queries
Background internal	Inngest step functions	Extraction step feeds evaluation step

Operational isolation

Each plane has independent:

Failure domains — a worker crash does not affect live conversations.
Scaling characteristics — the API scales for concurrent sessions, workers scale for ingestion throughput.
Query access — the realtime plane uses @reflection/db/queries/read, workers use @reflection/db/queries/admin.

​Why two planes

​Realtime plane

​Background plane

​The learning gate

​How the planes communicate

​Operational isolation

​Further reading

Why two planes

Realtime plane

Background plane

The learning gate

How the planes communicate

Operational isolation

Further reading