Skip to main content
Status: Accepted Date: 2026-02-06 Deciders: Reflections Maintainers

Context

Voice turns need grounded responses that combine chunk-level semantic retrieval and structured fact context, with traceability for debugging quality issues.

Decision

Use a staged retrieval pipeline:
  • Embed user utterance (embedOne) and perform vector search over chunks.
  • Resolve entity candidates heuristically from user text and prior turn context.
  • Fetch active graph facts via neighborhood hops from resolved entities.
  • Build a system append with explicit untrusted-evidence warning.
  • Persist retrieval traces with fail-soft behavior (warn on trace insert failure, do not fail turn).

Alternatives considered

Alternative 1: Pure vector RAG without entity/fact hops

Pros:
  • Simpler and fast.
Cons:
  • Weaker structured memory grounding.
  • Harder to answer relation-heavy questions consistently.
Pros:
  • Strong structured consistency.
Cons:
  • Misses evidence present only in source text chunks.
  • Lower recall for long-tail narrative information.

Alternative 3: LLM-only direct answer with no retrieval

Pros:
  • Lowest engineering complexity.
Cons:
  • Hallucination risk and poor factual grounding.
  • No evidence traceability.

Consequences

Benefits:
  • Balanced recall from both semantic chunks and graph facts.
  • Better explainability through trace payload persistence.
  • Safer prompt construction via explicit evidence warning banner.
Costs:
  • Multi-step retrieval introduces operational complexity.
  • Heuristic entity extraction can miss variants or noisy names.

Implementation notes

  • Pipeline orchestration lives in packages/brain-core/src/retrieve.ts.
  • Vector query behavior uses HNSW indexing for nearest-neighbor search.
  • Active fact lookup and hop expansion are implemented in the DB queries layer.
  • Trace persistence fallback behavior is intentionally fail-soft: retrieval continues even if trace writes fail.