Status: Accepted Date: 2026-02-06 Deciders: Reflections Maintainers
Context
Voice turns need grounded responses that combine chunk-level semantic retrieval and structured fact context, with traceability for debugging quality issues.Decision
Use a staged retrieval pipeline:- Embed user utterance (
embedOne) and perform vector search over chunks. - Resolve entity candidates heuristically from user text and prior turn context.
- Fetch active graph facts via neighborhood hops from resolved entities.
- Build a system append with explicit untrusted-evidence warning.
- Persist retrieval traces with fail-soft behavior (warn on trace insert failure, do not fail turn).
Alternatives considered
Alternative 1: Pure vector RAG without entity/fact hops
Pros:- Simpler and fast.
- Weaker structured memory grounding.
- Harder to answer relation-heavy questions consistently.
Alternative 2: Pure graph retrieval without semantic chunk search
Pros:- Strong structured consistency.
- Misses evidence present only in source text chunks.
- Lower recall for long-tail narrative information.
Alternative 3: LLM-only direct answer with no retrieval
Pros:- Lowest engineering complexity.
- Hallucination risk and poor factual grounding.
- No evidence traceability.
Consequences
Benefits:- Balanced recall from both semantic chunks and graph facts.
- Better explainability through trace payload persistence.
- Safer prompt construction via explicit evidence warning banner.
- Multi-step retrieval introduces operational complexity.
- Heuristic entity extraction can miss variants or noisy names.
Implementation notes
- Pipeline orchestration lives in
packages/brain-core/src/retrieve.ts. - Vector query behavior uses HNSW indexing for nearest-neighbor search.
- Active fact lookup and hop expansion are implemented in the DB queries layer.
- Trace persistence fallback behavior is intentionally fail-soft: retrieval continues even if trace writes fail.

