What the graph stores
The knowledge graph is built from four primary data types:| Data type | What it represents | Example |
|---|---|---|
| Facts | Atomic statements about the user’s world, with temporal validity and confidence scores. | ”User works at Acme Corp” (valid from 2024-01, confidence 0.92) |
| Entities | Named things referenced by facts — people, places, organizations, concepts. | ”Acme Corp” (type: organization) |
| Predicates | The relationship types that connect entities in facts. | ”works_at”, “lives_in”, “enjoys” |
| Chunks | Segmented pieces of source documents, embedded as vectors for similarity search. | A 500-token passage from an uploaded PDF |
Temporal fact model
Facts in the knowledge graph are not static rows that get overwritten. They use a temporal model that preserves the full history of what was known and when. Each fact carries three temporal fields:valid_from— when this fact became true (or was first observed).valid_to— when this fact stopped being true.nullmeans it is still current.supersedes_fact_id— a reference to the previous fact this one replaces, preserving lineage.
Facts are append-only in spirit. Old facts are never deleted — they are superseded by newer facts
with updated temporal validity. This preserves a complete audit trail of how knowledge evolved
over time.
Fact states
Every fact in the system is in one of three states:| State | Meaning | Visible to retrieval? |
|---|---|---|
candidate | Extracted from a source but not yet evaluated. Linked to a patch_batch. | No |
active | Passed evaluation and was promoted through patch application. | Yes |
rejected | Failed evaluation. Permanently excluded from active truth. | No |
active facts are returned by the retrieval pipeline during voice conversations. This is the core safety property of the knowledge graph.
Patch lifecycle
The patch lifecycle is the process by which extracted information becomes active knowledge. It is the implementation of the eval gate invariant.Extraction
When a source is ingested, the background pipeline uses an LLM to extract candidate facts, entities, and predicates from document chunks. Each extracted fact is inserted with
candidate status and linked to a patch_batch.Evaluation
Each candidate fact is evaluated for quality and consistency:
- Confidence scoring — how likely is this fact to be accurate based on the source material?
- Contradiction detection — does this fact conflict with existing active facts?
- Quality assessment — is the extracted statement well-formed and meaningful?
Approval
Patch batches that pass evaluation thresholds are approved for application. The batch groups related facts from a single extraction so they can be promoted atomically.
Aspect taxonomy
The aspect taxonomy provides user-facing domain structure over the knowledge graph. Without it, the graph is a flat collection of facts with no navigational hierarchy. With it, users can see how their knowledge distributes across life domains and where depth or gaps exist.The 14 aspects
The taxonomy defines 14 fixed aspects across 5 categories:| Category | Aspects |
|---|---|
| Cultural | Film & TV, Music, Books, Creativity |
| Identity | Career, Education, Identity, Childhood |
| Relationships | Family, Relationships |
| Worldview | Philosophy, Values, Spirituality |
| Lifestyle | Health |
@reflection/schemas as ASPECT_TAXONOMY — a readonly array of aspect objects with slug, displayName, category, and icon fields. Aspect slugs are a Zod enum used as the cross-layer contract.
Hybrid classification
Facts are classified into aspects through two complementary mechanisms that merge their results:Deterministic regex classifier
Deterministic regex classifier
The
classifyFactToAspects function in @reflection/shared/aspect-classification matches fact
predicates, subject types, object types, and memory types against keyword patterns for each of
the 14 aspects. It is intentionally fuzzy — common terms like “think” match philosophy. This
classifier is fast, has zero cost, and always available as a baseline.LLM-provided aspect tags
LLM-provided aspect tags
mergeAspectSlugs() — the union of both results, deduplicated. If the LLM does not provide tags (e.g., legacy data or fallback paths), the regex classifier still provides baseline aspect associations.
Staged tagging
Aspect classifications follow the same temporal lifecycle as facts:- During extraction, classifications are computed and stored on the fact row in an
aspect_slugsstaging column. - At patch application time,
fact_aspectsjoin rows are created atomically from the staging column. - Aspect associations only become visible when facts transition to
activestate.
Scoring
Each aspect receives a score from 0 to 100 that reflects the depth of knowledge in that domain. The score is a weighted sum of four normalized dimensions:| Dimension | Weight | Saturation point | What it measures |
|---|---|---|---|
| Fact volume | 40% | 25 facts | How many facts exist in this aspect |
| Entity breadth | 25% | 12 entities | How many distinct entities are referenced |
| Predicate diversity | 20% | 8 predicates | How varied the relationship types are |
| Average confidence | 15% | 1.0 | How confident the system is in these facts |
The web dashboard uses pre-computed scores stored on the
reflection_aspects table for instant
reads. The mobile API computes scores at query time, scoped to user-selected namespaces. Score
values may differ between web and mobile when namespace filtering changes the fact population.Further reading
- ADR-0005 — the decision record for the temporal fact and patch lifecycle.
- ADR-0025 — the decision record for the aspect taxonomy and scoring architecture.
- Two-plane architecture — how the realtime and background planes interact with the knowledge graph.
- System invariants — the eval gate and append-only fact rules.

