Skip to main content
The knowledge graph is the long-term memory of a Reflections agent. It stores everything the system learns from user-uploaded sources and conversations — facts, entities, predicates, and document chunks — organized with temporal validity and domain structure.

What the graph stores

The knowledge graph is built from four primary data types:
Data typeWhat it representsExample
FactsAtomic statements about the user’s world, with temporal validity and confidence scores.”User works at Acme Corp” (valid from 2024-01, confidence 0.92)
EntitiesNamed things referenced by facts — people, places, organizations, concepts.”Acme Corp” (type: organization)
PredicatesThe relationship types that connect entities in facts.”works_at”, “lives_in”, “enjoys”
ChunksSegmented pieces of source documents, embedded as vectors for similarity search.A 500-token passage from an uploaded PDF
During a voice conversation, the retrieval pipeline queries these four data types to assemble relevant context for the LLM.

Temporal fact model

Facts in the knowledge graph are not static rows that get overwritten. They use a temporal model that preserves the full history of what was known and when. Each fact carries three temporal fields:
  • valid_from — when this fact became true (or was first observed).
  • valid_to — when this fact stopped being true. null means it is still current.
  • supersedes_fact_id — a reference to the previous fact this one replaces, preserving lineage.
-- A fact that was later superseded
INSERT INTO facts (subject, predicate, object, valid_from, valid_to, status)
VALUES ('user', 'works_at', 'Acme Corp', '2024-01-15', '2025-06-01', 'active');

-- The superseding fact
INSERT INTO facts (subject, predicate, object, valid_from, supersedes_fact_id, status)
VALUES ('user', 'works_at', 'Globex Inc', '2025-06-01', <previous_fact_id>, 'active');
Facts are append-only in spirit. Old facts are never deleted — they are superseded by newer facts with updated temporal validity. This preserves a complete audit trail of how knowledge evolved over time.

Fact states

Every fact in the system is in one of three states:
StateMeaningVisible to retrieval?
candidateExtracted from a source but not yet evaluated. Linked to a patch_batch.No
activePassed evaluation and was promoted through patch application.Yes
rejectedFailed evaluation. Permanently excluded from active truth.No
Only active facts are returned by the retrieval pipeline during voice conversations. This is the core safety property of the knowledge graph.

Patch lifecycle

The patch lifecycle is the process by which extracted information becomes active knowledge. It is the implementation of the eval gate invariant.
1

Extraction

When a source is ingested, the background pipeline uses an LLM to extract candidate facts, entities, and predicates from document chunks. Each extracted fact is inserted with candidate status and linked to a patch_batch.
2

Evaluation

Each candidate fact is evaluated for quality and consistency:
  • Confidence scoring — how likely is this fact to be accurate based on the source material?
  • Contradiction detection — does this fact conflict with existing active facts?
  • Quality assessment — is the extracted statement well-formed and meaningful?
3

Approval

Patch batches that pass evaluation thresholds are approved for application. The batch groups related facts from a single extraction so they can be promoted atomically.
4

Application

Approved facts transition from candidate to active status. Temporal validity fields are set. If a new fact supersedes an existing one, the old fact’s valid_to is updated and the new fact’s supersedes_fact_id links them together.
The patch lifecycle is the only path from extracted content to active truth. There is no shortcut that bypasses evaluation — this is enforced by the boundary matrix and tested by invariant regression tests.

Aspect taxonomy

The aspect taxonomy provides user-facing domain structure over the knowledge graph. Without it, the graph is a flat collection of facts with no navigational hierarchy. With it, users can see how their knowledge distributes across life domains and where depth or gaps exist.

The 14 aspects

The taxonomy defines 14 fixed aspects across 5 categories:
CategoryAspects
CulturalFilm & TV, Music, Books, Creativity
IdentityCareer, Education, Identity, Childhood
RelationshipsFamily, Relationships
WorldviewPhilosophy, Values, Spirituality
LifestyleHealth
The taxonomy is defined in @reflection/schemas as ASPECT_TAXONOMY — a readonly array of aspect objects with slug, displayName, category, and icon fields. Aspect slugs are a Zod enum used as the cross-layer contract.

Hybrid classification

Facts are classified into aspects through two complementary mechanisms that merge their results:
The classifyFactToAspects function in @reflection/shared/aspect-classification matches fact predicates, subject types, object types, and memory types against keyword patterns for each of the 14 aspects. It is intentionally fuzzy — common terms like “think” match philosophy. This classifier is fast, has zero cost, and always available as a baseline.
During extraction, the LLM prompt instructs the model to tag each fact with aspect slugs. These tags are validated against the known slug set before merging. LLM tags augment and correct the imprecision of regex patterns, providing higher-quality classifications when available.
The two classifiers merge via mergeAspectSlugs() — the union of both results, deduplicated. If the LLM does not provide tags (e.g., legacy data or fallback paths), the regex classifier still provides baseline aspect associations.

Staged tagging

Aspect classifications follow the same temporal lifecycle as facts:
  1. During extraction, classifications are computed and stored on the fact row in an aspect_slugs staging column.
  2. At patch application time, fact_aspects join rows are created atomically from the staging column.
  3. Aspect associations only become visible when facts transition to active state.
This prevents aspects from referencing facts that are later rejected during evaluation.

Scoring

Each aspect receives a score from 0 to 100 that reflects the depth of knowledge in that domain. The score is a weighted sum of four normalized dimensions:
DimensionWeightSaturation pointWhat it measures
Fact volume40%25 factsHow many facts exist in this aspect
Entity breadth25%12 entitiesHow many distinct entities are referenced
Predicate diversity20%8 predicatesHow varied the relationship types are
Average confidence15%1.0How confident the system is in these facts
Fact volume is weighted highest so that early knowledge enrichment feels rewarding to users — even a few facts in a new domain produce a visible score.
The web dashboard uses pre-computed scores stored on the reflection_aspects table for instant reads. The mobile API computes scores at query time, scoped to user-selected namespaces. Score values may differ between web and mobile when namespace filtering changes the fact population.

Further reading

  • ADR-0005 — the decision record for the temporal fact and patch lifecycle.
  • ADR-0025 — the decision record for the aspect taxonomy and scoring architecture.
  • Two-plane architecture — how the realtime and background planes interact with the knowledge graph.
  • System invariants — the eval gate and append-only fact rules.