Knowledge graph - Reflections

The knowledge graph is the long-term memory of a Reflections agent. It stores everything the system learns from user-uploaded sources and conversations — facts, entities, predicates, and document chunks — organized with temporal validity and domain structure.

What the graph stores

The knowledge graph is built from four primary data types:

Data type	What it represents	Example
Facts	Atomic statements about the user’s world, with temporal validity and confidence scores.	”User works at Acme Corp” (valid from 2024-01, confidence 0.92)
Entities	Named things referenced by facts — people, places, organizations, concepts.	”Acme Corp” (type: organization)
Predicates	The relationship types that connect entities in facts.	”works_at”, “lives_in”, “enjoys”
Chunks	Segmented pieces of source documents, embedded as vectors for similarity search.	A 500-token passage from an uploaded PDF

During a voice conversation, the retrieval pipeline queries these four data types to assemble relevant context for the LLM.

Temporal fact model

Facts in the knowledge graph are not static rows that get overwritten. They use a temporal model that preserves the full history of what was known and when. Each fact carries three temporal fields:

valid_from — when this fact became true (or was first observed).
valid_to — when this fact stopped being true. null means it is still current.
supersedes_fact_id — a reference to the previous fact this one replaces, preserving lineage.

-- A fact that was later superseded
INSERT INTO facts (subject, predicate, object, valid_from, valid_to, status)
VALUES ('user', 'works_at', 'Acme Corp', '2024-01-15', '2025-06-01', 'active');

-- The superseding fact
INSERT INTO facts (subject, predicate, object, valid_from, supersedes_fact_id, status)
VALUES ('user', 'works_at', 'Globex Inc', '2025-06-01', <previous_fact_id>, 'active');

Facts are append-only in spirit. Old facts are never deleted — they are superseded by newer facts with updated temporal validity. This preserves a complete audit trail of how knowledge evolved over time.

Fact states

Every fact in the system is in one of three states:

State	Meaning	Visible to retrieval?
`candidate`	Extracted from a source but not yet evaluated. Linked to a `patch_batch`.	No
`active`	Passed evaluation and was promoted through patch application.	Yes
`rejected`	Failed evaluation. Permanently excluded from active truth.	No

Only active facts are returned by the retrieval pipeline during voice conversations. This is the core safety property of the knowledge graph.

Patch lifecycle

The patch lifecycle is the process by which extracted information becomes active knowledge. It is the implementation of the eval gate invariant.

Extraction

When a source is ingested, the background pipeline uses an LLM to extract candidate facts, entities, and predicates from document chunks. Each extracted fact is inserted with candidate status and linked to a patch_batch.

Evaluation

Each candidate fact is evaluated for quality and consistency:

Confidence scoring — how likely is this fact to be accurate based on the source material?
Contradiction detection — does this fact conflict with existing active facts?
Quality assessment — is the extracted statement well-formed and meaningful?

Approval

Patch batches that pass evaluation thresholds are approved for application. The batch groups related facts from a single extraction so they can be promoted atomically.

Application

Approved facts transition from candidate to active status. Temporal validity fields are set. If a new fact supersedes an existing one, the old fact’s valid_to is updated and the new fact’s supersedes_fact_id links them together.

The patch lifecycle is the only path from extracted content to active truth. There is no shortcut that bypasses evaluation — this is enforced by the boundary matrix and tested by invariant regression tests.

Aspect taxonomy

The aspect taxonomy provides user-facing domain structure over the knowledge graph. Without it, the graph is a flat collection of facts with no navigational hierarchy. With it, users can see how their knowledge distributes across life domains and where depth or gaps exist.

The 14 aspects

The taxonomy defines 14 fixed aspects across 5 categories:

Category	Aspects
Cultural	Film & TV, Music, Books, Creativity
Identity	Career, Education, Identity, Childhood
Relationships	Family, Relationships
Worldview	Philosophy, Values, Spirituality
Lifestyle	Health

The taxonomy is defined in @reflection/schemas as ASPECT_TAXONOMY — a readonly array of aspect objects with slug, displayName, category, and icon fields. Aspect slugs are a Zod enum used as the cross-layer contract.

Hybrid classification

Facts are classified into aspects through two complementary mechanisms that merge their results:

Deterministic regex classifier

The classifyFactToAspects function in @reflection/shared/aspect-classification matches fact predicates, subject types, object types, and memory types against keyword patterns for each of the 14 aspects. It is intentionally fuzzy — common terms like “think” match philosophy. This classifier is fast, has zero cost, and always available as a baseline.

LLM-provided aspect tags

During extraction, the LLM prompt instructs the model to tag each fact with aspect slugs. These tags are validated against the known slug set before merging. LLM tags augment and correct the imprecision of regex patterns, providing higher-quality classifications when available.

The two classifiers merge via mergeAspectSlugs() — the union of both results, deduplicated. If the LLM does not provide tags (e.g., legacy data or fallback paths), the regex classifier still provides baseline aspect associations.

Staged tagging

Aspect classifications follow the same temporal lifecycle as facts:

During extraction, classifications are computed and stored on the fact row in an aspect_slugs staging column.
At patch application time, fact_aspects join rows are created atomically from the staging column.
Aspect associations only become visible when facts transition to active state.

This prevents aspects from referencing facts that are later rejected during evaluation.

Scoring

Each aspect receives a score from 0 to 100 that reflects the depth of knowledge in that domain. The score is a weighted sum of four normalized dimensions:

Dimension	Weight	Saturation point	What it measures
Fact volume	40%	25 facts	How many facts exist in this aspect
Entity breadth	25%	12 entities	How many distinct entities are referenced
Predicate diversity	20%	8 predicates	How varied the relationship types are
Average confidence	15%	1.0	How confident the system is in these facts

Fact volume is weighted highest so that early knowledge enrichment feels rewarding to users — even a few facts in a new domain produce a visible score.

The web dashboard uses pre-computed scores stored on the reflection_aspects table for instant reads. The mobile API computes scores at query time, scoped to user-selected namespaces. Score values may differ between web and mobile when namespace filtering changes the fact population.

​What the graph stores

​Temporal fact model

​Fact states

​Patch lifecycle

​Aspect taxonomy

​The 14 aspects

​Hybrid classification

​Staged tagging

​Scoring

​Further reading

What the graph stores

Temporal fact model

Fact states

Patch lifecycle

Aspect taxonomy

The 14 aspects

Hybrid classification

Staged tagging

Scoring

Further reading