> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reflections.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Knowledge graph

> How the Reflections knowledge graph stores temporal facts, manages the patch lifecycle, and organizes knowledge through the aspect taxonomy.

The knowledge graph is the long-term memory of a Reflections agent. It stores everything the system learns from user-uploaded sources and conversations — facts, entities, predicates, and document chunks — organized with temporal validity and domain structure.

## What the graph stores

The knowledge graph is built from four primary data types:

| Data type      | What it represents                                                                      | Example                                                         |
| -------------- | --------------------------------------------------------------------------------------- | --------------------------------------------------------------- |
| **Facts**      | Atomic statements about the user's world, with temporal validity and confidence scores. | "User works at Acme Corp" (valid from 2024-01, confidence 0.92) |
| **Entities**   | Named things referenced by facts — people, places, organizations, concepts.             | "Acme Corp" (type: organization)                                |
| **Predicates** | The relationship types that connect entities in facts.                                  | "works\_at", "lives\_in", "enjoys"                              |
| **Chunks**     | Segmented pieces of source documents, embedded as vectors for similarity search.        | A 500-token passage from an uploaded PDF                        |

During a voice conversation, the retrieval pipeline queries these four data types to assemble relevant context for the LLM.

## Temporal fact model

Facts in the knowledge graph are not static rows that get overwritten. They use a temporal model that preserves the full history of what was known and when.

Each fact carries three temporal fields:

* **`valid_from`** — when this fact became true (or was first observed).
* **`valid_to`** — when this fact stopped being true. `null` means it is still current.
* **`supersedes_fact_id`** — a reference to the previous fact this one replaces, preserving lineage.

```sql theme={null}
-- A fact that was later superseded
INSERT INTO facts (subject, predicate, object, valid_from, valid_to, status)
VALUES ('user', 'works_at', 'Acme Corp', '2024-01-15', '2025-06-01', 'active');

-- The superseding fact
INSERT INTO facts (subject, predicate, object, valid_from, supersedes_fact_id, status)
VALUES ('user', 'works_at', 'Globex Inc', '2025-06-01', <previous_fact_id>, 'active');
```

<Info>
  Facts are append-only in spirit. Old facts are never deleted — they are superseded by newer facts
  with updated temporal validity. This preserves a complete audit trail of how knowledge evolved
  over time.
</Info>

## Fact states

Every fact in the system is in one of three states:

| State       | Meaning                                                                   | Visible to retrieval? |
| ----------- | ------------------------------------------------------------------------- | --------------------- |
| `candidate` | Extracted from a source but not yet evaluated. Linked to a `patch_batch`. | No                    |
| `active`    | Passed evaluation and was promoted through patch application.             | Yes                   |
| `rejected`  | Failed evaluation. Permanently excluded from active truth.                | No                    |

Only `active` facts are returned by the retrieval pipeline during voice conversations. This is the core safety property of the knowledge graph.

## Patch lifecycle

The patch lifecycle is the process by which extracted information becomes active knowledge. It is the implementation of the [eval gate invariant](/architecture/invariants).

<Steps>
  <Step title="Extraction">
    When a source is ingested, the background pipeline uses an LLM to extract candidate facts, entities, and predicates from document chunks. Each extracted fact is inserted with `candidate` status and linked to a `patch_batch`.
  </Step>

  <Step title="Evaluation">
    Each candidate fact is evaluated for quality and consistency:

    * **Confidence scoring** — how likely is this fact to be accurate based on the source material?
    * **Contradiction detection** — does this fact conflict with existing active facts?
    * **Quality assessment** — is the extracted statement well-formed and meaningful?
  </Step>

  <Step title="Approval">
    Patch batches that pass evaluation thresholds are approved for application. The batch groups related facts from a single extraction so they can be promoted atomically.
  </Step>

  <Step title="Application">
    Approved facts transition from `candidate` to `active` status. Temporal validity fields are set. If a new fact supersedes an existing one, the old fact's `valid_to` is updated and the new fact's `supersedes_fact_id` links them together.
  </Step>
</Steps>

<Warning>
  The patch lifecycle is the only path from extracted content to active truth. There is no shortcut
  that bypasses evaluation — this is enforced by the [boundary matrix](/architecture/boundaries) and
  tested by invariant regression tests.
</Warning>

## Aspect taxonomy

The aspect taxonomy provides user-facing domain structure over the knowledge graph. Without it, the graph is a flat collection of facts with no navigational hierarchy. With it, users can see how their knowledge distributes across life domains and where depth or gaps exist.

### The 14 aspects

The taxonomy defines 14 fixed aspects across 5 categories:

| Category          | Aspects                                |
| ----------------- | -------------------------------------- |
| **Cultural**      | Film & TV, Music, Books, Creativity    |
| **Identity**      | Career, Education, Identity, Childhood |
| **Relationships** | Family, Relationships                  |
| **Worldview**     | Philosophy, Values, Spirituality       |
| **Lifestyle**     | Health                                 |

The taxonomy is defined in `@reflection/schemas` as `ASPECT_TAXONOMY` — a readonly array of aspect objects with `slug`, `displayName`, `category`, and `icon` fields. Aspect slugs are a Zod enum used as the cross-layer contract.

### Hybrid classification

Facts are classified into aspects through two complementary mechanisms that merge their results:

<AccordionGroup>
  <Accordion title="Deterministic regex classifier">
    The `classifyFactToAspects` function in `@reflection/shared/aspect-classification` matches fact
    predicates, subject types, object types, and memory types against keyword patterns for each of
    the 14 aspects. It is intentionally fuzzy — common terms like "think" match philosophy. This
    classifier is fast, has zero cost, and always available as a baseline.
  </Accordion>

  <Accordion title="LLM-provided aspect tags">
    During extraction, the LLM prompt instructs the model to tag each fact with aspect slugs. These
    tags are validated against the known slug set before merging. LLM tags augment and correct the
    imprecision of regex patterns, providing higher-quality classifications when available.
  </Accordion>
</AccordionGroup>

The two classifiers merge via `mergeAspectSlugs()` — the union of both results, deduplicated. If the LLM does not provide tags (e.g., legacy data or fallback paths), the regex classifier still provides baseline aspect associations.

### Staged tagging

Aspect classifications follow the same temporal lifecycle as facts:

1. During extraction, classifications are computed and stored on the fact row in an `aspect_slugs` staging column.
2. At patch application time, `fact_aspects` join rows are created atomically from the staging column.
3. Aspect associations only become visible when facts transition to `active` state.

This prevents aspects from referencing facts that are later rejected during evaluation.

### Scoring

Each aspect receives a score from 0 to 100 that reflects the depth of knowledge in that domain. The score is a weighted sum of four normalized dimensions:

| Dimension           | Weight | Saturation point | What it measures                           |
| ------------------- | ------ | ---------------- | ------------------------------------------ |
| Fact volume         | 40%    | 25 facts         | How many facts exist in this aspect        |
| Entity breadth      | 25%    | 12 entities      | How many distinct entities are referenced  |
| Predicate diversity | 20%    | 8 predicates     | How varied the relationship types are      |
| Average confidence  | 15%    | 1.0              | How confident the system is in these facts |

Fact volume is weighted highest so that early knowledge enrichment feels rewarding to users — even a few facts in a new domain produce a visible score.

<Note>
  The web dashboard uses pre-computed scores stored on the `reflection_aspects` table for instant
  reads. The mobile API computes scores at query time, scoped to user-selected namespaces. Score
  values may differ between web and mobile when namespace filtering changes the fact population.
</Note>

## Further reading

* [ADR-0005](/decisions/adr-0005) — the decision record for the temporal fact and patch lifecycle.
* [ADR-0025](/decisions/adr-0025) — the decision record for the aspect taxonomy and scoring architecture.
* [Two-plane architecture](/architecture/two-plane-architecture) — how the realtime and background planes interact with the knowledge graph.
* [System invariants](/architecture/invariants) — the eval gate and append-only fact rules.
