Skip to main content
Status: Accepted Date: 2026-02-24 Deciders: Reflections Maintainers

Context

The knowledge graph stores facts, entities, and predicates but lacks user-facing domain structure. Users need to see how their knowledge distributes across life domains and where depth exists vs. gaps. Without aspects, the graph is a flat collection of facts with no navigational hierarchy.

Decision

Adopt a fixed 14-aspect taxonomy across 5 categories (Cultural, Identity, Relationships, Worldview, Lifestyle) with hybrid classification, staged tagging, and dual scoring:
  • Fixed taxonomy: 14 aspects defined in @reflection/schemas as ASPECT_TAXONOMY — a readonly array of { slug, displayName, category, icon }. Slugs are a Zod enum used as the cross-layer contract. Categories: Cultural (film_tv, music, books, creativity), Identity (career, education, identity, childhood), Relationships (family, relationships), Worldview (philosophy, values, spirituality), Lifestyle (health).
  • Hybrid classification: Two complementary classifiers merge their results:
    1. Deterministic regex patterns match fact predicates, subject types, object types, and memory types against 14 keyword patterns. Intentionally fuzzy — common verbs like “think” match philosophy. Fast, zero-cost, always available.
    2. LLM-provided aspect_tags — the extraction prompt instructs the LLM to tag each fact with aspect slugs. Tags are validated against the known slug set before merging. This augments and corrects regex imprecision.
  • Staged tagging: Facts carry an aspect_slugs staging column through the ingestion pipeline. Classifications are computed during extraction and stored on the fact row. At patch application time, fact_aspects join rows are created atomically from the staging column. This follows the existing temporal fact and patch lifecycle (ADR-0005): aspect associations only become visible when facts transition to active state.
  • Dual scoring paths:
    1. Pre-computed reflection_aspects.score — aggregates metrics (fact count, entity count, predicate diversity, average confidence) and stores a 0-100 score. Used by the web dashboard for instant reads.
    2. Runtime computeAspectScore() — the mobile API computes scores at query time, scoping to user-selected namespaces. Score values may differ between web and mobile when namespace filtering changes the fact population.
  • Scoring formula: weighted sum of four normalized dimensions — fact volume (40%, saturates at 25 facts), entity breadth (25%, saturates at 12 entities), predicate diversity (20%, saturates at 8 predicates), and average confidence (15%). Returns an integer 0-100. Volume is weighted highest because early enrichment should feel rewarding.
  • Two new tables:
    • reflection_aspects — one row per aspect per reflection, seeded on reflection creation. Stores pre-computed metrics and scores. RLS-enforced.
    • fact_aspects — many-to-many join between facts and aspects with a confidence column. Created atomically at patch application. Unique on (fact_id, aspect_slug).
  • API surface: Two endpoints:
    • GET /reflections/:id/aspects — all aspects with scores, ordered by score descending.
    • GET /reflections/:id/aspects/:slug — detail view with top entities for a specific aspect.

Alternatives considered

Alternative 1: Dynamic/user-defined taxonomy

Pros:
  • Users could create custom life domains.
  • More flexible categorization.
Cons:
  • No shared vocabulary across reflections for cross-user analytics.
  • Increases UI complexity (taxonomy management).
  • Harder to tune classification and scoring when categories are unbounded.
  • Violates 0-user operating principle: add complexity only where there’s measured need.

Alternative 2: Single scoring path (pre-computed only)

Pros:
  • Simpler — one source of truth for scores.
  • No runtime computation on read path.
Cons:
  • Cannot support namespace-filtered views (mobile shows scores scoped to selected namespaces).
  • Would require pre-computing scores for every namespace combination.

Alternative 3: Upstream classification only (no deterministic fallback)

Pros:
  • Higher-quality classifications from LLM.
  • Simpler code — no regex patterns to maintain.
Cons:
  • Facts ingested without LLM aspect tags (e.g., legacy data, fallback paths) would have no aspect associations.
  • Regex provides a baseline that catches obvious domain signals even when LLM tags are absent.
  • Hybrid approach allows graceful degradation.

Alternative 4: Post-hoc tagging (classify after facts are active)

Pros:
  • Simpler ingestion pipeline — no staging column needed.
Cons:
  • Breaks the temporal fact lifecycle — aspect associations would appear before fact activation.
  • Race condition: aspects could reference facts that are later rejected during patch review.
  • Staging column aligns with existing patch-based promotion pattern.

Consequences

Benefits:
  • Users can navigate their knowledge graph by life domain, seeing depth and gaps at a glance.
  • Fixed taxonomy enables consistent cross-reflection analytics and future comparative features.
  • Hybrid classification provides resilience — deterministic patterns cover the baseline, LLM tags add precision.
  • Dual scoring supports both pre-computed dashboard reads and runtime namespace-filtered mobile views.
  • Seeded aspects ensure every reflection has a complete aspect grid from creation, even before any facts exist.
Costs:
  • 14 regex patterns require maintenance as domain definitions evolve.
  • Scoring weight tuning (saturation thresholds, dimension weights) is a new parameter surface.
  • Namespace-aware mobile queries add JOIN complexity to the read path.
  • Two tables add to the schema surface.

Implementation notes

  • Taxonomy definition lives in @reflection/schemas.
  • Classification logic is pure functions with no I/O or db/vendor imports.
  • Scoring formula uses weights: volume 40%, entities 25%, diversity 20%, confidence 15%.
  • DB operations split into read (aspects, scores, detail) and admin (seeding, insertion, recomputation) following the query segregation pattern.
  • API routes provide list and detail endpoints with auth + role middleware.
  • Mobile scoring computes at query time scoped to namespace array, using the same formula as the pre-computed path.