Status: Accepted Date: 2026-02-24 Deciders: Reflections Maintainers
Context
The knowledge graph stores facts, entities, and predicates but lacks user-facing domain structure. Users need to see how their knowledge distributes across life domains and where depth exists vs. gaps. Without aspects, the graph is a flat collection of facts with no navigational hierarchy.Decision
Adopt a fixed 14-aspect taxonomy across 5 categories (Cultural, Identity, Relationships, Worldview, Lifestyle) with hybrid classification, staged tagging, and dual scoring:-
Fixed taxonomy: 14 aspects defined in
@reflection/schemasasASPECT_TAXONOMY— a readonly array of{ slug, displayName, category, icon }. Slugs are a Zod enum used as the cross-layer contract. Categories: Cultural (film_tv, music, books, creativity), Identity (career, education, identity, childhood), Relationships (family, relationships), Worldview (philosophy, values, spirituality), Lifestyle (health). -
Hybrid classification: Two complementary classifiers merge their results:
- Deterministic regex patterns match fact predicates, subject types, object types, and memory types against 14 keyword patterns. Intentionally fuzzy — common verbs like “think” match philosophy. Fast, zero-cost, always available.
- LLM-provided
aspect_tags— the extraction prompt instructs the LLM to tag each fact with aspect slugs. Tags are validated against the known slug set before merging. This augments and corrects regex imprecision.
-
Staged tagging: Facts carry an
aspect_slugsstaging column through the ingestion pipeline. Classifications are computed during extraction and stored on the fact row. At patch application time,fact_aspectsjoin rows are created atomically from the staging column. This follows the existing temporal fact and patch lifecycle (ADR-0005): aspect associations only become visible when facts transition toactivestate. -
Dual scoring paths:
- Pre-computed
reflection_aspects.score— aggregates metrics (fact count, entity count, predicate diversity, average confidence) and stores a 0-100 score. Used by the web dashboard for instant reads. - Runtime
computeAspectScore()— the mobile API computes scores at query time, scoping to user-selected namespaces. Score values may differ between web and mobile when namespace filtering changes the fact population.
- Pre-computed
- Scoring formula: weighted sum of four normalized dimensions — fact volume (40%, saturates at 25 facts), entity breadth (25%, saturates at 12 entities), predicate diversity (20%, saturates at 8 predicates), and average confidence (15%). Returns an integer 0-100. Volume is weighted highest because early enrichment should feel rewarding.
-
Two new tables:
reflection_aspects— one row per aspect per reflection, seeded on reflection creation. Stores pre-computed metrics and scores. RLS-enforced.fact_aspects— many-to-many join between facts and aspects with a confidence column. Created atomically at patch application. Unique on(fact_id, aspect_slug).
-
API surface: Two endpoints:
GET /reflections/:id/aspects— all aspects with scores, ordered by score descending.GET /reflections/:id/aspects/:slug— detail view with top entities for a specific aspect.
Alternatives considered
Alternative 1: Dynamic/user-defined taxonomy
Pros:- Users could create custom life domains.
- More flexible categorization.
- No shared vocabulary across reflections for cross-user analytics.
- Increases UI complexity (taxonomy management).
- Harder to tune classification and scoring when categories are unbounded.
- Violates 0-user operating principle: add complexity only where there’s measured need.
Alternative 2: Single scoring path (pre-computed only)
Pros:- Simpler — one source of truth for scores.
- No runtime computation on read path.
- Cannot support namespace-filtered views (mobile shows scores scoped to selected namespaces).
- Would require pre-computing scores for every namespace combination.
Alternative 3: Upstream classification only (no deterministic fallback)
Pros:- Higher-quality classifications from LLM.
- Simpler code — no regex patterns to maintain.
- Facts ingested without LLM aspect tags (e.g., legacy data, fallback paths) would have no aspect associations.
- Regex provides a baseline that catches obvious domain signals even when LLM tags are absent.
- Hybrid approach allows graceful degradation.
Alternative 4: Post-hoc tagging (classify after facts are active)
Pros:- Simpler ingestion pipeline — no staging column needed.
- Breaks the temporal fact lifecycle — aspect associations would appear before fact activation.
- Race condition: aspects could reference facts that are later rejected during patch review.
- Staging column aligns with existing patch-based promotion pattern.
Consequences
Benefits:- Users can navigate their knowledge graph by life domain, seeing depth and gaps at a glance.
- Fixed taxonomy enables consistent cross-reflection analytics and future comparative features.
- Hybrid classification provides resilience — deterministic patterns cover the baseline, LLM tags add precision.
- Dual scoring supports both pre-computed dashboard reads and runtime namespace-filtered mobile views.
- Seeded aspects ensure every reflection has a complete aspect grid from creation, even before any facts exist.
- 14 regex patterns require maintenance as domain definitions evolve.
- Scoring weight tuning (saturation thresholds, dimension weights) is a new parameter surface.
- Namespace-aware mobile queries add JOIN complexity to the read path.
- Two tables add to the schema surface.
Implementation notes
- Taxonomy definition lives in
@reflection/schemas. - Classification logic is pure functions with no I/O or db/vendor imports.
- Scoring formula uses weights: volume 40%, entities 25%, diversity 20%, confidence 15%.
- DB operations split into read (aspects, scores, detail) and admin (seeding, insertion, recomputation) following the query segregation pattern.
- API routes provide list and detail endpoints with auth + role middleware.
- Mobile scoring computes at query time scoped to namespace array, using the same formula as the pre-computed path.

