ADR-0025: Aspect taxonomy and scoring architecture

Status: Accepted Date: 2026-02-24 Deciders: Reflections Maintainers

Context

The knowledge graph stores facts, entities, and predicates but lacks user-facing domain structure. Users need to see how their knowledge distributes across life domains and where depth exists vs. gaps. Without aspects, the graph is a flat collection of facts with no navigational hierarchy.

Decision

Adopt a fixed 14-aspect taxonomy across 5 categories (Cultural, Identity, Relationships, Worldview, Lifestyle) with hybrid classification, staged tagging, and dual scoring:

Fixed taxonomy: 14 aspects defined in @reflection/schemas as ASPECT_TAXONOMY — a readonly array of { slug, displayName, category, icon }. Slugs are a Zod enum used as the cross-layer contract. Categories: Cultural (film_tv, music, books, creativity), Identity (career, education, identity, childhood), Relationships (family, relationships), Worldview (philosophy, values, spirituality), Lifestyle (health).
Hybrid classification: Two complementary classifiers merge their results:
1. Deterministic regex patterns match fact predicates, subject types, object types, and memory types against 14 keyword patterns. Intentionally fuzzy — common verbs like “think” match philosophy. Fast, zero-cost, always available.
2. LLM-provided aspect_tags — the extraction prompt instructs the LLM to tag each fact with aspect slugs. Tags are validated against the known slug set before merging. This augments and corrects regex imprecision.
Staged tagging: Facts carry an aspect_slugs staging column through the ingestion pipeline. Classifications are computed during extraction and stored on the fact row. At patch application time, fact_aspects join rows are created atomically from the staging column. This follows the existing temporal fact and patch lifecycle (ADR-0005): aspect associations only become visible when facts transition to active state.
Dual scoring paths:
1. Pre-computed reflection_aspects.score — aggregates metrics (fact count, entity count, predicate diversity, average confidence) and stores a 0-100 score. Used by the web dashboard for instant reads.
2. Runtime computeAspectScore() — the mobile API computes scores at query time, scoping to user-selected namespaces. Score values may differ between web and mobile when namespace filtering changes the fact population.
Scoring formula: weighted sum of four normalized dimensions — fact volume (40%, saturates at 25 facts), entity breadth (25%, saturates at 12 entities), predicate diversity (20%, saturates at 8 predicates), and average confidence (15%). Returns an integer 0-100. Volume is weighted highest because early enrichment should feel rewarding.
Two new tables:
- reflection_aspects — one row per aspect per reflection, seeded on reflection creation. Stores pre-computed metrics and scores. RLS-enforced.
- fact_aspects — many-to-many join between facts and aspects with a confidence column. Created atomically at patch application. Unique on (fact_id, aspect_slug).
API surface: Two endpoints:
- GET /reflections/:id/aspects — all aspects with scores, ordered by score descending.
- GET /reflections/:id/aspects/:slug — detail view with top entities for a specific aspect.

Alternatives considered

Alternative 1: Dynamic/user-defined taxonomy

Pros:

Users could create custom life domains.
More flexible categorization.

Cons:

No shared vocabulary across reflections for cross-user analytics.
Increases UI complexity (taxonomy management).
Harder to tune classification and scoring when categories are unbounded.
Violates 0-user operating principle: add complexity only where there’s measured need.

Alternative 2: Single scoring path (pre-computed only)

Pros:

Simpler — one source of truth for scores.
No runtime computation on read path.

Cons:

Cannot support namespace-filtered views (mobile shows scores scoped to selected namespaces).
Would require pre-computing scores for every namespace combination.

Alternative 3: Upstream classification only (no deterministic fallback)

Pros:

Higher-quality classifications from LLM.
Simpler code — no regex patterns to maintain.

Cons:

Facts ingested without LLM aspect tags (e.g., legacy data, fallback paths) would have no aspect associations.
Regex provides a baseline that catches obvious domain signals even when LLM tags are absent.
Hybrid approach allows graceful degradation.

Alternative 4: Post-hoc tagging (classify after facts are active)

Pros:

Simpler ingestion pipeline — no staging column needed.

Cons:

Breaks the temporal fact lifecycle — aspect associations would appear before fact activation.
Race condition: aspects could reference facts that are later rejected during patch review.
Staging column aligns with existing patch-based promotion pattern.

Consequences

Benefits:

Users can navigate their knowledge graph by life domain, seeing depth and gaps at a glance.
Fixed taxonomy enables consistent cross-reflection analytics and future comparative features.
Hybrid classification provides resilience — deterministic patterns cover the baseline, LLM tags add precision.
Dual scoring supports both pre-computed dashboard reads and runtime namespace-filtered mobile views.
Seeded aspects ensure every reflection has a complete aspect grid from creation, even before any facts exist.

Costs:

14 regex patterns require maintenance as domain definitions evolve.
Scoring weight tuning (saturation thresholds, dimension weights) is a new parameter surface.
Namespace-aware mobile queries add JOIN complexity to the read path.
Two tables add to the schema surface.

Implementation notes

Taxonomy definition lives in @reflection/schemas.
Classification logic is pure functions with no I/O or db/vendor imports.
Scoring formula uses weights: volume 40%, entities 25%, diversity 20%, confidence 15%.
DB operations split into read (aspects, scores, detail) and admin (seeding, insertion, recomputation) following the query segregation pattern.
API routes provide list and detail endpoints with auth + role middleware.
Mobile scoring computes at query time scoped to namespace array, using the same formula as the pre-computed path.

​Context

​Decision

​Alternatives considered

​Alternative 1: Dynamic/user-defined taxonomy

​Alternative 2: Single scoring path (pre-computed only)

​Alternative 3: Upstream classification only (no deterministic fallback)

​Alternative 4: Post-hoc tagging (classify after facts are active)

​Consequences

​Implementation notes

​Related ADRs

Context

Decision

Alternatives considered

Alternative 1: Dynamic/user-defined taxonomy

Alternative 2: Single scoring path (pre-computed only)

Alternative 3: Upstream classification only (no deterministic fallback)

Alternative 4: Post-hoc tagging (classify after facts are active)

Consequences

Implementation notes

Related ADRs