> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reflections.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# ADR-0025: Aspect taxonomy and scoring architecture

> Provide user-facing domain structure over the knowledge graph so users can see how their knowledge distributes across life domains and where depth exists vs. gaps.

<Info>**Status:** Accepted **Date:** 2026-02-24 **Deciders:** Reflections Maintainers</Info>

## Context

The knowledge graph stores facts, entities, and predicates but lacks user-facing domain structure. Users need to see how their knowledge distributes across life domains and where depth exists vs. gaps. Without aspects, the graph is a flat collection of facts with no navigational hierarchy.

## Decision

Adopt a fixed 14-aspect taxonomy across 5 categories (Cultural, Identity, Relationships, Worldview, Lifestyle) with hybrid classification, staged tagging, and dual scoring:

* **Fixed taxonomy:** 14 aspects defined in `@reflection/schemas` as `ASPECT_TAXONOMY` -- a readonly array of `{ slug, displayName, category, icon }`. Slugs are a Zod enum used as the cross-layer contract. Categories: Cultural (film\_tv, music, books, creativity), Identity (career, education, identity, childhood), Relationships (family, relationships), Worldview (philosophy, values, spirituality), Lifestyle (health).

* **Hybrid classification:** Two complementary classifiers merge their results:
  1. **Deterministic regex patterns** match fact predicates, subject types, object types, and memory types against 14 keyword patterns. Intentionally fuzzy -- common verbs like "think" match philosophy. Fast, zero-cost, always available.
  2. **LLM-provided `aspect_tags`** -- the extraction prompt instructs the LLM to tag each fact with aspect slugs. Tags are validated against the known slug set before merging. This augments and corrects regex imprecision.

* **Staged tagging:** Facts carry an `aspect_slugs` staging column through the ingestion pipeline. Classifications are computed during extraction and stored on the fact row. At patch application time, `fact_aspects` join rows are created atomically from the staging column. This follows the existing temporal fact and patch lifecycle ([ADR-0005](/decisions/adr-0005)): aspect associations only become visible when facts transition to `active` state.

* **Dual scoring paths:**
  1. **Pre-computed `reflection_aspects.score`** -- aggregates metrics (fact count, entity count, predicate diversity, average confidence) and stores a 0-100 score. Used by the web dashboard for instant reads.
  2. **Runtime `computeAspectScore()`** -- the mobile API computes scores at query time, scoping to user-selected namespaces. Score values may differ between web and mobile when namespace filtering changes the fact population.

* **Scoring formula:** weighted sum of four normalized dimensions -- fact volume (40%, saturates at 25 facts), entity breadth (25%, saturates at 12 entities), predicate diversity (20%, saturates at 8 predicates), and average confidence (15%). Returns an integer 0-100. Volume is weighted highest because early enrichment should feel rewarding.

* **Two new tables:**
  * `reflection_aspects` -- one row per aspect per reflection, seeded on reflection creation. Stores pre-computed metrics and scores. RLS-enforced.
  * `fact_aspects` -- many-to-many join between facts and aspects with a confidence column. Created atomically at patch application. Unique on `(fact_id, aspect_slug)`.

* **API surface:** Two endpoints:
  * `GET /reflections/:id/aspects` -- all aspects with scores, ordered by score descending.
  * `GET /reflections/:id/aspects/:slug` -- detail view with top entities for a specific aspect.

## Alternatives considered

### Alternative 1: Dynamic/user-defined taxonomy

Pros:

* Users could create custom life domains.
* More flexible categorization.

Cons:

* No shared vocabulary across reflections for cross-user analytics.
* Increases UI complexity (taxonomy management).
* Harder to tune classification and scoring when categories are unbounded.
* Violates 0-user operating principle: add complexity only where there's measured need.

### Alternative 2: Single scoring path (pre-computed only)

Pros:

* Simpler -- one source of truth for scores.
* No runtime computation on read path.

Cons:

* Cannot support namespace-filtered views (mobile shows scores scoped to selected namespaces).
* Would require pre-computing scores for every namespace combination.

### Alternative 3: Upstream classification only (no deterministic fallback)

Pros:

* Higher-quality classifications from LLM.
* Simpler code -- no regex patterns to maintain.

Cons:

* Facts ingested without LLM aspect tags (e.g., legacy data, fallback paths) would have no aspect associations.
* Regex provides a baseline that catches obvious domain signals even when LLM tags are absent.
* Hybrid approach allows graceful degradation.

### Alternative 4: Post-hoc tagging (classify after facts are active)

Pros:

* Simpler ingestion pipeline -- no staging column needed.

Cons:

* Breaks the temporal fact lifecycle -- aspect associations would appear before fact activation.
* Race condition: aspects could reference facts that are later rejected during patch review.
* Staging column aligns with existing patch-based promotion pattern.

## Consequences

**Benefits:**

* Users can navigate their knowledge graph by life domain, seeing depth and gaps at a glance.
* Fixed taxonomy enables consistent cross-reflection analytics and future comparative features.
* Hybrid classification provides resilience -- deterministic patterns cover the baseline, LLM tags add precision.
* Dual scoring supports both pre-computed dashboard reads and runtime namespace-filtered mobile views.
* Seeded aspects ensure every reflection has a complete aspect grid from creation, even before any facts exist.

**Costs:**

* 14 regex patterns require maintenance as domain definitions evolve.
* Scoring weight tuning (saturation thresholds, dimension weights) is a new parameter surface.
* Namespace-aware mobile queries add JOIN complexity to the read path.
* Two tables add to the schema surface.

## Implementation notes

* Taxonomy definition lives in `@reflection/schemas`.
* Classification logic is pure functions with no I/O or db/vendor imports.
* Scoring formula uses weights: volume 40%, entities 25%, diversity 20%, confidence 15%.
* DB operations split into read (aspects, scores, detail) and admin (seeding, insertion, recomputation) following the query segregation pattern.
* API routes provide list and detail endpoints with auth + role middleware.
* Mobile scoring computes at query time scoped to namespace array, using the same formula as the pre-computed path.

## Related ADRs

* [ADR-0005: Temporal fact and patch lifecycle](/decisions/adr-0005)
* [ADR-0006: DB query surface segregation](/decisions/adr-0006)
* [ADR-0021: Mobile API contracts via shared Zod schemas](/decisions/adr-0021)
