> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reflections.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# ADR-0019: Voice runtime provider strategy

> Replace self-hosted voice agent with a managed conversation provider while preserving knowledge-graph authority and the two-plane read-only invariant.

<Info>**Status:** Accepted **Date:** 2026-02-09 **Deciders:** Reflections Maintainers</Info>

## Context

The platform requires real-time voice conversations grounded in a user's knowledge graph. The previous implementation used a self-hosted LiveKit Agents service that ran in-process Anthropic LLM streaming, turn-state orchestration, and LiveKit protocol adaptation. This worked but carried significant operational cost:

* A dedicated Railway service with its own deployment pipeline, health checks, and scaling.
* A complex turn-state machine and LLM streaming adapter totaling \~1,200 LOC.
* Tight coupling between voice protocol (LiveKit SDK) and reasoning logic (brain-core retrieval).
* Fragile streaming: partial token delivery, backpressure, and silence-detection edge cases.

The knowledge graph, ingestion gate, temporal facts, and RBAC model must remain authoritative regardless of which voice runtime is used.

## Decision

Use ElevenLabs Conversations API as the managed voice runtime provider:

* **Session bootstrap:** `POST /v1/sessions` creates an internal conversation record, builds a capsule prompt from the user's knowledge graph, and returns a signed ElevenLabs conversation URL to the client.
* **Knowledge retrieval:** ElevenLabs calls `POST /v1/tools/retrieve-context` (server-tool callback) during conversations. The API authenticates the request via a shared tool secret (constant-time comparison), runs brain-core retrieval with a 700ms timeout, and returns bounded evidence. An LRU cache (1,000 entries, 90s TTL) absorbs repeated queries.
* **Transcript capture:** ElevenLabs sends a `conversation-ended` webhook. The handler validates the HMAC-SHA256 signature, fetches the full transcript, creates a source record, and dispatches ingestion into the existing Inngest pipeline.
* **Reconciliation:** A backfill endpoint lists recent ElevenLabs conversations and processes unrecorded ones. This is idempotent by provider conversation ID.
* **Capsule caching:** A `reflection_capsules` table caches serialized persona/context payloads with token estimates to bound session-bootstrap latency. Capsules are regenerated when the underlying knowledge graph changes. The capsule includes a persona narrative compiled during regeneration and injected into the ElevenLabs agent template.
* **Persona compilation:** During capsule regeneration, a worker pipeline compiles a structured natural-language persona from knowledge graph data via Anthropic LLM. The persona narrative captures the user's identity anchors, speaking style, and areas to explore. Fail-open: capsule regeneration succeeds even if persona compilation fails.
* **Authenticity gating:** A composite 0-100 score assesses onboarding readiness: coverage (55pts), confidence breakdown (25pts), persona/style profile (10pts), and voice clone state (10pts). The threshold of 85 points gates the transition from onboarding to conversation-ready.
* **No in-process agent service.** The previous LiveKit Agents service was deleted entirely.

## Alternatives considered

### Alternative 1: Keep LiveKit Agents (self-hosted)

Pros:

* Full control over voice pipeline, model selection, and turn logic.
* No external runtime dependency for conversation orchestration.

Cons:

* Significant ops burden: dedicated service, scaling, health monitoring.
* Complex turn-state machine and LLM streaming adapter (\~1,200 LOC).
* Voice protocol coupling makes it hard to swap STT/TTS providers independently.

### Alternative 2: ElevenLabs with ElevenLabs Knowledge Base

Pros:

* Fully managed retrieval and voice -- minimal API surface.

Cons:

* Loses knowledge-graph authority (violates the gated learning invariant).
* Cannot enforce eval gate, temporal fact validity, or RBAC scoping on retrieval.
* Duplicates truth source between ElevenLabs KB and Postgres.

### Alternative 3: Other managed voice providers (Vapi, Retell)

Pros:

* Similar managed benefits to ElevenLabs.
* Potentially different pricing models.

Cons:

* Less mature server-tool and webhook APIs at time of evaluation.
* Weaker documentation for custom tool-calling patterns.
* Would require the same integration work with a less proven contract.

## Consequences

**Benefits:**

* Eliminated the self-hosted agent service entirely (34 files, \~1,200 LOC removed).
* Deployment simplified from 4 Railway services to 3 (`web` + `api` + `workers`).
* Voice orchestration (STT, turn detection, TTS) is fully managed.
* Knowledge graph remains the single source of truth via server-tool callback.
* Reconciliation endpoint provides a safety net for missed webhooks.
* Persona compilation provides natural-language identity grounding for the voice agent, reducing generic responses.

**Costs:**

* ElevenLabs vendor lock-in for voice runtime: conversation lifecycle, transcript format, and webhook contract are provider-specific.
* Tool callback introduces a \~700ms latency budget constraint for retrieval.
* Transcript capture is asynchronous (webhook-driven), not synchronous -- a missed webhook without reconciliation could lose a transcript.
* New authentication surfaces: tool secret, webhook HMAC, internal API secret.
* Session response contract changed (signed URL replaces LiveKit room token) -- requires coordinated client update.
* Persona compilation adds an Anthropic LLM call to the capsule regeneration path (fail-open, non-blocking to capsule success).

## Implementation notes

* ElevenLabs HTTP client handles signed URL creation, conversation listing, transcript fetch, and webhook signature verification.
* Tool endpoint implements LRU cache and degraded fallback if retrieval fails.
* Webhook + reconciliation processing is idempotent by provider conversation ID.
* DB schema includes `conversations.runtime_provider` and `conversations.provider_conversation_id` columns to bind internal conversations to external provider IDs. The `reflection_capsules` table caches bootstrap payloads.
* Web client replaces the LiveKit room lifecycle hook with ElevenLabs SDK session management.
* Environment secrets are validated at startup: API key, agent ID, webhook secret, tool secret, internal API secret, and base URL.
* Persona compiler is split into a pure narrative builder, an LLM-backed compilation phase, and an Inngest phase wrapper.
* Authenticity assessment uses a composite scoring model with a configurable readiness threshold.
* Agent config drift detection runs as an hourly cron comparing live ElevenLabs agent configuration against expected snapshots.

## Related ADRs

* [ADR-0003: Two-plane system architecture](/decisions/adr-0003)
* [ADR-0007: Retrieval pipeline design](/decisions/adr-0007)
* [ADR-0010: Ingestion orchestration, idempotency, and recovery](/decisions/adr-0010)
* [ADR-0016: AI vendor and model strategy](/decisions/adr-0016)
