Status: Accepted Date: 2026-02-09 Deciders: Reflections Maintainers
Context
The platform requires real-time voice conversations grounded in a user’s knowledge graph. The previous implementation used a self-hosted LiveKit Agents service that ran in-process Anthropic LLM streaming, turn-state orchestration, and LiveKit protocol adaptation. This worked but carried significant operational cost:- A dedicated Railway service with its own deployment pipeline, health checks, and scaling.
- A complex turn-state machine and LLM streaming adapter totaling ~1,200 LOC.
- Tight coupling between voice protocol (LiveKit SDK) and reasoning logic (brain-core retrieval).
- Fragile streaming: partial token delivery, backpressure, and silence-detection edge cases.
Decision
Use ElevenLabs Conversations API as the managed voice runtime provider:- Session bootstrap:
POST /v1/sessionscreates an internal conversation record, builds a capsule prompt from the user’s knowledge graph, and returns a signed ElevenLabs conversation URL to the client. - Knowledge retrieval: ElevenLabs calls
POST /v1/tools/retrieve-context(server-tool callback) during conversations. The API authenticates the request via a shared tool secret (constant-time comparison), runs brain-core retrieval with a 700ms timeout, and returns bounded evidence. An LRU cache (1,000 entries, 90s TTL) absorbs repeated queries. - Transcript capture: ElevenLabs sends a
conversation-endedwebhook. The handler validates the HMAC-SHA256 signature, fetches the full transcript, creates a source record, and dispatches ingestion into the existing Inngest pipeline. - Reconciliation: A backfill endpoint lists recent ElevenLabs conversations and processes unrecorded ones. This is idempotent by provider conversation ID.
- Capsule caching: A
reflection_capsulestable caches serialized persona/context payloads with token estimates to bound session-bootstrap latency. Capsules are regenerated when the underlying knowledge graph changes. The capsule includes a persona narrative compiled during regeneration and injected into the ElevenLabs agent template. - Persona compilation: During capsule regeneration, a worker pipeline compiles a structured natural-language persona from knowledge graph data via Anthropic LLM. The persona narrative captures the user’s identity anchors, speaking style, and areas to explore. Fail-open: capsule regeneration succeeds even if persona compilation fails.
- Authenticity gating: A composite 0-100 score assesses onboarding readiness: coverage (55pts), confidence breakdown (25pts), persona/style profile (10pts), and voice clone state (10pts). The threshold of 85 points gates the transition from onboarding to conversation-ready.
- No in-process agent service. The previous LiveKit Agents service was deleted entirely.
Alternatives considered
Alternative 1: Keep LiveKit Agents (self-hosted)
Pros:- Full control over voice pipeline, model selection, and turn logic.
- No external runtime dependency for conversation orchestration.
- Significant ops burden: dedicated service, scaling, health monitoring.
- Complex turn-state machine and LLM streaming adapter (~1,200 LOC).
- Voice protocol coupling makes it hard to swap STT/TTS providers independently.
Alternative 2: ElevenLabs with ElevenLabs Knowledge Base
Pros:- Fully managed retrieval and voice — minimal API surface.
- Loses knowledge-graph authority (violates the gated learning invariant).
- Cannot enforce eval gate, temporal fact validity, or RBAC scoping on retrieval.
- Duplicates truth source between ElevenLabs KB and Postgres.
Alternative 3: Other managed voice providers (Vapi, Retell)
Pros:- Similar managed benefits to ElevenLabs.
- Potentially different pricing models.
- Less mature server-tool and webhook APIs at time of evaluation.
- Weaker documentation for custom tool-calling patterns.
- Would require the same integration work with a less proven contract.
Consequences
Benefits:- Eliminated the self-hosted agent service entirely (34 files, ~1,200 LOC removed).
- Deployment simplified from 4 Railway services to 3 (
web+api+workers). - Voice orchestration (STT, turn detection, TTS) is fully managed.
- Knowledge graph remains the single source of truth via server-tool callback.
- Reconciliation endpoint provides a safety net for missed webhooks.
- Persona compilation provides natural-language identity grounding for the voice agent, reducing generic responses.
- ElevenLabs vendor lock-in for voice runtime: conversation lifecycle, transcript format, and webhook contract are provider-specific.
- Tool callback introduces a ~700ms latency budget constraint for retrieval.
- Transcript capture is asynchronous (webhook-driven), not synchronous — a missed webhook without reconciliation could lose a transcript.
- New authentication surfaces: tool secret, webhook HMAC, internal API secret.
- Session response contract changed (signed URL replaces LiveKit room token) — requires coordinated client update.
- Persona compilation adds an Anthropic LLM call to the capsule regeneration path (fail-open, non-blocking to capsule success).
Implementation notes
- ElevenLabs HTTP client handles signed URL creation, conversation listing, transcript fetch, and webhook signature verification.
- Tool endpoint implements LRU cache and degraded fallback if retrieval fails.
- Webhook + reconciliation processing is idempotent by provider conversation ID.
- DB schema includes
conversations.runtime_providerandconversations.provider_conversation_idcolumns to bind internal conversations to external provider IDs. Thereflection_capsulestable caches bootstrap payloads. - Web client replaces the LiveKit room lifecycle hook with ElevenLabs SDK session management.
- Environment secrets are validated at startup: API key, agent ID, webhook secret, tool secret, internal API secret, and base URL.
- Persona compiler is split into a pure narrative builder, an LLM-backed compilation phase, and an Inngest phase wrapper.
- Authenticity assessment uses a composite scoring model with a configurable readiness threshold.
- Agent config drift detection runs as an hourly cron comparing live ElevenLabs agent configuration against expected snapshots.

