> ## Documentation Index > Fetch the complete documentation index at: https://docs.reflections.ai/llms.txt > Use this file to discover all available pages before exploring further. # ADR-0026: Voice clone pipeline architecture > Enable personalized voice output by cloning a user voice from conversation audio, manual uploads, or dedicated sample sessions, with quality gating, metrics tracking, and reliable async processing via an outbox pattern. **Status:** Superseded by [ADR-0029](/decisions/adr-0029) **Date:** 2026-03-04 **Deciders:** Reflections Maintainers ## Successor See [ADR-0029: Voice Clone Attempt Lifecycle Authority and Readiness Gate](/decisions/adr-0029). ## Context The platform uses ElevenLabs as the managed voice runtime provider (ADR-0019). For personalized voice output, the platform needs to clone a user's voice from audio samples. Voice cloning introduces several concerns not covered by the base voice runtime integration: * Multiple audio sources (post-call conversation audio, manual upload, dedicated sample sessions) require different validation and processing paths. * Audio quality varies — clipping, noise floors, silence ratios, and duration all affect clone fidelity. Poor clones degrade the experience. * ElevenLabs clone creation is asynchronous and can fail or require verification. The system needs reliable retry and status tracking. * User IDs must not leak to external providers in plaintext. * Session start must be gated on clone readiness to prevent sessions with un-cloned or failed voices, while still allowing onboarding/interviewer sessions that generate the audio. Evidence in code/config: * `apps/api/src/lib/voice-clone-state.ts` (state machine, status derivation) * `apps/api/src/lib/voice-clone-audio.ts` (audio validation, quality risk classification) * `apps/api/src/lib/voice-clone-metrics.ts` (attempt/success/failure counters in voiceConfig JSONB) * `apps/api/src/lib/voice-clone-gate.ts` (session gate, pure function) * `packages/db/src/queries/voice-clone-outbox.ts` (outbox pattern, lease-based claiming) * `apps/workers/src/functions/voice-clone-outbox-relay.ts` (Inngest cron relay) * `packages/schemas/src/index.ts` (`VoiceCloneSourceSchema`, `VoiceCloneStatusSchema`, `VoiceConfigSchema`) ## Decision ### State Machine Voice clone status is derived from the `voiceConfig` JSONB blob on each reflection row. A pure function `deriveVoiceCloneState(voiceConfig)` implements a priority ladder: 1. If `voiceId` is a non-empty string: always `ready` (ground truth — overrides all other fields). 2. If `cloneStatus` cannot be parsed and `optedIn` is false: `not_requested`. 3. If `optedIn` and status is `not_requested`: surface default guidance message. 4. If status is `failed` with an error: surface the error. 5. Otherwise: return status as-is with optional authenticity metadata. Status enum: `not_requested | waiting_for_audio | processing | ready | failed | verification_required`. `hasVoiceCloneStaleState()` detects any non-clean state requiring cleanup before re-clone. `VOICE_CLONE_RESET_PATCH` nulls all 20 voiceConfig metric fields atomically. ### Audio Validation and Quality Risk All audio validation is pure (no I/O): * **Duration windowing** via `validateVoiceCloneDurationWindow()`: 60s minimum (hard gate), 120s preferred max (soft checkpoint), 180s hard maximum. * **Quality risk classification** via `inferVoiceCloneQualityRisk()`: three tiers (`good`, `review`, `poor`) based on clipping ratio, peak dB, RMS dB, noise floor dB, and silence ratio. Missing metrics default to `review` (fail-safe). * **Denoise resolution** via `resolveVoiceCloneDenoiseEnabled()`: tri-state `auto | on | off`. In `auto` mode, source-aware heuristics: conversation audio always denoises; manual uploads denoise only on measured poor signal quality. ### Metrics Tracking Clone lifecycle metrics (attempt count, success count, failure count, re-record count, last duration, last quality risk, last failure code) are stored in the `voiceConfig` JSONB blob. Dedicated patch builders (`buildVoiceCloneAttemptMetricsPatch`, `buildVoiceCloneReadyMetricsPatch`, `buildVoiceCloneFailureMetricsPatch`, `buildVoiceCloneVerificationMetricsPatch`) construct patches atomically. All builders take the current metrics snapshot as input (read-then-patch) to prevent concurrent-write drift. ### Session Gate `checkVoiceCloneGate(voiceConfig, agentType)` is a pure discriminated-union gate: * `agentType === 'interviewer'`: always pass. Interviewer sessions must never be blocked — they are the mechanism for capturing audio. * Otherwise: pass only if derived status is `ready`. The failure response includes the precise current state for client UX. The async shell (`enforceVoiceCloneGate`) reads from DB and throws `ApiHttpError(403)` with `voice_clone_required`. ### Outbox Pattern for Async Processing Post-call audio events from ElevenLabs webhooks are enqueued into a `voice_clone_outbox` table: * **Idempotent upsert** by `(provider_conversation_id, event_type)`. On conflict: `failed` rows reset to `pending`; `done`/`leased` rows with new audio URL reset to `pending`; all other states are no-ops. Payloads merge via JSONB `||`. * **Skip-locked claiming** via CTE: eligible rows are `pending`/`retry` with `next_attempt_at <= NOW()`, plus expired leased rows (stale worker recovery). Each claim assigns a UUID lease token. * **Exponential backoff**: `base * 2^(attempt-1)` (default 30s base). Terminal failure after 6 attempts (configurable). * **Lease token verification**: `markDone` and `reschedule` require matching lease token — stale workers cannot corrupt state. The `relayVoiceCloneOutbox` Inngest function runs every 2 minutes, claims a batch, POSTs each to the API's internal processing endpoint, and partitions results into done/retry. `retries: 0` is deliberate — the outbox table is the retry surface, not Inngest. ### Security * **User ID hashing**: User IDs sent to ElevenLabs are SHA-256 hashed. Plaintext Clerk user IDs never leave the system boundary. * **ElevenLabs response validation**: All API responses are Zod-validated before use. * **Webhook authentication**: HMAC-SHA256 signature verification on incoming webhooks. * **Internal API authentication**: `x-internal-secret` header for outbox relay to API communication. ## Alternatives Considered ### Alternative 1: Synchronous clone creation only Pros: * Simpler architecture: no outbox, no cron relay. Cons: * Post-call audio webhook delivery is asynchronous and unreliable. Without an outbox, missed deliveries have no recovery path. * Clone creation can take seconds; blocking the webhook handler risks timeouts. ### Alternative 2: Store audio quality thresholds in configuration Pros: * Tunable without code changes. Cons: * Thresholds are tightly coupled to audio science and clone provider behavior. Configuration implies they are user-tunable when they are engineering decisions. * Adds configuration surface without current need (0-user mode). ### Alternative 3: Separate voice clone status table instead of JSONB Pros: * Normalized schema, queryable status history. Cons: * Adds a table and join for every session gate check and status query. * The voiceConfig JSONB is already the canonical voice configuration surface; co-locating clone state avoids schema sprawl. * Status history is not currently needed. ## Consequences Benefits: * Reliable async voice cloning with automatic retry and exponential backoff. * Pure state derivation and quality classification enable thorough unit testing without mocks. * Session gate prevents degraded voice experiences while preserving onboarding flow. * Metrics tracking provides visibility into clone pipeline health without a separate analytics system. * Outbox pattern reuses proven idempotency patterns from the ingestion pipeline ([ADR-0010](/decisions/adr-0010)). Costs: * Outbox cron adds a 2-minute processing latency for post-call audio clones. * 20 voiceConfig JSONB fields require defensive parsing throughout the codebase. * `VoiceCloneStatusSchema` is defined in both `voice-clone-state.ts` and `packages/schemas` — must stay in sync manually. * Quality thresholds are hardcoded and require code changes to tune. ## Implementation Notes * State machine: `apps/api/src/lib/voice-clone-state.ts` (pure derivation, reset patch constant). * Audio validation: `apps/api/src/lib/voice-clone-audio.ts` (duration windowing, quality risk, denoise resolution). * Metrics: `apps/api/src/lib/voice-clone-metrics.ts` (read/patch builders for voiceConfig JSONB counters). * Session gate: `apps/api/src/lib/voice-clone-gate.ts` (pure gate) + `apps/api/src/routes/sessions/start-session.ts` (async enforcement shell). * Outbox table: `packages/db/src/queries/voice-clone-outbox.ts` (upsert, claim, mark-done, reschedule). Admin-plane access only (per [ADR-0006](/decisions/adr-0006)). * Outbox relay: `apps/workers/src/functions/voice-clone-outbox-relay.ts` (Inngest cron, 2-min interval, skip-locked batch processing). * Outbox processing endpoint: `apps/api/src/routes/internal/elevenlabs-webhooks.ts` (`POST /webhooks/elevenlabs/voice-clone-outbox/process`). * Clone source enum: `packages/schemas/src/index.ts` (`VoiceCloneSourceSchema`: `onboarding_post_call_audio | dashboard_voice_sample_session | conversation_audio | manual_upload | custom_voice_id`). * Voice config schema: `packages/schemas/src/index.ts` (`VoiceConfigSchema` — passthrough Zod object, 20+ fields). * Clone creation routes: `apps/api/src/routes/reflections-voice.ts` (manual upload, clone-from-conversation). * Environment config: `VOICE_CLONE_OUTBOX_ENABLED`, `VOICE_CLONE_OUTBOX_BATCH_LIMIT`, `VOICE_CLONE_OUTBOX_LEASE_SECONDS`, `VOICE_CLONE_OUTBOX_MAX_ATTEMPTS`, `VOICE_CLONE_OUTBOX_BACKOFF_SECONDS`, `VOICE_CLONE_OUTBOX_REQUEST_TIMEOUT_MS` (all in `packages/shared/src/env.ts`). * DB migration: `supabase/migrations/20260302100000_voice_clone_outbox.sql`. ## Related ADRs * [ADR-0003: Two-Plane System Architecture](/decisions/adr-0003) * [ADR-0006: DB Query Surface Segregation](/decisions/adr-0006) * [ADR-0010: Ingestion Orchestration, Idempotency, and Recovery](/decisions/adr-0010) * [ADR-0018: Ingestion Source Security and Content Safety](/decisions/adr-0018) * [ADR-0019: Voice Runtime Provider Strategy](/decisions/adr-0019)