Status: Superseded by ADR-0029 Date: 2026-03-04 Deciders:
Reflections Maintainers
Successor
See ADR-0029: Voice Clone Attempt Lifecycle Authority and Readiness Gate.Context
The platform uses ElevenLabs as the managed voice runtime provider (ADR-0019). For personalized voice output, the platform needs to clone a user’s voice from audio samples. Voice cloning introduces several concerns not covered by the base voice runtime integration:- Multiple audio sources (post-call conversation audio, manual upload, dedicated sample sessions) require different validation and processing paths.
- Audio quality varies — clipping, noise floors, silence ratios, and duration all affect clone fidelity. Poor clones degrade the experience.
- ElevenLabs clone creation is asynchronous and can fail or require verification. The system needs reliable retry and status tracking.
- User IDs must not leak to external providers in plaintext.
- Session start must be gated on clone readiness to prevent sessions with un-cloned or failed voices, while still allowing onboarding/interviewer sessions that generate the audio.
apps/api/src/lib/voice-clone-state.ts(state machine, status derivation)apps/api/src/lib/voice-clone-audio.ts(audio validation, quality risk classification)apps/api/src/lib/voice-clone-metrics.ts(attempt/success/failure counters in voiceConfig JSONB)apps/api/src/lib/voice-clone-gate.ts(session gate, pure function)packages/db/src/queries/voice-clone-outbox.ts(outbox pattern, lease-based claiming)apps/workers/src/functions/voice-clone-outbox-relay.ts(Inngest cron relay)packages/schemas/src/index.ts(VoiceCloneSourceSchema,VoiceCloneStatusSchema,VoiceConfigSchema)
Decision
State Machine
Voice clone status is derived from thevoiceConfig JSONB blob on each reflection row. A pure function deriveVoiceCloneState(voiceConfig) implements a priority ladder:
- If
voiceIdis a non-empty string: alwaysready(ground truth — overrides all other fields). - If
cloneStatuscannot be parsed andoptedInis false:not_requested. - If
optedInand status isnot_requested: surface default guidance message. - If status is
failedwith an error: surface the error. - Otherwise: return status as-is with optional authenticity metadata.
not_requested | waiting_for_audio | processing | ready | failed | verification_required.
hasVoiceCloneStaleState() detects any non-clean state requiring cleanup before re-clone. VOICE_CLONE_RESET_PATCH nulls all 20 voiceConfig metric fields atomically.
Audio Validation and Quality Risk
All audio validation is pure (no I/O):- Duration windowing via
validateVoiceCloneDurationWindow(): 60s minimum (hard gate), 120s preferred max (soft checkpoint), 180s hard maximum. - Quality risk classification via
inferVoiceCloneQualityRisk(): three tiers (good,review,poor) based on clipping ratio, peak dB, RMS dB, noise floor dB, and silence ratio. Missing metrics default toreview(fail-safe). - Denoise resolution via
resolveVoiceCloneDenoiseEnabled(): tri-stateauto | on | off. Inautomode, source-aware heuristics: conversation audio always denoises; manual uploads denoise only on measured poor signal quality.
Metrics Tracking
Clone lifecycle metrics (attempt count, success count, failure count, re-record count, last duration, last quality risk, last failure code) are stored in thevoiceConfig JSONB blob. Dedicated patch builders (buildVoiceCloneAttemptMetricsPatch, buildVoiceCloneReadyMetricsPatch, buildVoiceCloneFailureMetricsPatch, buildVoiceCloneVerificationMetricsPatch) construct patches atomically. All builders take the current metrics snapshot as input (read-then-patch) to prevent concurrent-write drift.
Session Gate
checkVoiceCloneGate(voiceConfig, agentType) is a pure discriminated-union gate:
agentType === 'interviewer': always pass. Interviewer sessions must never be blocked — they are the mechanism for capturing audio.- Otherwise: pass only if derived status is
ready. The failure response includes the precise current state for client UX.
enforceVoiceCloneGate) reads from DB and throws ApiHttpError(403) with voice_clone_required.
Outbox Pattern for Async Processing
Post-call audio events from ElevenLabs webhooks are enqueued into avoice_clone_outbox table:
- Idempotent upsert by
(provider_conversation_id, event_type). On conflict:failedrows reset topending;done/leasedrows with new audio URL reset topending; all other states are no-ops. Payloads merge via JSONB||. - Skip-locked claiming via CTE: eligible rows are
pending/retrywithnext_attempt_at <= NOW(), plus expired leased rows (stale worker recovery). Each claim assigns a UUID lease token. - Exponential backoff:
base * 2^(attempt-1)(default 30s base). Terminal failure after 6 attempts (configurable). - Lease token verification:
markDoneandreschedulerequire matching lease token — stale workers cannot corrupt state.
relayVoiceCloneOutbox Inngest function runs every 2 minutes, claims a batch, POSTs each to the API’s internal processing endpoint, and partitions results into done/retry. retries: 0 is deliberate — the outbox table is the retry surface, not Inngest.
Security
- User ID hashing: User IDs sent to ElevenLabs are SHA-256 hashed. Plaintext Clerk user IDs never leave the system boundary.
- ElevenLabs response validation: All API responses are Zod-validated before use.
- Webhook authentication: HMAC-SHA256 signature verification on incoming webhooks.
- Internal API authentication:
x-internal-secretheader for outbox relay to API communication.
Alternatives Considered
Alternative 1: Synchronous clone creation only
Pros:- Simpler architecture: no outbox, no cron relay.
- Post-call audio webhook delivery is asynchronous and unreliable. Without an outbox, missed deliveries have no recovery path.
- Clone creation can take seconds; blocking the webhook handler risks timeouts.
Alternative 2: Store audio quality thresholds in configuration
Pros:- Tunable without code changes.
- Thresholds are tightly coupled to audio science and clone provider behavior. Configuration implies they are user-tunable when they are engineering decisions.
- Adds configuration surface without current need (0-user mode).
Alternative 3: Separate voice clone status table instead of JSONB
Pros:- Normalized schema, queryable status history.
- Adds a table and join for every session gate check and status query.
- The voiceConfig JSONB is already the canonical voice configuration surface; co-locating clone state avoids schema sprawl.
- Status history is not currently needed.
Consequences
Benefits:- Reliable async voice cloning with automatic retry and exponential backoff.
- Pure state derivation and quality classification enable thorough unit testing without mocks.
- Session gate prevents degraded voice experiences while preserving onboarding flow.
- Metrics tracking provides visibility into clone pipeline health without a separate analytics system.
- Outbox pattern reuses proven idempotency patterns from the ingestion pipeline (ADR-0010).
- Outbox cron adds a 2-minute processing latency for post-call audio clones.
- 20 voiceConfig JSONB fields require defensive parsing throughout the codebase.
VoiceCloneStatusSchemais defined in bothvoice-clone-state.tsandpackages/schemas— must stay in sync manually.- Quality thresholds are hardcoded and require code changes to tune.
Implementation Notes
- State machine:
apps/api/src/lib/voice-clone-state.ts(pure derivation, reset patch constant). - Audio validation:
apps/api/src/lib/voice-clone-audio.ts(duration windowing, quality risk, denoise resolution). - Metrics:
apps/api/src/lib/voice-clone-metrics.ts(read/patch builders for voiceConfig JSONB counters). - Session gate:
apps/api/src/lib/voice-clone-gate.ts(pure gate) +apps/api/src/routes/sessions/start-session.ts(async enforcement shell). - Outbox table:
packages/db/src/queries/voice-clone-outbox.ts(upsert, claim, mark-done, reschedule). Admin-plane access only (per ADR-0006). - Outbox relay:
apps/workers/src/functions/voice-clone-outbox-relay.ts(Inngest cron, 2-min interval, skip-locked batch processing). - Outbox processing endpoint:
apps/api/src/routes/internal/elevenlabs-webhooks.ts(POST /webhooks/elevenlabs/voice-clone-outbox/process). - Clone source enum:
packages/schemas/src/index.ts(VoiceCloneSourceSchema:onboarding_post_call_audio | dashboard_voice_sample_session | conversation_audio | manual_upload | custom_voice_id). - Voice config schema:
packages/schemas/src/index.ts(VoiceConfigSchema— passthrough Zod object, 20+ fields). - Clone creation routes:
apps/api/src/routes/reflections-voice.ts(manual upload, clone-from-conversation). - Environment config:
VOICE_CLONE_OUTBOX_ENABLED,VOICE_CLONE_OUTBOX_BATCH_LIMIT,VOICE_CLONE_OUTBOX_LEASE_SECONDS,VOICE_CLONE_OUTBOX_MAX_ATTEMPTS,VOICE_CLONE_OUTBOX_BACKOFF_SECONDS,VOICE_CLONE_OUTBOX_REQUEST_TIMEOUT_MS(all inpackages/shared/src/env.ts). - DB migration:
supabase/migrations/20260302100000_voice_clone_outbox.sql.

