> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reflections.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# ADR-0016: AI vendor and model strategy

> Separate inference, embedding, and voice-runtime vendor responsibilities while keeping model and provider selection configurable through validated environment contracts.

<Info>**Status:** Accepted **Date:** 2026-02-06 **Deciders:** Reflections Maintainers</Info>

## Context

The platform uses multiple AI capabilities: realtime and background LLM generation, plus embedding generation for retrieval. A single-vendor strategy would simplify integrations but increase concentration risk and reduce fit-for-purpose model selection.

## Decision

Use a split vendor/model strategy with three provider roles:

* **Generation:** Anthropic is the LLM provider for worker JSON/text tasks and brain-core reasoning.
* **Embeddings:** OpenAI is the embeddings provider with a fixed v1 embedding dimensionality contract (`1536`) tied to DB schema.
* **Voice runtime:** ElevenLabs Conversations API is the managed voice orchestration provider. It handles STT, turn orchestration, and TTS, calling back to the Reflections API for knowledge retrieval via server-tool endpoints. See [ADR-0019: Voice runtime provider strategy](/decisions/adr-0019) for the full decision record.
* Vendor and model identifiers are configured via validated environment variables in shared runtime config.
* **Vendor DI boundary policy:** direct vendor SDK imports are confined to `packages/vendors/src/*`; the ElevenLabs HTTP client is confined to the API layer (app-level, not a shared package, since only the API consumes it). All other modules depend on wrapper entrypoints for testability and swap control.

## Alternatives considered

### Alternative 1: Single vendor for generation and embeddings

Pros:

* Fewer SDKs and contracts.
* Simpler procurement/key management.

Cons:

* Higher lock-in and outage concentration risk.
* Reduced ability to optimize per capability.

### Alternative 2: Provider-agnostic abstraction layer from day one

Pros:

* Easier swaps in theory.

Cons:

* Adds abstraction complexity before proven need.
* Can hide vendor-specific quality/performance behavior.

### Alternative 3: Self-hosted open model stack

Pros:

* Maximum control over model behavior and cost surface.

Cons:

* Significant infra and MLOps burden.
* Higher reliability and latency risk for current team size.

## Consequences

**Benefits:**

* Better fit-for-purpose selection by capability domain.
* Reduced single-provider risk.
* Explicit environment validation for model-critical settings.

**Costs:**

* More integration surfaces and credential management (three providers).
* Need to monitor three provider reliability/cost profiles.
* Voice runtime vendor lock-in: conversation lifecycle, transcript format, and webhook contract are ElevenLabs-specific.

## Implementation notes

* Embedding dimensionality guardrails are enforced in the shared environment configuration.
* Retrieval and ingestion paths consume provider wrappers through package entrypoints, not direct SDK usage.
* Provider wrappers are isolated in `packages/vendors/src/*`. This boundary is mechanically enforced by ESLint `no-restricted-imports` (IDE + pre-commit) and `simplicity-guard.mjs` (CI). See [ADR-0022: Lint policy and enforcement layer strategy](/decisions/adr-0022).
* ElevenLabs integration is confined to the API layer: HTTP client, server-tool callback, webhook, and reconciliation endpoints. It is not a shared package because only the API service interacts with ElevenLabs directly.

## Related ADRs

* [ADR-0002: Runtime and build standards](/decisions/adr-0002)
* [ADR-0007: Retrieval pipeline design](/decisions/adr-0007)
* [ADR-0010: Ingestion orchestration, idempotency, and recovery](/decisions/adr-0010)
* [ADR-0019: Voice runtime provider strategy](/decisions/adr-0019)
