Skip to main content
Status: Accepted Date: 2026-02-06 Deciders: Reflections Maintainers

Context

Ingestion accepts external source content via metadata text, Supabase Storage files, and HTTP URLs. This creates attack vectors (SSRF, redirect abuse, oversized payloads, path mismatch) and reliability risk during embedding/extraction stages.

Decision

Enforce source-safety controls directly in the ingest and upload boundaries:
  • URL fetch path is restricted to http/https, checks blocked/private host patterns, and validates DNS resolution against private/internal addresses.
  • Fetching enforces redirect limits, request timeout limits, and max source byte size limits.
  • Storage downloads enforce ownership-constrained path prefixes (reflectionId/sourceId/...) before retrieval.
  • Upload URL creation limits MIME types and uses deterministic storage paths tied to reflection/source IDs.

Alternatives considered

Alternative 1: Trust source URIs and rely on infrastructure network controls only

Pros:
  • Minimal application code.
Cons:
  • Higher SSRF and data-exfiltration risk at app layer.
  • Harder to reason about content safety behavior during audits.

Alternative 2: Proxy all ingestion through a separate sandbox service

Pros:
  • Strong isolation boundary.
Cons:
  • Significant infrastructure complexity and latency overhead.
  • More operational burden than current complexity budget allows.

Alternative 3: Allow only uploaded files, no external URL ingestion

Pros:
  • Smaller attack surface.
Cons:
  • Reduced product flexibility for source onboarding.
  • Worse user DX for web/document ingestion workflows.

Consequences

Benefits:
  • Reduced SSRF and oversized payload risk.
  • More deterministic failure modes with explicit error codes.
  • Better alignment between API upload controls and worker ingestion checks.
Costs:
  • Additional validation logic and maintenance.
  • Potential false positives for unusual but valid network sources.

Implementation notes

  • SSRF/content-size/redirect controls are implemented in the worker ingestion pipeline.
  • Runtime safety limits are configured and validated via the shared environment configuration.
  • API upload path constraints are enforced in the sources route handler.