Conveyor — proposed ingestion & storage architecture

A target design that resolves the bulk-ingest / read-fan-out tension surfaced reviewing #117 (sharding) & #120 (async submit).
stage → accept → ingest → process → serve async-by-default for large jobs conditional sharding polyglot storage
The problem in one line. One job can carry up to ~1M tasks. Writing them synchronously blows the 60 s ALB wall and DynamoDB's ~1000 WCU/s per-partition cap. Sharding fixes writes but taxes reads; doing it on every job taxes the 99% that are small. The fix is to stop writing on the request path, shard only when it pays, and put each kind of state on the engine built for it.

1Target architecture

flowchart TB
  C(["Client / SDK"])
  C -->|"submit (small) · 201"| API
  C -->|"presign + PUT large list"| S3S[("S3 — staged task lists")]
  API["API · Fargate · stateless
(decides sync vs async)"] API -->|"large/open: 202 + ingest msg"| CQ[["SQS — control queue"]] API -->|"small closed: sync write"| DDB API -->|"leases · counters · hot-read cache"| RDS[("Redis / MemoryDB
coordination")] CQ --> IW["Ingest worker
(stream · write · enqueue · latch)"] IW -->|"read chunks"| S3S IW -->|"write tasks
(shard only if large)"| DDB[("DynamoDB
jobs · runs · tasks")] IW -->|"enqueue refs"| RQ[["SQS — per-run queues"]] CQ -. "permanent fail" .-> DLQ[["DLQ + reconcile sweeper"]] DLQ -.-> DDB RQ --> TW["Task workers · Fargate"] TW -->|"scrape"| GW(["gateway"]) TW -->|"result bodies"| S3R[("S3 — results")] TW -->|"stats INCR"| RDS TW -->|"terminal status"| DDB TW -->|"on run complete"| NOT["Step Functions / webhooks"] EXP["Results export (offline)"] -->|"full-set reads → zip"| S3R

Each arrow is a job for the engine best at it: S3 absorbs the giant body, SQS is the durable buffer, Redis owns hot coordination, DynamoDB stays the point-read/TTL system of record, the gateway/result flow is unchanged. The request thread returns in milliseconds.

2Submit decides sync vs async

flowchart TD
  S["POST /v1/jobs"] --> Q{"size & type?"}
  Q -->|"small closed
(tasks < threshold)"| SY["sync: write + enqueue
→ 201, single-partition"] Q -->|"large closed / open / async"| AS["stage list → 202
+ ingest_job message"] SY --> RS["tasks queryable now
cheap single-Query reads"] AS --> RA["tasks appear after ingest
sharded writes · poll job status"]

Today both ceilings are hit because submit is always synchronous and every job is stamped shard_count=16. Here, sharding and async are switched on only when the job is actually large or open — so the common small job keeps the old, fast, single-partition path (the regression we found in #117).

3Ingestion pipeline — durable, resumable

flowchart LR
  A["ingest_job (SQS)"] --> B["stream staged
source, chunk N"] B --> C["write tasks
deterministic ids"] C --> D["enqueue refs"] D --> E{"more
chunks?"} E -->|yes| B E -->|no| F["SetRunTotal
(absolute, idempotent)"] F --> G["latch last_batch"] G --> H["run completes via
normal lifecycle"]

At-least-once safe (this is #120's core, already tested): deterministic task ids (sha256(run_id:index)) make re-writes overwrite not duplicate; absolute SetRunTotal (not ADD) can't double-count and is set before the latch so completion never fires early; re-enqueued refs are deduped by the task worker's status guard. A crash at any step resumes on redelivery and converges; permanent failures land in a DLQ with the run left recoverable via /rerun or DELETE.

4State on the right engine (polyglot)

StateEngineWhy
Jobs · runs · idempotency · file-inputsDynamoDBlow-volume point reads, native TTL, serverless, optimistic concurrency
Tasks + events (the bulk collection)DynamoDB, conditionally sharded
→ Aurora Postgres if bulk ever dominates
shard writes only when large; small jobs single-partition. PG erases both write-cap & read-fan-out if it becomes the defining workload
Concurrency semaphore (leases) · rate limits · counters · hot-read cacheRedis / MemoryDBatomic INCR/SET NX PX, native expiry — coordination is what it's best at
Large task lists (ingest source)S3 (presigned)keeps 1M URLs off the request body; streamed by the ingest worker
Result bodiesS3unchanged; presigned URLs, 24 h TTL

5Read paths — bounded by design

6What changes vs today

ConcernTodayProposed
Large submitsync, hits 60 s wall202 async, wall gone
ShardingN=16 on every jobonly large / open jobs
Small-job reads16-way fan-outsingle Query
1M-URL bodyinline / cappedS3-staged stream
Full-set readshot scan, OOM riskoffline export
Semaphore / countersDDB hot partitionRedis
Ingest failuren/a (sync 503)DLQ + reconcile sweeper

7Evolution path — ship in slices

Now (in-stack, low risk). Land #117 with conditional sharding (small closed jobs stay single-partition) and a lower default N (4–8). Add the offline-export route for full-set reads. No new infra.
Beta (the 1M path). Land #120's ingest_job op + the S3-streamed carrier (extend file_input) + DLQ & reconcile sweeper. Optionally move the semaphore/counters to Redis. This is where the 60 s wall actually disappears for 1M.
If/when bulk dominates. Move only the task + event store to Aurora Postgres (polyglot) — erases write-cap, read-fan-out, and the GSI dilemma in one move, while DDB keeps the metadata layer. A full DB swap is reserved for if Dynamo fights you in more than this one place.

8Why not the other options

ConsideredVerdict
GSI to fix read fan-outre-creates the write hot-partition on the index — moves the bottleneck
DDB on-demand / adaptive capacitytrap — per-key cap is physical; one job = one key
Redis as task storenot durable enough for paid 14-day results; RAM-priced retention
Wholesale DB swap to Postgresbest engine for the bulk problem, but rewrites storage + worker + loses serverless/TTL — too big for one spike
Fat/chunked task rowsbreaks per-task status/results/pagination model
Raise ALB timeoutholds HTTP open for minutes, still WCU-bound — anti-pattern
Client chunking (AddTasks)free baseline, works today — but pushes batching + partial-failure to the client