Case Study

RabbitHole

An event-driven video-streaming platform on AWS — upload a video, an autoscaling fleet of workers transcodes it into adaptive-bitrate HLS, and you stream it back through a CDN, with live status pushed the whole way.

Built as a portfolio piece to demonstrate cloud architecture — event-driven design, a serverless + container hybrid, autoscaling-to-zero, real-time, and cost-awareness — all defined in Terraform. Live demo: rabbithole.stephsimmons.dev ↗ · Code: GitHub ↗

The problem

A streaming service is a textbook asynchronous workload: uploads are fast, but transcoding is slow and bursty. That mismatch is exactly what event-driven, autoscaling infrastructure exists to solve — so RabbitHole is built to demonstrate that architecture rather than fake it with a CRUD app. The goal: take a raw upload to adaptive playback through a fully decoupled pipeline that costs nothing when no one is using it.

What it does

Direct-to-S3 upload — the browser requests a presigned URL and PUTs the file straight to S3; the API never proxies the bytes.
Async transcode — the S3 upload emits an event that fans out through EventBridge and SQS to a fleet of Fargate workers running ffmpeg, producing multi-rendition HLS.
Adaptive playback — hls.js streams the renditions from CloudFront, switching quality to match the viewer's bandwidth.
Real-time status — a DynamoDB Stream drives a broadcaster Lambda that pushes Transcoding → Ready updates to the UI over a WebSocket, with no polling.
Cost dashboard — each transcode's Fargate cost (vCPU·s + GB·s at current rates) is measured and surfaced in the UI.

Architecture

An event-driven pipeline that decouples the fast path (upload) from the slow path (transcode), with a serverless API and a container worker fleet — the right tool for each job.

Upload → transcode → stream

BrowserReact · hls.js

1 · presigned URL

API Gateway → LambdaFastAPI · Mangum

2 · PUT file

S3 — uploadsraw video

3 · ObjectCreated event

EventBridge → SQS+ DLQ · retries

4 · poll · autoscale 0→N

ECS Fargate workersffmpeg · HLS renditions

HLS renditions

S3 — streamingprivate · OAC

CloudFrontadaptive playback → UI

Real-time path DynamoDB (videos) → Stream → Broadcaster Lambda → API Gateway WebSocket → live status in the UI

Highlights

Autoscale to zero — workers step-scale on SQS queue depth, up from zero on demand and back to zero when the queue drains. No running tasks and no NAT gateway means ~$0 when idle.
Event-driven decoupling — EventBridge → SQS (with a DLQ and retries) fully separates upload from transcode: resilient to worker failure, and fan-out-ready.
Serverless + container hybrid — Lambda runs the lightweight API; Fargate runs the long-running, CPU-heavy ffmpeg. Each layer uses the right compute model.
Real-time without coupling — a DynamoDB Stream → Lambda → WebSocket pushes status to the client; the worker never needs to know about the transport.
Private by default — CloudFront with Origin Access Control keeps the streaming bucket fully private; the browser only ever talks to the CDN.
Cost-aware by design — per-transcode Fargate cost is computed and shown, making the economics of the architecture visible.

Engineering decisions & trade-offs

Lambda API + Fargate workers

Right tool per job: serverless for the lightweight, bursty API; containers for the long-running, CPU-heavy ffmpeg transcode that would never fit Lambda's runtime and size limits.

Direct-to-S3 upload (presigned)

The API issues a presigned URL and the browser uploads straight to S3, so the API never proxies file bytes — cheaper, faster, and far friendlier to a Lambda execution model.

EventBridge → SQS → workers

Decoupling the pipeline through a queue makes it resilient and fan-out-ready: a DLQ and retries absorb worker failures, and the upload path doesn't block on transcode capacity.

Autoscale on queue depth, down to zero

Step scaling on SQS depth can scale workers up from zero, so there's no compute cost when idle — the single biggest lever for a bursty workload's bill.

No NAT gateway (a documented cost trade-off)

Workers run in public subnets with a zero-ingress security group instead of private subnets behind a NAT gateway. That keeps idle cost near zero for a demo; the production trade-off is noted below.

What I'd change at scale

The demo deliberately optimizes for cost and clarity. Honest production trade-offs:

AWS Elemental MediaConvert instead of self-managed ffmpeg — less operational surface, per-job billing.
Private subnets + VPC endpoints for the workers — defense-in-depth over the public-subnet demo.
A GSI on created_at instead of a Scan for the library listing.
CloudFront + auth in front of the API, and signed URLs / cookies on the streaming bucket.
Multi-region streaming origins with latency-based routing.

Stack at a glance

Layer	Tech
Frontend	React + TypeScript (Vite), hls.js → S3 + CloudFront
API	FastAPI on Lambda (Mangum) + API Gateway
Workers	ECS Fargate + ffmpeg, step-autoscaling on SQS depth (min 0)
Real-time	DynamoDB Streams → Lambda → API Gateway WebSocket
Messaging	SQS + DLQ, EventBridge, S3 notifications
Data	S3 (uploads + streaming), DynamoDB
CDN	CloudFront (Origin Access Control)
IaC	Terraform
CI/CD	GitHub Actions
Observability	CloudWatch metrics / alarms, structured logs

My role — Sole architect and engineer: the event-driven AWS architecture, the FastAPI/Lambda API, the Fargate ffmpeg worker, the real-time WebSocket layer, the React + hls.js frontend, and the Terraform that provisions all of it.

View live demo ↗ · View code ↗ · Get in touch ↗