← Back to portfolio
Case Study
RabbitHole
An event-driven video-streaming platform on AWS — upload a video, an autoscaling fleet of workers transcodes it into adaptive-bitrate HLS, and you stream it back through a CDN, with live status pushed the whole way.
Built as a portfolio piece to demonstrate
cloud architecture — event-driven design,
a serverless + container hybrid, autoscaling-to-zero, real-time, and cost-awareness — all defined in
Terraform. Live demo:
rabbithole.stephsimmons.dev ↗ ·
Code:
GitHub ↗
The problem
A streaming service is a textbook asynchronous workload: uploads are fast, but transcoding is slow and bursty. That mismatch is exactly what event-driven, autoscaling infrastructure exists to solve — so RabbitHole is built to demonstrate that architecture rather than fake it with a CRUD app. The goal: take a raw upload to adaptive playback through a fully decoupled pipeline that costs nothing when no one is using it.
What it does
- Direct-to-S3 upload — the browser requests a presigned URL and PUTs the file straight to S3; the API never proxies the bytes.
- Async transcode — the S3 upload emits an event that fans out through EventBridge and SQS to a fleet of Fargate workers running ffmpeg, producing multi-rendition HLS.
- Adaptive playback — hls.js streams the renditions from CloudFront, switching quality to match the viewer's bandwidth.
- Real-time status — a DynamoDB Stream drives a broadcaster Lambda that pushes Transcoding → Ready updates to the UI over a WebSocket, with no polling.
- Cost dashboard — each transcode's Fargate cost (vCPU·s + GB·s at current rates) is measured and surfaced in the UI.
Architecture
An event-driven pipeline that decouples the fast path (upload) from the slow path (transcode), with a serverless API and a container worker fleet — the right tool for each job.
Upload → transcode → stream
1 · presigned URL
API Gateway → LambdaFastAPI · Mangum
2 · PUT file
3 · ObjectCreated event
EventBridge → SQS+ DLQ · retries
4 · poll · autoscale 0→N
ECS Fargate workersffmpeg · HLS renditions
HLS renditions
S3 — streamingprivate · OAC
CloudFrontadaptive playback → UI
Real-time path DynamoDB (videos) → Stream → Broadcaster Lambda → API Gateway WebSocket → live status in the UI
Highlights
- Autoscale to zero — workers step-scale on SQS queue depth, up from zero on demand and back to zero when the queue drains. No running tasks and no NAT gateway means ~$0 when idle.
- Event-driven decoupling — EventBridge → SQS (with a DLQ and retries) fully separates upload from transcode: resilient to worker failure, and fan-out-ready.
- Serverless + container hybrid — Lambda runs the lightweight API; Fargate runs the long-running, CPU-heavy ffmpeg. Each layer uses the right compute model.
- Real-time without coupling — a DynamoDB Stream → Lambda → WebSocket pushes status to the client; the worker never needs to know about the transport.
- Private by default — CloudFront with Origin Access Control keeps the streaming bucket fully private; the browser only ever talks to the CDN.
- Cost-aware by design — per-transcode Fargate cost is computed and shown, making the economics of the architecture visible.
Engineering decisions & trade-offs
Lambda API + Fargate workers
Right tool per job: serverless for the lightweight, bursty API; containers for the long-running, CPU-heavy ffmpeg transcode that would never fit Lambda's runtime and size limits.
Direct-to-S3 upload (presigned)
The API issues a presigned URL and the browser uploads straight to S3, so the API never proxies file bytes — cheaper, faster, and far friendlier to a Lambda execution model.
EventBridge → SQS → workers
Decoupling the pipeline through a queue makes it resilient and fan-out-ready: a DLQ and retries absorb worker failures, and the upload path doesn't block on transcode capacity.
Autoscale on queue depth, down to zero
Step scaling on SQS depth can scale workers up from zero, so there's no compute cost when idle — the single biggest lever for a bursty workload's bill.
No NAT gateway (a documented cost trade-off)
Workers run in public subnets with a zero-ingress security group instead of private subnets behind a NAT gateway. That keeps idle cost near zero for a demo; the production trade-off is noted below.
What I'd change at scale
The demo deliberately optimizes for cost and clarity. Honest production trade-offs:
- AWS Elemental MediaConvert instead of self-managed ffmpeg — less operational surface, per-job billing.
- Private subnets + VPC endpoints for the workers — defense-in-depth over the public-subnet demo.
- A GSI on created_at instead of a Scan for the library listing.
- CloudFront + auth in front of the API, and signed URLs / cookies on the streaming bucket.
- Multi-region streaming origins with latency-based routing.
Stack at a glance
| Layer | Tech |
| Frontend | React + TypeScript (Vite), hls.js → S3 + CloudFront |
| API | FastAPI on Lambda (Mangum) + API Gateway |
| Workers | ECS Fargate + ffmpeg, step-autoscaling on SQS depth (min 0) |
| Real-time | DynamoDB Streams → Lambda → API Gateway WebSocket |
| Messaging | SQS + DLQ, EventBridge, S3 notifications |
| Data | S3 (uploads + streaming), DynamoDB |
| CDN | CloudFront (Origin Access Control) |
| IaC | Terraform |
| CI/CD | GitHub Actions |
| Observability | CloudWatch metrics / alarms, structured logs |