Case Study

PureZen

A full-stack spa booking platform on AWS — a natural-language concierge that books appointments for guests, and an operations console with an AI assistant for staff, on a fully serverless backend.

A concept build for a fictional spa brand, architected and built solo. Fully serverless on AWS — API Gateway, Lambda, S3, CloudFront, and DynamoDB — with Anthropic's Claude powering the conversational and agentic layers, all defined in AWS CDK. Live demo: purezen.stephsimmons.dev ↗ · Code: GitHub ↗

The problem

Spa booking is a transactional domain where wrong answers have real cost — a concierge that invents availability or double-books is worse than no concierge at all. The goal: a natural-language booking experience that feels effortless for guests, but is grounded in live data and structurally unable to confirm a slot that doesn't exist.

PureZen runs the full lifecycle — browse services, check availability, book, reschedule, cancel, and look up history — through conversation, with a separate staff console for schedule management and operational insight.

Two products, one backend

Guest concierge

A chat-first booking experience. Guests describe what they want in plain language (“a deep-tissue massage next Tuesday afternoon”); the system resolves the service against the real catalog, checks live availability, and walks them through booking, rescheduling, cancelling, or reviewing past appointments — with conversation state persisted per session.

Operations console

A staff portal with dashboards for daily overview, schedule, guests, users, and analytics — plus an AI assistant that answers operational questions by querying live data (“how many bookings tomorrow?”, “what are this week’s trends?”, “show me this guest’s history”). Each answer is produced from a real database query, not the model’s memory.

Architecture

A fully serverless design on AWS — no servers to patch, automatic scaling, and pay-per-request economics that scale to zero when idle. The static frontend is delivered from S3 via CloudFront; API traffic flows through API Gateway to a containerized Lambda.

Now — serverless on AWS

Browserguest & admin UI

static · HTTPS

CloudFrontglobal CDN · TLS

S3static frontend

/api calls

API GatewayREST · LambdaRestApi

AWS Lambdacontainer · FastAPI + Mangum

Anthropic — Claudeconcierge & admin agent

DynamoDBbookings · services · staff · sessions

CloudWatchlogs · metrics

AWS CDK infrastructure as code · least-privilege IAM roles · scales to zero

Highlights

Serverless compute — a containerized Lambda (Python 3.11) runs FastAPI through Mangum behind API Gateway; nothing to patch or keep warm, and zero cost when idle.
Global delivery — the static frontend ships from S3 behind CloudFront, with TLS and edge caching for low-latency access anywhere.
Booking integrity — slots are claimed with a DynamoDB conditional write that only succeeds while the slot is still marked available, so two guests can never confirm the same time.
Security by design — least-privilege IAM (no credentials in code), bcrypt-hashed passwords, and UUID session tokens rotated on login with server-side invalidation on logout and a 24-hour DynamoDB TTL.
Reproducible infra — the entire stack (Lambda image, API Gateway, S3, CloudFront, DynamoDB, IAM) is declared in AWS CDK and deployed from one source of truth.

AI design — grounded, not guessing

The model formats data; it never invents it

The concierge never answers from training knowledge. Services are resolved against the real catalog, availability and bookings come straight from DynamoDB, and the booking, reschedule, and cancel flows are deterministic state machines. Claude’s job is to understand intent and phrase the response — it physically can’t surface a slot the database didn’t return. That keeps every booking action traceable back to a specific query.

Deterministic routing first, model second

A regex-based intent router handles the common, well-formed requests — dates, times, service names, booking IDs — without a model call, and Claude (Haiku 4.5) is invoked for the genuinely conversational or ambiguous turns. Keeping the model a fallback rather than the front door means lower latency and predictable cost on the hot paths.

An agentic admin assistant — MCP-inspired

The staff assistant is built on Anthropic tool use, in a structure inspired by the Model Context Protocol: a clean separation between the model, a registry of typed tools, and the live data behind them — implemented as a focused in-app tool layer rather than a full MCP server. The tools — get_bookings_by_date, get_staff_roster, get_customer_history, get_trends, get_upcoming_bookings, and more — are typed data operations the model can call to answer a question, then summarize. The model decides which data to fetch; the tools guarantee it’s always real.

Engineering decisions & trade-offs

Decide when not to use the LLM

The most reliable features used the least AI. Anything with an exact answer — availability, trend counts, a guest’s history — is a deterministic function reading live data, which is faster, cheaper, and more predictable than asking a model. The LLM is reserved for language understanding and phrasing, where it’s genuinely the right tool.

Working within — then beyond — AWS Academy

PureZen began as a course project inside an AWS Academy sandbox — no GPU instances, a capped budget, and short-lived session credentials. Within those limits it was built as a textbook three-tier AWS architecture: a custom VPC in us-east-1 across two Availability Zones (four subnets), an Application Load Balancer and a bastion host in the public subnets, and an EC2 Auto Scaling Group in the private subnets (no public IPs). Each backend instance ran FastAPI under uvicorn on port 8000 alongside a self-hosted Ollama model (llama3.2:3b) on an r7a.large (2 vCPU, no GPU) — CPU-only, so generation took 10–40 seconds, which is exactly what motivated the regex-first routing. DynamoDB (point-in-time recovery, GSIs, conditional writes) held the data, reached over IAM instance profiles with no credentials in code, while CloudWatch + SNS handled observability and alerts.

Before — the AWS Academy architecture (three-tier on EC2)

Browserguest & admin UI

HTTP :80

VPC · us-east-1 · 2 Availability Zones

Public subnets (AZ-a / AZ-b)

Application Load BalancerHTTP :80 · health checks

Bastion hostEC2 · SSH :22

SG: ALB :80 → app :8000 · SSH :22 from bastion

Private subnets (AZ-a / AZ-b) · no public IPs

EC2 Auto Scaling GroupFastAPI · uvicorn :8000

Ollama — self-hostedllama3.2:3b · r7a.large CPU

IAM instance profile · SDK

DynamoDBPITR 35d · GSIs · conditional writes

S3 + CloudWatch + SNSstatic assets · logs · alerts

AWS Academy limits no GPU · capped budget · session credentials · CPU inference 10–40s

Rebuilding outside the Academy constraints replaced that box entirely: the conversational layer moved to Anthropic's Claude, which removed the inference bottleneck and unlocked the fully serverless backend the project runs on today — API Gateway and a containerized Lambda, with nothing to keep warm. (See the current architecture above.)

Infrastructure as code with CDK

An early environment loss — when the system existed only in its running state — made reproducibility a first-class concern. The stack is now declared in AWS CDK, so the entire environment (compute, gateway, CDN, data, IAM, and frontend deploy with cache invalidation) rebuilds from code rather than memory.

Stack at a glance

Layer	Tech
Frontend	Vanilla HTML / JS / CSS on S3, served via CloudFront
API	Amazon API Gateway (REST) → AWS Lambda
Backend	Python · FastAPI · Mangum, containerized Lambda (Python 3.11)
Data	Amazon DynamoDB — bookings, services, staff, availability, customers, sessions
AI	Anthropic Claude (Haiku 4.5) — grounded concierge + MCP-inspired tool-using admin agent
Auth	bcrypt password hashing, session-scoped guest & admin access
Infra	AWS CDK (TypeScript), least-privilege IAM, CloudWatch logs & metrics

My role — Sole architect and engineer: serverless architecture (API Gateway, Lambda, S3, CloudFront), DynamoDB data modeling, the FastAPI backend, the grounded concierge and the agentic admin assistant on Anthropic's Claude, the frontend, and the AWS CDK deployment. (A team course — but I was the only person with access to the AWS environment, and I designed and built all of the architecture and code.)

View live demo ↗ · View code ↗ · Get in touch ↗