A full-stack spa booking platform on AWS — a natural-language concierge that books appointments for guests, and an operations console with an AI assistant for staff, on a fully serverless backend.
Spa booking is a transactional domain where wrong answers have real cost — a concierge that invents availability or double-books is worse than no concierge at all. The goal: a natural-language booking experience that feels effortless for guests, but is grounded in live data and structurally unable to confirm a slot that doesn't exist.
PureZen runs the full lifecycle — browse services, check availability, book, reschedule, cancel, and look up history — through conversation, with a separate staff console for schedule management and operational insight.
A chat-first booking experience. Guests describe what they want in plain language (“a deep-tissue massage next Tuesday afternoon”); the system resolves the service against the real catalog, checks live availability, and walks them through booking, rescheduling, cancelling, or reviewing past appointments — with conversation state persisted per session.
A staff portal with dashboards for daily overview, schedule, guests, users, and analytics — plus an AI assistant that answers operational questions by querying live data (“how many bookings tomorrow?”, “what are this week’s trends?”, “show me this guest’s history”). Each answer is produced from a real database query, not the model’s memory.
A fully serverless design on AWS — no servers to patch, automatic scaling, and pay-per-request economics that scale to zero when idle. The static frontend is delivered from S3 via CloudFront; API traffic flows through API Gateway to a containerized Lambda.
Now — serverless on AWS
The concierge never answers from training knowledge. Services are resolved against the real catalog, availability and bookings come straight from DynamoDB, and the booking, reschedule, and cancel flows are deterministic state machines. Claude’s job is to understand intent and phrase the response — it physically can’t surface a slot the database didn’t return. That keeps every booking action traceable back to a specific query.
A regex-based intent router handles the common, well-formed requests — dates, times, service names, booking IDs — without a model call, and Claude (Haiku 4.5) is invoked for the genuinely conversational or ambiguous turns. Keeping the model a fallback rather than the front door means lower latency and predictable cost on the hot paths.
The staff assistant is built on Anthropic tool use, in a structure inspired by the Model Context Protocol: a clean separation between the model, a registry of typed tools, and the live data behind them — implemented as a focused in-app tool layer rather than a full MCP server. The tools — get_bookings_by_date, get_staff_roster, get_customer_history, get_trends, get_upcoming_bookings, and more — are typed data operations the model can call to answer a question, then summarize. The model decides which data to fetch; the tools guarantee it’s always real.
The most reliable features used the least AI. Anything with an exact answer — availability, trend counts, a guest’s history — is a deterministic function reading live data, which is faster, cheaper, and more predictable than asking a model. The LLM is reserved for language understanding and phrasing, where it’s genuinely the right tool.
PureZen began as a course project inside an AWS Academy sandbox — no GPU instances, a capped budget, and short-lived session credentials. Within those limits it was built as a textbook three-tier AWS architecture: a custom VPC in us-east-1 across two Availability Zones (four subnets), an Application Load Balancer and a bastion host in the public subnets, and an EC2 Auto Scaling Group in the private subnets (no public IPs). Each backend instance ran FastAPI under uvicorn on port 8000 alongside a self-hosted Ollama model (llama3.2:3b) on an r7a.large (2 vCPU, no GPU) — CPU-only, so generation took 10–40 seconds, which is exactly what motivated the regex-first routing. DynamoDB (point-in-time recovery, GSIs, conditional writes) held the data, reached over IAM instance profiles with no credentials in code, while CloudWatch + SNS handled observability and alerts.
Before — the AWS Academy architecture (three-tier on EC2)
Rebuilding outside the Academy constraints replaced that box entirely: the conversational layer moved to Anthropic's Claude, which removed the inference bottleneck and unlocked the fully serverless backend the project runs on today — API Gateway and a containerized Lambda, with nothing to keep warm. (See the current architecture above.)
An early environment loss — when the system existed only in its running state — made reproducibility a first-class concern. The stack is now declared in AWS CDK, so the entire environment (compute, gateway, CDN, data, IAM, and frontend deploy with cache invalidation) rebuilds from code rather than memory.
| Layer | Tech |
|---|---|
| Frontend | Vanilla HTML / JS / CSS on S3, served via CloudFront |
| API | Amazon API Gateway (REST) → AWS Lambda |
| Backend | Python · FastAPI · Mangum, containerized Lambda (Python 3.11) |
| Data | Amazon DynamoDB — bookings, services, staff, availability, customers, sessions |
| AI | Anthropic Claude (Haiku 4.5) — grounded concierge + MCP-inspired tool-using admin agent |
| Auth | bcrypt password hashing, session-scoped guest & admin access |
| Infra | AWS CDK (TypeScript), least-privilege IAM, CloudWatch logs & metrics |