The challenge
A fast-growing B2B SaaS was hitting a wall. Their Python/FastAPI API powered dashboards for thousands of seat-based accounts, and the read path had grown into a tangle of N+1 ORM queries, repeated permission checks, and per-request feature-flag lookups. p95 latency on the busiest endpoints had drifted from 380ms at launch to 1.8s, RDS CPU was pinned at 80% during business hours, and the engineering team was already pricing a vertical RDS upgrade that would have added ~$3,400/mo to the bill.
Worse, the slowness was invisible to half the customer base because the dashboard renders progressively — by the time customers complained, the team had already lost an enterprise renewal over "the dashboard feels broken." The team needed a fix in weeks, not a six-month re-platform.
Our solution
We dropped a disciplined, three-tier Redis caching layer in front of the hottest read paths and re-shaped the data model so it could be cached safely. The investment was in cache key design, invalidation contracts, and observability — not in throwing memory at the problem.
Tier 1: per-request memoization inside the FastAPI dependency container, killing duplicate lookups within a single request.
Tier 2: a shared Redis 7 cluster (cluster-mode, AWS ElastiCache) holding hot reads — permission sets, feature flags, account metadata, dashboard aggregates — with explicit TTLs and a typed Pydantic envelope so cached payloads are versioned and safe to evolve.
Tier 3: an out-of-band Celery worker that pre-warms the most-requested aggregates immediately after writes, so the next user request is already a cache hit.
Every cache key is namespaced by tenant + entity + version, every read records a hit/miss to Datadog, and every write goes through a single invalidation module so a future engineer can't silently bypass it.
- Three-tier caching: in-process memoization, Redis cluster, and pre-warming Celery workers
- Typed Pydantic cache envelopes with explicit version field for safe schema evolution
- Namespaced keys (tenant + entity + version) so deploys never serve mixed-shape data
- Single invalidation module — every write path funnels through it; no silent bypass
- Shadow-read rollout with cache-vs-DB diffing on 10% of live traffic before cutover
- Datadog dashboards for hit rate, p95, eviction rate, and Redis memory headroom
- PagerDuty alerts on hit-rate drop, eviction spikes, and Redis primary failover
- k6 load tests reproducing 3x peak traffic, with a written capacity model