Open to Engineering Manager / Director rolesLet's connect

Labs

Choose a lab and navigate the decisions that shaped the outcome.

These are real decisions I've navigated — not textbook scenarios. Walk through the trade-offs, make the calls, and see how the outcomes unfold.

ArchitectureIncident ResponseFeatured

Production Pods Were Restarting Randomly

A production incident involving intermittent connection failures and pod restarts under normal traffic patterns.

Choose your path
ArchitecturePerformanceFeatured

GraphQL Performance Was Deteriorating

API response times climbing steadily under normal load. The database is getting blamed. Infrastructure spend is on the table.

Choose your path
LeadershipSecurityFeatured

Security Vulnerabilities Were Accumulating in Our GraphQL Stack

Active CVEs in a production compliance platform. Audit scheduled. Limited team capacity. Every new feature built on a deprecated foundation.

Choose your path
LeadershipMigrationFeatured

The Cloud Migration That Almost Broke Our Export Service

3 weeks to migrate file storage from AWS S3 to Azure. The codebase has a storage abstraction layer. It looks clean. It isn't.

Choose your path
Architecturearchitecturemicroservicesstrangler-figscaling

Scaling Crisis: Your Monolithic Worker Has Hit the Wall

Six critical workloads share one process, one deployment, and one busy flag — and customers are feeling the pain.

Choose your path
Leadershipstakeholder-managementmigrationlegacy-systemsteam-dynamics

How do you handle a lead developer's resistance to migrating their legacy system?

The technical migration plan is solid, but the person who built what you're replacing isn't on board.

Choose your path
Architectureobservabilityalertingincident-response

We Had 400 Alerts and Missed the One That Mattered

A memory leak alert fired 90 minutes before the database OOM'd, but was buried in 400 weekly alerts.

Choose your path
Architectureperformancecachingdebuggingprofiling

We Built a Cache That Made the System Slower

A team added Redis caching to speed up a slow API endpoint, but response times got worse.

Choose your path
Architecturedatabasemigrationdowntimearchitecture

A Database Migration Took Down the Entire Platform

A routine schema migration brought down a multi-tenant SaaS platform for 47 minutes during business hours.

Choose your path
Architectureincident-responsedependency-managementfinancial-services

A Minor Dependency Update Broke Production for 12 Hours

A semver-compliant patch update silently corrupted financial reports through changed locale handling.

Choose your path
Architectureincident-responsefeature-flagsdata-corruption

A Feature Flag We Forgot About Caused a Production Incident

A stale flag's default value routes financial transactions through deprecated code, corrupting data for 6 hours.

Choose your path
Labs | John Tolar (JT)