Open to Engineering Manager / Director rolesLet's connect
Labs/Architecture/A Database Migration Took Down the Entire Platform
Architecturedatabasemigrationdowntimearchitecture

A Database Migration Took Down the Entire Platform

A routine schema migration brought down a multi-tenant SaaS platform for 47 minutes during business hours.

Situation

You're the senior engineer at a B2B SaaS company. A new feature requires adding a non-nullable column with a default value to your largest Postgres table — 80 million rows. The migration works perfectly in staging with 50,000 rows.

Stakes

  • Multi-tenant platform serving 200+ enterprise customers
  • Schema change required on the largest table in the system
  • SLA breach threshold was 30 minutes of downtime per quarter

Production deployment is scheduled for Tuesday morning. The migration worked perfectly in staging. What's your deployment strategy?