The Cloud Migration That Almost Broke Our Export Service
We had three weeks to migrate file storage from AWS S3 to Azure before our AWS contract renewed. The codebase had a clean storage abstraction built for exactly this scenario. We almost shipped without checking whether everything actually used it.
- Hard deadline — AWS contract renewal in three weeks with no extension option
- Customer-facing file exports silently broken if the migration went wrong
- No equivalent staging environment to validate Azure behavior before go-live
File exports weren't a background feature — customers downloaded invoices, reports, and audit records through them. A broken export that fails silently doesn't produce an error page. It produces a missing file when someone needs it most.
The Scenario
You're the lead engineer at a B2B SaaS company. You have three weeks to migrate file storage from AWS S3 to Azure before your AWS contract auto-renews at a rate leadership won't approve. The codebase has a storage abstraction layer — an enum-routed system designed years ago with exactly this kind of migration in mind. It looks clean. How do you approach it?
No hints. Just judgment.
Most engineers start a cloud storage migration by updating the abstraction layer and deploying — it feels clean because the abstraction was designed for exactly this. But that collapses two separate risks into one deployment: whether all traffic actually flows through the abstraction, and whether the new backend behaves identically under real load. An audit addresses the first. A hard cutover leaves you exposed on the second. Both need to be solved in the design before you write a line of migration code.
- Abstraction layers only protect what actually flows through them — audit before any migration
- Design for parallel backends from the start, not as a contingency after problems appear
- Rollback should be a config change throughout the migration, not a code revert
- Silent failures in customer-facing features are discovered by customers, not monitoring
- Two risks in one deployment doubles your exposure — solve coverage and behavioral parity separately
- Export service bypass caught before migration — zero customer-facing file failures
- Migration completed on deadline with no production incidents
- Parallel backend pattern adopted as the standard for future infrastructure migrations
- Rollback capability maintained throughout — cutover was a config change, not a deployment event
Related
A Database Migration Took Down the Entire Platform
A routine schema migration brought down a multi-tenant SaaS platform for 47 minutes during business hours. The migration itself was correct. The deployment strategy was the failure.
Read more ArchitectureWe Split the Monolith and Made Everything Worse
A team extracted a billing service from a monolith to improve deploy velocity. Deploys got faster. Everything else got slower, harder to debug, and more fragile. The architecture was right. The boundary was wrong.
Read more