Taming 50 Million Callbacks with Event-Driven Architecture
A legacy .NET HttpHandler buried inside the customer portal was processing webhook callbacks synchronously — and at 20M+ messages a month, vendor retry storms inflated that to 75 million callbacks with 90-second processing latency. We replaced it with an Azure Function that acknowledges in milliseconds and routes to channel-isolated processors via Service Bus, dropping latency to sub-second and eliminating the retry cascade entirely.
3 min read
Key Tradeoffs
- Speed over completeness. Acknowledge first, process later
- Isolation over simplicity.
- Serverless cost for predictable scale.
- Standardization over flexibility.
- Independent deploys, coordinated schemas.
What happened
The platform was a multi-channel communications SaaS — customers composed HTML messages and broadcast them as email, SMS, voice, or any combination across their contact base.
20M+
sms / month
5M+
emails / month
1000's
voice min / month
Customers reported delivery results were missing or delayed by hours. Outbound delivery was confirmed working — the bottleneck was the callback webhook handler ingesting delivery receipts, status updates, and opt-outs from vendors.
Root Cause
The callback processor was a .NET Framework HttpHandler baked into the main portal's IIS app pool. Application Insights showed ~75M callbacks/month — far exceeding actual message volume — with 90–120 second processing latency at peak. The excess was vendor retries: every unacknowledged callback triggered exponential backoff re-sends, creating a feedback loop that worsened under load.
How it was addressed
This wasn't a code problem — it was a missing architectural boundary. A synchronous handler embedded in a portal cannot absorb millions of async vendor callbacks. We defined three constraints for the replacement:
Decouple ingestion from processing. Vendors only need an HTTP 200 — acknowledge receipt instantly, process later.
Eliminate the retry cascade. Fast acknowledgment kills retries at the source. Handle the rest with idempotency.
Isolate failure domains. Email processor failures shouldn't impact SMS status tracking. Each channel needs independent scaling and failure boundaries.
The solution
The redesign replaced the monolithic handler with a three-layer event-driven pipeline on Azure serverless infrastructure.
Ingestion: A lightweight Azure Function accepts webhooks, normalizes payloads into a generic envelope, and pushes to a Service Bus topic. Response time: milliseconds. The URL convention
/incoming/{vendor}/{servicetype}/{*requesttype}made vendor onboarding a config change, not an infrastructure change.Routing: The Service Bus topic fans out via subscriptions — SMS to SMS processors, email to email, voice to voice — each with independent retry policies and dead-letter queues.
Processing: Vendor-scoped processor functions own the full event lifecycle — routing, business logic, and storage writes — with Azure SQL for transactional data, Cosmos DB for audit trails, and Azure Storage for overflow.
Result
Callback latency dropped from 90–120 seconds to sub-second. Inbound volume fell from ~75M to expected range — confirming the majority of prior load was vendor retries. Customer-reported delays resolved completely.
1
Architectural debt compounds under load. The handler worked at lower volumes. Recognizing a boundary problem vs. a code problem determined the right investment.
2
Fast acknowledgment is a force multiplier. Decoupling receipt from processing eliminated the retry cascade and cut total system load by ~65%.
3
Isolation enables independent scaling. Per-channel processors turned capacity planning from an all-or-nothing upgrade into a targeted decision.
Tradeoffs
Speed over completeness. Acknowledge first, process later — cutting response time from minutes to milliseconds eliminated the retry storm entirely.
Isolation over simplicity. More moving parts, but a failure in one channel no longer drags down every other channel with it.
Serverless cost for predictable scale. Trading free IIS capacity for consumption-based Azure infrastructure, but gaining the ability to absorb traffic spikes without manual intervention.
Standardization over flexibility. A generic message envelope simplifies routing at the cost of flattening vendor-specific nuance at the front door.
Independent deploys, coordinated schemas. Each processor ships on its own schedule, but envelope changes still require a synchronized rollout.
Related
From Busy Flag to Service Boundaries: Scaling a Monolithic Worker with Strangler Fig
A monolithic background worker processing compliance-critical events across six domains hit its scaling wall — but the real challenge wasn't technical. It was building the organizational confidence to decompose a system that couldn't go down, one boundary at a time.
Read more LeadershipWhy I Don't Do Technical Interviews for Senior Engineers
By the time someone has a decade of experience on their resume, I already know they can code — and write correct code at that. What I don't know, and what actually determines whether they'll thrive on my team, is whether they're curious, passionate, and wired to build great things. That's what the interview is for.
Read more