Duration: 2h 14m, Nov 24 8:08 AM UTC - 10:22 AM UTC]
Severity: P2
Impact: Reduced message processing rates across US infrastructure
On November 24th, our message rendering service experienced degraded performance during a period of high traffic. The service, which renders your messages for delivery, encountered memory constraints that caused intermittent service restarts and slower processing rates for priority queues across our US infrastructure.
Date: November 24, 2025
Our message rendering service runs on an auto-scaling infrastructure that automatically adjusts capacity based on workload. During this incident, sudden traffic spikes caused individual servers to consume memory faster than our auto-scaling could compensate. When servers reached memory limits, they restarted automatically (as designed for resilience), but these rolling restarts reduced our overall processing capacity during a time of peak demand, creating a compound effect.
Immediate fix: We deployed updated code to our rendering service that better manages memory consumption during traffic bursts, preventing the cascade of restarts that degraded performance.
Why this works: The update implements more efficient memory allocation patterns and adds throttling mechanisms that prevent any single traffic burst from overwhelming individual servers, regardless of auto-scaling speed.
* Deployed code optimizations that prevent memory exhaustion during traffic spikes
* Implemented per-node workload throttling to maintain stability
* Tuning our auto-scaling to be more predictive rather than reactive
* Increasing baseline capacity to handle larger bursts without scaling delays
* Adding memory pressure alerts that trigger before critical thresholds
* Implementing graduated responses to traffic spikes \(pre-scaling based on queue depth trends\)
While our platform maintained data integrity throughout this incident (no messages were lost), we understand that processing delays impact your customer engagement timing. We are actively working to ensure all customers’ messages are processed and delivered as efficiently and reliably as possible.
Your Customer Success Manager has details specific to your workspace's impact during this incident. For technical questions or to discuss our infrastructure roadmap, please reach out to your account team.