On Friday November 25th, 2022, between 14:57 and 18:53 UTC customers may have experienced a delay in outbound message delivery. This incident only affected the EU datacenter. The US datacenter was functioning properly.
This issue did not affect inbound data ingestion to our system. Once the issue was resolved outbound message delivery resumed and no messages were lost as they were queued during the incident.
Customer.io would like to apologize for the impact of this outage. We are committed to learn from this event and use it to drive improvement across our services.
Starting at 11:00 UTC on Nov 25th an abnormally large number of customers initiated broadcasts. This rise in volume is normally handled well by our autoscaling system, however, due to two unusually large sends the autoscaling failed to keep up with demand and we had to fall back to manually scaling the system.
Once the issue was identified, the team disabled some large sends and manually scaled the system. The manual corrections allowed the vast majority of sends to complete promptly, and we then worked to manually mitigate the remainder.
At 18:53 UTC on November 25th, 2022, the backlog was cleared, all messages were delivered, and the incident marked as resolved.
We are working on improving the message sending autoscaling to better handle sudden increases in load.