Customers may be impacted by data delays

Incident Report for Customer.io Status

Postmortem

Incident Summary

On December 8th, 2025, beginning at 17:38 UTC, some customers experienced delays in data processing and message delivery. Normal functionality was fully restored at 18:19 UTC, for a total duration of 41 minutes. No data was lost.

A failure during the startup of an internal processing service prevented it from becoming fully operational, leading to reduced throughput and increased retry activity in upstream components.

Root Cause

During startup, one of our processing services loads information about message queues before beginning normal operation. An unexpected queue state left over from a previous configuration caused the service to encounter an error during this process, leading it to restart repeatedly without successfully completing initialization.

Because this service is responsible for handing off work to downstream processors, its unavailability resulted in a drop in throughput and a rise in retry traffic. The elevated retries added load to our underlying data layer and contributed to further delays.

Resolution and Recovery

Engineers identified the failing service, corrected the underlying queue state, and restored the service to full operation. Once stabilized, normal processing resumed and retry volumes returned to expected levels. The system was monitored to confirm full recovery.

Corrective and Preventative Measures

To prevent recurrence, the team is improving validation during service startup to better handle unexpected queue conditions, refining deployment procedures to detect stalled services sooner, and enhancing monitoring for repeated restart patterns. These improvements are being incorporated into ongoing reliability work.

We apologize for any disruption this caused.

Posted Dec 11, 2025 - 15:47 UTC

Resolved

This incident has been resolved.
Posted Dec 08, 2025 - 20:19 UTC

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Dec 08, 2025 - 18:28 UTC

Identified

The issue has been identified and a fix is being implemented.
Posted Dec 08, 2025 - 18:24 UTC

Investigating

We identified a recent change that might cause data delays. We are investigating.
Posted Dec 08, 2025 - 18:12 UTC
This incident affected: Data Processing and Message Sending.