Partial data processing outage
Incident Report for Customer.io
Resolved
Everything remains operating normally. Thank you for the patience today while our team investigated and isolated a very tricky edge case.

Your friends at Customer.io
Posted Nov 07, 2019 - 21:38 UTC
Monitoring
Great news! The edge case we identified and patched does appear to be the cause of today's incident. Our team is monitoring to ensure everything continues operating normally.

Unless we identify additional issues we'll resolve this incident at 21:30 UTC
Posted Nov 07, 2019 - 20:39 UTC
Update
We've identified an edge case in our backend that may cause these processing slowdowns and are deploying a fix for it. We will monitor to determine if this resolves the issue.

We'll update again by 21:00 UTC
Posted Nov 07, 2019 - 20:32 UTC
Update
We are still investigating. Currently we're working to isolate the processing slowdown to a specific customer environment. Once we've identified the source we'll be able to remediate the issue.

We'll update again by 20:30 UTC
Posted Nov 07, 2019 - 20:05 UTC
Update
We're continuing to investigate. Nothing new exciting to share at this update.

We'll update again by 20:00 UTC
Posted Nov 07, 2019 - 19:31 UTC
Update
We are continuing to investigate. The rollback was successful and we've eliminated a recent change as a possible cause. We will provide an update by 19:30 UTC
Posted Nov 07, 2019 - 19:02 UTC
Update
We're still investigating the problem database and are rolling back to an earlier deploy to isolate the root cause. We will provide another update by 19:00 UTC
Posted Nov 07, 2019 - 18:21 UTC
Update
Investigation is still ongoing. We've deployed changes to restore processing and will monitor.

We will provide an update by 18:00 UTC
Posted Nov 07, 2019 - 17:35 UTC
Update
We're still investigating, attempting to restore processing for a subset of the affected workspaces.

We will provide an update by 17:30 UTC
Posted Nov 07, 2019 - 17:08 UTC
Update
Investigation is still ongoing.

We will provide another update by 17:00 UTC
Posted Nov 07, 2019 - 16:31 UTC
Update
Investigation is still ongoing, we have not isolated the issue yet.

We will provide another update by 16:30 UTC
Posted Nov 07, 2019 - 16:04 UTC
Investigating
We're having issues with one shard in our database resulting in no data processing and no messages being sent for a portion of our customers. Additionally, the management interface for the affected workspaces is not functional. Data collection is not affected.

We will provide an update by 16:00 UTC
Posted Nov 07, 2019 - 15:29 UTC
This incident affected: Data Processing, Email Sending, and Management Interface.