Mar 7, 16:46 UTC
Our cloud provider confirmed that a recent rollout of their HTTP LB software was the cause of these elevated errors. They rolled back the update and we confirmed recovery at 2019-03-06 05:36:00 UTC. Over the past hour we have not seen any regression in the eastern API servers.
We will publish a post mortem for this incident within the next 48 hours.
Mar 6, 06:31 UTC
We've finished redeploying our eastern data collection API and can confirm the 502 errors are no longer being returned. We are working with our cloud provider to determine the root cause of the elevated errors and will continue monitoring to ensure the issue doesn't return.
Mar 6, 05:50 UTC
We're continuing to investigate the issue with our eastern API cluster. Our SRE team has redeployed our API service on the affected cluster. We'll update the incident by 2019-03-06 05:30:00 UTC.
Mar 6, 05:16 UTC
Our SRE team is continuing to investigate the elevated failures to our eastern API servers. We'll update by 2019-03-06 05:15:00 UTC.
Mar 6, 04:52 UTC
We are currently investigating elevated rates of failure to our edge API servers located in the eastern United States. Our SRE team is investigating the failures and we'll provide an update by 2019-03-06 04:50:00 UTC.
Mar 6, 04:35 UTC