Friday, 2017-10-20

Unplanned Service Interruption due to Database Issue

17:37 PST: We are currently experiencing a major outage and are working on restoring service. We will post details shortly.

18:05 PST: Everything appears to be coming back online now. We're continuing to monitor and will post more details as we're able.

18:39 PST: Service is fully restored. Though we have identified the system that failed, we have not yet determined the precise cause. At this point, we know that one of our databases failed. Though we do account for this situation, it did take us time to properly diagnose (which was hampered by the particular way in which the database failed, which led to confusion about the root cause).

We are still diagnosing the precise cause of the failure, but believe it to be related to a routine security scan that wasn't as routine as it should have been. We will continue to dig in, and will also look into possibilities to improve our failure handling in this area.

2017-10-25 4:50pm PST: After another incident on the server, we are near certain it was a hardware issue on the server. We have replaced the hardware, and are continuing to monitor.

5:09pm PST: Confirmed from AWS that the hardware failed.