There are reports that Amazon’s Simple Storage Service – called S3 – suffered a “massive” outage this morning, beginning at about 7:30 am eastern time. At 9:03, an Amazon official posted a note in the service’s customer forum, saying, “We can confirm the high error rate you’re experiencing. While we don’t have an ETA at this point, we’re working as quickly as possible to restore performance. We’ll provide updates as soon as we have them.” A poster at Hacker News reports that the service is now “back up,” but another poster says that service remains “spotty.”
The outage, which may have spread to other Amazon web services, appears to be affecting many web businesses, including prominent ones like Twitter, which uses the service to store the images that appear on its site. Writes one blogger: “Amazon S3 goes down … panic ensues.” But another S3 customer, also posting in the service’s forum, is more sanguine: “This is the first outage I have experienced since I joined the service nearly a year ago. Yes it sucks, yes I hope they get it fixed very soon… but, the sky is not in fact falling at the moment.”
As someone who believes in the growth of the utility mode of computing, I feel compelled to point out the inevitable glitches that are going to happen along the way. How the supplier responds – in keeping customers apprised of the situation and explaining precisely what went wrong and how the source of the problem is being addressed – is crucial to building the trust of current and would-be users. When Salesforce.com suffered a big outage two years ago, it was justly criticized for an incomplete explanation; the company subsequently became much more forthright about the status of its services and the reasons behind outages. Given that entire businesses run on S3 and related services, Amazon has a particularly heavy responsibility not only to fix the problem quickly but to explain it fully.
UPDATE: As of 10:17, Amazon reports, “We’ve resolved this issue, and performance is returning to normal levels for all Amazon Web Services that were impacted. We apologize for the inconvenience. Please stay tuned to this thread for more information about this issue.” An S3 user suggests: “A health monitor would be useful – something to show what amazon thinks the status of the services are and to post official information. Maybe even proactive alerts or something I could tie our other infrastructure notifications into so I could be proactive in alerting our downstream affected users.” Another complains: “Amazon’s response was substandard in this case. I should, minimally, see a message on the front page at aws.amazon.com when there’s a complete outage.” I would expect that Amazon will roll out additional tools for monitoring service status and alerting users about problems in fairly short order.