On November 25, 2020, Amazon’s AWS operations in North America were extensively disrupted, impacting wide swaths of businesses and in turn, their customers.
The outage hit AWS clients for a very large part of the day and even into the evening. Ultimately, Amazon reported complete problem resolution early the next morning.
High-profile companies such as Adobe, Twilio, and Roku were affected, along with countless smaller business operations that rely on AWS to keep themselves humming along.
If you had problems signing on to one of your online accounts, or trying to make an online transaction, or noticed a smart device in your home wasn’t functioning properly, then chances are you were directly affected (as I believe I was).
Just imagine a vaguely similar but far larger-scale impact to a manufacturing facility’s sensitive software systems reliant on AWS. Think about the hours or even a full day’s worth of lost productivity and business revenue!
Is this really a big deal?
In the grand scheme of things, IT system-related outages are not unusual at all. They happen all the time, whether public cloud infrastructure, an on-premises data center, or an individual server.
While multi-hour service disruptions are not uncommon with cloud or online services, they’re also not exactly desirable, either. In the IT world, “five nines” or 99.999% reliability is the gold standard. But achieving high availability on that level would mean downtime of no more than 5.5 minutes per year.
“Three nines” or 99.9% is more achievable with up to about 9 hours of allowable downtime. This is probably close to realistic expectations for AWS’s uptime reliability.
There hasn’t been notable disruption to AWS’s reliability or service for about two years. But when it does occur, it always should serve as an important reminder:
AWS, by far the largest cloud computing provider by market share, has the remote but non-zero probability of taking down tens if not hundreds of thousands of clients along with it – including corporations valued at billions of dollars.
Many companies have, or are seriously considering, distributing their operations between multiple cloud providers for redundancy and fail-safe reasons.
Another common approach is to have at least some of the system architecture housed within the on-premises data center.
It’s vital to remember that if your business runs on the public cloud, and something happens to it, your customers don’t care that it was the cloud’s fault – or that it’s beyond your control.
All they think about is the fact that your company screwed them over in some way.