Eariler in the month AWS went down, well, parts of it and as you might expect being the biggest of the cloud providers it caused a number of other services to go offline. And of course, it was DNS fault. Domain Name System (DNS) is the service that translates human friendly names into computer friendly numbers, IP addresses. It’s one of the fundentmental services that runs our modern digital world.
Unfortunately, cloud does not mean you get 100% uptime, no one offers actual 100% uptime even these giant companies only ever offer upto 99.999% uptime, which means you’ll expect some outage despite all their best efforts.
Back in 2019, Verizon knocked off huge parts of the internet, not because of DNS, but because of Border Gateway Protocol (BGP). This was a routing issue, where a duff record was propagated up and out rather than being filtered out, which then caused the incorrect record to be distributed across the internet.
The PHP project switched to using GitHub rather than some homebrew system for limiting who has write access to the repository after a security issue. It’s not like they can’t manage the system, its just realising this is a distraction from their primary goal.
The key thing about these large scale outages, is they are well documented, people understand this is beyond your control. We as sociaty understand that things break and we need to wait for them to come back online, we don’t like it, but accept it more than when its an internal problem. Most people and companies aren’t in the infrastructure game, DNS, BGP and all those others critical things are taxes we all must pay and we pay them to these massive providers who do an excellent job running the services, who unfortatenly remind us occurancely they are human.