Unless you're living under a rock or enjoying a serene life away from social media, you're probably aware of the Facebook, WhatsApp and Instagram outages that happened at the start of this week.
Lasting close to six hours depending on your time zone, here in the UK the outage lasted from 16:00 GMT until 22:00 GMT on Monday.
Users trying to load Facebook’s site were met with nothing. While Instagram and WhatsApp, despite technically being accessible, would not load new content or send messages.
Approximately 3.5 billion people use Messenger, Instagram, Facebook and WhatsApp, to communicate to family, friends and colleagues, meaning the outage had a major social impact. Small business owners who use these sites to communicate with customers and make sales, also faced the prospect of a financial hit.
Sharing in this financial hit is Facebook itself. Fortune is reporting that the outage cost Mark Zuckerberg, founder of Facebook an estimated £4.4bn as company shares plummeted last night.
On a positive note, there are no signs user data has been compromised by this outage. Instead of users being left exposed, many have instead been left confused by the incident.
Why did this outage happen? How does an outage on this scale happen to one of the biggest companies in the world? And why did it affect not only Facebook but also WhatsApp, Messenger and Instagram?
Well, it’s got to do with how Facebook runs their sites. Everything including WhatsApp, Instagram and Messenger are run by the company through Facebook.
This becomes a major problem when a situation like the one Facebook faced on Monday arises. When the outage was reportedly caused by a configuration change to the backbone routers.
These routers coordinate network traffic between the company’s data centres, meaning if they go down, everything goes down.
You see, the internet consists of a lot of connected networks that rely on two things- DNS (Domain Name System) and BGP (Border Gateway Protocol).
DNS is the IP address for each website, ergo it indicates its location on the web. Meanwhile, BGP is more like a roadmap, or even Google Maps in that it works to find the most efficient route for you to get to that IP address.
Essentially, through making a series of updates to their routers, the company inadvertently told BGP that established paths to Facebook no longer existed. Meaning users and staff alike trying to reach Facebook couldn’t find the path to access it.
This had a knock-on effect, on Instagram, Messenger and WhatsApp which as mentioned before, are run via Facebook’s site.
Facebook couldn’t access their own communications platform, Workplace during the outage. Or, excruciatingly enough, their own offices.
Facebook’s security pass system was also caught up in the outage, preventing staff from entering the office to physically diagnose and resolve the outage. A more frustrating situation is difficult to imagine.
Multiple reports say that Facebook in the end brought everything back online by sending a technical team out to manually reset its servers in California, where the problem originated.
This demonstrates the myriad of problems that can come with having a single point of failure for a large number of online services.