Configuration Saboutage —

Facebook’s outage likely cost the company over $60 million

Configuration change cascaded down the data centers, bringing systems to a halt.

Facebook’s outage likely cost the company over $60 million
Getty Images

In a major outage yesterday, Facebook, along with its sibling sites, WhatsApp and Instagram, became unreachable for hours. Real-time website status tracker DownDetector received over 14 million reports from users who couldn't use the social media giant's apps and services.

But beyond the obvious inconvenience to those cut off from these services, yesterday's outage has had financial repercussions not only for Facebook, but many small businesses that rely on the platform.

Downtime estimated to cost Facebook over $60 million

Facebook's 2020 revenue was $86 billion. Experts have used this number to approximate the average loss incurred by the company yesterday at $163,565 for every minute of the outage. Over the six-hour period, this sums up to roughly $60 million in lost revenue. Another report by Fortune pinned the loss at $100 million, stating that "for many companies, a $100 million drop in revenue over any time period would be a financial event of significant concern. For Facebook, it is (for now) a drop in the bucket that investors will likely shrug off."

More than the loss of revenue, the event also affected Facebook's shares, which fell by 4.9 percent on Monday, translating into $47.3 billion in lost market cap.

Julian Dunn, director of product marketing at PagerDuty, helps companies address outages and told Ars, "Outages [at companies] like Facebook and Instagram mean big money for companies. Some companies are estimated to lose nearly five million dollars for every hour of the outage to their website. Although multi-hour outages are relatively rare, even short ones—15 minutes or half an hour—have an outsized impact, as impatient consumers are all too eager to leave a down site and go elsewhere. Plus, there’s a huge effect on the IT and developer teams that keep the systems running on the sites we visit every day."

And the losses don't end there. Some small businesses and firms had the equivalent of a "snow day" yesterday. Boutiques and shops that rely largely on social media platforms to communicate with clients, schedule appointments, and take payments were left without means to run operations.

“Configuration changes” blamed for implosion

Facebook has apologized for the inconvenience caused by the incident. "To all the people and businesses around the world who depend on us, we are sorry for the inconvenience caused by today’s outage across our platforms. We’ve been working as hard as we can to restore access, and our systems are now back up and running. The underlying cause of this outage also impacted many of the internal tools and systems we use in our day-to-day operations, complicating our attempts to quickly diagnose and resolve the problem," said Santosh Janardhan, VP for infrastructure at Facebook.

New York Times technology reporter Sheera Frenkel had additionally reported that some Facebook employees could not enter the office buildings due to badge access systems also being down from the outage.

While initially cybersecurity experts had steered everyone's attention toward Facebook's missing DNS records, the likely cause of the disruption was later attributed to a BGP misconfiguration. An extensive analysis by Celso Martinho and Tom Strickx of Cloudflare explains how the engineers identified Facebook's BGP routes that had been withdrawn from the Internet:

Routes were withdrawn, Facebook’s DNS servers went offline, and one minute after the problem occurred, Cloudflare engineers were in a room wondering why [our DNS service,] 1.1.1.1 couldn’t resolve facebook.com and worrying that it was somehow a fault with our systems. With those [BGP route] withdrawals, Facebook and its sites had effectively disconnected themselves from the Internet.

But what appeared to outside observers as BGP and DNS problems was actually the result of a configuration change that affected the entire internal backbone.

In a postmortem update posted yesterday, Facebook's Janardhan stated that "configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt."

Facebook's services were restored by approximately 7 pm ET yesterday. The company has explicitly pointed out that the root cause of this outage was a faulty configuration change and that there is no indication at this time that any user data was compromised. Regardless, the incident is a testament to the dominance of social media and messaging platforms on various aspects of our lives and commerce, making their availability no longer optional.

Channel Ars Technica