NewsBite

Advertisement

This was published 1 year ago

What caused the Optus outage?

By David Swan and Ben Grubb
Updated

A major outage impacted millions of Optus customers nationally on Wednesday including businesses, Melbourne’s rail network and hospitals. Services started to return to normal on Wednesday afternoon, almost nine hours after issues began. The full restoration of the network took about 16 hours.

Here’s what we know about the outage.

What caused the Optus outage?

Optus is yet to confirm the reason behind the outage, but the fact it started about 4am points to a likely issue with a software or firmware update, or an incorrect message sent from either inside or outside the Optus network that provided dodgy traffic routing instructions, according to network engineers. The vast majority of network updates occur overnight, between 2am and 4am, while most of us are asleep.

In 2012, Dodo took the blame for an outage affecting many Australian internet connections, saying it was caused by a hardware fault on a router that triggered crippling flow-on effects at Telstra. That outage only lasted 45 minutes, while the Optus outage went on for many hours longer.

Matt Tett, the managing director of Enex TestLab, said the issue appeared to involve a so-called “BGP [Border Gateway Protocol] prefix flood”.

Essentially, it means that one of Optus’ routers was likely fed incorrect routing information in an update, leading to total network gridlock. This could have been caused by either Optus or an external party. Optus has been contacted to confirm whether this is indeed the case.

Customers line up outside an Optus shop front on George Street in Sydney during a country-wide network outage.

Customers line up outside an Optus shop front on George Street in Sydney during a country-wide network outage.Credit: Dominic Lorrimer

“Take for example the fence you share with your neighbour. I have a note for you to pass to the neighbour next door that says ‘I love them’. Instead of putting the one address to next door, I accidentally put ‘everyone’, then you go off and try to deliver that note that I love them to EVERYONE, which results in you handing off that note to everyone on your BGP routes,” Tett said.

Border Gateway Protocol is how network owners and operators’ routers share routing information.

Advertisement

Network operators suggested this possible scenario after Optus sent a message to them stating that the suspected root cause of the issue lay with “route reflectors, which are currently handling an excessive number of routes, leading to session shutdown and a complete traffic halt”.

Loading

“Our on-site technician is actively prioritising establishing a console connection [a physical cable connection to routing hardware],” the message to Optus’ partners early on Wednesday said. “Rest assured that said technician is also being provided additional technical support remotely.”

Publicly available information appears to suggest that a BGP flood occurred. Just before 4am, BGP “route” announcements were hovering between 10,000 and 20,000. Then, about 4am, they rocketed up to 940,000.

The dodgy instruction could have been sent from an internet exchange (a physical location similar to a data centre, where multiple internet providers and network operators interconnect their networks) or directly from an internet provider itself or a content provider.

American internet routing infrastructure expert Doug Madory said on X, formerly known as Twitter, that the large increase in BGP announcements could have been a symptom of the outage but not necessarily its initial cause.

Just before the outage began on Wednesday, Madory said Optus withdrew 150 of the 271 BGP prefixes (the numbers, or IP addresses, and names of its network) it normally announces to the internet as existing.

“The withdrawal of a prefix triggers a flurry of messages as [other networks] search in vain for a [new] route to replace a lost route. The more routes withdrawn, the larger the flurry of messages.”

Is a cyberattack to blame?

It’s too early to rule out a malicious attack, though Optus chief executive Kelly Bayer Rosmarin says there are no indications of the outage being due to a hack or cyberattack, despite Optus suffering one of Australia’s most significant data breaches late last year.

How long will it take to fix?

Before services began to return to normal just before 1pm on Wednesday, Optus had not provided a firm timeline for when the outage would be resolved. Routing issues often took hours to fix, according to Tett.

“The problem with routers, particularly if it is configuration [issue and] not an attack, is that one major change has to propagate throughout the network and then a fix also then needs to propagate,” he said.

“[The fix] can take hours, particularly if the network is slammed through mis-routing.”

Why is the Optus network outage connected to public transport and hospitals?

Optus’ network infrastructure runs far deeper than just 4G and 5G mobile phone towers. Fibre networks like that from Optus are the backbone for all telecommunications services, including 5G and 4G, as well as eftpos, public transport infrastructure and hospitals, which were all affected on Wednesday.

What’s the government saying?

NSW Premier Chris Minns said the failure caused major disruption to Service NSW, but rail network issues were limited in Sydney.

Minns said the government would be requesting a full explanation from Optus for the outage, which he described to the ABC as “deeply regrettable.”

Loading

The Greens, meanwhile, called for a Senate inquiry into the outage.

“This is not a small matter and the parliament will have to look at what Optus can and should be doing, what they knew, how this failure happened and there needs to be … consequences for this type of outage,” Greens Senator Sarah Hanson-Young said on Wednesday.

Hanson-Young said the Greens did not have the numbers for bipartisan support, but hoped it would be agreed by all major parties.

South Australian Premier Peter Malinauskas said his government was disappointed with Optus.

“They have let their customers down throughout the state, including the government,” he told reporters in Adelaide.

Earlier, Communications Minister Michelle Rowland called on Optus to “step up” its public communications to customers as people are “hungry for information”.

Michelle Rowland called on Optus to “step up” its response to millions of disconnected customers.

Michelle Rowland called on Optus to “step up” its response to millions of disconnected customers.Credit: Alex Ellinghausen

Will customers be eligible for compensation?

If you have been disadvantaged or lost money due to a phone or internet outage, you might be able to claim compensation, according to ACCAN, the peak advocacy group for Australian communications consumers.

“Compensation should make up for your loss,” ACCAN says. “For example, if your internet is out for one week you could ask for your money back for that week. You may be able to claim for costs incurred, like getting your internet fixed or using extra mobile data.”

“Work out how much money you or your business has lost because of the outage, including any costs for an interim service. Keep documents such as bills and receipts as evidence, record when the outages happened, and how long they lasted.”

“Contact your service provider to explain the problem, and to ask for compensation, and give your service provider the evidence you have collected.”

The Business Briefing newsletter delivers major stories, exclusive coverage and expert opinion. Sign up to get it every weekday morning.

Most Viewed in Technology

Loading

Original URL: https://www.smh.com.au/link/follow-20170101-p5eiep