NewsBite

1400 triple-0 calls failed in major Telstra outage as report reveals telco missed early warning sign

AT LEAST 1400 emergency phone calls to Triple Zero failed during a Telstra outage in May and an investigation has pinpointed the chain of events that led to it.

AT LEAST 1400 emergency calls couldn’t get through to triple-0 in May this year due to a Telstra network outage, a government report into the incident has shown.

The report also revealed the telco had overlooked a significant early warning sign for one of three issues that led to the mass outage.

On the day of the May 4 outage, the Department of Communications announced it would investigate how it happened, with help from the Ausralian Communications and Media Authority (ACMA).

A detailed report released today — Investigation Report into the Triple Zero Service disruptions of 4 & 26 May — revealed three key events that led to the outage which started about 2am and lasted several hours.

The investigation found 3912 emergency phone calls were made during the outage and of those 1433 couldn’t be connected due to the outage, while the rest weren’t connected because the caller voluntarily ended the call while waiting for an operator.

At the time it was claimed it was due to a fire at a cable pit near Orange, NSW — which is now considered only to one part of the issue.

The damage in Orange was identified at 2.42am on May 4 and a fibre repair team was tasked to fix it 20 minutes later.

TRIPLE-0 CALL FAILURES

One result of the May 4 outage was that some triple-0 calls failed to connect to an operator in Telstra managed emergency call centres so they could be transferred to emergency services.

In the days after, Telstra refused to reveal how many emergency calls had gone unconnected that morning.

Typically about 25,000 calls are made to triple-0 each day across the country — of those about 25 per cent are not considered to be emergencies.

Telstra is contracted to provide triple-0 connections for all calls.

The outage affected services in ACT, NSW, Victoria, Western Australia, South Australia and Tasmania.

THE FIRST SIGN OF A PROBLEM

The investigation also found a link between the mass outage and a transmission controller card failure at a Victorian exchange, which was flagged the day before.

Despite Telstra being alerted to this in the early hours of May 3, it took more than a day for them to send out a technician to fix it.

People using telephones in Telstra booths. Picture: News Corp Australia
People using telephones in Telstra booths. Picture: News Corp Australia

It is said to be one of the three failures at the time which led to the widespread outage.

The report said the failure triggered a ‘loss of communication’ alarm at 3.46am on May 3.

The loss of communicaton alarm meant no subsequent alarms were presented to Telstra in the event of further failure in the transmission equipment because that equipment was disconnected from the monitoring system.

Telstra says at the time the loss of communication alarm was not identified by Telstra operators as it was one of about 26,000 alarms — 13,000 of which were considered critical issues.

This large volume of alarms Telstra was flooded with at the time was considered to be “typical” however it appears the number of alarms received had led to this one being overlooked as a potentially bigger problem for the network.

“There were indications of problems with Telstra’s network on 3 May. If the Telstra alarming system had recognised the significance of the alarms indicating a partial transmission loss, there may have been an opportunity to restore transmission on the coastal route before the Orange fire disrupted the inland route. This may have prevented or minimised the significant impact to the Triple-0 service that arose on 4 May,” the report states.

Telstra said the at the time the controller card failure was considered “fleeting” and it indicated that the loss of communications was “intermittent”. At the time there was no impact to triple-0 services.

At 5.38pm on May 3, Telstra had raised an internal ticket for this issue to be repaired on the next business day.

The investigation highlighted that Telstra’s response to the incident showed the telco had not identified how crucial it was until 7.30am the next day — 28 hours after the first alert and well after the triple-0 chaos had started.

It wasn’t until then that Telstra technicians were on site to look at the issue.

The first sign of problems for emergency services was flagged at 1.20am on May 4 when an ACT fire supervisor reported they were receiving overflowing calls from ACT police and ambulance. Forty minutes later multiple Telstra services had failed.

Full triple-0 connectivity was restored about midday on May 4.

The investigation found Telstra had failed to identify a problem with the unsuccessful call routing to correct emergency services in ACT. When the issue was raised by the ACT emergency services agency, Telstra conducted a test but could not identify any problem with transferring calls at the time.

TELSTRA FORCED TO IMPROVE

The report highlights that the outage was due to three key factors:

— the partial failure of an inter-state transmission device

— the fire in Orange that led to issues with the interstate fibre cable

— previously missed software faults in two sets of Internet Protocol (IP) core network routers.

“It was the combined effect of these three events that resulted in the service disruptions experienced by callers to Triple Zero,” the report states.

“Telstra advises that the combination of these events presented a highly unusual and complex scenario involving Telstra’s Public Switched Telephone Network (PSTN), IP network, and physical damage to infrastructure from a fire that was outside Telstra’s control, which made technical diagnosis and resolution of the disruptions difficult.”

Since the outage Telstra has entered into a court-enforceable undertaking to improve its network in response to the regulator’s findings.

It was found Telstra had contravened a rule that requires providers to ensure triple-0 calls on their networks are carried to the operator.

The report acknowledged Telstra has made multiple improvements to its networking alarm system in response to the incident.

ACMA has accepted the court enforceable undertaking by Telstra to improving the redundancy and diversity of its network, develop new disruption protocols, and to ensure its systems are up to international standards.

“The actions Telstra has already taken, and is undertaking, will help strengthen the emergency call service and minimise the risk of another disruption to this critical service,” ACMA chair Nerida O’Loughlin said.

“Triple Zero is the lifeline for Australians in life-threatening or emergency situations.

“Community confidence in the emergency call service must be maintained.”

TIMELINE OF THE TELSTRA OUTAGE

MAY 3

•3.46am: Telstra alarm goes off for transmission device at the Exchange One in Victoria signalling “loss of communication”. Telstra describes the failure of the controller card as ‘fleeting’, indicating the loss of communications was intermittent.

4.55pm: Telstra lost its redundant links over Link 1 for the transmission of PSTN services,

including 000. There was also a partial loss of transmission capacity for the interstate links over another link (Link 3). At the time, Telstra understood there was no known impact to Triple-0 calls.

MAY 4

•1.20am: ECP in Melbourne gets call from ACT fire supervisor saying they’re getting overflow calls from ACT police and ambulance.

•2.05am: Multiple Telstra services fail. Telstra puts focus on IP core network as it’s flagged as the most likely cause.

•2.42am: Telstra identifies cause to be fibre damage near Orange NSW.

•3.02am: Technicians sent to repair cable.

•3.18am: “Memory issues” identified in IP core network routers located in Exchange Two (VIC) and Exchange Three (NSW).

•5.33am: Similar issue was identified in an IP core network router located at Exchange Five (QLD).

•7.30am: Telstra identified the potential significance of the transmission controller card failure that commenced on May 3. Actions initiated to replace the controller card.

•About 8.32am: Another incident in core IP network. This time a router in Exchange Four (NSW) encountered a memory-related issue.

•10.38am: Repair of damaged fibre cable at Orange completed. This restored majority of the impacted PSTN services, including the calls to the triple-0 service, to normal. However connectivity issues between triple-0 call centre in Melbourne and ACT emergency services continue.

•12.10pm: Repair of the faulty controller card for the transmission device completed. Triple-0 call routing between Victoria and NSW restored.

Add your comment to this story

To join the conversation, please Don't have an account? Register

Join the conversation, you are commenting as Logout

Original URL: https://www.couriermail.com.au/news/national/1400-triple0-calls-unable-to-get-through-in-major-telstra-outage/news-story/5b7a5d1ed0051cd0c33b0852e34b5492