Chapter 4Risk and resiliency
4.1This chapter details the steps taken by Optus to prevent and prepare for any future outage of the type that occurred on 8 November 2023. It also outlines broader risk and resiliency across the industry and discusses possible measures to enhance the resilience of Australia's telecommunications networks.
The state of Australia's telecommunications networks
4.2Evidence to the committee from telecommunications carriers Optus, Telstra, and TPG Telecom (TPG) emphasised that all networks are likely to face outages. For example, Optus submitted that all telecommunications networks 'suffer from infrequent disruptions to service', and stated that:
While every communications network provider wants to avoid such outcomes, it is an unfortunate reality in our reliant digital age that no communications network can completely protect against, nor prevent, these types of occurrences from ever happening – despite the investments made or resiliency efforts undertaken.
4.3Telstra advised that:
While all network operators strive to minimise network problems by building in redundancy and resiliency into their networks, all networks can, and do, suffer major outages from time to time. It is not realistic to expect that major outages can be eliminated entirely.
4.4TPG submitted that its mobile and fixed networks operate on 'separate core networks' limiting the potential for its networks to face widespread outages.
4.5The Australian Mobile Telecommunications Association (AMTA) cautioned that no telecommunications network is entirely resilient, arguing that the scale, complexity, and interconnectedness of today's telecommunications networks mean 'it's unrealistic to expect that major outages will never occur'. AMTA pointed to recent outages in the United States, Canada, Japan, and various European markets that had ranged from five to 61 hours in length, some of which are discussed in Box 4.1.
Box 4.1 Example of recent international network outages Canada On 8 July 2022, Canadian telecommunications provider Rogers Communications experienced a major service outage lasting over 16 hours. According to NetBlocks, a cybersecurity watchdog, 12 million customers (around a quarter of Canada’s population) lost internet connectivity. As compensation for the service outage, Rogers issued credit to customers for 5 days of service, estimated to be valued at $50 million CAD. Directly following the outage, the Canadian Minister of Innovation, Science and Industry, the Honourable Francois-Philippe Champagne, directed Rogers Communications and other Canadian telecommunications companies to develop a backup plan to prevent a similar outage, giving them 60 days to do so. On 7 September 2022, Rogers Communications and 13 other telecommunications service providers signed a Memorandum of Understanding (MoU) aimed to ensure and guarantee emergency roaming and other mutual assistance in the case of a major outage. The MoU cited the need for 'additional measures to improve network reliability and resiliency and to mitigate the impact of service outages'. Japan On 2 July 2022, Japanese telecommunications provider KDDI, the second largest carrier in Japan, experienced a major service outage that lasted up to 61 hours before being resolved. An incident report published by KDDI and provided to the Japanese Ministry of Internal Affairs and Communications indicated that about 30 million people were affected by the outage. Like the Optus outage, affected consumers were unable to connect to the Japanese emergency service line, 110 or 119. Following the outage, the Ministry of Internal Affairs and Communications announced the establishment of a 'study group' that would assess and report on the practicality of network roaming during emergencies. After the outage, KDDI announced that it would compensate 2.71 million consumers 'heavily affected' affected by the network outage by refunding two days' worth of their monthly basic fees. Additionally, it would provide ¥200 to all 35.89 million subscribers, including those receiving the extra compensation, to be deducted from their bills. |
4.6The Australian Communication Consumer Action Network (ACCAN) drew the committee's attention to the Canadian Rogers Outage and how the Canadian Radio-television and Telecommunications Commission responded by issuing an interim directive that required carriers, among other things, to:
notify the Commission within two hours of becoming aware of a 'major service outage'; and,
within 14 days of an outage, provide a comprehensive report detailing the cause, resolution steps, impact on emergency and accessibility services, and plans to prevent future outages.
4.7ACCAN suggested that this case study could guide the Australian Government in considering regulatory reform to ensure that Australian telecommunications carriers have clear guidelines for responding to outages, thereby helping consumers, businesses and governments better adapt when such events occur.
4.8The Department of Home Affairs submitted that the Optus outage highlighted the deep interconnections between Australia's critical infrastructure sectors:
Significant disruption in one sector may have severe, cascading and compounding impacts and consequences on the delivery or support of other critical infrastructure services. The Optus outage has highlighted the consequences which can occur as a result of this interdependency, with the event impacting the availability of other critical infrastructure services, including hospital services, financial services and public transport. It also demonstrated the flow-on impact to businesses, which also act as key third-party service providers to other sectors (as well as individuals) across the economy. This adds to the significant impact of the outage on the day-to-day operation of tens of thousands of businesses and millions of individual Australians.
4.9Optus submitted that telecommunications services are provided on a commercial basis and are consequently not regulated, and as such, there is no guaranteed return on investment for telecommunications providers.
Optus' preparedness and resilience
4.10In evidence, Optus outlined its crisis management preparedness. Optus CEO Ms Kelly Bayer Rosmarin told the committee:
Even though we'd done very detailed reviews into our architecture, processes and risks to make sure we could be as resilient as possible, we did not have an articulated, clear risk that each one of our 90 routers would independently shut itself down at the same time using a fail-safe on those Cisco routers that our team was not aware of. That specific risk that caused this outage was not identified in any of our reviews.
4.11Optus reported that some of its staff had backup physical SIMs or eSIM's available to them as a redundancy measure that operated on non-Optus networks in the event of a network outage but acknowledged that most Australians do not have access to such a fallback.
4.12In evidence to the committee, Mr Lambo Kanagaratnam, Optus Managing Director of Networks, conceded:
We didn't have a plan in place for that specific scale of outage. It was unexpected. We have high levels of redundancy and it's not something that we expect to happen.
4.13Mr Kanagaratnam subsequently stated that the scale of the outage was 'something that we didn't anticipate'. He told the committee that Optus had the correct defence mechanism in place, and told the committee, '[f]or us to lose 90 routers in one outage is not something that we contemplated'.
4.14Ms Bayer Rosmarin stated that it was Optus' intention to provide customers with connectivity 24/7 for 365 days a year, and that Optus has multiple layers of geographical, physical, and power redundancy. She insisted that it was 'highly unusual' for all of Optus' segregated networks to go down simultaneously.
4.15Optus detailed a number of measures it had undertaken prior to the outage to enhance the resilience of its network, including simulating a state-outage and an assessment of a potential attack on a state's internet exchange, and regular internal and independent third-party reviews of its resilience network architecture. The latest major network architecture review was completed in October 2023. Ms Bayer Rosmarin told the committee that Optus regularly tests its failover (its ability to seamlessly switch to a backup system in the event of failure), and has identified 42 tests that are performed on a regular basis to test the resilience and redundancy of its network.
4.16Ms Bayer Rosmarin detailed how different areas within the company perform their own regular testing, and that Optus updates its risk process at least once a year. She also told the committee that an executive risk committee was in place to oversee measures to address risks that have been identified, which meets at least quarterly.
4.17In answers to questions taken on notice, Optus claimed that it has 'an established, comprehensive risk management framework and supporting processes' which align with industry standards to facilitate the continuous identification, quantification, monitoring, and controlling of risks.
4.18Optus detailed the following measures intended to manage risks to its network:
designing and maintaining a robust network architecture with multiple layers of redundancy;
performing exercises to identify, plan for and test risks;
maintaining processes and systems to identify and address incidents as quickly as possible; and
conducting regular crisis management exercises to test Optus infrastructure, systems, processes, and capabilities.
4.19Additionally, Optus confirmed that it implemented its crisis management procedures (CMP) on the day of the outage, the objectives of which were to 'mitigate the effects of a crisis, minimise disruption to operations and recover operations'.
4.20Ms Bayer Rosmarin also insisted that the company had invested more in network resilience. Mr Kanagaratnam provided further details of this at the hearing:
We have multiple layers of redundancy in the network. If we look at how we connect the services to our customers, we have multiple exchanges or sites across the country and so we segregate different parts of the network according to that. At any time, if one exchange is isolated, it will only impact a certain amount of customers if the whole exchange is lost. Within that exchange we have multiple layers of redundancy in terms of connectivity … Then what we do for mobile voice data and fixed voiced is that we provide geographical redundancy so that traffic can switch seamlessly across the country … For all our intercity fibre between our major cities, we have at least three different routes of connectivity, and we have multiple other layers of redundancy.
Enhancing the resilience of Australia's networks
4.21Submitters to the inquiry contended that the Optus network outage highlighted significant potential risks to Australia's telecommunications networks, and that reform was needed to help prevent and manage outages, as well as other threats to the networks. Key proposals for reform included:
the implementation of mandatory network roaming in cases of emergency;
improvements to the Emergency Call Service, Triple Zero;
the introduction of legislation to class telecommunication providers as 'critical infrastructure', imposing further security obligations on them in cases of emergency; and
other tangential matters such as improved industry collaboration.
Roaming and network sharing
4.22The outage prompted renewed calls for mandatory emergency network sharing and network roaming during outages—whereby customers from one network would be able to connect to a rival network in the event of a significant outage.
4.23On 30 June 2023, the Australian Competition and Consumer Commission published a report which found that the provision of temporary mobile roaming during emergencies was 'feasible'. The Australian Government subsequently announced that it would pursue trials to determine the feasibility of network sharing with industry. Submitters advised the committee that these trials were already in progress.
4.24However, Optus noted the potential limitations on these roaming solutions:
Optus is working with the other [mobile network operators] to assess the viability of temporary disaster roaming. If a roaming solution was in place, it would have likely resulted in other mobile networks being unable to accommodate the extra traffic given the number of users trying to roam. If the capacity issue could have been addressed, roaming would likely not have worked as the Optus core network was down and Optus subscribers would not have been able to be authenticated for roaming.
4.25Similarly, Telstra took the view that the temporary disaster roaming solution currently under discussion would not have compensated for the Optus outage due to the inability to authenticate users because of communications being disrupted to the Optus core network.
4.26The committee also received evidence on the viability of activating roaming during nation-wide network outages.
4.27Ms Bayer Rosmarin told the committee that Optus had discussed whether it was possible to temporarily roam its customers to another network during its first crisis meeting but had concluded that there was currently no capability to do so at that scale in Australia.
4.28Some submitters noted the 'significant technical and operational challenges' related to roaming in the event of a major outage. For example, AMTA and Telstra cautioned that wide-scale roaming may risk the integrity of the host network, which could fail or become congested due to a surge in users. Moreover, if an outage affects the core of the network, AMTA stated that roaming to other networks would not be possible as it would not be possible to authenticate devices.
4.29Telstra submitted that solving these issues would likely be 'very expensive' and 'highly complex' and claimed that it was not aware of any countries currently implementing network roaming in this way. TPG similarly advised that such a capability would require government funding.
4.30However, other submitters noted that there was a basis for considering avenues to mandate domestic roaming—particularly for regional Australia. TPGTelecom argued that roaming would 'have the benefit of providing a much higher degree of network resilience for Australians living in regional areas', and further, that it would incentivise investment and 'reverse the trend of weakened competitive dynamics'. The NSW Government similarly noted the benefits of a 'permanent, ongoing roaming capability'.
4.31Similarly, Mr Mark Gregory of RMIT submitted that there are many positives for implementing emergency roaming and that telecommunications carriers are 'typically not in favour of domestic mobile roaming as there is a perception that it reduces competition'. He insisted that domestic roaming can be implemented despite the protests of telecommunications carriers, and further, that arguments that high cost is an inhibitor to doing so are unfounded.
The Triple Zero network
4.32Evidence to the inquiry emphasized that the Emergency Call Service, also known as Triple Zero, plays a fundamental role in the safety of the Australian community and that the inability of many Optus customers to contact the service represented a critical failure and risk to public safety.
4.33As articulated by the NSW Government, a situation:
…where there is no ability for emergency management communications, including for Triple Zero, Emergency Alert and inter-community communications, constitutes a significant, material, unmitigated risk to public safety and lives.
4.34The Police Federation of Australia similarly submitted that the inability to contact emergency services can be the 'difference between life and death'.
4.35The merits of integrating different technologies into the Triple Zero service were considered by some submitters. For example, ACCAN recommended that the committee consider 'innovative ways to enhance Triple Zero services', such as the facilitation of a 'next generation' Triple Zero service.ACCAN considered that, in doing so:
…this is an opportunity for Optus and other telcos to review their redundancy and accessibility arrangements to ensure that in case of future large-scale network outages people with disability would not be at extended risk.
4.36The University of Melbourne Centre for Disaster Management and Public Safety explained what a 'next generation' Triple Zero service would entail:
The current ‘voice only’ Triple Zero service will be upgraded to allow callers to transfer data (pictures, videos, live feeds) and enhanced location information in support of their calls for emergencies and response by Public Safety Agencies.
4.37However, it submitted that such an upgrade would make the 'ecosystem even more complex' which would need to be addressed by increased investment in 'risk mitigation and redundancy arrangements'.
4.38Additionally, both the University of Melbourne Centre for Disaster Management and Public Safety and the Police Federation of Australia drew the committee's attention to a finding of the Royal Commission into National Disaster Arrangements, which called for the establishment of a Public Safety Mobile Broadband service and voiced their support for such a capability. ThePolice Federation of Australia emphasised that:
Until that recommendation is finally delivered, police and other emergency services heavily rely on telco providers for broadband capabilities.When those telco services fail, not only are the lives of Australia’s emergency services first responders put at risk, so are members of the Australian community who may be affected by an issue requiring emergency services.
4.39As of May 2023, the Australian Government had committed $10.1 million to establish a taskforce to deliver a Public Safety Mobile Broadband service.The taskforce, comprised of the Department of Infrastructure, Transport, Regional Development, Communications and the Arts, and state and territory government agencies, is currently 'creating the framework that drives the delivery of a national PSMB capability'.
Bringing telecommunications companies under security of critical infrastructure legislation
4.40As outlined in Chapter 1, the security obligations of telecommunications providers, including Optus, are articulated in the Telecommunications Act 1997 and the Security of Critical Infrastructure Act 2018 (SOCI Act). While there is an obligation for some entities that fall under the SOCI Act to comply with a critical infrastructure Risk Management Program (RMP), Optus and other carriers are not currently considered to be 'critical infrastructure' as defined in the SOCI Act, and therefore are not obliged to maintain an RMP.
4.41Mr Mark Gregory submitted in favour of more stringent auditing and reporting for critical infrastructure operators, such as telecommunications carriers. Herecommended the Australian Government develop more comprehensive regulations to ensure that there is 'more transparency and improved reporting to the regulator … on network design, management practices, redundancy and resiliency'.
4.42On 13 November 2023, the Australian Government announced its intention to introduce legislation to class carriers as 'critical infrastructure' and align their obligations with the SOCI Act. The then-Minister for Home Affairs, theHonClaire O'Neill MP, said that telecommunications companies should be required to adhere to the same standards as other critical infrastructure operators.
4.43The Department of Home Affairs' submission provided further information that an obligation under the RMP rule would require an entity to assess and develop appropriate strategies to minimise or eliminate risks, implement robust procedures to mitigate the impact of hazards, and to continuously update their RMP. At the time of writing, the legislation had not been introduced.
4.44The Department of Home Affairs also noted that the Australian Government is working with the telecommunications sector to 'consider the options for developing a dedicated risk management program and broader obligations for the telecommunications sector'. It submitted:
By placing the onus of adequate risk management on the entity's Board, the RMP obligation enhances engagement and review of internal auditing risk mitigation processes. Anecdotal evidence from industry suggests the Board exposure resulting from the SOCI legislative framework, including the RMP obligation, helps to elevate risk management to the Board level.
4.45TPG explained that it had been working with the Australian Government regarding this potential amendment and noted its intention to 'ensure that the reforms promote accountability without duplication and the addition of unnecessary regulatory burdens'.
4.46The Internet Association of Australia welcomed new legislation addressing critical infrastructure but urged that it should only be introduced after adequate industry consultation. It stated that the introduction of 'reactive legislation' will not contribute to substantial outcomes.