NorNet Edge monitored today’s Telenor service outage

Earlier today, Telenor Norway experienced an outage that impacted mobile broadband users for several minutes. At 08:49AM Telenor Norge made a press release on its Facebook page reporting the outage and informing its customers that Telenor is aware of the outage and it is looking into it. The outage was resolved shortly after that.

Telenor-Outage-2016-08-11-Figure1
Figure 1. Telenor’s Facebook post

In order to understand the impact and extent of the outage, we leverage the measurements of Telenor’s performance that is collected by the NorNet Edge infrastructure (NNE). At the time of the outage NNE had 59 active Telenor subscriptions that are spread over Norway. NNE attempts to always maintain an active data session over each connection. Then performs a handshake with a server hosted in Oslo by sending a data packet to the server which echoes it back every second.

About 40% of NNE Telenor connections were unable to exchange data during the outage. Figure 2 below shows the percentage of impacted Telenor connections aggregated every five minutes. The percentage of affected nodes started ramping up around 08:15 and kept steadily increasing to peak at 08:40. The outage seems to have been resolved around 08:45. The severity of the impact varied across connections, two thirds of the affected connections suffered at least a five minute long outage, while the rest suffered shorter degradations. Almost all affected nodes lost their data connection completely at some point during the outage. Nodes that lost their connectivity early on kept on trying to reestablish it. They often succeeded in receiving an IP address from the network but quickly lost the data connection. Looking further into this, we found that these nodes were unable to complete all the steps needed for establishing a data connection. These function are managed by elements in the mobile core networks (e.g. the MME, HSS, HLR). Hence, our observation hints that the outage is caused by a failure in the core network (e.g. between the MME and HSS in 4G).

To gain further insights into the root cause of the failure, we checked the location of the impacted nodes. Figure 3. Shows the geographic distribution of the impacted nodes and we can clearly see that they are spread all over the country. This confirms our earlier hypothesis that the cause of this failure lies in the core network.

In conclusion, today’s failure lasted for about half an hour and had a nationwide impact. Its root cause seems to lie in the core network. Telenor was quick at fixing this failure, and appears to have a decent degree of diversity within its network that minimized the impact of the failure. However, more diversity, faster failover and quicker failure resolution are needed to ensure that mobile broadband networks live up to the expectations of users and services that require more than a best effort reliability.

Figure 2. The percentage of Telenor connections unable to send data
Telenor-Outage-2016-08-11-Figure3
Figure 3. Geographic distribution of impacted nodes