NATS: It was a server failure

A server failure has been revealed as the root cause of the UK air traffic control problems that occurred last weekend.

On 7 December, NATS (National Air Traffic Services) said that it was unable to switch from night time to daytime operation, leading to severe delays at UK airports, due to a “technical problem”.

A NATS spokesperson told ComputerworldUK that it uses a Frequentis voice communication system, which has a touchscreen interface that automatically loads all the contacts that a controller needs for the particular piece of airspace that they are controlling at that time. These contacts include those around NATS and in other agencies involved in the air navigation network.

“It therefore ensures they can always immediately reach the person they need to speak to and will reconfigure itself with settings specific to the sector that the controller is responsible for when they log in for their shift,” the spokesperson said.

However, a server failure meant that the necessary contacts were not automatically loaded up at the required time.

“In the early hours of 7 December 2013 there was a server failure within the control and monitoring system connected to the Frequentis Voice Communication System at Swanwick Area Control, which caused significant disruption to the UK air traffic operation,” said Frequentis.

“At the time of the failure, the air traffic operation was in a night time configuration. At no time (was) radio or telephone communication was lost, only the ability to switch to daytime mode and therefore open additional workstations.

“As a result, it was not possible from 05:00 to start splitting the airspace up into the additional 15-20 sectors that would be used on a typical day.”

The vendor said that its engineers supported the NATS team on fixing the problem within an hour of it being informed of the issue by NATS, and that its CEO, Hannes Bardach, and technical director, Hermann Mattanovich, were in “regular dialogue” with NATS during the incident. Normal operations were resumed after 14 hours.

According to Frequentis, the NATS recovery procedure that was tested in its facility in Vienna before deployment “worked as designed” and recovered the service with critical radio communications with aircraft maintained throughout.

“There were no safety incidents resulting from this system failure,” it said, adding that NATS was able to support 90 percent of normal Saturday flights once the contingency measures were implemented.

Meanwhile, NATS chief executive Richard Deakin said that the firm had “never” experienced this technical issue in over 10 years of operation at Swanwick, and apologised to customers for the incident.

“This is something we deeply regret and are determined to do all we can to avoid it happening again.”

NATS launched an internal major incident inquiry immediately after the incident, and its board, through its Technical Review Committee, instigated an investigation led by Peter Read, independent non-executive member of the board and chairman of the Airline Group.

Despite this, Deakin welcomed further investigations: “Some of the comments over the weekend show that some parties believe our contingency was insufficient and instead we should be able to continue at 100 percent capacity in any eventuality.

“In addition to these measures, we believe it would now be to everyone’s benefit for the Civil Aviation Authority (CAA) to conduct an open and transparent review to confirm whether the level of contingency we have in place meets reasonable operational expectations at reasonable cost, or whether further measures need to be adopted, and if so how these further measures should be funded within the regulatory regime.”

Copyright © 2013 IDG Communications, Inc.

8 highly useful Slack bots for teams
Shop Tech Products at Amazon