No more historic SLA reports: Get it (and fix it) in real time

This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.

Businesses of almost all types are increasingly dependent on service providers for network connectivity that consistently delivers certain performance characteristics. In addition to basic service availability, these characteristics increasingly include peak, average and minimum bandwidth utilization, latency (delay) and latency variation (jitter), and packet loss -- all of which can affect operational efficiency and end user satisfaction. This is especially the case with Ethernet-based service offerings employing packet transport to deliver E-LINE services operating at speeds up to tens of gigabits per second.

Where third-party network services or facilities are concerned, these performance requirements are often written into contractual service level agreements (SLAs) that entail penalties to the service provider if they're not met. In most cases, service providers are required to supply periodic reports detailing the degree to which SLA targets were achieved.

ANALYSIS: Cloud computing and the truth about SLAs

But it's of little value for IT managers to discover at month-end that certain SLA violations occurred and that they may be due financial rewards. The damage -- to customer satisfaction especially -- has already been done and cannot be corrected by collecting penalties from the service provider.

What network managers need is not just backward-looking SLA reporting -- they need real-time SLA assurance reporting. They need to understand the presence of SLA violations the instant they occur and where they occurred so they can correlate and assess exactly what applications and end users have been adversely impacted.

Better still, if network managers can track actual bandwidth utilization against committed and peak bandwidth agreements, they will be in a position to forecast when additional bandwidth will be required on certain routes. By doing so, they may be able to prevent SLA violations from occurring, which is the preferred outcome for all parties.

For example, network delay is one of the most critical performance attributes since it is visible to users. High network delay is most frequently the result of packet discards and TCP retransmissions, which almost always stems from bandwidth congestion. Network managers with the right real-time monitoring tools can better forecast bandwidth needs and reduce the probability that such events will occur.

Going beyond historic SLA reports

Getting at this highly granular level of information in real time requires much closer collaboration with service providers because detailed SLA performance information only exists within service provider networks. However, in the past service providers have lacked many of the mechanisms required to accurately, securely, and cost effectively provide their customer with real-time access to SLA-impacting performance data.

The advent of cloud-based SLA assurance services, as well as evolving standards for Ethernet operations, administration and management (OAM), are beginning to change that. The tools now exist that allow service providers to very quickly prototype and deliver secure Web portals that their customers can use to access detailed SLA performance data. In some cases these capabilities are fee-based additions to core Ethernet services, in others they are bundled in as part of the service. In either case, IT network managers are able to securely access real-time SLA-impacting performance metrics from their laptops, tablets or smartphones.

How it works

Virtually all service providers routinely collect mountains of data -- measurements of network performance such as capacity utilization, packet loss rates, unidirectional and round-trip delay, and jitter. This information, which is captured for the service provider's own network management systems, can be filtered and selectively shared with enterprise customers. In order to accomplish this, certain accessibility, security and virtualization steps must be taken:

(a) Applicable data must be pushed into the cloud and made accessible via customer-specific Web portals. In the interests of security and maintaining high network availability, service providers will want to extract and isolate this information from the network operations domain.

(b) The frequency of data uploads impacts the degree of "real-timeness." More frequent uploads provide fresher information but could have implications to network resources and data storage. Many service providers have found that 15-minute upload intervals best balance these competing needs, but other options are available.

(c) Whereas the service provider typically looks at performance information in aggregate, an organization's IT department can only access the relevant performance data for their own services. Each enterprise customer should be able to securely view its own topology over the service provider's virtual network and examine individual services. Ideally, this capability will be integrated with a mapping service such as Google Maps to provide topologically accurate service-layer network views.

One of the key enabling technologies that make real-time SLA monitoring feasible, including OAM performance specifications for Ethernet, is International Telecommunication Union (ITU) Y.1731. This standards-based approach provides clear and consistent metrics for measuring and reporting SLA parameters for Ethernet services so equipment vendors, service providers and their enterprise customers can speak a common language when measuring and reporting the performance of SLA metrics.

In fact, standards like ITU Y.1731 remove one of the primary obstacles service providers have faced collecting and exposing SLA performance data -- the high cost of developing proprietary interfaces into different network elements.

ITU Y.1731 is widely supported in the latest generation of carrier-grade network equipment. For networks that are not Y.1731 enabled, relatively low-cost Ethernet network interface devices (NIDs) supporting this standard are available from a variety of vendors. These NIDs can be deployed at client sites and key network transition points across the network to provide end-to-end SLA monitoring and reporting with minimal additions or changes to existing equipment.

New cloud-hosted SLA assurance services make it practical for virtually any service provider to extract the performance metrics for key applications in multi-vendor environments and share that data with enterprises in real time. These new SLA assurance services enable IT managers and other authorized personnel to easily yet securely view the performance data of their own enterprisewide Ethernet services online -- anytime, anywhere.

IT managers with real-time visibility into services and SLA performance are better equipped to more effectively:

i(r) Communicate and collaborate with service providers and end users when service-impacting conditions occur.

i(r) Isolate and resolve non-network-related issues faster by ruling out the network.

i(r) Monitor and track resource utilization for more accurate budget and capacity planning.

i(r) Identify the most deterministic metrics of application performance and negotiate more meaningful SLAs.

i(r) Verify they're getting the level of service they're paying for.

Continued innovation with standards bodies and key enabling technologies now make it faster, easier and less expensive for organizations to take advantage of reliably performing business Ethernet services with guaranteed SLAs and real-time SLA assurance reporting. New cloud-hosted SLA assurance services are now available or undergoing trials with dozens of service providers across the United States and Europe. The cloud-hosted, multi-vendor nature of these solutions make it practical to turn up SLA assurance services in a single day for existing "SLA aware" networks, or a matter of weeks when starting from scratch.

The trend is clear -- IT managers are increasingly demanding (and receiving) real-time access to SLA data resulting in better performing, more reliable and more deterministic enterprise network services.

Read more about lan and wan in Network World's LAN & WAN section.

This story, "No more historic SLA reports: Get it (and fix it) in real time" was originally published by Network World.


Copyright © 2012 IDG Communications, Inc.

Shop Tech Products at Amazon