Why do my WAN links slow down?

WAN links have been around for many years and they continue to be one of the most critical parts of any IT infrastructure. When I first started my career as a network engineer in the nineties, link speeds of 32Kbps were not uncommon. Thankfully we have moved on from this and link speeds of 10Mbps+ are not uncommon today. However, for most networks, bandwidth on WAN links is still limited, due mostly to the expense of upgrading to higher link speeds.

In today’s world there are lots of demands for more bandwidth as more and more services are centralized to main data centres. Couple this with VoIP and Internet services and the result can be that WAN links get overloaded very frequently. Often, the first indicator of a problem is when end users start complaining that their applications are slow or are crashing out. There can be a number of reasons behind why this is happening:  

  • Traffic levels too high

The most common reason for an increase in latency on a WAN connection is excessive volumes of data being sent and received. Just like on a highway, if you have too many cars the whole system slows down and this is especially true during rush hour. Networks also have their rush hours when users first logon or when backups kick off.

You should have a system in place that will allow you to find the top users of bandwidth on your network. Sometimes it can be down to things like video streaming or applications like BitTorrent. Network monitoring systems which use SNMP can be great for letting you know that there is a problem. You can then leverage technologies like packet capture or NetFlow-based applications to get more detail as to what is causing the problem.  

  • Application problems

One of the most common arguments between network and server folks is the question on “is it the network or is it the application that is causing the problem”. Sometimes it’s too is easy to blame the network as it is perceived as being the most complex element. Application issues can often be the source of the problem. When I say application I mean the server, operating system and whatever software or service that the user is accessing. You need to be looking at things like CPU usage, disk throughputs, and memory utilization. If you are hosting the server as a virtual appliance, make sure that other virtual machines on the same hypervisor are not slowing things down.

You should also be aware that some applications handle latency better than others. Applications which use HTTP can operate fine although it may take a bit longer to load pages. File sharing applications which use SMB or CIFS can really slow down to the point of making them unusable.

  • Slow DNS

The domain name system (DNS) is still widely used as computers cannot connect to hostnames like www.computerworld.com or server names like AppServer1. This host or server name must be converted to an IP address before connections can be established. A slow response from a DNS server will cause delays and application timeouts. You should be aware of the DNS hierarchy on your network and make sure that the DNS settings are correct on client systems. These settings may have been changed accidently or maliciously by something like the DNSChanger malware.

  • Routing issue

Most WAN links contain many hops from your network, though an ISP or cloud service, and then onto the remote network. Routing protocols are used to determine the most efficient way between two end points across these hops. In some cases links can go down but the routing protocol figures out a new way to get to the destination. This can result in a remote site staying connected but with more latency as data has to be routed though different parts of the network.

You should be familiar with what hops exist between you and your remote sites. Most operating systems come with an application which allows you to trace the route between two points on a network. Capture the output of this when the links are operating normally so that you have something to compare it to when things go wrong.

  • Issues with QoS

One solution to the problem of too much traffic on WAN links is to implement QoS. If I go back to my highway analogy, implementing QoS is like installing a bus lane, certain types of traffic get priority over everything else. When correctly implemented it can be a great way to maximise your available bandwidth. However, the prioritization of the traffic must be consistent throughout its journey.  I have heard of cases where service providers changed certain information within packets like TOS which meant that prioritization was messed up. If you suspect this is happening on your network, take a look at the network packets at both ends of the link and check for any changes in the QoS elements.

  • Malware on end user computers

So you have checked the bandwidth on the link, latency looks okay and the application servers are under no abnormal load but users are still complaining. What else could be going wrong?

The next step I would take is to check what is happening on the remote LAN that the users are connected to. It could be down to a zombie host on that network slowing things down for everyone else. Malware or scareware software on the end users systems can also slow down other applications so check for the presence of these.

Do you have any more tips for troubleshooting WAN problems? Comments welcome.

Copyright © 2012 IDG Communications, Inc.

It’s time to break the ChatGPT habit
Shop Tech Products at Amazon