Server's down: How do I find out what's wrong?

Track down Linux server problems with this step-by-step troubleshooting guide

1 2 3 4 5 6 Page 2
Page 2 of 6

Note: ethtool has uses beyond simply checking for a link. It can also be used to diagnose and correct duplex issues. When a Linux server connects to a network, typically it autonegotiates with the network to see what speeds it can use and whether the network supports full duplex. The Speed and Duplex lines in the example ethtool output illustrate what a 100Mb/s, full duplex network should report. If you notice slow network speeds on a host, its speed and duplex settings are a good place to look. Run ethtool as in the previous example, and if you notice Duplex set to Half, then run

$ sudo ethtool -s eth0 autoneg off duplex full

Replace eth0 with your Ethernet device.

Is the interface up?

Once you have established that you are physically connected to the network, the next step is to confirm that the network interface is configured correctly on your host. The best way to check this is to run the ifconfig command with your interface as an argument. So to test eth0's settings, you would run

sudo ifconfig eth0

eth0      Link encap:Ethernet  HWaddr 00:17:42:1f:18:be  

          inet addr:10.1.1.7  Bcast:10.1.1.255  Mask:255.255.255.0

          inet6 addr: fe80::217:42ff:fe1f:18be/64 Scope:Link

          UP BROADCAST MULTICAST  MTU:1500  Metric:1

          RX packets:1 errors:0 dropped:0 overruns:0 frame:0

          TX packets:11 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000 

          RX bytes:229 (229.0 B)  TX bytes:2178 (2.1 KB)

          Interrupt:10 

Probably the most important line in this is the second line of output, which tells us our host has an IP address (10.1.1.7) and subnet mask (255.255.255.0) configured. Now, whether these are the correct settings for this host is something you will need to confirm. If the interface is not configured, try running sudo ifup eth0 and then run ifconfig again to see if the interface comes up. If the settings are wrong or the interface won't come up, inspect /etc/network/interfaces on Debian-based systems or /etc/-sysconfig/-network_scripts/ifcfg-<interface> on Red Hat-based systems. It is in these files that you can correct any errors in the network settings. Now if the host gets its IP through DHCP, you will need to move your troubleshooting to the DHCP host to find out why you aren't getting a lease.

Is it on the local network?

Once you see that the interface is up, the next step is to see if a default gateway has been set and whether you can access it. The route command will display your current routing table, including your default gateway:

sudo route -n

Kernel IP routing table

Destination     Gateway      Genmask          Flags Metric Ref     Use Iface

10.1.1.0        *             255.255.255.0    U     0      0        0 eth0

default         10.1.1.1     0.0.0.0           UG    100    0        0 eth0

The line you are interested in is the last line, which starts with default. Here you can see that the host has a gateway of 10.1.1.1. Note that the -n option was used with route so it wouldn't try to resolve any of these IP addresses into hostnames. For one thing, the command runs more quickly, but more important, you don't want to cloud your troubleshooting with any potential DNS errors. If you don't see a default gateway configured here, and the host you want to reach is on a different subnet (say, web1, which is on 10.1.2.5), that is the likely cause of your problem. To fix this, either be sure to set the gateway in /etc/network/interfaces on Debian-based systems or /etc/-sysconfig/network_scripts/ifcfg-<interface> on Red Hat-based systems, or if you get your IP via DHCP, be sure it is set correctly on the DHCP server and then reset your interface with the following on Debian-based systems:

$ sudo service networking restart

The following would be used on Red Hat-based systems:

$ sudo service network restart

On a side note, it's amazing that these distributions have to differ even on something this fundamental.

Once you have identified the gateway, use the ping command to confirm that you can communicate with the gateway:

$ ping -c 5 10.1.1.1

PING 10.1.1.1 (10.1.1.1) 56(84) bytes of data.

64 bytes from 10.1.1.1: icmp_seq=1 ttl=64 time=3.13 ms

64 bytes from 10.1.1.1: icmp_seq=2 ttl=64 time=1.43 ms

64 bytes from 10.1.1.1: icmp_seq=3 ttl=64 time=1.79 ms

64 bytes from 10.1.1.1: icmp_seq=5 ttl=64 time=1.50 ms

--- 10.1.1.1 ping statistics ---

5 packets transmitted, 4 received, 20% packet loss, time 4020ms

rtt min/avg/max/mdev = 1.436/1.966/3.132/0.686 ms

As you can see, we were able to successfully ping the gateway, which means that we can at least communicate with the 10.1.1.0 network. If you couldn't ping the gateway, it could mean a few things. It could mean that your gateway is blocking ICMP packets. If so, tell your network administrator that blocking ICMP is an annoying practice with negligible security benefits and then try to ping another Linux host on the same subnet. If ICMP isn't being blocked, then it's possible that the switch port on your host is set to the wrong VLAN, so you will need to further inspect the switch to which it is connected.

1 2 3 4 5 6 Page 2
Page 2 of 6
It’s time to break the ChatGPT habit
Shop Tech Products at Amazon