Automating System Comparisons

One of the first questions that enters the mind of someone troubleshooting a system problem is how the system with the problem differs from systems that don't exhibit the problem. Even when systems are very similar with respect to the software they are

running, the patches installed, the particular architecture and applications, they can be different in ways that lead to problems. One system might have more swap space. Another might have a newer library. They might mount different file systems or bind to different name servers. Automating the comparisons that are important in your environment can help pinpoint important differences when you need to resolve a problem in a pinch. In this week's post, we're going to look at a script that makes some initial comparisons between two systems to flush out some of the issues that might provide insights during troubleshooting.

In order to run commands remotely without requiring the user to repeatedly enter a password, we're going to assume that either ssh or rsh commands have been enabled from the system on which the script is being run. The command used can be selected in the sixth line of the script.

# compare two systems for important differences

# set script to use ssh or rsh

The compare script expects to be passed two system names and exits with a usage statement when this is not the case.

if [ $# != 2 ]; then
    echo "USAGE: $0 sys1 sys2"
    exit 1

To verify that our commands are going to run without our output being punctuated by password requests, we check the effectiveness of the selected remote command protocol like this:

# check systems for remote command usage
dt1=`$SH $sys1 date 2>/dev/null` || echo "OOPS: cannot $SH to $sys1"
dt2=`$SH $sys2 date 2>/dev/null` || echo "OOPS: cannot $SH to $sys2"

If either of these commands fails to produce a date (i.e., fails instead), the script will exit:

if [ "$dt1" == "" ] || [ "$dt2" == "" ]; then

if the script fails at this point, the user will see something like this if using rsh:

# ./compare boson fermion
OOPS: cannot ssh to fermion

If using ssh when password-free access is not allowed, the user will be prompted for a password three times for each system before the script exits.

The first differences the script looks for are differences in the versions and patch levels of the operating systems.

# OS
rev1=`$SH $sys1 "uname -r"`
rev2=`$SH $sys2 "uname -r"`

if [ $rev1 != $rev2 ]; then
    echo "Revisions differ"
    echo "$sys1 $rev1"
    echo "$sys2 $rev2"
ver1=`$SH $sys1 "uname -v"`
ver2=`$SH $sys2 "uname -v"`
if [ $ver1 != $ver2 ]; then
    echo "Versions differ"
    echo "$sys1 $ver1"
    echo "$sys2 $ver2"

The output from this portion of the script might look like this:

Revisions differ
boson 5.9
fermion 5.10

Versions differ
boson Generic_112233-11
fermion Generic_127127-11

The next thing we look at is what file systems are mounted from remote systems.

# what is mounted
$SH $sys1 "df -k | grep :" | sort > /tmp/df-k-1-$$
$SH $sys2 "df -k | grep :" | sort > /tmp/df-k-2-$$
diff /tmp/df-k-1-$$ /tmp/df-k-2-$$ | grep : && echo

The output of these commands will highlight file systems that are mounted on only one of the two systems:

< quark:/data        173570555 130418936 41415914    76%    /net/quark/data

We then use the prtconf command, which provides information on the system components, including installed memory:

# system components
$SH $sys1 "/usr/sbin/prtconf | grep :" > /tmp/prtconf-1-$$
$SH $sys2 "/usr/sbin/prtconf | grep :" > /tmp/prtconf-2-$$
diff /tmp/prtconf-1-$$ /tmp/prtconf-2-$$ | egrep "<|>" && echo

We might see output like this, showing that the systems have different amounts of RAM installed:

< Memory size: 1024 Megabytes
< Memory size: 8192 Megabytes

We also look at swap space.

# swap space
swap1=`$SH $sys1 "/usr/sbin/swap -l" | grep -v swapfile | awk '{print $4,$5}'`
swap2=`$SH $sys1 "/usr/sbin/swap -l" | grep -v swapfile | awk '{print $4,$5}'`
if [ "$swap1" != "$swap2" ]; then
    echo "swapfile blocks free"
    echo $sys1 $swap1
    echo $sys2 $swap2

The output from a difference in swap space will look something like this:

swapfile blocks free
boson 1052624 946544
fermion 2097392 2097392

Another important factor when comparing systems is performance. To get a very quick but essential statistic on system performance, we compare the 15-minute load measurement:

# compare load
load1=`$SH $sys1 "uptime" | awk '{print $NF}'`
load2=`$SH $sys2 "uptime" | awk '{print $NF}'`
echo "load: $sys1 $load1 vs $sys2 $load2"

The single line of output tells us a lot about how much strain each system is under.

load: boson 2.35 vs fermion 0.01

The last thing we check in this script is who is logged on. While this may not be an important system difference, it's nearly always useful to know how many people are using the system and who they are:

# who is logged in
echo $sys1
$SH $sys1 who | awk '{print $1}' | sort | uniq -c
echo $sys2
$SH $sys2 who | awk '{print $1}' | sort | uniq -c

This gives us a simple list of usernames:

   1 jd
   1 chrissie
   2 donboy
   1 ellie
   3 billh
   1 sandra
   1 bigdoe
   1 jonp
  11 godiva
   6 godiva

Lastly, we clean up any temporary files that we created during our system comparisons.

# clean up
rm /tmp/df-k-1-$$
rm /tmp/df-k-2-$$
rm /tmp/prtconf-1-$$
rm /tmp/prtconf-2-$$

This article is published as part of the IDG Contributor Network. Want to Join?

Computerworld's IT Salary Survey 2017 results
Shop Tech Products at Amazon