Looking into system performance: The artistry of top

top
Credit: flickr / Porter Rockwell
RELATED TOPICS

Several decades after its introduction in 1984, top is still one of the most popular tools for looking into system performance. It can help you spotlight processes that are consuming system resources, gauge resource limitations, or get a quick and very useful glimpse into how well your systems are handling their processing loads.

The basic form of the command -- what you see when you simply type “top” -- shows the most useful performance statistics you’re likely to find on a Unix system. You get a view of how much memory is being used, the system load, the processes that are using the most resources (and who is running those processes), how long the system has been up and whether any serious swapping is going on.

Take the following top output. This system has been running for over a year since its last reboot. Some of the numbers that quickly tell you that the system is not having any problems include:

  • 99.9% idle time -- the system is not sweating and, instead, is spending a lot of time waiting for something to do
  • load averages all under 1.0 -- only once in a while does some process have to wait for access to the CPU
  • there’s a lot of free memory (more than 50%)
  • little swap is being used

You’ll probably notice that top just happens to be the process that is listed first. It's almost always a sign that the system doesn’t have much to do if the top command gets more access to the CPU than other processes.

# top
top - 15:41:30 up 863 days,  4:21,  2 users,  load average: 0.25, 0.08, 0.03
Tasks: 403 total,   1 running, 402 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.1%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  37037804k total, 17297860k used, 19739944k free,   387076k buffers
Swap: 16778232k total,   415288k used, 16362944k free, 13402116k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9727 root      15   0 12868 1324  812 R  0.7  0.0   0:00.03 top
 1330 root      10  -5     0    0    0 S  0.3  0.0  28:43.74 scsi_eh_1
 5499 oracle    15   0 8416m  35m  31m S  0.3  0.1   0:00.50 oracle
 6507 root      21   0 1694m 486m  10m S  0.3  1.3 894:02.16 java
 7201 oracle    15   0  659m  55m  15m S  0.3  0.2 455:12.66 oraagent.bin
 7557 oracle    15   0 8419m 314m 306m S  0.3  0.9  40:36.22 oracle
12632 oracle    19   0  9.8g 685m  19m S  0.3  1.9   7:39.88 java
    1 root      15   0 10344  684  572 S  0.0  0.0   2:46.64 init

This command updates the display every 3 seconds, shows you additional lines of process data if you stretch out your window, and runs until you type “q” for quit or ^c.

That first line, as you might have noticed, is the same that you get from the uptime command.

# uptime
 15:49:24 up 863 days,  4:29,  1 user,  load average: 0.25, 0.08, 0.03

The load averages are one of the most useful measures for gauging overall system performance. These three numbers represent the number (on average) number of processes that are having to wait to get access to the CPU. If the load average were 1, it would mean that there’s generally a process waiting all of the time. The .025 measurement means that there’s a process waiting only once in every four times that number is queried.

There are three numbers -- one that shows the average over the last minute, one that shows the average over the last five minutes, and one that shows the number over the last fifteen minutes.

Getting a sense of whether the system is slowing down or speeding up, therefore, only takes a quick look at the three load statistics. In this case, the system is slowing down. You can tell this because the one-minute average is larger than the five-minute statistic and the five-minute average is larger than the fifteen-minute statistic. Keep in mind, however, that we’re only looking at fifteen minutes worth of data. You’ll need to capture a lot more load data to get a feel for how the system is doing over a longer period of time.

The top command’s default is to organize the processes it report by CPU usage (highest usage first). However, you can change that interactively or when you start:

  • Interactively, press M when you’re using top, and P to go back to sorting by CPU usage
  • When you start, use top –a

Press T if you want to sort by the time processes have been running (longest running first).

You can also change the interval that top uses -- the number of seconds that it waits between display updates. Use the –d option followed by the number of seconds you want each interval to take. Note that you can specify portions of seconds if you’re so inclined, though .05 is going to produce quite a frenetic display.

And you can control how many views you see -- a continuous stream or some specific number of updates. Add –n # (e.g., -n 10) if you want to see only a specific number of updates before your top command completes.

The command below shows performance data organized by memory usage with only one view.

$ top -a -n 1
top - 12:07:50 up 233 days,  2:38,  1 user,  load average: 0.00, 0.01, 0.05
Tasks:  64 total,   1 running,  63 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.1%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1020188k total,   902120k used,   118068k free,   160296k buffers
Swap:        0k total,        0k used,        0k free,   597904k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1907 root      20   0  242m 5676 1012 S  0.0  0.6   0:30.66 rsyslogd
11723 root      20   0  111m 4268 3252 S  0.0  0.4   0:00.00 sshd
 2163 root      20   0 91436 2696  860 S  0.0  0.3   5:58.59 sendmail
 2170 smmsp     20   0 82888 2108  656 S  0.0  0.2   0:01.78 sendmail
 2148 ntp       20   0 29244 2040 1468 S  0.0  0.2   0:19.46 ntpd
11726 ec2-user  20   0  112m 2028 1572 S  0.0  0.2   0:00.00 bash
11725 ec2-user  20   0  111m 1884  868 S  0.0  0.2   0:00.01 sshd
    1 root      20   0 19596 1616 1292 S  0.0  0.2   0:11.44 init

To look at one specific user, use the –u option.

# top -u shs -n 2
top - 16:51:13 up 4 days,  1:54,  1 user,  load average: 0.10, 0.06, 0.00
Tasks: 336 total,   1 running, 335 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.7%us,  0.1%sy,  0.0%ni, 99.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  37037804k total, 31024460k used,  6013344k free,   445068k buffers
Swap: 16777208k total,        0k used, 16777208k free, 27962132k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
13250 shs       15   0 88216 1836 1096 S  0.0  0.0   0:00.06 sshd
13251 shs       15   0  4536 1456 1216 S  0.0  0.0   0:00.01 bash

You can show cumulative time (how long processes and their dead child processes have been running) using the -S option.

$ top -S
top - 20:32:25 up 233 days, 11:02,  1 user,  load average: 0.00, 0.01, 0.05
Tasks:  60 total,   1 running,  59 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.1%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1020188k total,   882788k used,   137400k free,   160300k buffers
Swap:        0k total,        0k used,        0k free,   579692k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    1 root      20   0 19596 1616 1292 S  0.0  0.2 118:34.36 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      20   0     0    0    0 S  0.0  0.0   0:07.72 ksoftirqd/0
    5 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 kworker/0:0H
    6 root      20   0     0    0    0 S  0.0  0.0   0:36.43 kworker/u30:0
    7 root      20   0     0    0    0 S  0.0  0.0   1:11.05 rcu_sched
    8 root      20   0     0    0    0 S  0.0  0.0   0:00.00 rcu_bh
    9 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0

You can actually terminate a process within top if you have sufficient privilege. To do that, type k. You will then be prompted to enter the process ID. Similarly, you can change the nice setting of a process by typing r. You will be prompted to enter the process ID and the nice value. Setting a "nice" value, by the way, is sort of like ordering something from Amazon Prime and saying you don’t need it in two days. You're lowering the process' priority to be "nice" to other processes, giving them more opportunities to run. Some processes run with a "nice" value by default and you'll spot these when using top.

One other interesting aspect of top is that you can save your top settings by pressing W (for write) when you're running top. This will store your settings in a file called .toprc (~/.toprc). If you want to change your default top settings, you can try editing that file. For example, you can change the interval between updates (look for Delay_time in the output below) or the order in which top's columns are displayed (reverse the fieldscur settings). On the other hand, making these changes interactively is easier and more flexible.

$ more .toprc
RCfile for "top with windows"           # shameless braggin'
Id:a, Mode_altscr=0, Mode_irixps=1, Delay_time=3.000, Curwin=0
Def     fieldscur=AEHIOQTWKNMbcdfgjplrsuvyzX
        winflags=62777, sortindx=10, maxtasks=0
        summclr=1, msgsclr=1, headclr=3, taskclr=1
Job     fieldscur=ABcefgjlrstuvyzMKNHIWOPQDX
        winflags=62777, sortindx=0, maxtasks=0
        summclr=6, msgsclr=6, headclr=7, taskclr=6
Mem     fieldscur=ANOPQRSTUVbcdefgjlmyzWHIKX
        winflags=62777, sortindx=13, maxtasks=0
        summclr=5, msgsclr=5, headclr=4, taskclr=5
Usr     fieldscur=ABDECGfhijlopqrstuvyzMKNWX
        winflags=62777, sortindx=4, maxtasks=0
        summclr=3, msgsclr=3, headclr=2, taskclr=3

And, of course, you can always turn your favorite top command variations into aliases to make them easier to remember and use.

alias topmem=’top -a -n 1’
alias topcum=’top –S’

 

 

This article is published as part of the IDG Contributor Network. Want to Join?

RELATED TOPICS
Crash Course: Advanced beginner's guide to R
View Comments
Join the discussion
Be the first to comment on this article. Our Commenting Policies