Some slick awk Built-ins

awk

The awk utility is still one of the most useful tools for separating out columns within text, thus allowing them to be used in other commands. And GNU awk (gawk) with its wide range of new features only makes this so much better. We can still do the traditional awk text manipulation. For example, the -F (field separator) option makes it easy to extract columns of data whether those columns are defined by the white space, colons, commas, or other characters that separate them. But there a lot of other really useful things that we couldn't do before awk took on all the new features of gawk. In this post, we're going to look at some very handy things that you can do with today's awk.

But, first, just a bit of nostalgia. Over the last couple of decades, we've all probably seen hundreds of commands that do something like this:

awk -F: '{print $1}' /etc/passwd

Some of my earliest excitement about learning Unix was seeing how tricks like this could save me so much time and effort. This good old command displays the first field in the colon-separated passwd file. The -F is the command line option that allows us to specify a field separator other than the default of white space.

Either of the following two commands takes this just a bit further by displaying the last or rightmost field in the passwd file, but the second shows the last field regardless of how many fields there are.

awk -F: '{print $7}' /etc/passwd
awk -F: '{print $NF}' /etc/passwd

In the second command, we're making use of the NF built-in which stands for the number of fields in the file and prepending that with a $ so that we're retrieving the value of the rightmost field. Thus, for the passwd file, $NF is the same as $7. This construct is most useful when the number of fields in the file you're working with is irregular. You might want to see the rightmost field even if it's the 5th field in one line or the 15th field in the next.

And, of course, we could also do the same thing like this:

awk 'BEGIN{FS=":"}{print $NF}'

In this case, however, we're using two awk built-ins. Enclosed within the single quotes is our very simple awk script. It sets the input field separator to a colon and then displays the last field in each line.

A similar built-in that is available in gawk and you might not yet have used is the OFS built-in. The "FS" part of this acronym still means "field separator" as it does with FS, but the "O" signifies that this field separator applies to awk command output. The command below would then be showing the first and 7th fields from the passwd file, but replacing the colon with a tab in the output. So, we're selecing our fields and replacing our field separator in one simple command.

awk 'BEGIN {FS=":";OFS="\t"}{print $1,$7}' /etc/passwd
root    /bin/bash
bin     /sbin/nologin
daemon  /sbin/nologin
...

The FIELDWIDTHS built-in is another interesting option that gawk provides. This built-in allows you to specify field widths completely independently of separators. The effect is similar to using a series of substring operators to divide your input into the right-sized chunks. In the command below, we're taking 12 characters from the input and breaking them into three fields with 3, 4, and 5 characters in them.

BEGIN {FIELDWIDTHS="3 4 5"}
{
  print $1
  print $2
  print $3
}

The output might look like this:

012
3456
789ab

Another useful built-in is RS (record separator). This is like FS, but it represents the character that separates lines rather than the one that separates fields. So, by default, it's a carriage return. Need every colon-separated field in a file to be treated as a separate line? No problem. Do something like this:

BEGIN {RS=":"}
{
  print $0
}

There's also a built-in for treating uppercase and lowercase identically. Set IGNORECASE to 1 in your awk scripts and your tests will be case insensitive.

$ cat file.txt
This April will be sunny but cool.
$ awk 'BEGIN {IGNORECASE = 1} {if ($2 == "april") print $0 }' file.txt
This April will be sunny but cool.

You can also use built-ins to selectively convert text to all lowercase or all uppercase.

$ echo "MyString" | awk '{print tolower($1)}'
mystring
$ echo "MyString" | awk '{print toupper($1)}'
MYSTRING

There's a nice pile of other built-ins as well. I just picked out a few that seemed especially useful.

So, if you've only been using awk in the traditional ways -- getting it to extract a column or two from delimited files -- you might be surprised by some of the other things it can do. Take a look at the awk/gawk man page and you might find some features you're simply going to love.

This article is published as part of the IDG Contributor Network. Want to Join?

To express your thoughts on Computerworld content, visit Computerworld's Facebook page, LinkedIn page and Twitter stream.
Related:
Windows 10 annoyances and solutions
Shop Tech Products at Amazon
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.