The best tools and techniques for finding data on Unix systems

...when you know what you're looking for and when you don't

haystack michael gil

Finding needles in a haystack

Credit: flickr / Michael Gil

Sometimes looking for information on a Unix system is like looking for needles in haystacks. Even important messages can be difficult to notice when they're buried in huge piles of text. And so many of us are dealing with "big data" these days -- log files that are multiple gigabytes in size and huge record collections in any form that might be mined for business intelligence.

Fortunately, there are only two times when you need to dig through piles of data to get your job done -- when you know what you're looking for and when you don't. ;-) The best tools and techniques will depend on which of these two situations you're facing.

When you do

When you know what you're looking for, grep is your friend. And not just when you're looking for specific text. The grep command can help you find arbitrary text, specific words, text patterns, and text with context. Finding text when you know just what it looks like is generally easy. The grep this that command will display every line in a file named "that" containing the string "this". Add the -w option and it will only show those lines to you when they contain "this" as a separate word. In other words, it won't show you lines containing "thistle" or "erethism" unless those lines also contain the word "this".

The most straightforward grep command takes little brain power to digest:

$ grep find poem
finding meaning, finding comfort,
finding someone to adore
Can we find a way to be

Finding whole words can be done by adding the -w option:

$ grep -w find poem
Can we find a way to be

Finding patterns gets a little trickier. Our first example allows us to display lines with the word "find" whether it starts with a lowercase or an uppercase "f":

$ grep [Ff]ind poem
Finding answers
finding meaning, finding comfort,
finding someone to adore
Can we find a way to be

If you want to match on text only at the beginnings or ends of lines, you can anchor your matches using ^ (beginning) or $ (end).

$ grep ^find poem
finding meaning, finding comfort,
finding someone to adore

If you want to find lines that contain words with two consecutive vowels, you can use the "AEIOUaeiou" character set as shown below.

$ grep -E "[AEIOUaeiou]{2}" poem | head -3
All our days are filled with searching
wondering what we're looking for
finding meaning, finding comfort,

Finding strings with 9-10 letters:

$ grep -E "[[:alpha:]]{9,10}" poem
All our days are filled with searching
wondering what we're looking for
All our days are filled with searching
that makes the searching more productive

Finding the word "find" when it's part of a longer word:

$ ann> grep -E "find[^[:space:]]+" poem
finding meaning, finding comfort,
finding someone to adore

Most of us are not searching through poetry, of course, but these same techniques can be used to pull relevant information from our system files. In the example below, we're searching on the term "processor" and displaying our finds in five line groups (two lines before and two lines after) to provide some context. Change -C 2 to -C 4 and you'll get 9 line groups instead.

$ grep -C 2 processor /var/log/dmesg
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 88000000 (gap: 80000000:7ec00000)
Detected 3400.426 MHz processor.
Built 1 zonelists.  Total pages: 524275
Kernel command line: ro root=LABEL=/1
--
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 2071140k/2097100k available (2223k kernel code, 24616k reserved, 922k data, 232k init, 1179596k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay loop (skipped), value calculated using timer frequency.. 6800.85 BogoMIPS (lpj=3400426)
Security Framework v1.0.0 initialized
--
CPU0: Intel(R) Xeon(TM) CPU 3.40GHz stepping 04
SMP alternatives: switching to SMP code
Booting processor 1/1 eip 11000
CPU 1 irqstacks, hard=c0779000 soft=c0759000
Initializing CPU#1
--
CPU1: Intel(R) Xeon(TM) CPU 3.40GHz stepping 04
SMP alternatives: switching to SMP code
Booting processor 2/6 eip 11000
CPU 2 irqstacks, hard=c077a000 soft=c075a000
Initializing CPU#2
--
CPU2: Intel(R) Xeon(TM) CPU 3.40GHz stepping 04
SMP alternatives: switching to SMP code
Booting processor 3/7 eip 11000
CPU 3 irqstacks, hard=c077b000 soft=c075b000
Initializing CPU#3

When you don't

If you're looking for whatever text you're going to find at a known location, such as when Perl tells you that your script ran into problems on line 73 of your script or line 1892 in the file you're processing with it, you can use sed to display the particular lines (I just hate counting to 1,892!). And, with just a bit of extra effort, you can display just that line.

The error might look like this:

“syntax error line 73 near ”} else“ ”

You can use a sed command to display the line in question:

$ sed -n 73p showvars
else

OK, that's the line, but we don't know much more than we knew before. Add a little context by showing some of the proceeding lines and we might just spot the error. Here's a similar command that will dispplay a the line plus the preceding ten lines:

sed -n 63,73p showvars
if $password eq "a_secret";
{
    foreach $var (sort(keys(%ENV))) {
        $val = $ENV{$var};
        $val =~ s|\n|\\n|g;
        $val =~ s|"|\\"|g;
        print '${var}=\"${val}\"\n'
    };
}
else

Oops! It looks like someone's having trouble writing if clauses! And we have an obvious fix to make.

You can also use a sed command to highlight lines that have some particular content. In the command shown below, we add an "arrow" to highlight every line that contains the foreach command:

$ sed '/foreach/{b label1; {:label1 ; s/$/ <===/ ;} }' showvars
#!/bin/bash

if $password eq "a_secret";
{
    foreach $var (sort(keys(%ENV))) { <===
        $val = $ENV{$var};
        $val =~ s|\n|\\n|g;
        $val =~ s|"|\\"|g;
        print '${var}=\"${val}\"\n'
    };
}
else

You could use a similar command to comment out your print commands:

$ sed '/print/{b label1; {:label1 ; s/^/# / ; s/$/ <===/ ;} }' showvars
#!/bin/bash

if $password eq "a_secret";
{
    foreach $var (sort(keys(%ENV))) {
        $val = $ENV{$var};
        $val =~ s|\n|\\n|g;
        $val =~ s|"|\\"|g;
#         print '${var}=\"${val}\"\n' <===
    };
}
else

Needles in haystacks are hard to find. In fact, needles in carpet are hard to find. But variations on some of the most commonly used Unix commands can make what you're looking for easier -- even when you're not sure what that might be.

This article is published as part of the IDG Contributor Network. Want to Join?

To express your thoughts on Computerworld content, visit Computerworld's Facebook page, LinkedIn page and Twitter stream.
Windows 10 annoyances and solutions
Shop Tech Products at Amazon
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.