Using Bash's regular expressions

Bash has quietly made scripting on Unix systems a lot easier with its own regular expressions. If you're still leaning on grep and sed commands to get your scripts to do what you need from them, maybe it's time to look into what bash can do on its own.

Since version 3 (circa 2004), bash has a built-in regular expression comparison operator, represented by =~. A lot of scripting tricks that use grep or sed can now be handled by bash expressions and the bash expressions might just give you scripts that are easier to read and maintain. As with other comparison operators (e.g., -lt or ==), bash will return a zero if an expression like $digit =~ "[[0-9]]" shows that the variable on the left matches the expression on the right and a one otherwise. This example test asks whether the value of $digit matches a single digit.

if [[ $digit =~ [0-9] ]]; then
    echo "$digit is a digit"
    echo "oops"

You can also check whether a reply to a prompt is numeric with similar syntax:

echo -n "Your answer> "
read REPLY
if [[ $REPLY =~ ^[0-9]+$ ]]; then
    echo Numeric
    echo Non-numeric

Bash's regex can be fairly complicated. In the test below, we're asking whether the value of our $email variable looks like an email address. Notice that the first expression (the account name) can contain letters, digits and some special characters. The + to the right of the first ] means that we can have any number of such characters. We then see the @ sign sitting between the username and the email domain -- and a literal dot (\.) between the primary part of the domain name and the "com", "net", "gov", etc. part. The comparison is then enclosed in double brackets.

if [[ "$email" =~ "^[A-Za-z0-9._%+-]+<b>@</b>[A-Za-z0-9.-]+<b>\.</b>[A-Za-z]{2,4}$" ]]
    echo "This email address looks fine: $email"
    echo "This email address is flawed: $email"

Similarly, you can construct tests that determine whether the value of variables is in the proper format for an IP address:


if [ $# != 1 ]; then
    echo "Usage: $0 address"
    exit 1

if [[ $ip =~ ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ ]]; then
    echo "Looks like an IPv4 IP address"
elif [[ $ip =~ ^[A-Fa-f0-9:]+$ ]]; then
    echo "Could be an IPv6 IP address"
    echo "oops"

Bash also provides for some simplified looping. Want to loop 100 times? Just do something like this:

for n in {1..100}
    echo $n

And you can loop through letters or through various ranges of letters or numbers using expressions such as these. You don't have to start with 1 or a and you can move backwards through the list.


Want to see how these ranges work? You can also just try expanding them with the echo command.

$echo {a..z}
a b c d e f g h i j k l m n o p q r s t u v w x y z
$ echo {5..-1}
5 4 3 2 1 0 -1

What a swell shell!

Read more of Sandra Henry-Stocker's Unix as a Second Language blog and follow the latest IT news at ITworld, Twitter and Facebook.

This article is published as part of the IDG Contributor Network. Want to Join?

To express your thoughts on Computerworld content, visit Computerworld's Facebook page, LinkedIn page and Twitter stream.
10 super-user tricks to boost Windows 10 productivity
Shop Tech Products at Amazon