Unix: Under the spell of magic numbers

If you've ever wondered how Unix systems identify files, you might be surprised to learn that file names really aren't an important factor. Unix systems reach into files looking for special codes called "magic numbers" to figure out what your files really are. They also examine other file content to pick out additional details.

numbers_0.jpgImage credit: iStockphoto

Magic numbers play a very important role on Unix systems. They help the OS recognize details about files that aren't obvious even to Unix wizards.

What are magic numbers?

Magic numbers are values, not generally visible unless you dump file contents using a command like od that displays file contents in hex, that serve as signatures for file type identification. Some are fairly obvious once you see them -- like if you spot "\x89PNG" in a file dump. Others won't offer much of a clue. In general, they are special values set up at or near the beginning or nearby in files that allow command like file to distinguish files by their type so even if you call a jpg file my.image, the OS can still figure out what it is.

How are they used?

The tool for file identification is a command named "file". Use the file command to examine your oddly named my.image file and might just tell you something like this:

$ file my.image
my.image: JPEG image data, JFIF standard 1.01

The "JFIF" tag in this description stands for JPEG File Interchange Format. This output tells you that this complies. This tells you that the magic number that is associated with JPEG files is both stored in this file and stored in the right location to identify the file. The same value at some other location would have no effect and would likely be coincidental. Use file to examine a file that isn't a JPEG file, but has been assigned a .jpg file extension and the file command comes through and identifies this file as an executable.

$ file bash.jpg
binfile.jpg: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/
Linux 2.6.9, dynamically linked (uses shared libs), stripped

If we follow up this inquiry by examing the beginning of the file with the od command, we can see what appears to be an identifier for ELF files strikingly present in the first four bytes.

$ od -bc bash.jpg | head -4
0000000 177 105 114 106 001 001 001 000 000 000 000 000 000 000 000 000
        177   E   L   F 001 001 001  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000020 002 000 003 000 001 000 000 000 340 307 005 010 064 000 000 000
        002  \0 003  \0 001  \0  \0  \0 340 307 005  \b   4  \0  \0  \0

Do the same thing with out odd my.image file and we see the "JFIF" tag within the magic number fairly clearly.

$ od -bc my.image | head -4
0000000 377 330 377 340 000 020 112 106 111 106 000 001 001 001 001 054
        377 330 377 340  \0 020   J   F   I   F  \0 001 001 001 001   ,
0000020 001 054 000 000 377 341 035 122 105 170 151 146 000 000 111 111
        001   ,  \0  \0 377 341 035   R   E   x   i   f  \0  \0   I   I

Most magic numbers aren't readable unless you use a tool such as od to examine the file's contents. The exception is the shebang line that we use in scripts to identify the shell to be used when they are run. This serves as a kind of magic number on its own and is used by the file command as such. For most files, lines in the /usr/share/file/magic file describe what the file command needs to look for. The line for a PNG file, for example, looks like this:

0       string          \x89PNG         PNG image data,

This line tells us that there's no offset (the magic number appears at the beginning of the file), Looking at tux.png, an image of the beloved penguin, file tells us:

$ file tux.png
tux.png: PNG image data, 400 x 479, 8-bit/color RGBA, non-interlaced

Now that's a lot more than just the magic number indicates. The file command, once determining that the file is a PNG file, can look at other content in the file to determine its size and color depth.

$ od -bc tux.png | head -4
0000000 <b>211 120 116 107 015 012 032 012</b> 000 000 000 015 111 110 104 122
        211   P   N   G  \r  \n 032  \n  \0  \0  \0  \r   I   H   D   R
0000020 000 000 001 220 000 000 001 337 010 006 000 000 000 176 261 216
         \0  \0 001 220  \0  \0 001 337  \b 006  \0  \0  \0   ~ 261 216

PNG files begin with this 8-byte signature which is very obvious in the output shown above. \211 P N G \r \n \032 \n Magic numbers help ensure that Unix systems have more to go on than file names when trying to identify file content. Examine the /usr/share/file/magic on your Linux system and you'll notice that there's a lot of information there about different types of files.

Read more of Sandra Henry-Stocker's Unix as a Second Language blog and follow the latest IT news at ITworld, Twitter and Facebook.

This article is published as part of the IDG Contributor Network. Want to Join?

To express your thoughts on Computerworld content, visit Computerworld's Facebook page, LinkedIn page and Twitter stream.
Fix Windows 10 problems with these free Microsoft tools
Shop Tech Products at Amazon
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.