Windows vs. Unix: Those pesky line terminators

pencils
Credit: Sandra H-S

Other than having to switch between "/" and "\" when I'm moving from Unix systems to Windows systems, one of the most annoying (and seemingly pointless) differences is the way that these two competing operating systems insist on terminating lines in text files.

Unix systems use a single character -- the linefeed -- and Windows systems use both a carriage return and a linefeed (often referred to as "CRLF"). Being a "simpler is better" kind of person, I prefer the Unix approach. Knowing that I am sometimes going to have to deal with CRLF files from time to time, however, I try to be ready with some easy commands to detect and, as needed, change the files I'm working with.

The easiest way to tell if a file on your Unix system makes use of the CRLF convention is to ask the question using the file command.

$ file DarkBeers.txt
DarkBeers.txt: ASCII text, with CRLF line terminators

OK, so that was pretty easy. The file command clearly identifies files that use the Windows convention. For files that use only linefeeds, you would instead see something like this:

$ file junk2
junk2: ASCII text

If you want to look through a group of files with one handy command, you can just send the output of your file command to grep:

$ file * | grep CRLF
DarkBeers.txt:              ASCII text, with CRLF line terminators
Lab1:                       ASCII text, with CRLF, LF line terminators
lab6:                       ASCII English text, with very long lines,
with CRLF, CR, LF line terminators, with escape sequences, with overstriking
me-dos.txt:                 ASCII text, with CRLF line terminators
typescript:                 ASCII text, with CRLF, LF line terminators
whatever:                   ASCII text, with CRLF, LF line terminators

If you want to look through an entire file structure, make use of the find command and use the exec option to run the file command. Then tack on a pipe and look for that CRLF identifier that we saw in the command above.

$  find . -type f -exec file "{}" ";" | grep CRLF
./whatever: ASCII text, with CRLF, LF line terminators
./lab6: ASCII English text, with very long lines, with CRLF, CR, LF line
terminators, with escape sequences, with overstriking
./bin/crlf: ASCII text, with CRLF line terminators
./bin/typescript: ASCII text, with CRLF, LF line terminators
./DarkBeers.txt: ASCII text, with CRLF line terminators

If you would like to view the carriage return linefeed sequence within your files, you can use the od (octal dump) command to show them to you. In the figure below, the carriage return (15 in octal and displayed as \r) and the linefeed or "newline" as it is often called (12 in octal and displayed as \n) are highlighted in red.

crlf1 Sandra H-S

So how to you go about converting the files from one form of line terminators to the other? Most Unix systems will include two utilities -- dos2unix and unix2dos -- to convert the files from either format to the other. And they make the changes "in place." In other words, you don't have to generate a second file and then rename it to get back to the original file.

Note that the date/time fields will reflect the date time you made the changes. If you don't want the timestamps to change, use the -k or -keepdate options as shown in the second set of commands below.

$ dos2unix DarkBeers.txt
dos2unix: converting file DarkBeers.txt to UNIX format ...
$ file DarkBeers.txt
DarkBeers.txt: ASCII text
$ ls -l DarkBeers.txt
-rw-r----- 1 shs staff 116 Aug 15 17:06 DarkBeers.txt
$ file DarkBeers.txt
DarkBeers.txt: ASCII text, with CRLF line terminators
$ ls -l DarkBeers.txt
-rw-r----- 1 shs staff 120 Jan  5  2015 DarkBeers.txt
$ dos2unix -k DarkBeers.txt
dos2unix: converting file DarkBeers.txt to UNIX format ...
$ ls -l DarkBeers.txt
-rw-r----- 1 shs staff 116 Jan  5  2015 DarkBeers.txt
$ file DarkBeers.txt
DarkBeers.txt: ASCII text

There are also some other ways that you can determine if a file contains Windows style line terminations. In the awk command below, we have to look at the return code to determine if awk found a carriage return (\r) in the file.

$ grep -q $'\r' DarkBeers.txt && echo dos
dos
$ [[ $(file -b - < DarkBeers.txt) =~ CRLF ]] && echo dos
dos
$ awk '/\r$/ { exit(1) }' DB.txt
$ echo $?
1

You can also automate the process of converting files from Windows (DOS) to Unix format to save yourself a little trouble. Here's a script that checks whether the conversion is needed, performs it as required, and ensures that the timestamp doesn't change.

#!/bin/bash

if [ $# == 0 ]; then
    echo -n "file> "
    read file
else
    file=$1
fi

file $file | grep CRLF > /dev/null

if [ $? != 0 ]; then
    dos2unix -k $file
fi

In the commands below, the script is being used to convert the DarkBeers.txt file. The other commands are meant to show that the file's timestamp hasn't changed. They also show that the size of the file has changed slightly. The Unix version of the file, after all, has been stripped of its carriage returns so it's four characters smaller.

$ file DarkBeers.txt
DarkBeers.txt: ASCII text, with CRLF line terminators
$ ls -l DarkBeers.txt
-rw-r----- 1 shs staff 120 Jan  5  2015 DarkBeers.txt
$ ./fixit
file> DarkBeers.txt
dos2unix: converting file DarkBeers.txt to UNIX format ...
$ file DarkBeers.txt
DarkBeers.txt: ASCII text
$ ls -l DarkBeers.txt
-rw-r----- 1 shs staff 116 Jan  5  2015 DarkBeers.txt

I'd still prefer a world in which there's only one way to end a line in a text file, but at least moving files from one format to the other is simple and you can easily avoid changing the files' timestamps.

This article is published as part of the IDG Contributor Network. Want to Join?

To express your thoughts on Computerworld content, visit Computerworld's Facebook page, LinkedIn page and Twitter stream.
Windows 10 annoyances and solutions
Shop Tech Products at Amazon
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.