Reading a file, line by line

November 24th, 2009

nixcraft has a link on how to read a file line by line. The method is a great way to read a file, but there some trouble spots I thought I would point out.

In the script, the special variable IFS is set:

# set the Internal Field Separator to a pipe symbol
IFS='|'

The tells the read command to split “cyberciti.biz|74.86.48.99″ into “cyberciti.biz” and “74.86.48.99″ and thus fill both the domain and ip variables here:

while read domain ip

Using BASH to split strings is much faster than doing something line this:

while read line
do
   domain=$(echo $line | awk -F'|' '{print $1}'
   ip=$(echo $line | awk -F'|' '{print $2}'

As new script writers typically do. However, setting IFS and forgetting to reset the special variable can cause some odd problems in longer scripts. For example, lets say you needed to read a second file, later on in the script. This one delimited by spaces. For simplicity, I will take the same file and just replace the pipe characters with spaces.

/tmp/domains-using-space.txt

root@b92 [~]# cat /tmp/domains-using-space.txt
cyberciti.biz 74.86.48.99
nixcraft.com 75.126.168.152
theos.in 75.126.168.153
cricketnow.in 75.126.168.154
vivekgite.com 75.126.168.155

Now, here is my new script:

#!/bin/ksh
# set the Internal Field Separator to a pipe symbol
IFS='|'

# file name
file=/tmp/domains.txt

# use while loop to read domain and ip
while read domain ip
do
    print "$domain has address $ip"
done <"$file"

echo ------------------------
file=/tmp/domains-using-space.txt

# use while loop to read domain and ip
while read domain ip
do
    print "$domain has address $ip"
done <"$file"

As you can see, the output is incorrect:

root@b92 [~]# ./test.sh
cyberciti.biz has address 74.86.48.99
nixcraft.com has address 75.126.168.152
theos.in has address 75.126.168.153
cricketnow.in has address 75.126.168.154
vivekgite.com has address 75.126.168.155
------------------------
cyberciti.biz 74.86.48.99 has address
nixcraft.com 75.126.168.152 has address
theos.in 75.126.168.153 has address
cricketnow.in 75.126.168.154 has address
vivekgite.com 75.126.168.155 has address

By saving and resetting the special variable IFS, we can eliminate this problem:

#!/bin/ksh
# file name
file=/tmp/domains.txt

# set the Internal Field Separator to a pipe symbol
oldIFS="$IFS"
IFS='|'

# use while loop to read domain and ip
while read domain ip
do
    print "$domain has address $ip"
done <"$file"
IFS="$oldIFS"

echo ------------------------
file=/tmp/domains-using-space.txt

# use while loop to read domain and ip
while read domain ip
do
    print "$domain has address $ip"
done <"$file"

The output from the new script, which saves and resets IFS:

cyberciti.biz has address 74.86.48.99
nixcraft.com has address 75.126.168.152
theos.in has address 75.126.168.153
cricketnow.in has address 75.126.168.154
vivekgite.com has address 75.126.168.155
------------------------
cyberciti.biz has address 74.86.48.99
nixcraft.com has address 75.126.168.152
theos.in has address 75.126.168.153
cricketnow.in has address 75.126.168.154
vivekgite.com has address 75.126.168.155

In short, IFS is a great way to split strings. My next article will be a more in depth discussion of this topic. In the mean time, one item to remember when using IFS, is to always save and reset this variable.

7 Responses to “Reading a file, line by line”

  1. Seth Says:

    Good tip, but I thought this was _BASH_ cures cancer. :)

    For the above to work as a bash script the “print” command needs to be changed to “echo”.

    Also, it would be nice if read would accept a word delimiter as a parameter the way AWK does.

  2. Alex Says:

    Commands like this are pretty risky:
    oldIFS=$IFS
    IFS=$oldIFS
    If IFS included the newline character (as it does by default), or the tab character, it won’t do any more after this command. For copying variables which might contain whitespace other than spaces, you need to enclose the RHS of the assignment in quotes:
    oldIFS=”$IFS”
    IFS=”$oldIFS”
    But yes, manipulating IFS can be a nifty little trick from time to time. You can do the splitting with cut rather than awk, like this:
    domain = $(echo $line | cut -d ‘|’ -f 1)
    ip = $(echo $line | cut -d ‘|’ -f 2)
    But in this case manipulating IFS is probably still quicker and even easier.

  3. Brock Noland Says:

    Seth, As I am sure you are aware, this works in BASH as well! ;)

    Alex,

    First, point is excellent. I will update the post. However, using a subshell whether it be awk, cut, sed or something else is much, much slower.

  4. Brock Noland Says:

    Actually, Jeff my test does not show you are correct. At least with BASH 3.1.17 and KSH 1993-12-28 r.

    $ cat test.sh
    #!/bin/bash
    echo \$IFS = "'$IFS'"
    oldIFS=$IFS
    echo \$oldIFS = "'$oldIFS'"
    
    $ ./test.sh
    $IFS = '
    '
    $oldIFS = '
    '
    
  5. Cris F.A. Johnson Says:

    “you need to enclose the RHS of the assignment in quotes”

    You only need to quote the right-hand side when there is a literal whitespace character. (This is true for any Bourne-type shell.)

    “the “print” command needs to be changed to “echo”.”

    The print command should be changed to printf; echo is deprecated.

  6. Splitting Strings Natively with the Shell: Why Says:

    [...] strings into tokens or “words”. I previously discussed how to do this with the IFS variable and promised a more in depth discussion. Today, I will make the case on WHY to use IFS to split [...]

  7. Joe Says:

    Однако

Leave a Reply

If Wordpress eats your comment (shell output, loops, ex..) email the text to me.