Reading a file, line by line
November 24th, 2009
nixcraft has a link on how to read a file line by line. The method is a great way to read a file, but there some trouble spots I thought I would point out.
In the script, the special variable IFS is set:
# set the Internal Field Separator to a pipe symbol IFS='|'
The tells the read command to split “cyberciti.biz|74.86.48.99″ into “cyberciti.biz” and “74.86.48.99″ and thus fill both the domain and ip variables here:
while read domain ip
Using BASH to split strings is much faster than doing something line this:
while read line
do
domain=$(echo $line | awk -F'|' '{print $1}'
ip=$(echo $line | awk -F'|' '{print $2}'
As new script writers typically do. However, setting IFS and forgetting to reset the special variable can cause some odd problems in longer scripts. For example, lets say you needed to read a second file, later on in the script. This one delimited by spaces. For simplicity, I will take the same file and just replace the pipe characters with spaces.
/tmp/domains-using-space.txt
root@b92 [~]# cat /tmp/domains-using-space.txt cyberciti.biz 74.86.48.99 nixcraft.com 75.126.168.152 theos.in 75.126.168.153 cricketnow.in 75.126.168.154 vivekgite.com 75.126.168.155
Now, here is my new script:
#!/bin/ksh
# set the Internal Field Separator to a pipe symbol
IFS='|'
# file name
file=/tmp/domains.txt
# use while loop to read domain and ip
while read domain ip
do
print "$domain has address $ip"
done <"$file"
echo ------------------------
file=/tmp/domains-using-space.txt
# use while loop to read domain and ip
while read domain ip
do
print "$domain has address $ip"
done <"$file"
As you can see, the output is incorrect:
root@b92 [~]# ./test.sh cyberciti.biz has address 74.86.48.99 nixcraft.com has address 75.126.168.152 theos.in has address 75.126.168.153 cricketnow.in has address 75.126.168.154 vivekgite.com has address 75.126.168.155 ------------------------ cyberciti.biz 74.86.48.99 has address nixcraft.com 75.126.168.152 has address theos.in 75.126.168.153 has address cricketnow.in 75.126.168.154 has address vivekgite.com 75.126.168.155 has address
By saving and resetting the special variable IFS, we can eliminate this problem:
#!/bin/ksh
# file name
file=/tmp/domains.txt
# set the Internal Field Separator to a pipe symbol
oldIFS="$IFS"
IFS='|'
# use while loop to read domain and ip
while read domain ip
do
print "$domain has address $ip"
done <"$file"
IFS="$oldIFS"
echo ------------------------
file=/tmp/domains-using-space.txt
# use while loop to read domain and ip
while read domain ip
do
print "$domain has address $ip"
done <"$file"
The output from the new script, which saves and resets IFS:
cyberciti.biz has address 74.86.48.99 nixcraft.com has address 75.126.168.152 theos.in has address 75.126.168.153 cricketnow.in has address 75.126.168.154 vivekgite.com has address 75.126.168.155 ------------------------ cyberciti.biz has address 74.86.48.99 nixcraft.com has address 75.126.168.152 theos.in has address 75.126.168.153 cricketnow.in has address 75.126.168.154 vivekgite.com has address 75.126.168.155
In short, IFS is a great way to split strings. My next article will be a more in depth discussion of this topic. In the mean time, one item to remember when using IFS, is to always save and reset this variable.


November 25th, 2009 at 1:05 am
Good tip, but I thought this was _BASH_ cures cancer.
For the above to work as a bash script the “print” command needs to be changed to “echo”.
Also, it would be nice if read would accept a word delimiter as a parameter the way AWK does.
November 28th, 2009 at 8:52 am
Commands like this are pretty risky:
oldIFS=$IFS
IFS=$oldIFS
If IFS included the newline character (as it does by default), or the tab character, it won’t do any more after this command. For copying variables which might contain whitespace other than spaces, you need to enclose the RHS of the assignment in quotes:
oldIFS=”$IFS”
IFS=”$oldIFS”
But yes, manipulating IFS can be a nifty little trick from time to time. You can do the splitting with cut rather than awk, like this:
domain = $(echo $line | cut -d ‘|’ -f 1)
ip = $(echo $line | cut -d ‘|’ -f 2)
But in this case manipulating IFS is probably still quicker and even easier.
November 29th, 2009 at 1:08 pm
Seth, As I am sure you are aware, this works in BASH as well!
Alex,
First, point is excellent. I will update the post. However, using a subshell whether it be awk, cut, sed or something else is much, much slower.
November 29th, 2009 at 1:13 pm
Actually, Jeff my test does not show you are correct. At least with BASH 3.1.17 and KSH 1993-12-28 r.
November 30th, 2009 at 3:56 am
“you need to enclose the RHS of the assignment in quotes”
You only need to quote the right-hand side when there is a literal whitespace character. (This is true for any Bourne-type shell.)
“the “print” command needs to be changed to “echo”.”
The print command should be changed to printf; echo is deprecated.
December 9th, 2009 at 6:18 pm
[...] strings into tokens or “words”. I previously discussed how to do this with the IFS variable and promised a more in depth discussion. Today, I will make the case on WHY to use IFS to split [...]
January 17th, 2010 at 9:17 am
…
Однако …