Splitting Strings Natively with the Shell: Native vs Native
December 9th, 2009
Splitting Strings Natively with the Shell: Native vs Native
In my previous post on why to split strings with bash itself, I used set to split the string.
This was much faster than using a sub-shell and awk or cut. However, we can do better! The read command accepts a list of variables to split the input. Combined with setting a per command variable, we can write an even more elegant solution.
The magic is here:
while IFS=: read username x uid gid gecos home shell
We set IFS=: only for the execution of read, so there is no need to reset it once done splitting the string. Second we read each field (separated by : via IFS) into a variable directly.
Below is the script we will use to compare the two methods. You will notice I had to up the iterations to 100 in order to see a difference in execution speed:
[root@sandbox ~]# cat ifs-test2.sh
#!/bin/bash
split_words_native() {
# execute 100 times
for i in {0..100}
do
while read line
do
oldIFS=$IFS
IFS=:
set -- $line
IFS=$oldIFS
# at this point $1 is the username, $3
# is the uid, and $7 is the shell
if [[ $3 -gt 10 ]] && [[ '/sbin/nologin' == "$7" ]]
then
echo $1
fi
done < /etc/passwd
done
}
split_words_native_read() {
# execute 100 times
for i in {0..100}
do
while IFS=: read username x uid gid gecos home shell
do
if [[ $uid -gt 10 ]] && [[ '/sbin/nologin' == "$shell" ]]
then
echo $username
fi
done < /etc/passwd
done
}
echo "---Native---"
time split_words_native >/dev/null
echo -e "\n---Read---"
time split_words_native_read >/dev/null
Using read is more elegant and a little faster:
[root@sandbox ~]# ./ifs-test2.sh ---Native--- real 0m0.179s user 0m0.168s sys 0m0.010s ---Read--- real 0m0.147s user 0m0.135s sys 0m0.012s
December 10th, 2009 at 12:36 am
Glad to see you back!
December 10th, 2009 at 12:54 am
Thanks!
Woops, execute 101 times:
# execute 100 times for i in {0..100}December 11th, 2009 at 8:19 am
I enjoyed the analysis and the walkthrough! Splitting the fields by IFS is a great technique.
December 11th, 2009 at 11:30 am
Please excuse my ignorance, but… why is no semicolon needed between the command setting IFS and read?
ie why isn’t it “while IFS=:; read username…”
December 27th, 2009 at 1:39 am
Please note that:
while IFS=: read username x uid gid gecos home shell
uses IFS not for the duration of the read, but the loop, since a subshell is created. Which implies that the read values are lost outside the loop.
Very often you may split a string and put the result in scalar variables or in an array. The “set –” which does not use a while loop would be more effective.
thanks for your informative posts.
rahul
January 1st, 2010 at 10:06 pm
Hi Brock,
I have been searching the net for an answer about something in your code, but I haven’t come up with a satisfactory answer. It has to do with this:
while IFS=: read username x uid gid gecos home shell
We set IFS=: only for the execution of read, so there is no need to reset it once done splitting the string. Second we read each field (separated by : via IFS) into a variable directly.
Why is IFS only changed for the read execute? Is read done in a subshell? That is my best guess, but I hate guessing compared to _knowing_ . Even if I am wrong, I have learned a lot about subshells in BASH, as to how you get them, either intentionally, or unintentionally.
Thanks for this post – it has been a thought provoking exercise so far.
January 7th, 2010 at 9:04 am
Rob: see bash man page (http://linux.die.net/man/1/bash) and search for ‘SIMPLE COMMAND EXPANSION’.
quote:
“If no command name results, the variable assignments affect the current shell environment. Otherwise, the variables are added to the environment of the executed command and do not affect the current shell environment. If any of the assignments attempts to assign a value to a readonly variable, an error occurs, and the command exits with a non-zero status.”
May 18th, 2011 at 11:16 pm
You address the speed merits of the native form, but you don’t mention that it is also much easier to read and does a better job of self-documenting. It is much nicer to compare $uid to something than $3 with a comment explaining what’s in $3.
Thanks.