Splitting Strings Natively with the Shell: Native vs Native

In my previous post on why to split strings with bash itself, I used set to split the string.

This was much faster than using a sub-shell and awk or cut. However, we can do better! The read command accepts a list of variables to split the input. Combined with setting a per command variable, we can write an even more elegant solution.

The magic is here:

while IFS=: read username x uid gid gecos home shell

We set IFS=: only for the execution of read, so there is no need to reset it once done splitting the string. Second we read each field (separated by : via IFS) into a variable directly.

Below is the script we will use to compare the two methods. You will notice I had to up the iterations to 100 in order to see a difference in execution speed:

[root@sandbox ~]# cat ifs-test2.sh
#!/bin/bash
split_words_native() {
    # execute 100 times
    for i in {0..100}
    do
        while read line
        do
            oldIFS=$IFS
            IFS=:
            set -- $line
            IFS=$oldIFS
            # at this point $1 is the username, $3
            # is the uid, and $7 is the shell
            if [[ $3 -gt 10 ]] && [[ '/sbin/nologin' == "$7" ]]
            then
                echo $1
            fi
        done < /etc/passwd
    done
}

split_words_native_read() {
    # execute 100 times
    for i in {0..100}
    do
        while IFS=: read username x uid gid gecos home shell
        do
            if [[ $uid -gt 10 ]] && [[ '/sbin/nologin' == "$shell" ]]
            then
                echo $username
            fi
        done < /etc/passwd
done
}
echo "---Native---"
time split_words_native >/dev/null
echo -e "\n---Read---"
time split_words_native_read >/dev/null

Using read is more elegant and a little faster:

[root@sandbox ~]# ./ifs-test2.sh
---Native---

real    0m0.179s
user    0m0.168s
sys     0m0.010s

---Read---

real    0m0.147s
user    0m0.135s
sys     0m0.012s

8 Responses to “Splitting Strings Natively with the Shell: Native vs Native”

  1. Adam Says:

    Glad to see you back!

  2. Brock Noland Says:

    Thanks!

    Woops, execute 101 times:

        # execute 100 times
        for i in {0..100}
    
  3. Rick Umali Says:

    I enjoyed the analysis and the walkthrough! Splitting the fields by IFS is a great technique.

  4. johnraff Says:

    Please excuse my ignorance, but… why is no semicolon needed between the command setting IFS and read?
    ie why isn’t it “while IFS=:; read username…”

  5. rahul benegal Says:

    Please note that:

    while IFS=: read username x uid gid gecos home shell

    uses IFS not for the duration of the read, but the loop, since a subshell is created. Which implies that the read values are lost outside the loop.

    Very often you may split a string and put the result in scalar variables or in an array. The “set –” which does not use a while loop would be more effective.

    thanks for your informative posts.
    rahul

  6. Rob S. Says:

    Hi Brock,

    I have been searching the net for an answer about something in your code, but I haven’t come up with a satisfactory answer. It has to do with this:

    while IFS=: read username x uid gid gecos home shell

    We set IFS=: only for the execution of read, so there is no need to reset it once done splitting the string. Second we read each field (separated by : via IFS) into a variable directly.

    Why is IFS only changed for the read execute? Is read done in a subshell? That is my best guess, but I hate guessing compared to _knowing_ . Even if I am wrong, I have learned a lot about subshells in BASH, as to how you get them, either intentionally, or unintentionally.

    Thanks for this post – it has been a thought provoking exercise so far.

  7. Redy Says:

    Rob: see bash man page (http://linux.die.net/man/1/bash) and search for ‘SIMPLE COMMAND EXPANSION’.

    quote:
    “If no command name results, the variable assignments affect the current shell environment. Otherwise, the variables are added to the environment of the executed command and do not affect the current shell environment. If any of the assignments attempts to assign a value to a readonly variable, an error occurs, and the command exits with a non-zero status.”

  8. Ben Hartshorne Says:

    You address the speed merits of the native form, but you don’t mention that it is also much easier to read and does a better job of self-documenting. It is much nicer to compare $uid to something than $3 with a comment explaining what’s in $3.

    Thanks.

Leave a Reply

If Wordpress eats your comment (shell output, loops, ex..) brock (at) gmail dot com.