Today I want to discuss splitting strings into tokens or “words”. I previously discussed how to do this with the IFS variable and promised a more in depth discussion. Today, I will make the case on WHY to use IFS to split strings as opposed to using a subshell combined with awk or cut.

I wrote this script which reads the /etc/password file line-by-line and prints the username of any user which has a UID greater than 10 and has the shell of /sbin/nologin. Each test function performs this task 10 times to increase the length of the test:

[root@sandbox ~]# cat ifs-test.sh
#!/bin/bash
split_words_cut() {
       # execute 10 times
        for i in {0..9}
        do
                while read line
                do
                        # get uid
                        id=$(echo $line | cut -d: -f3)
                        if [[ $id -gt 10 ]]
                        then
                                # get shell
                                shell=$(echo $line | echo $line | cut -d: -f7)
                                if [[ '/sbin/nologin' == "$shell" ]]
                                then
                                        # print username
                                        echo $line | cut -d: -f1
                                fi
                        fi
                done < /etc/passwd
        done
}

split_words_awk() {
        # execute 10 times
        for i in {0..9}
        do
                while read line
                do
                        # get uid
                        id=$(echo $line | awk -F: '{print $3}')
                        if [[ $id -gt 10 ]]
                        then
                                # get shell
                                shell=$(echo $line | awk -F: '{print $NF}')
                                if [[ '/sbin/nologin' == "$shell" ]]
                                then
                                        # print username
                                        echo $line | awk -F: '{print $1}'
                                fi
                        fi
                done < /etc/passwd
        done
}
split_words_native() {
        # execute 10 times
        for i in {0..9}
        do
                while read line
                do
                        oldIFS=$IFS
                        IFS=:
                        set -- $line
                        IFS=$oldIFS
                        # at this point $1 is the username, $3
                        # is the uid, and $7 is the shell
                        if [[ $3 -gt 10 ]] && [[ '/sbin/nologin' == "$7" ]]
                        then
                              echo $1
                        fi
                done < /etc/passwd
        done
}
echo -e "---Cut---"
time split_words_cut >/dev/null
echo -e "\n---Awk---"
time split_words_awk >/dev/null
echo -e "\n---Native---"
time split_words_native >/dev/null

As you can see, using the shell itself is about two orders of magnitude faster than using the subshell awk/cut method:

[root@sandbox ~]# ./ifs-test.sh
---Cut---

real    0m1.184s
user    0m0.118s
sys     0m0.676s

---Awk---

real    0m1.279s
user    0m0.151s
sys     0m0.750s

---Native---

real    0m0.018s
user    0m0.014s
sys     0m0.003s

This is why you should using IFS when splitting strings….

9 Responses to “Splitting Strings Natively with the Shell: Why”

  1. Ben Says:

    The overhead is most likely due to the system call overhead of spawning and starting (fork()ing and exec()ing) the awk and cut processes. It would be interesting to see how “already started” cut and awk compare with bash/IFS speedwise.

  2. Brock Noland Says:

    Yes, you are correct!

    However I do not see how that would work. One thought would be to set them up to listen to a fifo, ensure they are line buffered, and have them ignore EOF, you could send them a single line and get a single response.

    Like this:

    # make fifo
    mkfifo awk-input-username
    mkfifo awk-output-username
    awk -F: ‘{print $1} < awk-input-username > awk-output-username
    ….
    # get username
    echo $line > awk-input-username &
    read username < awk-output-username

    Of course they do NOT ignore EOF so the awk process will exit after reading a single line.

  3. Splitting Strings Natively with the Shell: Native vs Native Says:

    [...] my previous post on why to split strings with bash itself, I used set to split the [...]

  4. Ben Says:

    Doh. I should have thought of using fifos…

    Dumb suggestion: if you were totally desperate to use cut or awk in this way, could you convince tr to filter out EOF (that is, ^D = 0×04) to something less destructive? I wonder if tr is coded to allow this behavior. Just send a signal later to cut to get it to die.

  5. Brock Noland Says:

    I do not believe there is a way to “filter” out the EOF.

  6. Chris F.A. Johnson Says:

    tr -d ’4′

  7. Chris F.A. Johnson Says:

    That should be:

    tr -d '\04'
  8. Катя Says:

    Хм

  9. Luke Shumaker Says:

    Sorry to burst your bubble, but:

    sed -n ‘s@^\(.*\):\(.*\):\(…*\):\(.*\):\(.*\):\(.*\):/bin/nologin$@\1@p’ /etc/passwd

    —Sed—

    real 0m0.015s
    user 0m0.000s
    sys 0m0.004s

    —Cut—

    real 0m4.066s
    user 0m1.336s
    sys 0m2.004s

    —Awk—

    real 0m2.523s
    user 0m0.576s
    sys 0m1.716s

    —Native—

    real 0m0.066s
    user 0m0.060s
    sys 0m0.000s

    Although, it wouldn’t have been possible if you’d specified a uid of, say, 15. I was able to rely on the fact that >10 = at least 2 digits for mine.

Leave a Reply

If Wordpress eats your comment (shell output, loops, ex..) brock (at) gmail dot com.