Yesterday a 20 line shell script caught a race condition in one of software I work on. Our best engineers looked at the problem, a vendor case was filed, and there was thoughts of automatically restarted the application to fix the issue at hand.

In the end, a 20 line shell script to strace for file operations continuously provided the needed visibility into the issue at hand.

Update: The shell script.

The shell script itself, though trivial, is owned by the company I work for, but the basic concept was as follows:

#!/bin/bash

# loop forever
while sleep 60
do
pid=$(ps -ef | grep application | egrep -v grep | awk ‘{print $2}’)
if [[ -n $pid ]]
then
outfile=strace-application-$(date +%Y%M%d-%H:%M:%S).out.gz
strace -tt -f -e trace=file -p $pid 2>&1 | gzip -c > $outfile
fi
done

Splitting Strings Natively with the Shell: Native vs Native

In my previous post on why to split strings with bash itself, I used set to split the string.

This was much faster than using a sub-shell and awk or cut. However, we can do better! The read command accepts a list of variables to split the input. Combined with setting a per command variable, we can write an even more elegant solution.

The magic is here:

while IFS=: read username x uid gid gecos home shell

We set IFS=: only for the execution of read, so there is no need to reset it once done splitting the string. Second we read each field (separated by : via IFS) into a variable directly.

Below is the script we will use to compare the two methods. You will notice I had to up the iterations to 100 in order to see a difference in execution speed:

[root@sandbox ~]# cat ifs-test2.sh
#!/bin/bash
split_words_native() {
    # execute 100 times
    for i in {0..100}
    do
        while read line
        do
            oldIFS=$IFS
            IFS=:
            set -- $line
            IFS=$oldIFS
            # at this point $1 is the username, $3
            # is the uid, and $7 is the shell
            if [[ $3 -gt 10 ]] && [[ '/sbin/nologin' == "$7" ]]
            then
                echo $1
            fi
        done < /etc/passwd
    done
}

split_words_native_read() {
    # execute 100 times
    for i in {0..100}
    do
        while IFS=: read username x uid gid gecos home shell
        do
            if [[ $uid -gt 10 ]] && [[ '/sbin/nologin' == "$shell" ]]
            then
                echo $username
            fi
        done < /etc/passwd
done
}
echo "---Native---"
time split_words_native >/dev/null
echo -e "\n---Read---"
time split_words_native_read >/dev/null

Using read is more elegant and a little faster:

[root@sandbox ~]# ./ifs-test2.sh
---Native---

real    0m0.179s
user    0m0.168s
sys     0m0.010s

---Read---

real    0m0.147s
user    0m0.135s
sys     0m0.012s

Today I want to discuss splitting strings into tokens or “words”. I previously discussed how to do this with the IFS variable and promised a more in depth discussion. Today, I will make the case on WHY to use IFS to split strings as opposed to using a subshell combined with awk or cut.

I wrote this script which reads the /etc/password file line-by-line and prints the username of any user which has a UID greater than 10 and has the shell of /sbin/nologin. Each test function performs this task 10 times to increase the length of the test:

[root@sandbox ~]# cat ifs-test.sh
#!/bin/bash
split_words_cut() {
       # execute 10 times
        for i in {0..9}
        do
                while read line
                do
                        # get uid
                        id=$(echo $line | cut -d: -f3)
                        if [[ $id -gt 10 ]]
                        then
                                # get shell
                                shell=$(echo $line | echo $line | cut -d: -f7)
                                if [[ '/sbin/nologin' == "$shell" ]]
                                then
                                        # print username
                                        echo $line | cut -d: -f1
                                fi
                        fi
                done < /etc/passwd
        done
}

split_words_awk() {
        # execute 10 times
        for i in {0..9}
        do
                while read line
                do
                        # get uid
                        id=$(echo $line | awk -F: '{print $3}')
                        if [[ $id -gt 10 ]]
                        then
                                # get shell
                                shell=$(echo $line | awk -F: '{print $NF}')
                                if [[ '/sbin/nologin' == "$shell" ]]
                                then
                                        # print username
                                        echo $line | awk -F: '{print $1}'
                                fi
                        fi
                done < /etc/passwd
        done
}
split_words_native() {
        # execute 10 times
        for i in {0..9}
        do
                while read line
                do
                        oldIFS=$IFS
                        IFS=:
                        set -- $line
                        IFS=$oldIFS
                        # at this point $1 is the username, $3
                        # is the uid, and $7 is the shell
                        if [[ $3 -gt 10 ]] && [[ '/sbin/nologin' == "$7" ]]
                        then
                              echo $1
                        fi
                done < /etc/passwd
        done
}
echo -e "---Cut---"
time split_words_cut >/dev/null
echo -e "\n---Awk---"
time split_words_awk >/dev/null
echo -e "\n---Native---"
time split_words_native >/dev/null

As you can see, using the shell itself is about two orders of magnitude faster than using the subshell awk/cut method:

[root@sandbox ~]# ./ifs-test.sh
---Cut---

real    0m1.184s
user    0m0.118s
sys     0m0.676s

---Awk---

real    0m1.279s
user    0m0.151s
sys     0m0.750s

---Native---

real    0m0.018s
user    0m0.014s
sys     0m0.003s

This is why you should using IFS when splitting strings….

Reading a file, line by line

November 24th, 2009

nixcraft has a link on how to read a file line by line. The method is a great way to read a file, but there some trouble spots I thought I would point out.

In the script, the special variable IFS is set:

# set the Internal Field Separator to a pipe symbol
IFS='|'

The tells the read command to split “cyberciti.biz|74.86.48.99″ into “cyberciti.biz” and “74.86.48.99″ and thus fill both the domain and ip variables here:

while read domain ip

Using BASH to split strings is much faster than doing something line this:

while read line
do
   domain=$(echo $line | awk -F'|' '{print $1}'
   ip=$(echo $line | awk -F'|' '{print $2}'

As new script writers typically do. However, setting IFS and forgetting to reset the special variable can cause some odd problems in longer scripts. For example, lets say you needed to read a second file, later on in the script. This one delimited by spaces. For simplicity, I will take the same file and just replace the pipe characters with spaces.

/tmp/domains-using-space.txt

root@b92 [~]# cat /tmp/domains-using-space.txt
cyberciti.biz 74.86.48.99
nixcraft.com 75.126.168.152
theos.in 75.126.168.153
cricketnow.in 75.126.168.154
vivekgite.com 75.126.168.155

Now, here is my new script:

#!/bin/ksh
# set the Internal Field Separator to a pipe symbol
IFS='|'

# file name
file=/tmp/domains.txt

# use while loop to read domain and ip
while read domain ip
do
    print "$domain has address $ip"
done <"$file"

echo ------------------------
file=/tmp/domains-using-space.txt

# use while loop to read domain and ip
while read domain ip
do
    print "$domain has address $ip"
done <"$file"

As you can see, the output is incorrect:

root@b92 [~]# ./test.sh
cyberciti.biz has address 74.86.48.99
nixcraft.com has address 75.126.168.152
theos.in has address 75.126.168.153
cricketnow.in has address 75.126.168.154
vivekgite.com has address 75.126.168.155
------------------------
cyberciti.biz 74.86.48.99 has address
nixcraft.com 75.126.168.152 has address
theos.in 75.126.168.153 has address
cricketnow.in 75.126.168.154 has address
vivekgite.com 75.126.168.155 has address

By saving and resetting the special variable IFS, we can eliminate this problem:

#!/bin/ksh
# file name
file=/tmp/domains.txt

# set the Internal Field Separator to a pipe symbol
oldIFS="$IFS"
IFS='|'

# use while loop to read domain and ip
while read domain ip
do
    print "$domain has address $ip"
done <"$file"
IFS="$oldIFS"

echo ------------------------
file=/tmp/domains-using-space.txt

# use while loop to read domain and ip
while read domain ip
do
    print "$domain has address $ip"
done <"$file"

The output from the new script, which saves and resets IFS:

cyberciti.biz has address 74.86.48.99
nixcraft.com has address 75.126.168.152
theos.in has address 75.126.168.153
cricketnow.in has address 75.126.168.154
vivekgite.com has address 75.126.168.155
------------------------
cyberciti.biz has address 74.86.48.99
nixcraft.com has address 75.126.168.152
theos.in has address 75.126.168.153
cricketnow.in has address 75.126.168.154
vivekgite.com has address 75.126.168.155

In short, IFS is a great way to split strings. My next article will be a more in depth discussion of this topic. In the mean time, one item to remember when using IFS, is to always save and reset this variable.

After a long break I have some more ideas…. Will be posting them soon. For now, I leave you with a command to clear your current environment:

root@67 [~]# printenv | wc -l
26
root@67 [~]# env -i printenv | wc -l
0

This is very useful when you want to run a command ignoring any environment variables you have set. I use this command with curl nearly everyday to ignore the http_proxy environment variable I have set. Another, longer, option is:

root@67 [~]# http_proxy="" curl ....

I prefer env -i as its simpler.

Quite some time ago I wrote about using xsltproc to process xml on the command line. Thank fully someone pointed out XMLStarlet.  I now use XMLStarlet almost every day.  I work with a variety of REST based API’s gather information. XMLStartlet along with a simple for loop or xargs gives you an exceedingly powerful set of tools.

Here is a quick introduction into the power of XMLStarlet. This is just a teaser as I cannot share the data I work with. However, you should be able to see the power of this tool.

All the links from my RSS feed:

$ curl -s 'http://bashcurescancer.com/rss/' | xml sel -t -m '//link' -v '.' -n

http://bashcurescancer.com
http://bashcurescancer.com/processing-xml-on-the-command-line.html

http://bashcurescancer.com/do-not-close-stderr.html
http://bashcurescancer.com/prepend-to-a-file-with-sponge-from-moreutils.html
http://bashcurescancer.com/bug-in-curl-is-fixed.html
http://bashcurescancer.com/using-kill-to-see-if-a-process-is-alive.html
http://bashcurescancer.com/performance-testing-with-curl.html
http://bashcurescancer.com/new-command-prepend.html
http://bashcurescancer.com/shell-function-which-webserver-does-that-site-run.html
http://bashcurescancer.com/exposing-command-line-programs-as-web-services.html

http://bashcurescancer.com/wrapping-dynamic-languages-in-shell-without-an-extra-script.html

Or how about “Title: link”

$ curl -s 'http://bashcurescancer.com/rss/' | xml sel -t -m '//item' -v 'title' -o ': ' -v 'link' -n
Processing XML on the Command Line: http://bashcurescancer.com/processing-xml-on-the-command-line.html
Do not close stderr: http://bashcurescancer.com/do-not-close-stderr.html
prepend to a file with sponge from moreutils: http://bashcurescancer.com/prepend-to-a-file-with-sponge-from-moreutils.html
Bug in Curl is fixed: http://bashcurescancer.com/bug-in-curl-is-fixed.html
using kill to see if a process is alive: http://bashcurescancer.com/using-kill-to-see-if-a-process-is-alive.html
Performance testing - with curl: http://bashcurescancer.com/performance-testing-with-curl.html
New command: prepend: http://bashcurescancer.com/new-command-prepend.html
Shell Function - Which Webserver Does That Site Run?: http://bashcurescancer.com/shell-function-which-webserver-does-that-site-run.html
Exposing command line programs as web services: http://bashcurescancer.com/exposing-command-line-programs-as-web-services.html
Wrapping dynamic languages in shell without an extra script: http://bashcurescancer.com/wrapping-dynamic-languages-in-shell-without-an-extra-script.html

You may need to do some reading on xpaths and xsl stylesheets to use the full power of the tool.

The other day on the cURL email list, someone asked:

Could someone please tell me (preferably with an example) of how I could parse and xml like the following:

<?xml version=”1.0″ encoding=”ISO-8859-1″ ?>
<FileRetriever>
<FileList>
<File name=”AMERI08.D4860.ZIP” />
<File name=”DTCCRSF.D4861.ZIP” />
<File name=”DTGSS01.D4862.ZIP” />
<File name=”DTGSS02.D4863.ZIP” />
<File name=”DTGSS03.D4864.ZIP” /
</FileList>
</FileRetriever>

This is not appropriate for the cURL list, but I thought a fair question.  You could do this:

$ grep '<File ' config.xml  | awk -F'"' '{print $2}' | xargs -l -I {} echo curl -I "http://bashcurescancer.com/{}"
curl -I http://bashcurescancer.com/AMERI08.D4860.ZIP
curl -I http://bashcurescancer.com/DTCCRSF.D4861.ZIP
curl -I http://bashcurescancer.com/DTGSS01.D4862.ZIP
curl -I http://bashcurescancer.com/DTGSS02.D4863.ZIP
curl -I http://bashcurescancer.com/DTGSS03.D4864.ZIP

Or, you could use the xsltproc command with an associated style sheet. This is really the correct method and much more effective when your processing complex XML or XML that is not easily grep’able:

$ xsltproc --nonet config.xsl config.xml | xargs -l -I {} echo curl -I "http://bashcurescancer.com/{}"
curl -I http://bashcurescancer.com/AMERI08.D4860.ZIP
curl -I http://bashcurescancer.com/DTCCRSF.D4861.ZIP
curl -I http://bashcurescancer.com/DTGSS01.D4862.ZIP
curl -I http://bashcurescancer.com/DTGSS02.D4863.ZIP
curl -I http://bashcurescancer.com/DTGSS03.D4864.ZIP

Links to config.xml and config.xsl.

Do not close stderr

April 22nd, 2008

A few years ago, I wrote a post commenting on how ugly this was:

$ someprog 2>/dev/null

I was nearly imploring the reader to close stderr:

$ someprog 2>&-

Some very knowledgeable anonymous commenter explained why that was a bad idea. At the time, I didn’t understand exactly what they were saying. As such, I deleted the post. Yesterday, for no particular reason, the implications of closing stderr popped into my head. In the shower no less.

I wrote a simple little C program named do-not-close-stderr.c. It takes two parameters, a string you want written to a file and the file you want said string written to. After opening the file, it prints “some kind of warning message” to stderr. Here we are:

$ gcc -Wall do-not-close-stderr.c -o do-not-close-stderr
$ ./do-not-close-stderr "Brock was here." output
Some kind of warning message.
$ cat output
Brock was here.

Now lets close standard error when executing:

$ ./do-not-close-stderr "Brock was here." output 2>&-
$ cat output
Some kind of warning message.
Brock was here.

Thanks to whoever that commenter was.

A few weeks I wrote about a tool, which helps you easily prepend to a file. I submitted prepend to moreutils and Joey was kind enough to point out this could be done with `sponge’.  sponge reads standard input and when done, writes it to a file:

Probably the most general purpose tool in moreutils so far is sponge(1), which lets you do things like this:

% sed "s/root/toor/" /etc/passwd | grep -v joey | sponge /etc/passwd

Two days ago Joey released version 0.29 of moreutils including a patch by yours truly (with much help from Joey).

sponge: Handle large data sizes by using a temp file rather than by  consuming arbitrary amounts of memory. Patch by Brock Noland. version 0.29 changelog

Also, on a non-command line note, I found a video on Joey’s site which I thought was pretty cool, Joey Learns to Fly.

Bug in Curl is fixed

April 14th, 2008

I love curl. I use it quite often to perform HTTP HEAD requests:

$ curl -I http://bashcurescancer.com
HTTP/1.1 200 OK
Date: Mon, 14 Apr 2008 03:11:35 GMT
Server: Apache/2.2.6 (Unix)
X-Pingback: http://bashcurescancer.com/wordpress/xmlrpc.php
Last-Modified: Mon, 14 Apr 2008 02:38:11 GMT
Connection: close
Content-Type: text/html; charset=UTF-8

However, I sometimes forget if a HEAD request is -I or -i, as such I usually specify them both. Lowercase i is “include headers in output” and uppercase I tells curl to use HEAD instead of GET.  When you use -I, -i is implied.

Given all this, there should be no problems specifying both options. However, if you place -I before -i, curl doesn’t actually display the response. Here is the output from my bug report to curl-users:

$ curl -I -i http://bashcurescancer.com
$ curl -i -I http://bashcurescancer.com
HTTP/1.1 200 OK
Date: Mon, 14 Apr 2008 03:11:35 GMT
Server: Apache/2.2.6 (Unix)
X-Pingback: http://bashcurescancer.com/wordpress/xmlrpc.php
Last-Modified: Mon, 14 Apr 2008 02:38:11 GMT
Connection: close
Content-Type: text/html; charset=UTF-8

Curl uses a long integer for configuration flags via bit masking. The problem arises in that the -I option sets two bits bit and the -i option XOR’s one of those same bits:

src/main.c
case 'i':
config->conf ^= CONF_HEADER; /* include the HTTP header as well */
break;
...
case 'I':
/*
* This is a bit tricky. We either SET both bits, or we clear both
* bits. Let's not make any other outcomes from this.
*/
if((CONF_HEADER|CONF_NOBODY) !=
(config->conf&(CONF_HEADER|CONF_NOBODY)) ) {
/* one of them weren't set, set both */
config->conf |= (CONF_HEADER|CONF_NOBODY);
if(SetHTTPrequest(config, HTTPREQ_HEAD, &config->httpreq))
return PARAM_BAD_USE;
}
else {
/* both were set, clear both */
config->conf &= ~(CONF_HEADER|CONF_NOBODY);
if(SetHTTPrequest(config, HTTPREQ_GET, &config->httpreq))
return PARAM_BAD_USE;
}

Thanks to Daniel Stenberg, the fix “is now committed!