Bug in Curl is fixed

April 14th, 2008

I love curl. I use it quite often to perform HTTP HEAD requests:

$ curl -I http://bashcurescancer.com
HTTP/1.1 200 OK
Date: Mon, 14 Apr 2008 03:11:35 GMT
Server: Apache/2.2.6 (Unix)
X-Pingback: http://bashcurescancer.com/wordpress/xmlrpc.php
Last-Modified: Mon, 14 Apr 2008 02:38:11 GMT
Connection: close
Content-Type: text/html; charset=UTF-8

However, I sometimes forget if a HEAD request is -I or -i, as such I usually specify them both. Lowercase i is “include headers in output” and uppercase I tells curl to use HEAD instead of GET.  When you use -I, -i is implied.

Given all this, there should be no problems specifying both options. However, if you place -I before -i, curl doesn’t actually display the response. Here is the output from my bug report to curl-users:

$ curl -I -i http://bashcurescancer.com
$ curl -i -I http://bashcurescancer.com
HTTP/1.1 200 OK
Date: Mon, 14 Apr 2008 03:11:35 GMT
Server: Apache/2.2.6 (Unix)
X-Pingback: http://bashcurescancer.com/wordpress/xmlrpc.php
Last-Modified: Mon, 14 Apr 2008 02:38:11 GMT
Connection: close
Content-Type: text/html; charset=UTF-8

Curl uses a long integer for configuration flags via bit masking. The problem arises in that the -I option sets two bits bit and the -i option XOR’s one of those same bits:

src/main.c
case 'i':
config->conf ^= CONF_HEADER; /* include the HTTP header as well */
break;
...
case 'I':
/*
* This is a bit tricky. We either SET both bits, or we clear both
* bits. Let's not make any other outcomes from this.
*/
if((CONF_HEADER|CONF_NOBODY) !=
(config->conf&(CONF_HEADER|CONF_NOBODY)) ) {
/* one of them weren't set, set both */
config->conf |= (CONF_HEADER|CONF_NOBODY);
if(SetHTTPrequest(config, HTTPREQ_HEAD, &config->httpreq))
return PARAM_BAD_USE;
}
else {
/* both were set, clear both */
config->conf &= ~(CONF_HEADER|CONF_NOBODY);
if(SetHTTPrequest(config, HTTPREQ_GET, &config->httpreq))
return PARAM_BAD_USE;
}

Thanks to Daniel Stenberg, the fix “is now committed!

I am making some changes to the moreutils sponge command. Sponge provides a method of prepending which is less specialized than my prepend util. However, it has trouble with large amounts of input.

Regardless, while testing my changes, I want to watch it operate. Normally, you would just do so from a second terminal. That is a pain. kill -0 can be very useful for this. After backgrounding the command, I assign the pid (via the variable $!) to $pid using eval. eval is needed to stop BASH from expanding $! until after the background operation.

After that, I enter a while loop on kill -0 $pid, which will not kill $pid, but will return successfully until $pid has died:

# cat large-file-GB | ./sponge large-file-GB-copy & eval 'pid=$!'; while kill -0 $pid; do sleep 10; ls -lh large-file* /tmp/sponge.*; echo;done
[1] 7937
-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw------- 1 root root 128M 2008-04-09 17:23 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw------- 1 root root 384M 2008-04-09 17:23 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw------- 1 root root 877M 2008-04-09 17:24 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw-r--r-- 1 root root  20M 2008-04-09 17:24 large-file-GB-copy
-rw------- 1 root root 896M 2008-04-09 17:24 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw-r--r-- 1 root root 413M 2008-04-09 17:25 large-file-GB-copy
-rw------- 1 root root 896M 2008-04-09 17:24 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw-r--r-- 1 root root 836M 2008-04-09 17:25 large-file-GB-copy
-rw------- 1 root root 896M 2008-04-09 17:24 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw-r--r-- 1 root root 920M 2008-04-09 17:25 large-file-GB-copy
[1]+  Done                    cat large-file-GB | ./sponge large-file-GB-copy
ls: cannot access /tmp/sponge.*: No such file or directory

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw-r--r-- 1 root root 977M 2008-04-09 17:25 large-file-GB-copy
-bash: kill: (7937) - No such process
# md5sum large-file-GB*
b5c667a723a10a3485a33263c4c2b978  large-file-GB
b5c667a723a10a3485a33263c4c2b978  large-file-GB-copy

Often I need or want to do some type of performance testing. Given my ideas on software development, I can usually do this by making simple HTTP requests. I use curl for this. While you may be tempted to do this in a for loop (or worse, actually write something!):

$ time for i in {1..1000}; do curl -s "http://bashcurescancer.com/blank.html";done
real    0m23.436s
user    0m6.416s
sys     0m7.351s

Curl provides the same functionality:

$ time curl -s "http://bashcurescancer.com/blank.html?[1-1000]"
real    0m6.561s
user    0m0.294s
sys     0m0.494s

Here are the details from the curl manual:

The URL syntax is protocol dependent. You’ll find a detailed description in RFC 3986.

You can specify multiple URLs or parts of URLs by writing part sets within braces as in:

http://site.{one,two,three}.com

or you can get sequences of alphanumeric series by using [ ] as in:

ftp://ftp.numericals.com/file[1-100].txt
ftp://ftp.numericals.com/file[001-100].txt    (with leading zeros)
ftp://ftp.letters.com/file[a-z].txt

No nesting of the sequences is supported at the moment, but you can use several ones next to each other:

http://any.org/archive[1996-1999]/vol[1-4]/part{a,b,c}.html

You can specify any amount of URLs on the command line. They will be fetched in a sequential manner in the specified order.

Since curl 7.15.1 you can also specify step counter for the ranges, so that you can get every Nth number or letter:

http://www.numericals.com/file[1-100:10].txt

http://www.letters.com/file[a-z:2].txt

If you specify URL without protocol:// prefix, curl will attempt to guess what protocol you might want. It will then default to HTTP but try other protocols based on often-used host name prefixes. For example, for host names starting with “ftp.” curl will assume you want to  speak FTP.

Curl  will  attempt  to re-use connections for multiple file transfers, so that getting many files from the same server will not do multiple connects / handshakes. This improves speed. Of course this is only done on files specified on a single  command  line  and  cannot  be  used
between separate curl invokes.

This is important as it helps measure the actual change being tested. A for loop, by creating a new process every loop, will fill up your test with “local” time. Using a single curl process eliminates this – which should allow you to see the results of your test in a more transparent manner.

For example, lets say you have a change that reduces page production time. Your not sure how long, so you decide to run 1000 tests. Eliminating a second from a 23 second tests is not 5 percent. While removing a second from a 6 second test, is almost 20%.

New command: prepend

April 6th, 2008

I am utilizing Google’s project hosting to host software which I create and feel is useful or want to keep track of. I called the project Brock’s Tools. The code that led me to create this project was a command I am calling prepend 1.1. (UPDATE: See this post on sponge as its a better general case tool.)

prepend, prepend’s files or standard input to a file. For example,  you have three files:

$ echo BROCK > a
$ echo DAVID > b
$ echo NOLAND > c

And you want to combine them into one file:

$ echo "My name is:" | prepend - a b c
$ cat c
My name is:
BROCK
DAVID
NOLAND

Or lets say you just want to append a file to itself:

$ cat a
BROCK
$ cat a >> a
cat: a: input file is output file

prepend does this:

$ prepend a
$ cat a
BROCK
BROCK

I come across the a situation where this would be useful quite often. Of course prepend’ing can be done in the shell:

$ { echo "My name is:"; cat a b c; } > tmp && mv -f tmp c
$ cat c
My name is:
BROCK
DAVID
NOLAND

However, that is unsafe and I have lost data that way. I perform this operation most often when dealing with XML. In this example, its trivial to open the file in an editor, but with a large file, its quite nasty to do so:

$ cat something.xml
<entry><blah/><more>stuff 1</more></entry>
<entry><blah/><more>stuff 2</more></entry>
<entry><blah/><more>stuff 3 </more></entry>
<entry><blah/><more>stuff 4</more></entry>
$ echo "</entries>" >> something.xml
$ cat something.xml
<entry><blah/><more>stuff 1</more></entry>
<entry><blah/><more>stuff 2</more></entry>
<entry><blah/><more>stuff 3 </more></entry>
<entry><blah/><more>stuff 4</more></entry>
</entries>
$ echo "<entries>" | prepend - something.xml
$ cat something.xml
<entries>
<entry><blah/><more>stuff 1</more></entry>
<entry><blah/><more>stuff 2</more></entry>
<entry><blah/><more>stuff 3 </more></entry>
<entry><blah/><more>stuff 4</more></entry>
</entries>

I just read the following post Python – Script – Which Webserver Does That Site Run? by blogger Corey Goldberg.

I prefer the shell version:

$ what-http-server() { curl -s -I "http://$1" | awk -F': ' '/^Server:/ {print $2}'; }
$ what-http-server www.pylot.org
Apache/2.0.52
$ what-http-server() { curl -s -I "$@" | awk -F': ' '/^Server:/ {print $2}'; }
$ what-http-server www.pylot.org google.com bashcurescancer.com
Apache/2.0.52
gws
Apache/2.2.6 (Unix)

That works but this version is more correct:

what-http-server() { curl -s -I $(for h in "$@"; do printf "http://%s " "$h"; done) | awk -F': ' '/^Server:/ {print $2}'; }

In the version which works for multiple hosts, we are letting curl assume the protocol is HTTP. This works fine most of the time. However, there are exceptions:

If you specify URL without protocol:// prefix, curl will attempt to guess what protocol you might want. It will then default to HTTP but try other protocols based on often-used host name prefixes. For example, for host names starting with “ftp.” curl will assume you want to  speak FTP. – man curl

The web services paradigm of development is based on the Unix philosophy of “small is good”.  Web services should do one job, and do it well, allowing users to develop complex solutions by combining small, reliable and proven services.
Why not then, expose the power of familiar Unix commands like sort, grep, gzip… to the web?

Here is a proof of concept python script (Python 2.3 version) to demonstrate.

Start services:

$ ./to_web.py -p8008 sort &
Thu Mar 27 13:45:54 2008 sort server started - 8008
$ ./to_web.py -p8009 gzip &
Thu Mar 27 13:46:29 2008 gzip server started - 8009

Use the services:

$ for i in {1..10}; do echo ${RANDOM:0:2}; done | \
> curl --data-binary @- "http://swat:8008/sort+-nr" | \
> curl --data-binary @- "http://swat:8009/gzip" | \
> gunzip
97
37
23
23
21
18
11
11
10
10

In my position, we have a database with host information – which has a command line interface. This tool has dependencies which are a painful to resolve. With to_web.py, we can turn the command line tool into a web service and access the data without having to satisfy those additional dependencies.

This is guest post by my esteemed colleague Adam Fokken. He can be reached here: Sadly, he does not have a blog.

There are situations where, if you want a Python, PERL, PHP, etc script to be portable among a few different servers, it makes sense to wrap the script in shell. A few years ago I was trying to use the Python cx_Oracle module. This module is a wrapper for the native Oracle database driver. However, it requires the driver library directory be in the LD_LIBRARY_PATH environment variable.

No problem I thought. I’ll use the os.environ dict to set the variable.  Example script:

$ cat python-only.sh
#!/usr/bin/python
import sys, os
sys.path.append("/usr/local/lib/python2.4/site-packages/")
if not os.environ.has_key('LD_LIBRARY_PATH'):
        os.environ['LD_LIBRARY_PATH'] = "/home/noland/oracle-lib"
else:
        os.environ['LD_LIBRARY_PATH'] = "/home/noland/oracle-lib:" + os.environ['LD_LIBRARY_PATH']
print "LD_LIBRARY_PATH looks OK in Python: LD_LIBRARY_PATH = ", os.environ['LD_LIBRARY_PATH']
os.system('echo LD_LIBRARY_PATH looks OK via os.system: LD_LIBRARY_PATH = $LD_LIBRARY_PATH')
try:
        import cx_Oracle
        print "Imported cx_Oracle! LD_LIBRARY_PATH was set correctly."
except ImportError, e:
        print "Woops, LD_LIBRARY_PATH was not set correctly: ", e

This method does not work:

$ ./python-only.sh
LD_LIBRARY_PATH looks OK in Python: LD_LIBRARY_PATH =  /home/noland/oracle-lib
LD_LIBRARY_PATH looks OK via os.system: LD_LIBRARY_PATH = /home/noland/oracle-lib
Woops, LD_LIBRARY_PATH was not set correctly:  libclntsh.so.10.1: cannot open shared object file: No such file or directory

This seems to be a common problem. However, when I was dealing with this a few years ago, I could not find a good resource on Google. I bite the bullet and wrote a separate shell script wrapper – hating invocation of the shell script. However, there is absolutely no reason I needed a separate shell script. I could have embedded the Python within a shell script. Example:

$ cat python-and-bash.sh
#/bin/bash
export LD_LIBRARY_PATH=/home/noland/oracle-lib:$LD_LIBRARY_PATH
/usr/bin/python<<END_OF_PYTHON
import sys
sys.path.append("/usr/local/lib/python2.4/site-packages/")
try:
        import cx_Oracle
        print "Imported cx_Oracle! LD_LIBRARY_PATH was set correctly."
except ImportError, e:
        print "Woops, LD_LIBRARY_PATH was not set correctly: ", e
END_OF_PYTHON

Ahh, much better:

$ ./python-and-bash.sh
Imported cx_Oracle! LD_LIBRARY_PATH was set correctly.

Of course I could have just set this variable in my profile. However, this creates an additional external dependency – which is what I was trying to avoid.

Process Substitution

March 23rd, 2008

Quite some time ago, someone wrote me to ask about a possible article on process substitution. Sadly, I could not find the email so I cannot credit them. As you likely have guessed, I am finally writing a post on process substitution.

Many times I have used pipelines and temporary files when process substitution would be a much cleaner solution.

First, I am going to create two test files:

$ dd if=/dev/urandom of=file-small count=750001
$ dd if=/dev/urandom of=file-large count=1000000
$ ls -l file-*
-rw-r--r-- 1 noland noland 512000000 Mar 23 08:53 file-large
-rw-r--r-- 1 noland noland 384000512 Mar 23 08:49 file-small

I thought of writing this article while writing a script to test ftp servers and file locking. As such I will upload the small file to a file named append-example:

$ curl -T file-small --user noland ftp://localhost/append-example
Enter host password for user 'noland':
$ ls -l append-example
-rw-r--r-- 1 noland noland 384000512 Mar 23 11:52 append-example

Now I will append  the large file:

$ curl -s -a -T file-large --user noland ftp://localhost/append-example
Enter host password for user 'noland':
$ ls -l append-example
-rw-r--r-- 1 noland noland 896000512 Mar 23 11:54 append-example

I am going to use dd and process substituion to caculate the MD5 hash of the first upload:

$ md5sum file-small <(dd if=append-example count=750001 status=noxfer)
dfabff7441bd814145a804e03d333864  file-small
1000000+0 records in
1000000+0 records out
dfabff7441bd814145a804e03d333864  /dev/fd/63

Now the portion that was appended:

$ md5sum file-large <(dd if=append-example  skip=750001 status=noxfer)
1b8daed9e435fc90b4a49d74b55f96f4  file-large
1000000+0 records in
1000000+0 records out
1b8daed9e435fc90b4a49d74b55f96f4  /dev/fd/63

When you place a command inside <( ) the shell sets standard output of the command to pipe inside /dev/fd/ and replaces the command with that pipe. Here is the classic example:

$ echo <(echo) <(echo) <(echo) <(echo)
/dev/fd/63 /dev/fd/62 /dev/fd/61 /dev/fd/60

In my script I use process substitution as below (effectively) which feels exeedingly clean:

$ read hash name < <(md5sum <(dd if=append-example skip=750001 status=noxfer))
1000000+0 records in
1000000+0 records out
$ printf "hash=%s name=%s\n" $hash $name
hash=1b8daed9e435fc90b4a49d74b55f96f4 name=/dev/fd/63

UPDATE: Including the one I added after posting and Elias‘ quoting exampling the comments we are up to eight.

After reading Shell Scripting Recipes, I became more interested in the speed of shell operations. In his book, Chris says “Command Substitution Is Slow.” He is correct!

$ f() { echo -n }; time for i in {0..100}; do v=$( f ); done

real    0m4.189s
user    0m0.000s
sys     0m4.188s
$ f() { _F="" }; time for i in {0..100}; do f; v=$_F; done

real    0m0.006s
user    0m0.000s
sys     0m0.000s

I found a few other equivalent operations which can be used to speed up shell scripts to varying degrees (none like the above) depending on the task at hand.  As Chris says, “the extra few milliseconds … may not seem significant, but scripts often loop hundred of even thousands of times.”

${#array[@]} is faster than () when expanding an array (#7)

$ a=(); time for i in {0..1000}; do a=(${a[@]} $i);done; echo ${#a[@]}

real    0m3.545s
user    0m3.544s
sys     0m0.000s
1001
$ a=(); time for i in {0..1000}; do a[${#a[@]}]=$i;done; echo ${#a[@]}

real    0m0.043s
user    0m0.040s
sys     0m0.003s
1001

< is faster than cat

$ time for i in {0..10000}; do var=`cat out`;done

real    0m9.328s
user    0m2.892s
sys     0m6.436s
$ time for i in {0..10000}; do var=`<out`;done
real    0m5.930s
user    0m1.412s
sys     0m4.520s

echo is faster than printf (though not nearly as powerful)

$ time for i in {0..100000}; do printf "\n"; done >/dev/null

real    0m4.446s
user    0m4.076s
sys     0m0.236s

$ time for i in {0..100000}; do echo; done >/dev/null

real    0m3.291s
user    0m3.100s
sys     0m0.184s

Arithmetic Evaluation is faster than let

$ i=0; time while :; do let "i = i + 1"; [[ $i -gt 100000 ]] && break;done
real    0m8.211s
user    0m7.900s
sys     0m0.304s
$ i=0; time while :; do ((i++)); [[ $i -gt 100000 ]] && break;done

real    0m5.287s
user    0m4.980s
sys     0m0.304s

UPDATE: This appears to still be true, but by a different margin. See comments.

List expansion is faster than seq and command substitution (though not always available)

$ time for i in $(seq 0 1000000); do :; done

real    0m28.482s
user    0m28.066s
sys     0m0.412s

$ time for i in {0..1000000}; do :; done

real    0m24.563s
user    0m24.402s
sys     0m0.156s

UPDATE: On BSD systems the apparent seq equivalent (jot) is faster than list expansion. See comments.

: is faster than true

$ i=0; time while true; do ((i++)); [[ $i -gt 1000000 ]] && break;done

real    0m57.360s
user    0m53.967s
sys     0m3.392s
$ i=0; time while :; do ((i++)); [[ $i -gt 1000000 ]] && break;done

real    0m54.138s
user    0m50.571s
sys     0m3.560s

Missing space – deleting open files

I ran into this one again today. If a file is open when deleted, it will not appear in a directory listing, but will take up space.

# df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      72G   58G   11G  86% /
# cat - >>large-file &
[1] 8958
# lsof large-file
COMMAND  PID USER   FD   TYPE DEVICE       SIZE    NODE NAME
cat     8958 root    1w   REG  253,0 5120000000 4300883 large-file
# rm -f large-file
# lsof | grep large-file
cat       8958      root    1w      REG      253,0 5120000000    4300883 /root/large-file (deleted)
# df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      72G   58G   11G  86% /
# kill -9 8958
# df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      72G   53G   15G  79% /
[1]+  Killed                  cat - >>large-file

uuencode/uudecode on RHEL (CentOS)

Earlier today I was looking to use uuencode on my RHEL host. Unfortunately, yum did not help:

# yum search uuencode
Loading "installonlyn" plugin
Setting up repositories
base                      100% |=========================| 1.1 kB    00:00
updates                   100% |=========================|  951 B    00:00
addons                    100% |=========================|  951 B    00:00
extras                    100% |=========================| 1.1 kB    00:00
Reading repository metadata in from local files
No Matches found

Furthermore, I struggled to find the correct search terms for Google to provide me with an answer. The correct package is “sharutils.” Anyways, for good measure, here is a quick demo of uuencode/uudecode:

$ echo "BASH Cures Cancer" > test.txt
$ zip test.zip test.txt
  adding: test.txt (stored 0%)
$ uuencode < test.zip -
begin 664 -
M4$L#!`H``````-%9=3@7HDD\$@```!(````(`!4`=&5S="YT>'155`D``^G>
MXT?IWN-'57@$`/0!]`%"05-(($-U<F5S($-A;F-E<@I02P$"%P,*``````#1
M674X%Z))/!(````2````"``-```````!````M($`````=&5S="YT>'155`4`
?`^G>XT=5>```4$L%!@`````!``$`0P```$T`````````
`
end
$ uuencode < test.zip - | uudecode > test2.zip
$ unzip test2.zip
Archive:  test2.zip
replace test.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
 extracting: test.txt
$ cat test.txt
BASH Cures Cancer

From the manual: “Uuencode reads file (or by default the standard input) and writes an encoded version to the standard output.  The encoding uses only printing ASCII characters and includes the mode of the file and the operand name for use by uudecode.