Bug in Curl is fixed

April 14th, 2008

I love curl. I use it quite often to perform HTTP HEAD requests:

$ curl -I http://bashcurescancer.com
HTTP/1.1 200 OK
Date: Mon, 14 Apr 2008 03:11:35 GMT
Server: Apache/2.2.6 (Unix)
X-Pingback: http://bashcurescancer.com/wordpress/xmlrpc.php
Last-Modified: Mon, 14 Apr 2008 02:38:11 GMT
Connection: close
Content-Type: text/html; charset=UTF-8

However, I sometimes forget if a HEAD request is -I or -i, as such I usually specify them both. Lowercase i is “include headers in output” and uppercase I tells curl to use HEAD instead of GET.  When you use -I, -i is implied.

Given all this, there should be no problems specifying both options. However, if you place -I before -i, curl doesn’t actually display the response. Here is the output from my bug report to curl-users:

$ curl -I -i http://bashcurescancer.com
$ curl -i -I http://bashcurescancer.com
HTTP/1.1 200 OK
Date: Mon, 14 Apr 2008 03:11:35 GMT
Server: Apache/2.2.6 (Unix)
X-Pingback: http://bashcurescancer.com/wordpress/xmlrpc.php
Last-Modified: Mon, 14 Apr 2008 02:38:11 GMT
Connection: close
Content-Type: text/html; charset=UTF-8

Curl uses a long integer for configuration flags via bit masking. The problem arises in that the -I option sets two bits bit and the -i option XOR’s one of those same bits:

src/main.c
case 'i':
config->conf ^= CONF_HEADER; /* include the HTTP header as well */
break;
…
case ‘I’:
/*
* This is a bit tricky. We either SET both bits, or we clear both
* bits. Let’s not make any other outcomes from this.
*/
if((CONF_HEADER|CONF_NOBODY) !=
(config->conf&(CONF_HEADER|CONF_NOBODY)) ) {
/* one of them weren’t set, set both */
config->conf |= (CONF_HEADER|CONF_NOBODY);
if(SetHTTPrequest(config, HTTPREQ_HEAD, &config->httpreq))
return PARAM_BAD_USE;
}
else {
/* both were set, clear both */
config->conf &= ~(CONF_HEADER|CONF_NOBODY);
if(SetHTTPrequest(config, HTTPREQ_GET, &config->httpreq))
return PARAM_BAD_USE;
}

Thanks to Daniel Stenberg, the fix “is now committed!

I have never been forced to create a patch. However, its certainly a good skill to have and would have likely helped me when I submitted a bug to Vsftpd’s maintainer. Tonight I needed to create a patch and google provided the answer.

$ diff -Naur olddir newdir > new-patch
- or -
$ diff -Naur oldfile newfile >new-patch

Here is a quick example I cooked up:

$ mkdir prepatch
$ nano -w prepatch/test
$ cat prepatch/test
BASH
Cancer
$ cp -R prepatch/ postpatch
$ nano -w postpatch/test
$ diff -Naur prepatch/ postpatch/ > patch
$ cat patch
diff -Naur prepatch/test postpatch/test
--- prepatch/test       2008-02-11 23:15:46.000000000 -0600
+++ postpatch/test      2008-02-11 23:16:15.000000000 -0600
@@ -1,2 +1,3 @@
 BASH
+Cures
 Cancer
$ cp -R prepatch/ testpatch
$ cd testpatch/
$ patch -p1 < ../patch
patching file test
$ cat test
BASH
Cures
Cancer

If your interested on why I had to create a patch, read on. Otherwise you may be interested in an older post 10 Steps to Beautiful Shell Scripts.

I consult for Idologic on the weekends. Idologic like many website hosting companies use suPHP to run client’s php scripts as the user who owns the file. Meaning if user A owns a virtual host php scripts executed as user A. When using mod_php, the script is executed as the user running the web server, i.e. nobody.

suPHP is a good solution. However, there is a cost as it creates a new process for every php page request. In order to help reduce this cost, Telena Internet Services wrote a patch for Apache 2, called the Peruser MPM, which creates a httpd process and performs a setuid() to the configured user. This then handles the current request and afterwards is available to process future requests. After a certain configurable period of inactivity, the process is killed. For busy sites, this significantly reduces the number of process the web server has to create, which can improve performance, if configured correctly.

Beyond performance, suPHP is separate from Apache and must be built and configured correctly, creating extra work.

I am a member of the mailing list and admin’s using this patch have complained of intermittent segmentation faults. As with any new patch there are problems. The easiest way to troubleshoot this is to turn on core dumping. The Apache directive CoreDumpDirectory does exactly this. (The kernel by default disables core dumps after setuid() operations, “echo 1 > /proc/sys/fs/suid_dumpable” enables them by default.)

However, after the peruser’s setuid sys call, the CoreDumpDirectory directive was not being respected. I decided to fix this:

The modified patch for a vanilla 2.2.3 and as such includes the peruser 0.3.0 patch.

The patch against a patched version of 2.2.3.

Often users of CGI scripts encounter the dreaded “premature end of script headers” error. A quick Google search on that phrase proves this to be true. The cause of this, is often the same as the “bad interpreter” error received on the command line.

A very common cause is where the file was created on a Windows host and then uploaded to a Unix host for execution. (Think of the millions of websites using shared hosting.) The problem here, as no doubt you maybe aware, is that the file contains the dreaded “carriage return” before every newline. This is so common, there is even a standard command dos2unix for converting these files.

However, I had yet to see a reason as to WHY that carriage return after the hash bang causes a problem. I just accepted it as fact. Today, I decided to figure out why and where it failed.

As it turns out, it fails in the kernel. To figure this out, I downloaded the latest kernel (2.6.23.12) to my RHEL 5 host and compiled it with the standard config file that comes with RHEL 5. The configuration file was for a much older kernel, but worked for my purposes. I booted it to make sure I had a working copy of 2.6.23.12. Then I went into the kernel source tree and started modifying stuff!

To be sure, I am no kernel hacker, nor even a C hacker. I had never successfully modified the kernel source, though I haven’t tried in many years. But after three compilations, I was able to figure out where in (2.6.23.12) exec system call was taking place and was able to insert some code to print a custom message. The file I modifed was “fs/exec.c”:

[root@test1 linux-2.6.23.12]# ls -l fs/exec.c
-rw-rw-r– 1 root root 42231 Jan 7 17:44 fs/exec.c

In that file, inside the function search_binary_handler() I inserted the following code:

1139 printk(”{”);
1140 int z = 0;
1141 for(; z < strlen(bprm->buf); z++) {
1142 if(bprm->buf[z] == ‘\r’) {
1143 printk(”(carriage return)”);
1144 } else {
1145 printk(”%c”, bprm->buf[z]);
1146 }
1147 }
1148 printk(”}\n”);

The line numbers may be slightly off as I had inserted and removed code above. After compliation and booting, I was able to give it a shot. I created a sample script with a carriage return after the hash bang interpreter portion.

[root@test1 ~]# echo -e ‘#!/bin/bash\r\n/usr/bin/id’ >id.sh
[root@test1 ~]# chmod +x id.sh

And BAM, it worked!

[root@test1 ~]# ./id.sh
{#!/bin/bash(carriage return)}
-bash: ./id.sh: /bin/bash^M: bad interpreter: No such file or directory

As you can see, the kernel was actually trying to execute “/bin/bash\r” which of course, not a valid file. Here’s a screen shot:

Why do CGI Scripts and Shell Scripts fail when they contain carriage returns

Update: In response to reader requests, I wrote an explanation of my investigation: On the case of carriage returns and kernel exec system calls.

 

Win a copy of Chris F.A. Johnson’s Shell Scripting Recipes by telling me why the script below does not work.

UPDATE: Quite a few people responded correctly (see comments).  I will sort it out tonight and decide who wins.

#/bin/bash
doRead()
{
  local retVal=1
  ps -e -o user | grep apache | \
  while read user
  do
  echo $user
  retVal=0
  done
  return $retVal
}
doRead
echo "doRead exited with retVal = $?"

When the script runs, assuming the host has a user named apache and is running something, you should see “apache” a few times and then “doRead exited with retVal = 0″. However, the actual output is below:

# ./readReturnValExample.sh
apache
apache
apache
apache
apache
apache
apache
apache
doRead exited with retVal = 1

I ran into this problem when writing the monitor cpu usage shell script.  I could have put echo statement’s all over the place in order to find out what the value of retVal is at various points in the script.  However, I used the shell’s builtin debug option instead. I just added “set -x” to the top of the script, like so:

#/bin/bash
set -x
doRead()

This prints out exactly what the shell is doing.

# ./readReturnValExample.sh
++ doRead
++ local retVal=1
++ ps -e -o user
++ grep apache
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ return 1
++ echo 'doRead exited with retVal = 1'
doRead exited with retVal = 1

As you can see, the value of $retVal is valued at 1, directly after the last and unsuccesful read statement. The $25.99 dollar question is, why? I remember reading something about this a long, long time ago but its quite hazy and a cursory google search did not resolve my question. The person that answers that question (except Chris as he won my last contest) gets the book. If multiple people answer, the best single answer gets the book.