The other day, I began wondering which comparator, test, [, or [[, was fastest? Here are the results:

$ time for i in {1..100000}; do [[ -d . ]];done
real    0m1.256s
user    0m1.018s
sys     0m0.238s
$ time for i in {1..100000}; do [ -d . ];done
real    0m3.407s
user    0m2.704s
sys     0m0.703s
$ time for i in {1..100000}; do test -d .;done

real    0m3.223s
user    0m2.607s
sys     0m0.616s

The double bracket is a “compound command” where as test and the single bracket are shell built-ins (and in actuality are the same command). Thus, the single bracket and double bracket execute different code.

The test and single bracket are the most portable as they exist as separate and external commands. However, if your using any remotely modern version of BASH, the double bracket is supported.

Here is the performance numbers on the external version of test and single bracket:

$ time for i in {1..100000}; do /usr/bin/test -d .;done

real    5m49.324s
user    0m51.771s
sys     4m48.013s
$ time for i in {1..100000}; do /usr/bin/[ -d . ];done

real    5m45.728s
user    0m52.536s
sys     4m46.259s

Wow! This shows the high cost of process creation!

I often find myself having to sort large files. Gigabyte files take an extremely long time to sort. Twenty to thirty minutes is common, even when utilizing an entire CPU. As such, I wrote a script to perform distributed sorting.

Setup requires editing the “hosts” array in the top of the script. Put the hosts or username@host pair, like so:

hosts[0]="localhost"
hosts[1]="localhost
hosts[2]="user@otherhost"
hosts[3]="yahost"
hosts[4]="finalhost"

Then, optionally setup SSH keys to automate authentication and start sorting!

$ ls -l unsorted_file
-rw-r--r-- 1 noland users 499328352 2008-01-18 18:19 unsorted_file
 $ time sort unsorted_file > sorted_file2

real    19m17.247s
user    18m8.720s
sys     0m10.169s
$ time ./distsort.sh unsorted_file > sorted_file

real    9m46.398s
user    7m19.651s
sys     0m21.397s

$ md5sum sorted_file sorted_file2
ceb8a3aa2947868cae6ee33c457ba8d0  sorted_file
ceb8a3aa2947868cae6ee33c457ba8d0  sorted_file2

Here is the proccess:

  1. Split the file into equal portions, one for each entry in the hosts array.
  2. For each host, either background local sort process or a remote sort process using SSH. The file is transferred via a pipeline to the remote hosts.
  3. Wait for all processes to exit.
  4. Merge the sorted files.
  5. Remove temp files.

Some things to keep in mind:

  • If you don’t have mktemp available, you will have to add a mktemp function.
  • The demonstration above was for a 500MB file and utilized three sort processes, one on the local host and two remote.
  • You only want one sort process per CPU, as sort will consume 100% of the CPU. If you have two CPU’s on your local machine, place two localhost entries and it will start two sort processes. The same goes for remote hosts.
  • The file must be large enough so that the cost of transferring each “chunk” over the wire twice is less than the cost of sorting the entire file locally.

If you change this script or have suggestions of any kind, please do comment!

A few days ago, esofthub over at My SysAd Blog wrote about using FTP in a shell script. I recently had an issue where an application malfunctioned and an automated process started two ftp sessions for the same file at the same time. Both processes were writing to the file and thus the file ended up corrupt and twice as large as it should have been.

As such, I planned to write up a demo showing that this problem should be considered when ftp’ing in scripts. However, when doing the demo, vsftpd blocked the second session until the first was done, indicating that file locking was being done. Sure enough vsftpd added file locking in 2.0.3:

At this point: v2.0.3 released! (need to get about three important fixes out)
- Add optional file locking support via lock_upload_files (default on).

My demo was wrecked…. Then, I noticed that the file was corrupt. In my test, I started two ftp processes (notice how you can use curl to upload files via the command line) in two different terminals:

$ date; curl -s -T bigfile --user noland:password ftp://localhost/vsftpd-file-locking; date
Mon Jan 14 19:06:51 CST 2008
Mon Jan 14 19:07:20 CST 2008
$ date; curl -s -T bigfile --user noland:password ftp://localhost/vsftpd-file-locking; date
Mon Jan 14 19:06:52 CST 2008
Mon Jan 14 19:07:50 CST 2008

After the upload, the file is corrupt:

$ ls -l vsftpd-file-locking bigfile
-rw-r--r-- 1 noland users  512000000 2008-01-14 18:17 bigfile
-rw-r--r-- 1 noland users 1019576320 2008-01-14 19:07 vsftpd-file-locking

Being curious and hoping to fix a bug, I investigated. I downloaded the latest version, 2.0.5 and looked over the source files. The file “postlogin.c” looked like a candidate. Sure enough, I eventually found the function:

handle_upload_common(struct vsf_session* p_sess, int is_append, int is_unique)

This is where the magic appeared to be happening. Simple printf statements wouldn’t work since vsftpd daemonizes itself. As such, I decided to use syslog. I included syslog.h after the last include statement in “postlogin.c”:

27  #include "vsftpver.h"
#include <syslog.h>

The two variables “is_append” and “offset” looked interesting and decided the flow of the file write operation. Thus I added the two syslog statements below, above the code with line numbers specified:

syslog(LOG_INFO, "is_append = %d", is_append);
syslog(LOG_INFO, "offset = %d", (int) offset);
967      /* For non-anonymous, allow open() to overwrite or append existing files */
968      if (!is_append && offset == 0)
969      {
970         new_file_fd = str_create_overwrite(p_filename);
971      }
972      else
973      {
974          new_file_fd = str_create_append(p_filename);
975      }

I then built vsftp by typing “make” and as root started it, “./vsftpd /etc/vsftpd/vsftpd.conf”. I retried my test and the following was logged to /var/log/messages:

Jan 11 22:24:21 test1 vsftpd: is_append = 0
Jan 11 22:24:21 test1 vsftpd: offset = 0
Jan 11 22:24:24 test1 vsftpd: is_append = 0
Jan 11 22:24:24 test1 vsftpd: offset = 0

Unfortunately this meant the second session was not starting at some offset in the file as I had hoped. I decided to follow the function that opened the file where the upload was saved, “str_create_overwrite.” This function’s definition is in “sysstr.c”:

95  int
96  str_create_overwrite(const struct mystr* p_str)
97  {
98    return vsf_sysutil_create_overwrite_file(str_getbuf(p_str));
99  }

I then found “vsf_sysutil_create_overwrite_file” in “sysutil.c”:

1058  vsf_sysutil_create_overwrite_file(const char* p_filename)
1059  {
1060    return open(p_filename, O_CREAT | O_TRUNC | O_WRONLY |
1061                            O_APPEND | O_NONBLOCK,
1062                tunable_file_open_mode);
1063  }

I thought it curious that the function had the word “overwrite” in it and yet used the O_APPEND flag. In addition, using both the O_TRUNC and O_APPEND flags seemed odd. I commented out the original code and removed the O_APPEND flag:

1058  vsf_sysutil_create_overwrite_file(const char* p_filename)
1059  {
1060  //  return open(p_filename, O_CREAT | O_TRUNC | O_WRONLY |
1061  //                          O_APPEND | O_NONBLOCK,
1062  //              tunable_file_open_mode);
1063    return open(p_filename, O_CREAT | O_TRUNC | O_WRONLY |
1064                            O_NONBLOCK,
1065                tunable_file_open_mode);
1066  }

After recompiling via “make” and restarting as root via “./vsftpd /etc/vsftpd/vsftpd.conf” I tested again:

$ date; curl -s -T bigfile --user noland:password ftp://localhost/vsftpd-file-locking; date
Fri Jan 11 22:57:41 EST 2008
Fri Jan 11 22:59:26 EST 2008
$ date; curl -s -T bigfile --user noland:password ftp://localhost/vsftpd-file-locking; date
Fri Jan 11 22:57:42 EST 2008
Fri Jan 11 23:00:58 EST 2008

And…after the test, the file was not corrupt:

$ ls -l vsftpd-file-locking bigfile
-rw-r--r-- 1 noland noland 512000000 Jan 11 22:22 bigfile
-rw-r--r-- 1 noland noland 512000000 Jan 11 23:00 vsftpd-file-locking
$ md5sum vsftpd-file-locking bigfile
651db2470e6473a7f35d1879a5632c58  vsftpd-file-locking
651db2470e6473a7f35d1879a5632c58  bigfile

Yay!! I emailed the problem and possible fix to the maintainer. This is why I love free software.

Notes:

  1. I tested the newest release version of pure-ftpd and proftpd and file locking appeared to work correctly.
  2. I created the test file with “dd if=/dev/urandom of=bigfile count=1000000″

Often users of CGI scripts encounter the dreaded “premature end of script headers” error. A quick Google search on that phrase proves this to be true. The cause of this, is often the same as the “bad interpreter” error received on the command line.

A very common cause is where the file was created on a Windows host and then uploaded to a Unix host for execution. (Think of the millions of websites using shared hosting.) The problem here, as no doubt you maybe aware, is that the file contains the dreaded “carriage return” before every newline. This is so common, there is even a standard command dos2unix for converting these files.

However, I had yet to see a reason as to WHY that carriage return after the hash bang causes a problem. I just accepted it as fact. Today, I decided to figure out why and where it failed.

As it turns out, it fails in the kernel. To figure this out, I downloaded the latest kernel (2.6.23.12) to my RHEL 5 host and compiled it with the standard config file that comes with RHEL 5. The configuration file was for a much older kernel, but worked for my purposes. I booted it to make sure I had a working copy of 2.6.23.12. Then I went into the kernel source tree and started modifying stuff!

To be sure, I am no kernel hacker, nor even a C hacker. I had never successfully modified the kernel source, though I haven’t tried in many years. But after three compilations, I was able to figure out where in (2.6.23.12) exec system call was taking place and was able to insert some code to print a custom message. The file I modifed was “fs/exec.c”:

[root@test1 linux-2.6.23.12]# ls -l fs/exec.c
-rw-rw-r– 1 root root 42231 Jan 7 17:44 fs/exec.c

In that file, inside the function search_binary_handler() I inserted the following code:

1139 printk(”{”);
1140 int z = 0;
1141 for(; z < strlen(bprm->buf); z++) {
1142 if(bprm->buf[z] == ‘\r’) {
1143 printk(”(carriage return)”);
1144 } else {
1145 printk(”%c”, bprm->buf[z]);
1146 }
1147 }
1148 printk(”}\n”);

The line numbers may be slightly off as I had inserted and removed code above. After compliation and booting, I was able to give it a shot. I created a sample script with a carriage return after the hash bang interpreter portion.

[root@test1 ~]# echo -e ‘#!/bin/bash\r\n/usr/bin/id’ >id.sh
[root@test1 ~]# chmod +x id.sh

And BAM, it worked!

[root@test1 ~]# ./id.sh
{#!/bin/bash(carriage return)}
-bash: ./id.sh: /bin/bash^M: bad interpreter: No such file or directory

As you can see, the kernel was actually trying to execute “/bin/bash\r” which of course, not a valid file. Here’s a screen shot:

Why do CGI Scripts and Shell Scripts fail when they contain carriage returns

Update: In response to reader requests, I wrote an explanation of my investigation: On the case of carriage returns and kernel exec system calls.