Which comparator, test, bracket, or double bracket, is fastest?
January 24th, 2008
The other day, I began wondering which comparator, test, [, or [[, was fastest? Here are the results:
$ time for i in {1..100000}; do [[ -d . ]];done
real 0m1.256s user 0m1.018s sys 0m0.238s
$ time for i in {1..100000}; do [ -d . ];done
real 0m3.407s user 0m2.704s sys 0m0.703s
$ time for i in {1..100000}; do test -d .;done
real 0m3.223s
user 0m2.607s
sys 0m0.616s
The double bracket is a “compound command” where as test and the single bracket are shell built-ins (and in actuality are the same command). Thus, the single bracket and double bracket execute different code.
The test and single bracket are the most portable as they exist as separate and external commands. However, if your using any remotely modern version of BASH, the double bracket is supported.
Here is the performance numbers on the external version of test and single bracket:
$ time for i in {1..100000}; do /usr/bin/test -d .;done
real 5m49.324s
user 0m51.771s
sys 4m48.013s
$ time for i in {1..100000}; do /usr/bin/[ -d . ];done
real 5m45.728s
user 0m52.536s
sys 4m46.259s
Wow! This shows the high cost of process creation!
dssh - executing an arbitrary command in parallel on an arbitrary number of hosts
January 21st, 2008
I asked “What do you want” and you said scripting. Which is good, because I have felt like scripting lately!
I help a website hosting company, Idologic, on the weekends. (Side note: I highly recommend Idologic. I have worked with and been a customer of many other hosting companies. I really doubt you will find better customer service elsewhere.) Like many businesses these days, Idologic has quite a few Linux servers. When presented with many servers, I typically want to parallelize my work.
As such, I have written a script called dssh (previous version), which allows you to execute commands on n hosts, in parallel. This can be used to find information on the hosts, such as load average, number of processes by user, number of processes by process name, etc.
There are other options such as pssh and p-run, however I wanted to create a shell solution which could be easily and simply “installed”. Dssh reads standard input. It expects one host per line. Host specific ssh options are supported. Here is my sample hosts file:
$ cat hosts mojito -l noland kodiak mojito kodiak -C mojito -i /home/noland/.ssh/id_rsa kodiak
There is nothing restricting you from generating this output from some type of meta data (I.E. database). Here are some examples of output:
$ ./dssh.sh "uptime" < hosts First time huh? Think your cmd over and then try again. $ ./dssh.sh "uptime" < hosts mojito:O:0:19:16:45 up 3 days, 14 min, 5 users, load average: 0.22, 0.22, 0.20 kodiak:O:0:13:24:00 up 20:00, 1 user, load average: 0.42, 0.16, 0.05 mojito:O:0:19:16:45 up 3 days, 14 min, 5 users, load average: 0.22, 0.22, 0.20 kodiak:O:0:13:24:00 up 20:00, 1 user, load average: 0.42, 0.16, 0.05 mojito:O:0:19:16:45 up 3 days, 14 min, 5 users, load average: 0.22, 0.22, 0.20 kodiak:O:0:13:24:00 up 20:00, 1 user, load average: 0.42, 0.16, 0.0
$ ./dssh.sh "pgrep -u noland | wc -w" < hosts mojito:O:0:60 kodiak:O:0:5 mojito:O:0:60 kodiak:O:0:5 mojito:O:0:60 kodiak:O:0:5
$ ./dssh.sh "ls not_a_file" < hosts mojito:E:2:ls: not_a_file: No such file or directory kodiak:E:2:ls: not_a_file: No such file or directory mojito:E:2:ls: not_a_file: No such file or directory kodiak:E:2:ls: not_a_file: No such file or directory mojito:E:2:ls: not_a_file: No such file or directory kodiak:E:2:ls: not_a_file: No such file or directory
Notes:
- With great power comes even greater responsibility. Running rm -rf / as root with this script would do exactly that.
- I don’t reccomend doing anything with this script that “changes state”.
- I make no warranties or promises.
- You need ssh keys to use this. I recommend using ssh-agent.
- By default dssh will execute 10 children in parallel. If you have a large host, increase this.
- When looping through the hosts, if the maximum number of children are still processing, the script will sleep 500ms. If your version of sleep does not support fractional seconds, you will need to change this.
Here is an outline of the script:
- Read from standard input a list of hosts
- Configure trap to remove temporary files on exit
- For each host
- Sleep while we have more children than the maximum number of children
- Generate three temporary files, one for each of
- Standard Output
- Standard Error
- Exit value
- Create a child process saving stdin, stderr, and the exit value in their respective files.
- Wait for all children to exit
- For each host
- If the standard output or error files are of size greater than zero, print the content, prefacing each line with the hostname, standard error/output indicator, and exit status.
- Else print something to indicate we executed a process and have an exit value.
Once again, here is the script I am calling dssh.
Sorting large files faster with a shell script
January 18th, 2008
I often find myself having to sort large files. Gigabyte files take an extremely long time to sort. Twenty to thirty minutes is common, even when utilizing an entire CPU. As such, I wrote a script to perform distributed sorting.
Setup requires editing the “hosts” array in the top of the script. Put the hosts or username@host pair, like so:
hosts[0]="localhost" hosts[1]="localhost hosts[2]="user@otherhost" hosts[3]="yahost" hosts[4]="finalhost"
Then, optionally setup SSH keys to automate authentication and start sorting!
$ ls -l unsorted_file -rw-r--r-- 1 noland users 499328352 2008-01-18 18:19 unsorted_file
$ time sort unsorted_file > sorted_file2 real 19m17.247s user 18m8.720s sys 0m10.169s
$ time ./distsort.sh unsorted_file > sorted_file real 9m46.398s user 7m19.651s sys 0m21.397s $ md5sum sorted_file sorted_file2 ceb8a3aa2947868cae6ee33c457ba8d0 sorted_file ceb8a3aa2947868cae6ee33c457ba8d0 sorted_file2
Here is the proccess:
- Split the file into equal portions, one for each entry in the hosts array.
- For each host, either background local sort process or a remote sort process using SSH. The file is transferred via a pipeline to the remote hosts.
- Wait for all processes to exit.
- Merge the sorted files.
- Remove temp files.
Some things to keep in mind:
- If you don’t have mktemp available, you will have to add a mktemp function.
- The demonstration above was for a 500MB file and utilized three sort processes, one on the local host and two remote.
- You only want one sort process per CPU, as sort will consume 100% of the CPU. If you have two CPU’s on your local machine, place two localhost entries and it will start two sort processes. The same goes for remote hosts.
- The file must be large enough so that the cost of transferring each “chunk” over the wire twice is less than the cost of sorting the entire file locally.
If you change this script or have suggestions of any kind, please do comment!
Five Quick Command Line Tips
January 18th, 2008
These are my tips for the week. Have you any tips to share?
Create many files of random size
$ for i in $(seq 1 1000); do let "size = ($RANDOM/100)"; dd if=/dev/urandom \ of=file-$i count=$size 2>/dev/null;done
Delete files with find, faster
Many people use find to delete files. However, they often to so like this:
find directory conditionals -exec rm {} \;
Which certainly works. However, find has a built-in “-delete” action which is significantly faster.
$ find . -name 'file-*'| wc -l
1000
$ time find . -name 'file-*' -exec rm {} \;
real 0m2.540s
user 0m1.076s
sys 0m1.196s
$ find . -name 'file-*'| wc -l
0
$ find . -name 'file-*'| wc -l
1000
$ time find . -name 'file-*' -delete
real 0m0.118s
user 0m0.008s
sys 0m0.100s
$ find . -name 'file-*'| wc -l
0
Finding more fun
Often those new to the shell do something like this to find the size of files:
$ ls -la | awk '/^\-/ {print $5}'
98816
99328
99840
99840
99840
Or the last modification date:
$ ls -la | awk '/^\-/ {print $6, $7}'
2008-01-17 23:03
2008-01-17 23:03
2008-01-17 23:03
2008-01-17 23:03
2008-01-17 23:03
Find offers a much clean interface to this information:
$ find . -maxdepth 1 -type f -printf "%s\n" 98816 99328 99840 99840 99840
$ find . -maxdepth 1 -type f -printf "%TY-%Tm-%Td %TH:%TM\n" 2008-01-17 23:03 2008-01-17 23:03 2008-01-17 23:03 2008-01-17 23:03 2008-01-17 23:03
Standard input can only be read once
$ echo a > a $ echo b > b $ echo c > c
That is, this:
$ cat a b c | cat - a b c
Is the same as:
$ cat a b c | cat - - a b c
Meaning that standard input is “read desctructive.” As Pádraig Brady said recently on the core utils mailing list:
$ echo mouse | cat - - mouse
Odd characters showing up in your terminal when using Putty
Go to your settings:
Window
Translation
Change “Recieved data assumed to be in which character set” to UTF-8
Update: I changed the first examples when finding last modification date to use awk instead of grep and awk. Hard to change my ways!
A bug in vsftpd and using ftp on the command line
January 14th, 2008
A few days ago, esofthub over at My SysAd Blog wrote about using FTP in a shell script. I recently had an issue where an application malfunctioned and an automated process started two ftp sessions for the same file at the same time. Both processes were writing to the file and thus the file ended up corrupt and twice as large as it should have been.
As such, I planned to write up a demo showing that this problem should be considered when ftp’ing in scripts. However, when doing the demo, vsftpd blocked the second session until the first was done, indicating that file locking was being done. Sure enough vsftpd added file locking in 2.0.3:
At this point: v2.0.3 released! (need to get about three important fixes out)
- Add optional file locking support via lock_upload_files (default on).
My demo was wrecked…. Then, I noticed that the file was corrupt. In my test, I started two ftp processes (notice how you can use curl to upload files via the command line) in two different terminals:
$ date; curl -s -T bigfile --user noland:password ftp://localhost/vsftpd-file-locking; date Mon Jan 14 19:06:51 CST 2008 Mon Jan 14 19:07:20 CST 2008
$ date; curl -s -T bigfile --user noland:password ftp://localhost/vsftpd-file-locking; date Mon Jan 14 19:06:52 CST 2008 Mon Jan 14 19:07:50 CST 2008
After the upload, the file is corrupt:
$ ls -l vsftpd-file-locking bigfile -rw-r--r-- 1 noland users 512000000 2008-01-14 18:17 bigfile -rw-r--r-- 1 noland users 1019576320 2008-01-14 19:07 vsftpd-file-locking
Being curious and hoping to fix a bug, I investigated. I downloaded the latest version, 2.0.5 and looked over the source files. The file “postlogin.c” looked like a candidate. Sure enough, I eventually found the function:
handle_upload_common(struct vsf_session* p_sess, int is_append, int is_unique)
This is where the magic appeared to be happening. Simple printf statements wouldn’t work since vsftpd daemonizes itself. As such, I decided to use syslog. I included syslog.h after the last include statement in “postlogin.c”:
27 #include "vsftpver.h" #include <syslog.h>
The two variables “is_append” and “offset” looked interesting and decided the flow of the file write operation. Thus I added the two syslog statements below, above the code with line numbers specified:
syslog(LOG_INFO, "is_append = %d", is_append);
syslog(LOG_INFO, "offset = %d", (int) offset);
967 /* For non-anonymous, allow open() to overwrite or append existing files */
968 if (!is_append && offset == 0)
969 {
970 new_file_fd = str_create_overwrite(p_filename);
971 }
972 else
973 {
974 new_file_fd = str_create_append(p_filename);
975 }
I then built vsftp by typing “make” and as root started it, “./vsftpd /etc/vsftpd/vsftpd.conf”. I retried my test and the following was logged to /var/log/messages:
Jan 11 22:24:21 test1 vsftpd: is_append = 0 Jan 11 22:24:21 test1 vsftpd: offset = 0 Jan 11 22:24:24 test1 vsftpd: is_append = 0 Jan 11 22:24:24 test1 vsftpd: offset = 0
Unfortunately this meant the second session was not starting at some offset in the file as I had hoped. I decided to follow the function that opened the file where the upload was saved, “str_create_overwrite.” This function’s definition is in “sysstr.c”:
95 int
96 str_create_overwrite(const struct mystr* p_str)
97 {
98 return vsf_sysutil_create_overwrite_file(str_getbuf(p_str));
99 }
I then found “vsf_sysutil_create_overwrite_file” in “sysutil.c”:
1058 vsf_sysutil_create_overwrite_file(const char* p_filename)
1059 {
1060 return open(p_filename, O_CREAT | O_TRUNC | O_WRONLY |
1061 O_APPEND | O_NONBLOCK,
1062 tunable_file_open_mode);
1063 }
I thought it curious that the function had the word “overwrite” in it and yet used the O_APPEND flag. In addition, using both the O_TRUNC and O_APPEND flags seemed odd. I commented out the original code and removed the O_APPEND flag:
1058 vsf_sysutil_create_overwrite_file(const char* p_filename)
1059 {
1060 // return open(p_filename, O_CREAT | O_TRUNC | O_WRONLY |
1061 // O_APPEND | O_NONBLOCK,
1062 // tunable_file_open_mode);
1063 return open(p_filename, O_CREAT | O_TRUNC | O_WRONLY |
1064 O_NONBLOCK,
1065 tunable_file_open_mode);
1066 }
After recompiling via “make” and restarting as root via “./vsftpd /etc/vsftpd/vsftpd.conf” I tested again:
$ date; curl -s -T bigfile --user noland:password ftp://localhost/vsftpd-file-locking; date Fri Jan 11 22:57:41 EST 2008 Fri Jan 11 22:59:26 EST 2008
$ date; curl -s -T bigfile --user noland:password ftp://localhost/vsftpd-file-locking; date Fri Jan 11 22:57:42 EST 2008 Fri Jan 11 23:00:58 EST 2008
And…after the test, the file was not corrupt:
$ ls -l vsftpd-file-locking bigfile -rw-r--r-- 1 noland noland 512000000 Jan 11 22:22 bigfile -rw-r--r-- 1 noland noland 512000000 Jan 11 23:00 vsftpd-file-locking
$ md5sum vsftpd-file-locking bigfile 651db2470e6473a7f35d1879a5632c58 vsftpd-file-locking 651db2470e6473a7f35d1879a5632c58 bigfile
Yay!! I emailed the problem and possible fix to the maintainer. This is why I love free software.
Notes:
- I tested the newest release version of pure-ftpd and proftpd and file locking appeared to work correctly.
- I created the test file with “dd if=/dev/urandom of=bigfile count=1000000″
Do it better with awk 2
January 12th, 2008
Note: The file format in the file below is the same as in my earlier article Do it better with awk 1.
Today I was able to meet Bryan of Guru Labs. During our conversation he posed the following question. “Find the 3rd field in a file consisting of space separated fields, the first being an ip address, in the range 192.168.1-2.1-255. There maybe lines in the file containing invalid ip addresses.”
I used grep to find the lines and then used awk to find the field. For example:
$ egrep '^192\.168\.[1-2]\.([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|2[0-5]{2})' access_log | \
awk '{print $(NF-1)}'
200
304
304
...
He pointed out that, while this works, there is no reason to invoke grep. He is certainly correct. Indeed, awk is all powerful! The default usage of awk is:
awk 'pattern { command }'
In its most common and simple usage, to print a field deliminated by spaces:
awk '{print $3}'
You are specifying no pattern, which matches every line. When solving the problem posed by Bryan, simply specify the pattern and eliminate grep from the pipe line. Here is the equivalent awk command:
$ awk '/^192\.168\.[1-2]\.([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|2[0-5]{2})/ {print $(NF-1)}' \
access_log
200
304
304
...
Awk has some extremely powerful selecting operators. Here I am using the ~ operator to match the third field from the right (resource), to ^/man, and printing the matched field:
$ awk '$(NF-3) ~ /^\/man/ {print $(NF-3)}' access_log
/man/cmd/info
/man/cmd/Mail
/man/s/Z
/man/cmd/mv
...
This invocation uses the !~ operator, to match lines where the resource does not match the pattern ^/man:
$ awk '$(NF-3) !~ /^\/man/ {print $(NF-3)}' access_log
/feed/
/feed
/robots.txt
/10-linux-commands-youve-never-used.html
...
Here I am selecting lines where the response code $(NF-1) is greater than or equal to 200, but less than 400 and printing the resource and response code. I use awk’s boolean “and” operator && to perform this operation:
$ awk '$(NF-1) >= 200 && $(NF-1) <= 399 {print $(NF-3), $(NF-1)}' access_log
/man/cmd/info 200
/feed/ 304
/feed 304
...
The following example uses the boolean “or” operator || to print lines where there resource matches ^/feed or ^/sitemap:
$ awk '$(NF-3) ~ /^\/feed/ || $(NF-3) ~ /^\/sitemap/ {print $0}' access_log
192.168.1.2 - - [01/Jan/2008:00:00:31 -0600] "GET /feed/ HTTP/1.1" 304 -
192.168.1.3 - - [01/Jan/2008:00:01:09 -0600] "GET /feed HTTP/1.1" 304 -
...
Do it better with awk 1
January 10th, 2008
If you are system administrator or developer, you need to process log files to have a better grasp of situation. Many people use Perl or Python to help with this task. However, many times using one of the P languages is overkill. Furthermore, every single day I am on a machine that I cannot make changes to and thus cannot use my helper script. However, awk has the tools available to solve most on-the-fly log processing problems, directly from the command line. In addition, awk can provide a more concise and faster solution the the pipeline of cut, grep, sort, and other commands you are currently using.
In this article, this is the format of the file I am working with:
$ tail -n 1 access_log-2008-01 1.1.1.1 - - [10/Jan/2008:17:26:51 -0600] "GET / HTTP/1.1" 200 38856
Basically what we have here is: ip address, date, request, response code, response size. (Ignoring the dashes after the ip address.)
How would you find the largest response sent by your HTTP server? My typical solution has always been:
$ awk '{print $NF}' access_log-2008-01 | egrep -v '\-' | sort -n | tail -n 1
10678272
However, there is clearly a better solution.
By default, awk splits input lines by spaces, and assigns the entire line to $0, each field to $n, and the number of fields to NF. See this example:
$ echo a b c d e f | awk '{print $0}'
a b c d e f
$ echo a b c d e f | awk '{print $1}'
a
$ echo a b c d e f | awk '{print $2}'
b
$ echo a b c d e f | awk '{print NF}'
6
Note that you can print the last field by saying print the (NF)’s variable:
$ echo a b c d e f | awk '{print $NF}'
f
Or print the second variable from the end:
$ echo a b c d e f | awk '{print $(NF-1)}'
e
Look at my example again:
$ awk '{print $NF}' access_log-2008-01 | egrep -v '\-' | sort -n | tail -n 1
10678272
That solution starts three processes and filters the data three times. That is exceedingly inefficient! How about this:
$ awk '{if ($NF > max) { max = $NF;}} END {print max}' access_log-2008-01
10678272
This starts one process and filters the data only one time. That command in English says: For each line, if the last field is greater than the max, set it to the variable “max”. Once we have processed all the lines, print the variable max.
Which command do you suppose is faster?
$ time awk '{print $NF}' access_log-2008-01 | egrep -v '\-' | sort -n | tail -n 1
10678272
real 0m1.107s user 0m1.070s sys 0m0.037s
$ time awk '{if ($NF > max) { max = $NF;}} END {print max}' access_log-2008-01
10678272
real 0m0.207s user 0m0.194s sys 0m0.012s
Experts state that “1.0 second is about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay.” That log file is only 12MB in size and there is a different in speed which you can notice at the terminal. Imagine if the log is 300MB?
Awk also has extremely accessible associative arrays. Here I use an array to count HTTP response codes:
$ awk '{counts[$(NF-1)]+=1}; END {for(code in counts) print code, counts[code]}' \
access_log-2008-01
206 177
301 1212
302 302
304 5051
403 5
200 82539
404 906
405 1
500 183
The previous command in English says: for each line, using the second to last field as our index, increment our array. Once we have proccessed all lines, loop through the array assigning “code” to the array index.
Lets count the number of requests for each URL:
$ awk '{counts[$(NF-3)]+=1}; END {for(url in counts) print counts[url], url}' \
access_log-2008-01 | sort -n
...output removed...
796 /media/centos5.0_install/common/AA-bios.jpg
846 /robots.txt
1063 /media/misc/why-bad-interpreter-premature-end-of-script-headers.png
1425 /media/10-linux-commands-youve-never-used/mkfifo-write-to-pipe.png
1443 /media/10-linux-commands-youve-never-used/read-from-pipe.png
1629 /
2066 /feed/
3073 /10-linux-commands-youve-never-used.html
3909 /wp2.3/wp-content/themes/minn-01/style.css
6989 /favicon.ico
Now lets sum the responses sizes each URL and display it in MB:
$ awk '{sizes[$(NF-3)]+=$NF}; END {for(url in sizes) print (sizes[url]/1024/1024) "MB", url}' \
access_log-2008-01 | sort -n
...output removed...
68.6784MB /media/centos5.0_install/gui_common/AQ-install-in-progress-3.png
72.0453MB /media/centos5.0_install/gui_common/AP-install-in-progress-2.png
74.0067MB /media/centos5.0_install/gui_common/AT-setup-agent-welcome.png
74.6089MB /media/centos5.0_install/gui_common/AV-setup-agent-firewall-r-u-sure.png
78.2652MB /media/centos5.0_install/gui_common/BA-setup-agent-sound-card.png
80.3148MB /media/centos5.0_install/gui_common/AG-bootloader-configuration.png
85.8359MB /media/centos5.0_install/gui_common/AI-set-timezone.png
101.836MB /media/centos_4.4_boot.iso
137.622MB /
263.253MB /media/centos_5.0_boot.iso
Lets do the same for IP addresses:
$ awk '{counts[$1]+=1}; END {for(ip in counts) print counts[ip], ip}' \
access_log-2008-01 | sort -n
...output removed...
378 67.202.20.7
402 65.214.45.100
476 195.225.177.39
493 87.207.147.201
702 66.150.96.121
704 213.239.195.172
968 82.150.18.3
1335 65.28.61.246
2330 66.249.73.75
2883 71.63.249.40
$ awk '{sizes[$1]+=$NF}; END {for(ip in sizes) print (sizes[ip]/1024/1024) "MB", ip}' \
access_log-2008-01 | sort -n
...output removed...
20.9338MB 61.64.209.144
21.8517MB 116.71.182.210
23.4265MB 85.102.126.48
31.5194MB 213.239.195.172
32.732MB 67.176.123.158
37.9046MB 66.249.73.75
56.1901MB 71.63.249.40
57.9892MB 67.202.20.7
78.6117MB 65.28.61.246
Sum the size of all responses by ip address if the response code is 200:
$ awk '$(NF-1) == 200 {sizes[$1]+=$NF}; END {for(ip in sizes) print (sizes[ip]/1024/1024) "MB", ip}' \
access_log-2008-01 | sort -n
...output removed...
16.5405MB 220.181.38.245
16.7031MB 207.67.117.178
16.7661MB 128.227.0.66
16.9171MB 67.176.123.158
18.2246MB 71.72.54.173
31.5194MB 213.239.195.172
37.3774MB 66.249.73.75
53.6944MB 71.63.249.40
57.9885MB 67.202.20.7
76.9965MB 65.28.61.246
The command in English: for each line, if the response code is 200 ($(NF-1)), then increment our array at index ip address ($1), by response size ($NF).
Any questions, comments, or suggestions? I will be writing a second article on some other features of awk in the near future.
On the case of carriage returns and kernel exec system calls
January 8th, 2008
Yesterday, I wrote a post titled “Why do CGI scripts and shell scripts fail when they contain carriage returns?” I got a comment and a few emails saying in the words of Stan “I’d be interested in the process by which you narrowed it down to the exec.c file. You jumped straight to the catch, but I wanted to see the chase.”
This is the chase. If you were not interested in yesterday’s post, this will probably interest you even less. However, I do have a two posts on command line tips in the works. Please stay tuned.
I didn’t believe (mistakenly) that BASH and Apache simply sent the name of shell script name into a system call. Surely there had to be more magic involved! Thus, I downloaded BASH 3.2 and unzipped the source. I grep’ed “bad interpreter” which is printed when you try and execute a script with a carriage return after the interpreter. That found a few matches, but only one one match in a c source file.
noland@mojito:~/bash-3.2$ grep -R 'bad interpreter' .
Binary file ./po/en@quot.gmo matches
./po/ru.po:msgid "%s: %s: bad interpreter"
./po/en@boldquot.po:msgid "%s: %s: bad interpreter"
./po/en@boldquot.po:msgstr "%s: %s: bad interpreter"
./po/en@quot.po:msgid "%s: %s: bad interpreter"
./po/en@quot.po:msgstr "%s: %s: bad interpreter"
./po/bash.pot:msgid "%s: %s: bad interpreter"
Binary file ./po/en@boldquot.gmo matches
./execute_cmd.c: sys_error (_("%s: %s: bad interpreter"), command, interp ? interp : "");
I opened “execute_cmd.c” with less and searched for “bad interpreter”. Seeing the following code:
3339 execve (command, args, env);
I placed the following code directly above that statement. NOTE: If you copy this code from my blog, you have to replace the quotes as WP does something stupid with them:
printf("{");
int z = 0;
for(; z < strlen(command); z++) {
if(command[z] == '\r') {
printf("(carriage return)");
} else {
printf("%c", command[z]);
}
}
printf("}\n");
And then compiled bash:
noland@mojito:~/bash-3.2$ ./configure && make checking build system type... i686-pc-linux-gnu ..... output removed ..... ls -l bash -rwxr-xr-x 1 noland noland 2070786 2008-01-08 18:40 bash size bash text data bss dec hex filename 681571 19288 19848 720707 aff43 bash
I created the test file and made it executable:
noland@mojito:~/bash-3.2$ echo -e '#!/bin/bash\r\n/usr/bin/id' >id.sh noland@mojito:~/bash-3.2$ chmod +x id.sh
Then I switched to my new shell and ran the executable:
noland@mojito:~/bash-3.2$ ./bash
noland@mojito:~/bash-3.2$ ./id.sh
{./id.sh}
bash: ./id.sh: /bin/bash^M: bad interpreter: No such file or directory
Rats, the shell is just sending in the file name to the execve() call. I then noticed the following comment after the execve() call:
/* The file has the execute bits set, but the kernel refuses to
run it for some reason.
Score! Time to do some kernel modifications…. When I first started using Linux, I used to compile the kernel for lack of anything better to do. (I lived 50 miles from civilization and my win modem didn’t work with Linux.) As such, I knew I could get it up and running, but I doubted my ability to intercept the actual execve() call.
Regardless, I downloaded the kernel and ran the following grep:
noland@mojito:~/linux-2.6.23.9$ grep -R 'execve' . ./fs/compat.c: * compat_do_execve() is mostly a copy of do_execve(), with the exception ./fs/compat.c:int compat_do_execve(char * filename, ./fs/compat.c: /* execve success */ ./fs/exec.c: * sys_execve() executes a new program. ./fs/exec.c:int do_execve(char * filename, ....output removed
Jackpot! I then setup a VMWare instance and compiled the kernel with the default config from distribution. On boot, a few things didn’t start, but I didn’t care. I booted it without any modifications first so I would know when and if my modifications were the cause of a kernel panic. I’ll post how I did this later this week.
First off, I wanted to know what was coming into the do_execve() as the filename parameter. I didn’t think I would have printf() available to me. (Maybe it is, I don’t know.) So I grep’ed for print to see what the kernel used:
noland@mojito:~/linux-2.6.23.9$ grep -R ' print' . | grep 'c.:' | head -n 4
./fs/reiserfs/prints.c: printk ("reiserfs_put_super: session statistics: balances %d, fix_nodes %d, \
./net/decnet/dn_nsp_out.c: /* printk(KERN_DEBUG "ack: %s %04x %04x\n", ack ? "ACK" : "SKIP", (int)cb2->segnum, (int)acknum); */
./arch/um/kernel/sysrq.c: printk("Call Trace: \n");
./arch/ppc/platforms/residual.c: if ( did.BusId & PNPISADEVICE ) printk("PNPISA Device:");
It looked like printk was available and looked much like printf. Giddy up. I placed a variation of the the code I used above to print filename. When I rebooted, a TON of stuff was printed to the terminal, but when I executed my script, the output was exactly the same as it was before. Then this portion of the do_exec() function caught my eye:
retval = search_binary_handler(bprm,regs);
if (retval >= 0) {
/* execve success */
free_arg_pages(bprm);
security_bprm_free(bprm);
acct_update_integrals(current);
kfree(bprm);
return retval;
}
The function signature of search_binary_handler() is:
int search_binary_handler(struct linux_binprm *bprm,struct pt_regs *regs);
I found the declaration of linux_binprm in “./include/linux/binfmts.h”
noland@mojito:~/linux-2.6.23.9$ grep -R linux_binprm . | grep '\.h'
./arch/ia64/ia32/ia32priv.h:struct linux_binprm;
./arch/ia64/ia32/ia32priv.h:extern int ia32_setup_arg_pages (struct linux_binprm *bprm, int exec_stack);
./security/selinux/include/objsec.h: struct linux_binprm *bprm; /* back pointer to bprm object */
./include/linux/binfmts.h:/* sizeof(linux_binprm->buf) */
./include/linux/binfmts.h:struct linux_binprm{
....output removed
The following variables caught my eye in that structure declaration:
struct linux_binprm{
char buf[BINPRM_BUF_SIZE];
...stuff removed....
char * filename; /* Name of binary as seen by procps */
char * interp; /* Name of the binary really executed. Most
of the time same as filename, but could be
different for binfmt_{misc,script} */
The comment for interp, seemed to be hinting at the answer! I tried the same code for interp and the results were interesting, but not what I wanted. I then tried printing buf immediately before the last return statement and boom, it worked! Pick up the “Rest of the Story“.
Often users of CGI scripts encounter the dreaded “premature end of script headers” error. A quick Google search on that phrase proves this to be true. The cause of this, is often the same as the “bad interpreter” error received on the command line.
A very common cause is where the file was created on a Windows host and then uploaded to a Unix host for execution. (Think of the millions of websites using shared hosting.) The problem here, as no doubt you maybe aware, is that the file contains the dreaded “carriage return” before every newline. This is so common, there is even a standard command dos2unix for converting these files.
However, I had yet to see a reason as to WHY that carriage return after the hash bang causes a problem. I just accepted it as fact. Today, I decided to figure out why and where it failed.
As it turns out, it fails in the kernel. To figure this out, I downloaded the latest kernel (2.6.23.12) to my RHEL 5 host and compiled it with the standard config file that comes with RHEL 5. The configuration file was for a much older kernel, but worked for my purposes. I booted it to make sure I had a working copy of 2.6.23.12. Then I went into the kernel source tree and started modifying stuff!
To be sure, I am no kernel hacker, nor even a C hacker. I had never successfully modified the kernel source, though I haven’t tried in many years. But after three compilations, I was able to figure out where in (2.6.23.12) exec system call was taking place and was able to insert some code to print a custom message. The file I modifed was “fs/exec.c”:
[root@test1 linux-2.6.23.12]# ls -l fs/exec.c
-rw-rw-r– 1 root root 42231 Jan 7 17:44 fs/exec.c
In that file, inside the function search_binary_handler() I inserted the following code:
1139 printk(”{”);
1140 int z = 0;
1141 for(; z < strlen(bprm->buf); z++) {
1142 if(bprm->buf[z] == ‘\r’) {
1143 printk(”(carriage return)”);
1144 } else {
1145 printk(”%c”, bprm->buf[z]);
1146 }
1147 }
1148 printk(”}\n”);
The line numbers may be slightly off as I had inserted and removed code above. After compliation and booting, I was able to give it a shot. I created a sample script with a carriage return after the hash bang interpreter portion.
[root@test1 ~]# echo -e ‘#!/bin/bash\r\n/usr/bin/id’ >id.sh
[root@test1 ~]# chmod +x id.sh
And BAM, it worked!
[root@test1 ~]# ./id.sh
{#!/bin/bash(carriage return)}
-bash: ./id.sh: /bin/bash^M: bad interpreter: No such file or directory
As you can see, the kernel was actually trying to execute “/bin/bash\r” which of course, not a valid file. Here’s a screen shot:

Update: In response to reader requests, I wrote an explanation of my investigation: On the case of carriage returns and kernel exec system calls.
imagemagick: command line photoshop, only so much better
October 30th, 2007
I use Vmware extensively. If your unfamiliar, Vmware makes it possible to run other operating systems within but separate from your own. In addition, Vmware allows you to take screen shots. However, those screen shots are too large for posting as they make my sidebar disappear. As such, I need to scale them down to a maximum width of 640 pixels. I used to use Photoshop to automate the resizing of these images. I do not have access to Photoshop after recently upgrading my main machine and finally dumping Windows.
I knew Gimp had scripting built-in, which I planned on using to automate the resizing of said images. Not surprisingly, trying to understand the GUI was pure hell. Luckily after 15 minutes I remembered imagemagick.
I installed imagemagick on my ubuntu machine in about 5 seconds:
noland@mojito:~/Desktop$ sudo apt-get install imagemagick.
Reading package lists... Done
Building dependency tree
Reading state information... Done
Suggested packages:
html2ps
The following NEW packages will be installed:
imagemagick
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 0B/740kB of archives.
After unpacking 3232kB of additional disk space will be used.
Selecting previously deselected package imagemagick.
(Reading database ... 99849 files and directories currently installed.)
Unpacking imagemagick (from .../imagemagick_7%3a6.2.4.5.dfsg1-2ubuntu1_i386.deb) ...
Setting up imagemagick (7:6.2.4.5.dfsg1-2ubuntu1) ..
And then used find to resize all of them in about 10 seconds.
find . -type f -name '*.png' -exec convert -resize 640x {} {} \;
I LOVE the command line!

