using kill to see if a process is alive
April 9th, 2008
I am making some changes to the moreutils sponge command. Sponge provides a method of prepending which is less specialized than my prepend util. However, it has trouble with large amounts of input.
Regardless, while testing my changes, I want to watch it operate. Normally, you would just do so from a second terminal. That is a pain. kill -0 can be very useful for this. After backgrounding the command, I assign the pid (via the variable $!) to $pid using eval. eval is needed to stop BASH from expanding $! until after the background operation.
After that, I enter a while loop on kill -0 $pid, which will not kill $pid, but will return successfully until $pid has died:
# cat large-file-GB | ./sponge large-file-GB-copy & eval 'pid=$!'; while kill -0 $pid; do sleep 10; ls -lh large-file* /tmp/sponge.*; echo;done [1] 7937 -rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB -rw------- 1 root root 128M 2008-04-09 17:23 /tmp/sponge.JMsBWG -rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB -rw------- 1 root root 384M 2008-04-09 17:23 /tmp/sponge.JMsBWG -rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB -rw------- 1 root root 877M 2008-04-09 17:24 /tmp/sponge.JMsBWG -rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB -rw-r--r-- 1 root root 20M 2008-04-09 17:24 large-file-GB-copy -rw------- 1 root root 896M 2008-04-09 17:24 /tmp/sponge.JMsBWG -rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB -rw-r--r-- 1 root root 413M 2008-04-09 17:25 large-file-GB-copy -rw------- 1 root root 896M 2008-04-09 17:24 /tmp/sponge.JMsBWG -rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB -rw-r--r-- 1 root root 836M 2008-04-09 17:25 large-file-GB-copy -rw------- 1 root root 896M 2008-04-09 17:24 /tmp/sponge.JMsBWG -rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB -rw-r--r-- 1 root root 920M 2008-04-09 17:25 large-file-GB-copy [1]+ Done cat large-file-GB | ./sponge large-file-GB-copy ls: cannot access /tmp/sponge.*: No such file or directory -rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB -rw-r--r-- 1 root root 977M 2008-04-09 17:25 large-file-GB-copy -bash: kill: (7937) - No such process # md5sum large-file-GB* b5c667a723a10a3485a33263c4c2b978 large-file-GB b5c667a723a10a3485a33263c4c2b978 large-file-GB-copy
Performance testing - with curl
April 8th, 2008
Often I need or want to do some type of performance testing. Given my ideas on software development, I can usually do this by making simple HTTP requests. I use curl for this. While you may be tempted to do this in a for loop (or worse, actually write something!):
$ time for i in {1..1000}; do curl -s "http://bashcurescancer.com/blank.html";done
real 0m23.436s user 0m6.416s sys 0m7.351s
Curl provides the same functionality:
$ time curl -s "http://bashcurescancer.com/blank.html?[1-1000]"
real 0m6.561s user 0m0.294s sys 0m0.494s
Here are the details from the curl manual:
The URL syntax is protocol dependent. You’ll find a detailed description in RFC 3986.
You can specify multiple URLs or parts of URLs by writing part sets within braces as in:
http://site.{one,two,three}.com
or you can get sequences of alphanumeric series by using [ ] as in:
ftp://ftp.numericals.com/file[1-100].txt
ftp://ftp.numericals.com/file[001-100].txt (with leading zeros)
ftp://ftp.letters.com/file[a-z].txtNo nesting of the sequences is supported at the moment, but you can use several ones next to each other:
http://any.org/archive[1996-1999]/vol[1-4]/part{a,b,c}.html
You can specify any amount of URLs on the command line. They will be fetched in a sequential manner in the specified order.
Since curl 7.15.1 you can also specify step counter for the ranges, so that you can get every Nth number or letter:
http://www.numericals.com/file[1-100:10].txt
http://www.letters.com/file[a-z:2].txtIf you specify URL without protocol:// prefix, curl will attempt to guess what protocol you might want. It will then default to HTTP but try other protocols based on often-used host name prefixes. For example, for host names starting with “ftp.” curl will assume you want to speak FTP.
Curl will attempt to re-use connections for multiple file transfers, so that getting many files from the same server will not do multiple connects / handshakes. This improves speed. Of course this is only done on files specified on a single command line and cannot be used
between separate curl invokes.
This is important as it helps measure the actual change being tested. A for loop, by creating a new process every loop, will fill up your test with “local” time. Using a single curl process eliminates this - which should allow you to see the results of your test in a more transparent manner.
For example, lets say you have a change that reduces page production time. Your not sure how long, so you decide to run 1000 tests. Eliminating a second from a 23 second tests is not 5 percent. While removing a second from a 6 second test, is almost 20%.
Eight ways to speed up your shell scripts
March 21st, 2008
UPDATE: Including the one I added after posting and Elias‘ quoting exampling the comments we are up to eight.
After reading Shell Scripting Recipes, I became more interested in the speed of shell operations. In his book, Chris says “Command Substitution Is Slow.” He is correct!
$ f() { echo -n }; time for i in {0..100}; do v=$( f ); done
real 0m4.189s
user 0m0.000s
sys 0m4.188s
$ f() { _F="" }; time for i in {0..100}; do f; v=$_F; done
real 0m0.006s
user 0m0.000s
sys 0m0.000s
I found a few other equivalent operations which can be used to speed up shell scripts to varying degrees (none like the above) depending on the task at hand. As Chris says, “the extra few milliseconds … may not seem significant, but scripts often loop hundred of even thousands of times.”
${#array[@]} is faster than () when expanding an array (#7)
$ a=(); time for i in {0..1000}; do a=(${a[@]} $i);done; echo ${#a[@]}
real 0m3.545s
user 0m3.544s
sys 0m0.000s
1001
$ a=(); time for i in {0..1000}; do a[${#a[@]}]=$i;done; echo ${#a[@]}
real 0m0.043s
user 0m0.040s
sys 0m0.003s
1001
< is faster than cat
$ time for i in {0..10000}; do var=`cat out`;done
real 0m9.328s
user 0m2.892s
sys 0m6.436s
$ time for i in {0..10000}; do var=`<out`;done
real 0m5.930s user 0m1.412s sys 0m4.520s
echo is faster than printf (though not nearly as powerful)
$ time for i in {0..100000}; do printf "\n"; done >/dev/null
real 0m4.446s
user 0m4.076s
sys 0m0.236s
$ time for i in {0..100000}; do echo; done >/dev/null
real 0m3.291s
user 0m3.100s
sys 0m0.184s
Arithmetic Evaluation is faster than let
$ i=0; time while :; do let "i = i + 1"; [[ $i -gt 100000 ]] && break;done
real 0m8.211s user 0m7.900s sys 0m0.304s
$ i=0; time while :; do ((i++)); [[ $i -gt 100000 ]] && break;done real 0m5.287s user 0m4.980s sys 0m0.304s
UPDATE: This appears to still be true, but by a different margin. See comments.
List expansion is faster than seq and command substitution (though not always available)
$ time for i in $(seq 0 1000000); do :; done
real 0m28.482s
user 0m28.066s
sys 0m0.412s
$ time for i in {0..1000000}; do :; done
real 0m24.563s
user 0m24.402s
sys 0m0.156s
UPDATE: On BSD systems the apparent seq equivalent (jot) is faster than list expansion. See comments.
: is faster than true
$ i=0; time while true; do ((i++)); [[ $i -gt 1000000 ]] && break;done real 0m57.360s user 0m53.967s sys 0m3.392s
$ i=0; time while :; do ((i++)); [[ $i -gt 1000000 ]] && break;done real 0m54.138s user 0m50.571s sys 0m3.560s
Demonstration
Here is my base iptable INPUT chain:
# iptables -L INPUT -n Chain INPUT (policy DROP) target prot opt source destination ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:22
As you can see, I am dropping all packets except TCP packets on port 22. I am going to open up port 4550:
# iptables -A INPUT -p tcp --dport 4550 -j ACCEPT # iptables -L INPUT -n Chain INPUT (policy DROP) target prot opt source destination ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:22 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:4550
Here I am using a netcat and an infinite loop as a simple “server” to send “i = $i” when someone connects to port 4550:
# i=0;while :; do echo i = $i | nc -l 192.168.6.20 4550; ((i++)); echo $i;done 1 2 3
In another terminal I have connected to port 4550 three times:
# time nc -w 120 -v 192.168.6.20 4550 Connection to 192.168.6.20 4550 port [tcp/*] succeeded! i = 0 real 0m0.842s user 0m0.001s sys 0m0.014s # time nc -w 120 -v 192.168.6.20 4550 Connection to 192.168.6.20 4550 port [tcp/*] succeeded! i = 1 real 0m0.822s user 0m0.000s sys 0m0.007s # time nc -w 120 -v 192.168.6.20 4550 Connection to 192.168.6.20 4550 port [tcp/*] succeeded! i = 2 real 0m0.526s user 0m0.002s sys 0m0.009s
Now I am going to delete the ACCEPT rule and add a REJECT rule:
# iptables -D INPUT 2 # iptables -A INPUT -p tcp --dport 4550 -j REJECT # iptables -L INPUT -n Chain INPUT (policy DROP) target prot opt source destination ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:22 REJECT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:4550 reject-with icmp-port-unreachable
Here is the output of the “client” netcat command after adding the REJECT rule:
# time nc -w 120 -v 192.168.6.20 4550 nc: connect to 192.168.6.20 port 4550 (tcp) failed: Connection refused real 0m1.113s user 0m0.000s sys 0m0.005s
As you can see the command returned after ~1 second with an error. Now I am going to delete the REJECT rule. The default rule, DROP, will now be in effect:
# iptables -D INPUT 2 # iptables -L INPUT -n Chain INPUT (policy DROP) target prot opt source destination ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:22
Here is the output of the “client” program in another terminal session:
# time nc -w 120 -v 192.168.6.20 4550 nc: connect to 192.168.6.20 port 4550 (tcp) timed out: Operation now in progress real 2m0.152s user 0m0.000s sys 0m0.001s
The command took two minutes to return with the error. The -w 120 option causes netcat to timeout if no reply is recieved after 120 seconds.
Explanation
iptables is often used to block a specific ip address or subnet whom are doing something maclious. A REJECT rule will cause the maclicious host to recieve an error shortly after the connection attempt. A DROP rule will act differently. If they have set a client timeout, the malicious host will wait until said timeout is satisfied. Most likely several seconds. This should significantly slow the malicious program. A program without a client timeout will sit for hours waiting for a reply.
3 Principles of Web Application (GUI) Design for the Command Line
March 10th, 2008
When organizations need to create an application (most likely doing CRUD), they create both the application logic and user interface. Typically, this is done via a web application whose user interface is HTML. This essentially decides how the user can best utilize application logic.
CRUD applications can be used seamlessly in a GUI, via the command line, or inside other applications by following these three principles:
- Decouple the user interface from the application
- Use a standard and stateless authentication mechanism
- REST
Decouple the user interface from the application
Do not send HTML to the browser, send XML and an associated style sheet. The browser will then render the document. My sitemap is an example. This makes the page both readable in a browser and machine processable. (Note, this is very basic style sheet.)
This way, anyone can create a client side user interface to your “application.” Your user interface, simply becomes the default user interface. Anyone can create their own. Bonus points if you provide an easy method of sharing these alternative user interfaces.
Use a standard and stateless authentication mechanism
Use only HTTP Basic Authentication over SSL. Being stateless and standard, this protocol is simple and leverages a ton pre-built tools. While Apache/IIS implement Basic Authentication, it is important to understand that Basic Authentication is simply a protocol for communicating credentials. You can use any authentication store. PHP.net has a good overview of HTTP Authentication.
REST
I had never heard of REST until last year. While speaking with an exceedingly intelligent colleague of mine - I explained how if I had designed this particular GUI I would have let users query data by simply modifying the URL. Example:
http://gui/servers?platform=linux&active=true
He said, “REST!”
This is SO simple, just use GET, be stateless, use logical names, and allow selection via all characteristics. UPDATE: This is not REST, but will get the job done. I’d prefer if you implemented REST. (See comments.)
Final Thoughts
Not all data nicely fits on a single line or few lines. However, in the vast majority of cases, records can be displayed in a grep’able format. As such, its trivial to create a parameter, say f=pt, which will output the data in some line based format. At the very least, xml can be displayed in a format with is grep’able. Instead of:
<records> <record id="1"> <key name="abc" val="123" /> </record> <record id="2"> <key name="def" val="456" /> </record> </records>
Do this:
<records> <record id="1"><key name="abc" val="123" /></record> <record id="2"><key name="def" val="456" /></record> </records>
Many times, a separate “Web Services API” is created to allow people to extract data in a machine processable format. However, if you follow the these three principles, your GUI and API are one in the same. There is no need to create a separate non-human API. Furthermore, in my experience, there is rarely a need for reference documentation. The API is self explanatory.
Which comparator, test, bracket, or double bracket, is fastest?
January 24th, 2008
The other day, I began wondering which comparator, test, [, or [[, was fastest? Here are the results:
$ time for i in {1..100000}; do [[ -d . ]];done
real 0m1.256s user 0m1.018s sys 0m0.238s
$ time for i in {1..100000}; do [ -d . ];done
real 0m3.407s user 0m2.704s sys 0m0.703s
$ time for i in {1..100000}; do test -d .;done
real 0m3.223s
user 0m2.607s
sys 0m0.616s
The double bracket is a “compound command” where as test and the single bracket are shell built-ins (and in actuality are the same command). Thus, the single bracket and double bracket execute different code.
The test and single bracket are the most portable as they exist as separate and external commands. However, if your using any remotely modern version of BASH, the double bracket is supported.
Here is the performance numbers on the external version of test and single bracket:
$ time for i in {1..100000}; do /usr/bin/test -d .;done
real 5m49.324s
user 0m51.771s
sys 4m48.013s
$ time for i in {1..100000}; do /usr/bin/[ -d . ];done
real 5m45.728s
user 0m52.536s
sys 4m46.259s
Wow! This shows the high cost of process creation!
Five Quick Command Line Tips
January 18th, 2008
These are my tips for the week. Have you any tips to share?
Create many files of random size
$ for i in $(seq 1 1000); do let "size = ($RANDOM/100)"; dd if=/dev/urandom \ of=file-$i count=$size 2>/dev/null;done
Delete files with find, faster
Many people use find to delete files. However, they often to so like this:
find directory conditionals -exec rm {} \;
Which certainly works. However, find has a built-in “-delete” action which is significantly faster.
$ find . -name 'file-*'| wc -l
1000
$ time find . -name 'file-*' -exec rm {} \;
real 0m2.540s
user 0m1.076s
sys 0m1.196s
$ find . -name 'file-*'| wc -l
0
$ find . -name 'file-*'| wc -l
1000
$ time find . -name 'file-*' -delete
real 0m0.118s
user 0m0.008s
sys 0m0.100s
$ find . -name 'file-*'| wc -l
0
Finding more fun
Often those new to the shell do something like this to find the size of files:
$ ls -la | awk '/^\-/ {print $5}'
98816
99328
99840
99840
99840
Or the last modification date:
$ ls -la | awk '/^\-/ {print $6, $7}'
2008-01-17 23:03
2008-01-17 23:03
2008-01-17 23:03
2008-01-17 23:03
2008-01-17 23:03
Find offers a much clean interface to this information:
$ find . -maxdepth 1 -type f -printf "%s\n" 98816 99328 99840 99840 99840
$ find . -maxdepth 1 -type f -printf "%TY-%Tm-%Td %TH:%TM\n" 2008-01-17 23:03 2008-01-17 23:03 2008-01-17 23:03 2008-01-17 23:03 2008-01-17 23:03
Standard input can only be read once
$ echo a > a $ echo b > b $ echo c > c
That is, this:
$ cat a b c | cat - a b c
Is the same as:
$ cat a b c | cat - - a b c
Meaning that standard input is “read desctructive.” As Pádraig Brady said recently on the core utils mailing list:
$ echo mouse | cat - - mouse
Odd characters showing up in your terminal when using Putty
Go to your settings:
Window
Translation
Change “Recieved data assumed to be in which character set” to UTF-8
Update: I changed the first examples when finding last modification date to use awk instead of grep and awk. Hard to change my ways!
Printing pretty help/usage messages
October 22nd, 2007
I just came across a script that had a beautiful method of printing help messages. I am unsure why this method is not used more. Here is an example:
$ cat helpExample.sh
#!/bin/bash
usage()
{
echo "Usage: ${0##*/}
This is an example help message in BASH.
-x this option does nothing
" >&2
exit $1
}
usage 1
The script is using echo ” followed by a multi line help message redirected to standard error (" >&2). Output:
$ ./helpExample.sh
Usage: helpExample.sh
This is an example help message in BASH.
-x this option does nothing
Shell Script Krusties
October 18th, 2007
If your writing a shell script that writes files, its bad practice not to use trap. Useful scripts get used, copied, and shared. Users love to use Ctrl-C. If I had a nickel for every time I have heard “Ctrl-C out of it” being said by some user who did not understand the implications, I might have a few Euros. As such many, many scripts out there leave “krusties” all over the place.
Take this shell script for example:
$ cat krusties.sh
#!/bin/bash
tmpfile="/tmp/krusty"
cleanup()
{
rm -f $tmpfile
}
touch $tmpfile
sleep 5 # doing something useful here...
cleanup
It assumes that the cleanup function will always run. However, if I run the script and press Ctrl-C while its sleeping…
$ ls -l /tmp/k*
ls: /tmp/k*: No such file or directory
$ ./krusties.sh
^C
$ ls -l /tmp/k*
-rw-rw-r-- 1 brock brock 0 Oct 15 10:41 /tmp/krusty
$ rm -f /tmp/krusty
It leaves a krusty. In the best case scenario this is annoying. In the worst case it can cause serious problems. (E.g., lock files.) Now consider this modified script:
$ cat krusties.sh
#!/bin/bash
tmpfile="/tmp/krusty"
cleanup()
{
rm -f $tmpfile
exit
}
trap "cleanup" SIGINT SIGTERM
touch $tmpfile
sleep 5 # doing something useful here...
cleanup
I added the line trap “cleanup” SIGINT SIGTERM which causes the shell to execute the cleanup function on Ctrl-C and the terminate signal. If not killed, it runs cleanup on exit. This version does not leave the krusty:
$ ls -l /tmp/k*
ls: /tmp/k*: No such file or directory
$ ./krusties.sh
^C
$ ls -l /tmp/k*
ls: /tmp/k*: No such file or directory
The 60 second getopts tutorial
October 15th, 2007
Processing command line arguments is a pain in any language. If done manually, parsing even a few options and option value pairs in BASH is a huge pain. As such and given the nature of shell scripts, they usually have exceedingly poor options processing. However, there is a solution. getopts is a BASH builtin which makes handling command line arguments like butter.
Here is how I have done so in this monitor cpu usage script. First I define all my option holding variables:
whatTowatch="" email="" startAtUid="-1" maxCpuUsage="-1" debug=""
The following while statement loops through all the options and sets them to the corresponding variable. getopts returns true while there are options to be processed. The argument string, here “hw:e:u:m:d”, specifies which options the script accepts. If the user specifies an option which is not in this string, it sets $optionName to ?. If the option is succeeded by a colon, the value immediately following the option is placed in the variable $OPTARG.
while getopts "hw:e:u:m:d" optionName; do case "$optionName" in h) printHelpAndExit 0;; d) debug="0";; w) whatTowatch="$OPTARG";; e) email="$OPTARG";; u) startAtUid="$OPTARG";; m) maxCpuUsage="$OPTARG";; [?]) printErrorHelpAndExit "$badOptionHelp";; esac done
All that’s left to do, is to make sure you have the correct option groups specified and your off and running.
outputCmd="mail -s 'CPU Abusers on ${HOSTNAME}' $email"
[[ "$whatTowatch" != "users" ]] && [[ "$whatTowatch" != "procs" ]] \
&& printErrorHelpAndExit "$watchHelp"
if [[ -z "$debug" ]]
then
( [[ "$maxCpuUsage" -ge 0 ]] && [[ "$maxCpuUsage" -le 100 ]] ) || \
printErrorHelpAndExit "$maxCpuHelp"
[[ "$startAtUid" -eq -1 ]] && [[ "$whatTowatch" == "users" ]] && \
printErrorHelpAndExit "$uidHelp"
[[ -z "$email" ]] && printErrorHelpAndExit "$emailHelp"
else
outputCmd=cat
fi
Here is what the script outputs when it encounters an unknown option:
# ./monitorCpuUsage.sh -x ./monitorCpuUsage.sh: illegal option -- x Option not recognised
The “./monitorCpuUsage.sh: illegal option — x” portion is printed by getopts. If you want to remain silent, just place a colon at the front of the argument string.

