I am making some changes to the moreutils sponge command. Sponge provides a method of prepending which is less specialized than my prepend util. However, it has trouble with large amounts of input.

Regardless, while testing my changes, I want to watch it operate. Normally, you would just do so from a second terminal. That is a pain. kill -0 can be very useful for this. After backgrounding the command, I assign the pid (via the variable $!) to $pid using eval. eval is needed to stop BASH from expanding $! until after the background operation.

After that, I enter a while loop on kill -0 $pid, which will not kill $pid, but will return successfully until $pid has died:

# cat large-file-GB | ./sponge large-file-GB-copy & eval 'pid=$!'; while kill -0 $pid; do sleep 10; ls -lh large-file* /tmp/sponge.*; echo;done
[1] 7937
-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw------- 1 root root 128M 2008-04-09 17:23 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw------- 1 root root 384M 2008-04-09 17:23 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw------- 1 root root 877M 2008-04-09 17:24 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw-r--r-- 1 root root  20M 2008-04-09 17:24 large-file-GB-copy
-rw------- 1 root root 896M 2008-04-09 17:24 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw-r--r-- 1 root root 413M 2008-04-09 17:25 large-file-GB-copy
-rw------- 1 root root 896M 2008-04-09 17:24 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw-r--r-- 1 root root 836M 2008-04-09 17:25 large-file-GB-copy
-rw------- 1 root root 896M 2008-04-09 17:24 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw-r--r-- 1 root root 920M 2008-04-09 17:25 large-file-GB-copy
[1]+  Done                    cat large-file-GB | ./sponge large-file-GB-copy
ls: cannot access /tmp/sponge.*: No such file or directory

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw-r--r-- 1 root root 977M 2008-04-09 17:25 large-file-GB-copy
-bash: kill: (7937) - No such process
# md5sum large-file-GB*
b5c667a723a10a3485a33263c4c2b978  large-file-GB
b5c667a723a10a3485a33263c4c2b978  large-file-GB-copy

Often I need or want to do some type of performance testing. Given my ideas on software development, I can usually do this by making simple HTTP requests. I use curl for this. While you may be tempted to do this in a for loop (or worse, actually write something!):

$ time for i in {1..1000}; do curl -s "http://bashcurescancer.com/blank.html";done
real    0m23.436s
user    0m6.416s
sys     0m7.351s

Curl provides the same functionality:

$ time curl -s "http://bashcurescancer.com/blank.html?[1-1000]"
real    0m6.561s
user    0m0.294s
sys     0m0.494s

Here are the details from the curl manual:

The URL syntax is protocol dependent. You’ll find a detailed description in RFC 3986.

You can specify multiple URLs or parts of URLs by writing part sets within braces as in:

http://site.{one,two,three}.com

or you can get sequences of alphanumeric series by using [ ] as in:

ftp://ftp.numericals.com/file[1-100].txt
ftp://ftp.numericals.com/file[001-100].txt    (with leading zeros)
ftp://ftp.letters.com/file[a-z].txt

No nesting of the sequences is supported at the moment, but you can use several ones next to each other:

http://any.org/archive[1996-1999]/vol[1-4]/part{a,b,c}.html

You can specify any amount of URLs on the command line. They will be fetched in a sequential manner in the specified order.

Since curl 7.15.1 you can also specify step counter for the ranges, so that you can get every Nth number or letter:

http://www.numericals.com/file[1-100:10].txt

http://www.letters.com/file[a-z:2].txt

If you specify URL without protocol:// prefix, curl will attempt to guess what protocol you might want. It will then default to HTTP but try other protocols based on often-used host name prefixes. For example, for host names starting with “ftp.” curl will assume you want to  speak FTP.

Curl  will  attempt  to re-use connections for multiple file transfers, so that getting many files from the same server will not do multiple connects / handshakes. This improves speed. Of course this is only done on files specified on a single  command  line  and  cannot  be  used
between separate curl invokes.

This is important as it helps measure the actual change being tested. A for loop, by creating a new process every loop, will fill up your test with “local” time. Using a single curl process eliminates this – which should allow you to see the results of your test in a more transparent manner.

For example, lets say you have a change that reduces page production time. Your not sure how long, so you decide to run 1000 tests. Eliminating a second from a 23 second tests is not 5 percent. While removing a second from a 6 second test, is almost 20%.

UPDATE: Including the one I added after posting and Elias‘ quoting exampling the comments we are up to eight.

After reading Shell Scripting Recipes, I became more interested in the speed of shell operations. In his book, Chris says “Command Substitution Is Slow.” He is correct!

$ f() { echo -n }; time for i in {0..100}; do v=$( f ); done

real    0m4.189s
user    0m0.000s
sys     0m4.188s
$ f() { _F="" }; time for i in {0..100}; do f; v=$_F; done

real    0m0.006s
user    0m0.000s
sys     0m0.000s

I found a few other equivalent operations which can be used to speed up shell scripts to varying degrees (none like the above) depending on the task at hand.  As Chris says, “the extra few milliseconds … may not seem significant, but scripts often loop hundred of even thousands of times.”

${#array[@]} is faster than () when expanding an array (#7)

$ a=(); time for i in {0..1000}; do a=(${a[@]} $i);done; echo ${#a[@]}

real    0m3.545s
user    0m3.544s
sys     0m0.000s
1001
$ a=(); time for i in {0..1000}; do a[${#a[@]}]=$i;done; echo ${#a[@]}

real    0m0.043s
user    0m0.040s
sys     0m0.003s
1001

< is faster than cat

$ time for i in {0..10000}; do var=`cat out`;done

real    0m9.328s
user    0m2.892s
sys     0m6.436s
$ time for i in {0..10000}; do var=`<out`;done
real    0m5.930s
user    0m1.412s
sys     0m4.520s

echo is faster than printf (though not nearly as powerful)

$ time for i in {0..100000}; do printf "\n"; done >/dev/null

real    0m4.446s
user    0m4.076s
sys     0m0.236s

$ time for i in {0..100000}; do echo; done >/dev/null

real    0m3.291s
user    0m3.100s
sys     0m0.184s

Arithmetic Evaluation is faster than let

$ i=0; time while :; do let "i = i + 1"; [[ $i -gt 100000 ]] && break;done
real    0m8.211s
user    0m7.900s
sys     0m0.304s
$ i=0; time while :; do ((i++)); [[ $i -gt 100000 ]] && break;done

real    0m5.287s
user    0m4.980s
sys     0m0.304s

UPDATE: This appears to still be true, but by a different margin. See comments.

List expansion is faster than seq and command substitution (though not always available)

$ time for i in $(seq 0 1000000); do :; done

real    0m28.482s
user    0m28.066s
sys     0m0.412s

$ time for i in {0..1000000}; do :; done

real    0m24.563s
user    0m24.402s
sys     0m0.156s

UPDATE: On BSD systems the apparent seq equivalent (jot) is faster than list expansion. See comments.

: is faster than true

$ i=0; time while true; do ((i++)); [[ $i -gt 1000000 ]] && break;done

real    0m57.360s
user    0m53.967s
sys     0m3.392s
$ i=0; time while :; do ((i++)); [[ $i -gt 1000000 ]] && break;done

real    0m54.138s
user    0m50.571s
sys     0m3.560s

Demonstration

Here is my base iptable INPUT chain:

# iptables -L INPUT -n
Chain INPUT (policy DROP)
target     prot opt source               destination
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:22

As you can see, I am dropping all packets except TCP packets on port 22. I am going to open up port 4550:

# iptables -A INPUT -p tcp --dport 4550 -j ACCEPT
# iptables -L INPUT -n
Chain INPUT (policy DROP)
target     prot opt source               destination
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:22
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:4550

Here I am using a netcat and an infinite loop as a simple “server” to send “i = $i” when someone connects to port 4550:

# i=0;while :; do echo i = $i | nc -l 192.168.6.20 4550; ((i++)); echo $i;done
1
2
3

In another terminal I have connected to port 4550 three times:

# time nc -w 120 -v 192.168.6.20 4550
Connection to 192.168.6.20 4550 port [tcp/*] succeeded!
i = 0

real    0m0.842s
user    0m0.001s
sys     0m0.014s
# time nc -w 120 -v 192.168.6.20 4550
Connection to 192.168.6.20 4550 port [tcp/*] succeeded!
i = 1

real    0m0.822s
user    0m0.000s
sys     0m0.007s
# time nc -w 120 -v 192.168.6.20 4550
Connection to 192.168.6.20 4550 port [tcp/*] succeeded!
i = 2

real    0m0.526s
user    0m0.002s
sys     0m0.009s

Now I am going to delete the ACCEPT rule and add a REJECT rule:

# iptables -D INPUT 2
# iptables -A INPUT -p tcp --dport 4550 -j REJECT
# iptables -L INPUT -n
Chain INPUT (policy DROP)
target     prot opt source               destination
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:22
REJECT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:4550 reject-with icmp-port-unreachable

Here is the output of the “client” netcat command after adding the REJECT rule:

# time nc -w 120 -v 192.168.6.20 4550
nc: connect to 192.168.6.20 port 4550 (tcp) failed: Connection refused

real    0m1.113s
user    0m0.000s
sys     0m0.005s

As you can see the command returned after ~1 second with an error. Now I am going to delete the REJECT rule. The default rule, DROP, will now be in effect:

# iptables -D INPUT 2
# iptables -L INPUT -n
Chain INPUT (policy DROP)
target     prot opt source               destination
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:22

Here is the output of the “client” program in another terminal session:

# time nc -w 120 -v 192.168.6.20 4550
nc: connect to 192.168.6.20 port 4550 (tcp) timed out: Operation now in progress

real    2m0.152s
user    0m0.000s
sys     0m0.001s

The command took two minutes to return with the error. The -w 120 option causes netcat to timeout if no reply is recieved after 120 seconds.

Explanation

iptables is often used to block a specific ip address or subnet whom are doing something maclious. A REJECT rule will cause the maclicious host to recieve an error shortly after the connection attempt. A DROP rule will act differently. If they have set a client timeout, the malicious host will wait until said timeout is satisfied. Most likely several seconds. This should significantly slow the malicious program. A program without a client timeout will sit for hours waiting for a reply.

When organizations need to create an application (most likely doing CRUD), they create both the application logic and user interface. Typically, this is done via a web application whose user interface is HTML. This essentially decides how the user can best utilize application logic.

CRUD applications can be used seamlessly in a GUI, via the command line, or inside other applications by following these three principles:

  1. Decouple the user interface from the application
  2. Use a standard and stateless authentication mechanism
  3. REST

Decouple the user interface from the application

Do not send HTML to the browser, send XML and an associated style sheet. The browser will then render the document. My sitemap is an example. This makes the page both readable in a browser and machine processable. (Note, this is very basic style sheet.)

This way, anyone can create a client side user interface to your “application.” Your user interface, simply becomes the default user interface. Anyone can create their own. Bonus points if you provide an easy method of sharing these alternative user interfaces.

Use a standard and stateless authentication mechanism

Use only HTTP Basic Authentication over SSL. Being stateless and standard, this protocol is simple and leverages a ton pre-built tools. While Apache/IIS implement Basic Authentication, it is important to understand that Basic Authentication is simply a protocol for communicating credentials. You can use any authentication store. PHP.net has a good overview of HTTP Authentication.

REST

I had never heard of REST until last year. While speaking with an exceedingly intelligent colleague of mine – I explained how if I had designed this particular GUI I would have let users query data by simply modifying the URL. Example:

http://gui/servers?platform=linux&active=true

He said, “REST!”

This is SO simple, just use GET, be stateless, use logical names, and allow selection via all characteristics. UPDATE: This is not REST, but will get the job done. I’d prefer if you implemented REST. (See comments.)

Final Thoughts

Not all data nicely fits on a single line or few lines. However, in the vast majority of cases, records can be displayed in a grep’able format. As such, its trivial to create a parameter, say f=pt, which will output the data in some line based format. At the very least, xml can be displayed in a format with is grep’able. Instead of:

<records>
<record id="1">
<key name="abc" val="123" />
</record>
<record id="2">
<key name="def" val="456" />
</record>
</records>

Do this:

<records>
<record id="1"><key name="abc" val="123" /></record>
<record id="2"><key name="def" val="456" /></record>
</records>

Many times, a separate “Web Services API” is created to allow people to extract data in a machine processable format. However, if you follow the these three principles, your GUI and API are one in the same. There is no need to create a separate non-human API. Furthermore, in my experience, there is rarely a need for reference documentation. The API is self explanatory.

The other day, I began wondering which comparator, test, [, or [[, was fastest? Here are the results:

$ time for i in {1..100000}; do [[ -d . ]];done
real    0m1.256s
user    0m1.018s
sys     0m0.238s
$ time for i in {1..100000}; do [ -d . ];done
real    0m3.407s
user    0m2.704s
sys     0m0.703s
$ time for i in {1..100000}; do test -d .;done

real    0m3.223s
user    0m2.607s
sys     0m0.616s

The double bracket is a "compound command" where as test and the single bracket are shell built-ins (and in actuality are the same command). Thus, the single bracket and double bracket execute different code.

The test and single bracket are the most portable as they exist as separate and external commands. However, if your using any remotely modern version of BASH, the double bracket is supported.

Here is the performance numbers on the external version of test and single bracket:

$ time for i in {1..100000}; do /usr/bin/test -d .;done

real    5m49.324s
user    0m51.771s
sys     4m48.013s
$ time for i in {1..100000}; do /usr/bin/[ -d . ];done

real    5m45.728s
user    0m52.536s
sys     4m46.259s

Wow! This shows the high cost of process creation!

Five Quick Command Line Tips

January 18th, 2008

These are my tips for the week. Have you any tips to share?

Create many files of random size

$ for i in $(seq 1 1000); do let "size = ($RANDOM/100)"; dd if=/dev/urandom \
of=file-$i count=$size 2>/dev/null;done

Delete files with find, faster

Many people use find to delete files. However, they often to so like this:

find directory conditionals -exec rm {} \;

Which certainly works. However, find has a built-in “-delete” action which is significantly faster.

$ find . -name 'file-*'| wc -l
1000
$ time find . -name 'file-*' -exec rm {} \;

real    0m2.540s
user    0m1.076s
sys     0m1.196s
$ find . -name 'file-*'| wc -l
0

$ find . -name 'file-*'| wc -l
1000
$ time find . -name 'file-*' -delete

real    0m0.118s
user    0m0.008s
sys     0m0.100s
$ find . -name 'file-*'| wc -l
0

Finding more fun

Often those new to the shell do something like this to find the size of files:

$ ls -la | awk '/^\-/ {print $5}'
98816
99328
99840
99840
99840

Or the last modification date:

$ ls -la | awk '/^\-/ {print $6, $7}'
2008-01-17 23:03
2008-01-17 23:03
2008-01-17 23:03
2008-01-17 23:03
2008-01-17 23:03

Find offers a much clean interface to this information:

$ find . -maxdepth 1 -type f -printf "%s\n"
98816
99328
99840
99840
99840
$ find . -maxdepth 1 -type f -printf "%TY-%Tm-%Td %TH:%TM\n"
2008-01-17 23:03
2008-01-17 23:03
2008-01-17 23:03
2008-01-17 23:03
2008-01-17 23:03

Standard input can only be read once

$ echo a > a
$ echo b > b
$ echo c > c

That is, this:

$ cat a b c | cat -
a
b
c

Is the same as:

$ cat a b c | cat - -
a
b
c

Meaning that standard input is “read desctructive.” As Pádraig Brady said recently on the core utils mailing list:

$ echo mouse | cat - -
mouse

Odd characters showing up in your terminal when using Putty

Go to your settings:

Window
Translation
Change “Recieved data assumed to be in which character set” to UTF-8

Update: I changed the first examples when finding last modification date to use awk instead of grep and awk.  Hard to change my ways!

I just came across a script that had a beautiful method of printing help messages. I am unsure why this method is not used more. Here is an example:


$ cat helpExample.sh
#!/bin/bash
usage()
{
echo "Usage: ${0##*/}
This is an example help message in BASH.
-x this option does nothing
" >&2
exit $1
}
usage 1

The script is using echo ” followed by a multi line help message redirected to standard error (" >&2). Output:


$ ./helpExample.sh
Usage: helpExample.sh
This is an example help message in BASH.
-x this option does nothing

Shell Script Krusties

October 18th, 2007

If your writing a shell script that writes files, its bad practice not to use trap. Useful scripts get used, copied, and shared. Users love to use Ctrl-C. If I had a nickel for every time I have heard “Ctrl-C out of it” being said by some user who did not understand the implications, I might have a few Euros. As such many, many scripts out there leave “krusties” all over the place.

Take this shell script for example:

$ cat krusties.sh
#!/bin/bash
tmpfile="/tmp/krusty"
cleanup()
{
rm -f $tmpfile
}
touch $tmpfile
sleep 5 # doing something useful here...
cleanup

It assumes that the cleanup function will always run. However, if I run the script and press Ctrl-C while its sleeping…

$ ls -l /tmp/k*
ls: /tmp/k*: No such file or directory
$ ./krusties.sh
^C
$ ls -l /tmp/k*
-rw-rw-r-- 1 brock brock 0 Oct 15 10:41 /tmp/krusty
$ rm -f /tmp/krusty

It leaves a krusty. In the best case scenario this is annoying. In the worst case it can cause serious problems. (E.g., lock files.) Now consider this modified script:

$ cat krusties.sh
#!/bin/bash
tmpfile="/tmp/krusty"
cleanup()
{
rm -f $tmpfile
exit
}
trap "cleanup" SIGINT SIGTERM
touch $tmpfile
sleep 5 # doing something useful here...
cleanup

I added the line trap “cleanup” SIGINT SIGTERM which causes the shell to execute the cleanup function on Ctrl-C and the terminate signal. If not killed, it runs cleanup on exit. This version does not leave the krusty:

$ ls -l /tmp/k*
ls: /tmp/k*: No such file or directory
$ ./krusties.sh
^C
$ ls -l /tmp/k*
ls: /tmp/k*: No such file or directory

The 60 second getopts tutorial

October 15th, 2007

Processing command line arguments is a pain in any language. If done manually, parsing even a few options and option value pairs in BASH is a huge pain. As such and given the nature of shell scripts, they usually have exceedingly poor options processing. However, there is a solution. getopts is a BASH builtin which makes handling command line arguments like butter.

Here is how I have done so in this monitor cpu usage script. First I define all my option holding variables:

whatTowatch=""
email=""
startAtUid="-1"
maxCpuUsage="-1"
debug=""

The following while statement loops through all the options and sets them to the corresponding variable. getopts returns true while there are options to be processed. The argument string, here “hw:e:u:m:d”, specifies which options the script accepts. If the user specifies an option which is not in this string, it sets $optionName to ?. If the option is succeeded by a colon, the value immediately following the option is placed in the variable $OPTARG.

while getopts "hw:e:u:m:d" optionName; do
case "$optionName" in
h) printHelpAndExit 0;;
d) debug="0";;
w) whatTowatch="$OPTARG";;
e) email="$OPTARG";;
u) startAtUid="$OPTARG";;
m) maxCpuUsage="$OPTARG";;
[?]) printErrorHelpAndExit "$badOptionHelp";;
esac
done

All that’s left to do, is to make sure you have the correct option groups specified and your off and running.

outputCmd="mail -s 'CPU Abusers on ${HOSTNAME}' $email"
[[ "$whatTowatch" != "users" ]] && [[ "$whatTowatch" != "procs" ]] \
&& printErrorHelpAndExit "$watchHelp"
if [[ -z "$debug" ]]
then
( [[ "$maxCpuUsage" -ge 0 ]] && [[ "$maxCpuUsage" -le 100 ]] ) || \
printErrorHelpAndExit "$maxCpuHelp"
[[ "$startAtUid" -eq -1 ]] && [[ "$whatTowatch" == "users" ]] && \
printErrorHelpAndExit "$uidHelp"
[[ -z "$email" ]] && printErrorHelpAndExit "$emailHelp"
else
outputCmd=cat
fi

Here is what the script outputs when it encounters an unknown option:

# ./monitorCpuUsage.sh -x
./monitorCpuUsage.sh: illegal option -- x   

Option not recognised

The “./monitorCpuUsage.sh: illegal option — x” portion is printed by getopts. If you want to remain silent, just place a colon at the front of the argument string.