The other day on the cURL email list, someone asked:

Could someone please tell me (preferably with an example) of how I could parse and xml like the following:

<?xml version=”1.0″ encoding=”ISO-8859-1″ ?>
<FileRetriever>
<FileList>
<File name=”AMERI08.D4860.ZIP” />
<File name=”DTCCRSF.D4861.ZIP” />
<File name=”DTGSS01.D4862.ZIP” />
<File name=”DTGSS02.D4863.ZIP” />
<File name=”DTGSS03.D4864.ZIP” /
</FileList>
</FileRetriever>

This is not appropriate for the cURL list, but I thought a fair question.  You could do this:

$ grep '<File ' config.xml  | awk -F'"' '{print $2}' | xargs -l -I {} echo curl -I "http://bashcurescancer.com/{}"
curl -I http://bashcurescancer.com/AMERI08.D4860.ZIP
curl -I http://bashcurescancer.com/DTCCRSF.D4861.ZIP
curl -I http://bashcurescancer.com/DTGSS01.D4862.ZIP
curl -I http://bashcurescancer.com/DTGSS02.D4863.ZIP
curl -I http://bashcurescancer.com/DTGSS03.D4864.ZIP

Or, you could use the xsltproc command with an associated style sheet. This is really the correct method and much more effective when your processing complex XML or XML that is not easily grep’able:

$ xsltproc --nonet config.xsl config.xml | xargs -l -I {} echo curl -I "http://bashcurescancer.com/{}"
curl -I http://bashcurescancer.com/AMERI08.D4860.ZIP
curl -I http://bashcurescancer.com/DTCCRSF.D4861.ZIP
curl -I http://bashcurescancer.com/DTGSS01.D4862.ZIP
curl -I http://bashcurescancer.com/DTGSS02.D4863.ZIP
curl -I http://bashcurescancer.com/DTGSS03.D4864.ZIP

Links to config.xml and config.xsl.

I am making some changes to the moreutils sponge command. Sponge provides a method of prepending which is less specialized than my prepend util. However, it has trouble with large amounts of input.

Regardless, while testing my changes, I want to watch it operate. Normally, you would just do so from a second terminal. That is a pain. kill -0 can be very useful for this. After backgrounding the command, I assign the pid (via the variable $!) to $pid using eval. eval is needed to stop BASH from expanding $! until after the background operation.

After that, I enter a while loop on kill -0 $pid, which will not kill $pid, but will return successfully until $pid has died:

# cat large-file-GB | ./sponge large-file-GB-copy & eval 'pid=$!'; while kill -0 $pid; do sleep 10; ls -lh large-file* /tmp/sponge.*; echo;done
[1] 7937
-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw------- 1 root root 128M 2008-04-09 17:23 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw------- 1 root root 384M 2008-04-09 17:23 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw------- 1 root root 877M 2008-04-09 17:24 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw-r--r-- 1 root root  20M 2008-04-09 17:24 large-file-GB-copy
-rw------- 1 root root 896M 2008-04-09 17:24 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw-r--r-- 1 root root 413M 2008-04-09 17:25 large-file-GB-copy
-rw------- 1 root root 896M 2008-04-09 17:24 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw-r--r-- 1 root root 836M 2008-04-09 17:25 large-file-GB-copy
-rw------- 1 root root 896M 2008-04-09 17:24 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw-r--r-- 1 root root 920M 2008-04-09 17:25 large-file-GB-copy
[1]+  Done                    cat large-file-GB | ./sponge large-file-GB-copy
ls: cannot access /tmp/sponge.*: No such file or directory

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw-r--r-- 1 root root 977M 2008-04-09 17:25 large-file-GB-copy
-bash: kill: (7937) - No such process
# md5sum large-file-GB*
b5c667a723a10a3485a33263c4c2b978  large-file-GB
b5c667a723a10a3485a33263c4c2b978  large-file-GB-copy

New command: prepend

April 6th, 2008

I am utilizing Google’s project hosting to host software which I create and feel is useful or want to keep track of. I called the project Brock’s Tools. The code that led me to create this project was a command I am calling prepend 1.1. (UPDATE: See this post on sponge as its a better general case tool.)

prepend, prepend’s files or standard input to a file. For example,  you have three files:

$ echo BROCK > a
$ echo DAVID > b
$ echo NOLAND > c

And you want to combine them into one file:

$ echo "My name is:" | prepend - a b c
$ cat c
My name is:
BROCK
DAVID
NOLAND

Or lets say you just want to append a file to itself:

$ cat a
BROCK
$ cat a >> a
cat: a: input file is output file

prepend does this:

$ prepend a
$ cat a
BROCK
BROCK

I come across the a situation where this would be useful quite often. Of course prepend’ing can be done in the shell:

$ { echo "My name is:"; cat a b c; } > tmp && mv -f tmp c
$ cat c
My name is:
BROCK
DAVID
NOLAND

However, that is unsafe and I have lost data that way. I perform this operation most often when dealing with XML. In this example, its trivial to open the file in an editor, but with a large file, its quite nasty to do so:

$ cat something.xml
<entry><blah/><more>stuff 1</more></entry>
<entry><blah/><more>stuff 2</more></entry>
<entry><blah/><more>stuff 3 </more></entry>
<entry><blah/><more>stuff 4</more></entry>
$ echo "</entries>" >> something.xml
$ cat something.xml
<entry><blah/><more>stuff 1</more></entry>
<entry><blah/><more>stuff 2</more></entry>
<entry><blah/><more>stuff 3 </more></entry>
<entry><blah/><more>stuff 4</more></entry>
</entries>
$ echo "<entries>" | prepend - something.xml
$ cat something.xml
<entries>
<entry><blah/><more>stuff 1</more></entry>
<entry><blah/><more>stuff 2</more></entry>
<entry><blah/><more>stuff 3 </more></entry>
<entry><blah/><more>stuff 4</more></entry>
</entries>

The web services paradigm of development is based on the Unix philosophy of “small is good”.  Web services should do one job, and do it well, allowing users to develop complex solutions by combining small, reliable and proven services.
Why not then, expose the power of familiar Unix commands like sort, grep, gzip… to the web?

Here is a proof of concept python script (Python 2.3 version) to demonstrate.

Start services:

$ ./to_web.py -p8008 sort &
Thu Mar 27 13:45:54 2008 sort server started - 8008
$ ./to_web.py -p8009 gzip &
Thu Mar 27 13:46:29 2008 gzip server started - 8009

Use the services:

$ for i in {1..10}; do echo ${RANDOM:0:2}; done | \
> curl –data-binary @- “http://swat:8008/sort+-nr” | \
> curl –data-binary @- “http://swat:8009/gzip” | \
> gunzip
97
37
23
23
21
18
11
11
10
10

In my position, we have a database with host information - which has a command line interface. This tool has dependencies which are a painful to resolve. With to_web.py, we can turn the command line tool into a web service and access the data without having to satisfy those additional dependencies.

This is guest post by my esteemed colleague Adam Fokken. He can be reached here: Sadly, he does not have a blog.

There are situations where, if you want a Python, PERL, PHP, etc script to be portable among a few different servers, it makes sense to wrap the script in shell. A few years ago I was trying to use the Python cx_Oracle module. This module is a wrapper for the native Oracle database driver. However, it requires the driver library directory be in the LD_LIBRARY_PATH environment variable.

No problem I thought. I’ll use the os.environ dict to set the variable.  Example script:

$ cat python-only.sh
#!/usr/bin/python
import sys, os
sys.path.append("/usr/local/lib/python2.4/site-packages/")
if not os.environ.has_key('LD_LIBRARY_PATH'):
        os.environ['LD_LIBRARY_PATH'] = "/home/noland/oracle-lib"
else:
        os.environ['LD_LIBRARY_PATH'] = "/home/noland/oracle-lib:" + os.environ['LD_LIBRARY_PATH']
print "LD_LIBRARY_PATH looks OK in Python: LD_LIBRARY_PATH = ", os.environ['LD_LIBRARY_PATH']
os.system('echo LD_LIBRARY_PATH looks OK via os.system: LD_LIBRARY_PATH = $LD_LIBRARY_PATH')
try:
        import cx_Oracle
        print "Imported cx_Oracle! LD_LIBRARY_PATH was set correctly."
except ImportError, e:
        print "Woops, LD_LIBRARY_PATH was not set correctly: ", e

This method does not work:

$ ./python-only.sh
LD_LIBRARY_PATH looks OK in Python: LD_LIBRARY_PATH =  /home/noland/oracle-lib
LD_LIBRARY_PATH looks OK via os.system: LD_LIBRARY_PATH = /home/noland/oracle-lib
Woops, LD_LIBRARY_PATH was not set correctly:  libclntsh.so.10.1: cannot open shared object file: No such file or directory

This seems to be a common problem. However, when I was dealing with this a few years ago, I could not find a good resource on Google. I bite the bullet and wrote a separate shell script wrapper - hating invocation of the shell script. However, there is absolutely no reason I needed a separate shell script. I could have embedded the Python within a shell script. Example:

$ cat python-and-bash.sh
#/bin/bash
export LD_LIBRARY_PATH=/home/noland/oracle-lib:$LD_LIBRARY_PATH
/usr/bin/python<<END_OF_PYTHON
import sys
sys.path.append("/usr/local/lib/python2.4/site-packages/")
try:
        import cx_Oracle
        print "Imported cx_Oracle! LD_LIBRARY_PATH was set correctly."
except ImportError, e:
        print "Woops, LD_LIBRARY_PATH was not set correctly: ", e
END_OF_PYTHON

Ahh, much better:

$ ./python-and-bash.sh
Imported cx_Oracle! LD_LIBRARY_PATH was set correctly.

Of course I could have just set this variable in my profile. However, this creates an additional external dependency - which is what I was trying to avoid.

Process Substitution

March 23rd, 2008

Quite some time ago, someone wrote me to ask about a possible article on process substitution. Sadly, I could not find the email so I cannot credit them. As you likely have guessed, I am finally writing a post on process substitution.

Many times I have used pipelines and temporary files when process substitution would be a much cleaner solution.

First, I am going to create two test files:

$ dd if=/dev/urandom of=file-small count=750001
$ dd if=/dev/urandom of=file-large count=1000000
$ ls -l file-*
-rw-r–r– 1 noland noland 512000000 Mar 23 08:53 file-large
-rw-r–r– 1 noland noland 384000512 Mar 23 08:49 file-small

I thought of writing this article while writing a script to test ftp servers and file locking. As such I will upload the small file to a file named append-example:

$ curl -T file-small --user noland ftp://localhost/append-example
Enter host password for user 'noland':
$ ls -l append-example
-rw-r--r-- 1 noland noland 384000512 Mar 23 11:52 append-example

Now I will append  the large file:

$ curl -s -a -T file-large --user noland ftp://localhost/append-example
Enter host password for user 'noland':
$ ls -l append-example
-rw-r--r-- 1 noland noland 896000512 Mar 23 11:54 append-example

I am going to use dd and process substituion to caculate the MD5 hash of the first upload:

$ md5sum file-small <(dd if=append-example count=750001 status=noxfer)
dfabff7441bd814145a804e03d333864  file-small
1000000+0 records in
1000000+0 records out
dfabff7441bd814145a804e03d333864  /dev/fd/63

Now the portion that was appended:

$ md5sum file-large <(dd if=append-example  skip=750001 status=noxfer)
1b8daed9e435fc90b4a49d74b55f96f4  file-large
1000000+0 records in
1000000+0 records out
1b8daed9e435fc90b4a49d74b55f96f4  /dev/fd/63

When you place a command inside <( ) the shell sets standard output of the command to pipe inside /dev/fd/ and replaces the command with that pipe. Here is the classic example:

$ echo <(echo) <(echo) <(echo) <(echo)
/dev/fd/63 /dev/fd/62 /dev/fd/61 /dev/fd/60

In my script I use process substitution as below (effectively) which feels exeedingly clean:

$ read hash name < <(md5sum <(dd if=append-example skip=750001 status=noxfer))
1000000+0 records in
1000000+0 records out
$ printf “hash=%s name=%s\n” $hash $name
hash=1b8daed9e435fc90b4a49d74b55f96f4 name=/dev/fd/63

In the past, my SSH sessions died due to inactivity. In order to solve this, I used to:

while true; do uptime; sleep 5;done

Obviously, this eventually clears your terminal history. BASH to rescue! My noop script solves this problem. (Please see comments, there maybe a better solution, thanks David!) noop, standing for no operation, is a processor instruction and is common in protocols. You may find it interesting, that exploit code is filled with NOP’s. The operation increases your chances of exploiting buffer overflows

The source:

$ cat /usr/bin/noop
#!/bin/bash
backspace() {
        echo -e "\b\c"
}
cleanup() {
        backspace
        exit
}
trap "cleanup" 2
while :
do
        num=${RANDOM:0:1}
        printf $num
        sleep ".$num"
        backspace
done

For the hell of it, I made a video of noop in action.

If your wondering how the script works, here is a quick explanation. The script defines two functions. backspace and cleanup. Backspace prints the special characters \b and \c.  Backslash b is a backspace, and backslash c, stops echo from printing a trailing newline:

backspace() {
        echo -e "\b\c"
}

The cleanup function prints a backspace and then exits.  The cleanup function is run by trap when it receives a SIGINT (2):

cleanup() {
        backspace
        exit
}
trap "cleanup" 2

The main body of the script, is an infinite loop which generates, a random number using the special variable $RANDOM. This random is assigned to the variable num, utilizing only the first digit. After printing that number, the script sleeps num tenths of seconds, and the backspace function is called:

while :
do
        num=${RANDOM:0:1}
        printf $num
        sleep ".$num"
        backspace
done

My favorite site to convert rpm’s to tar gzip files appears to have shut down. As such, I wrote my own tool. It has a web interface: Convert a RPM to a tgz and (keeping inline with my thoughts on software) can be used from the command line.

Five usage examples:

$ wget -q "http://bashcurescancer.com/rpm2tgz.ws?url=http://bashcurescancer.com/media/rpm2tgz/telnet-0.17-39.el5.i386.rpm"
$ ls -l telnet-0.17-39.el5.i386.tgz
-rw-r--r-- 1 noland noland 49804 Feb 23 17:09 telnet-0.17-39.el5.i386.tgz
$ curl -s -F "rpm=@telnet-0.17-39.el5.i386.rpm" \
"http://bashcurescancer.com/rpm2tgz.ws" >telnet-0.17-39.el5.i386.tgz.1
$ curl -s -F "url=http://bashcurescancer.com/media/rpm2tgz/telnet-0.17-39.el5.i386.rpm" \
 http://bashcurescancer.com/rpm2tgz.ws > telnet-0.17-39.el5.i386.tgz.2
$ curl -s "http://bashcurescancer.com/rpm2tgz.ws?url=ttp://bashcurescancer.com/media/rpm2tgz/telnet-0.17-39.el5.i386.rpm" \
> telnet-0.17-39.el5.i386.tgz.3
$ wget -q -O telnet-0.17-39.el5.i386.tgz.4 \
"http://bashcurescancer.com/rpm2tgz.ws?url=http://bashcurescancer.com/media/rpm2tgz/telnet-0.17-39.el5.i386.rpm"

Needless to say, if you abuse this, I will block your ip address from accessing the service. If there is an error the script will either return 404 File Not Found or 500 Internal Server Error and an empty body. As such, you should be able to the -s expression of test, [, and [[ to check the validity of the file.

The other day, I began wondering which comparator, test, [, or [[, was fastest? Here are the results:

$ time for i in {1..100000}; do [[ -d . ]];done
real    0m1.256s
user    0m1.018s
sys     0m0.238s
$ time for i in {1..100000}; do [ -d . ];done
real    0m3.407s
user    0m2.704s
sys     0m0.703s
$ time for i in {1..100000}; do test -d .;done

real    0m3.223s
user    0m2.607s
sys     0m0.616s

The double bracket is a “compound command” where as test and the single bracket are shell built-ins (and in actuality are the same command). Thus, the single bracket and double bracket execute different code.

The test and single bracket are the most portable as they exist as separate and external commands. However, if your using any remotely modern version of BASH, the double bracket is supported.

Here is the performance numbers on the external version of test and single bracket:

$ time for i in {1..100000}; do /usr/bin/test -d .;done

real    5m49.324s
user    0m51.771s
sys     4m48.013s
$ time for i in {1..100000}; do /usr/bin/[ -d . ];done

real    5m45.728s
user    0m52.536s
sys     4m46.259s

Wow! This shows the high cost of process creation!

I asked “What do you want” and you said scripting. Which is good, because I have felt like scripting lately!

I help a website hosting company, Idologic, on the weekends. (Side note: I highly recommend Idologic. I have worked with and been a customer of many other hosting companies. I really doubt you will find better customer service elsewhere.) Like many businesses these days, Idologic has quite a few Linux servers. When presented with many servers, I typically want to parallelize my work.

As such, I have written a script called dssh (previous version), which allows you to execute commands on n hosts, in parallel. This can be used to find information on the hosts, such as load average, number of processes by user, number of processes by process name, etc.

There are other options such as pssh and p-run, however I wanted to create a shell solution which could be easily and simply “installed”. Dssh reads standard input. It expects one host per line. Host specific ssh options are supported. Here is my sample hosts file:

$ cat hosts
mojito
-l noland kodiak
mojito
kodiak
-C mojito
-i /home/noland/.ssh/id_rsa kodiak

There is nothing restricting you from generating this output from some type of meta data (I.E. database). Here are some examples of output:

$ ./dssh.sh "uptime" < hosts
First time huh? Think your cmd over and then try again.
$ ./dssh.sh "uptime" < hosts
mojito:O:0:19:16:45 up 3 days, 14 min,  5 users,  load average: 0.22, 0.22, 0.20
kodiak:O:0:13:24:00 up 20:00,  1 user,  load average: 0.42, 0.16, 0.05
mojito:O:0:19:16:45 up 3 days, 14 min,  5 users,  load average: 0.22, 0.22, 0.20
kodiak:O:0:13:24:00 up 20:00,  1 user,  load average: 0.42, 0.16, 0.05
mojito:O:0:19:16:45 up 3 days, 14 min,  5 users,  load average: 0.22, 0.22, 0.20
kodiak:O:0:13:24:00 up 20:00,  1 user,  load average: 0.42, 0.16, 0.0
$ ./dssh.sh "pgrep -u noland | wc -w" < hosts
mojito:O:0:60
kodiak:O:0:5
mojito:O:0:60
kodiak:O:0:5
mojito:O:0:60
kodiak:O:0:5
$ ./dssh.sh "ls not_a_file" < hosts
mojito:E:2:ls: not_a_file: No such file or directory
kodiak:E:2:ls: not_a_file: No such file or directory
mojito:E:2:ls: not_a_file: No such file or directory
kodiak:E:2:ls: not_a_file: No such file or directory
mojito:E:2:ls: not_a_file: No such file or directory
kodiak:E:2:ls: not_a_file: No such file or directory

Notes:

  1. With great power comes even greater responsibility. Running rm -rf / as root with this script would do exactly that.
  2. I don’t reccomend doing anything with this script that “changes state”.
  3. I make no warranties or promises.
  4. You need ssh keys to use this. I recommend using ssh-agent.
  5. By default dssh will execute 10 children in parallel. If you have a large host, increase this.
  6. When looping through the hosts, if the maximum number of children are still processing, the script will sleep 500ms. If your version of sleep does not support fractional seconds, you will need to change this.

Here is an outline of the script:

  1. Read from standard input a list of hosts
  2. Configure trap to remove temporary files on exit
  3. For each host
    1. Sleep while we have more children than the maximum number of children
    2. Generate three temporary files, one for each of
      1. Standard Output
      2. Standard Error
      3. Exit value
    3. Create a child process saving stdin, stderr, and the exit value in their respective files.
  4. Wait for all children to exit
  5. For each host
    1. If the standard output or error files are of size greater than zero, print the content, prefacing each line with the hostname, standard error/output indicator, and exit status.
    2. Else print something to indicate we executed a process and have an exit value.

Once again, here is the script I am calling dssh.