Quite some time ago I wrote about using xsltproc to process xml on the command line. Thank fully someone pointed out XMLStarlet.  I now use XMLStarlet almost every day.  I work with a variety of REST based API’s gather information. XMLStartlet along with a simple for loop or xargs gives you an exceedingly powerful set of tools.

Here is a quick introduction into the power of XMLStarlet. This is just a teaser as I cannot share the data I work with. However, you should be able to see the power of this tool.

All the links from my RSS feed:

$ curl -s 'http://bashcurescancer.com/rss/' | xml sel -t -m '//link' -v '.' -n
http://bashcurescancer.com
http://bashcurescancer.com/processing-xml-on-the-command-line.html
http://bashcurescancer.com/do-not-close-stderr.html
http://bashcurescancer.com/prepend-to-a-file-with-sponge-from-moreutils.html
http://bashcurescancer.com/bug-in-curl-is-fixed.html
http://bashcurescancer.com/using-kill-to-see-if-a-process-is-alive.html
http://bashcurescancer.com/performance-testing-with-curl.html
http://bashcurescancer.com/new-command-prepend.html
http://bashcurescancer.com/shell-function-which-webserver-does-that-site-run.html
http://bashcurescancer.com/exposing-command-line-programs-as-web-services.html
http://bashcurescancer.com/wrapping-dynamic-languages-in-shell-without-an-extra-script.html

Or how about “Title: link”

$ curl -s 'http://bashcurescancer.com/rss/' | xml sel -t -m '//item' -v 'title' -o ': ' -v 'link' -n
Processing XML on the Command Line: http://bashcurescancer.com/processing-xml-on-the-command-line.html
Do not close stderr: http://bashcurescancer.com/do-not-close-stderr.html
prepend to a file with sponge from moreutils: http://bashcurescancer.com/prepend-to-a-file-with-sponge-from-moreutils.html
Bug in Curl is fixed: http://bashcurescancer.com/bug-in-curl-is-fixed.html
using kill to see if a process is alive: http://bashcurescancer.com/using-kill-to-see-if-a-process-is-alive.html
Performance testing - with curl: http://bashcurescancer.com/performance-testing-with-curl.html
New command: prepend: http://bashcurescancer.com/new-command-prepend.html
Shell Function - Which Webserver Does That Site Run?: http://bashcurescancer.com/shell-function-which-webserver-does-that-site-run.html
Exposing command line programs as web services: http://bashcurescancer.com/exposing-command-line-programs-as-web-services.html
Wrapping dynamic languages in shell without an extra script: http://bashcurescancer.com/wrapping-dynamic-languages-in-shell-without-an-extra-script.html

You may need to do some reading on xpaths and xsl stylesheets to use the full power of the tool.

A few weeks I wrote about a tool, which helps you easily prepend to a file. I submitted prepend to moreutils and Joey was kind enough to point out this could be done with `sponge’.  sponge reads standard input and when done, writes it to a file:

Probably the most general purpose tool in moreutils so far is sponge(1), which lets you do things like this:

% sed "s/root/toor/" /etc/passwd | grep -v joey | sponge /etc/passwd

Two days ago Joey released version 0.29 of moreutils including a patch by yours truly (with much help from Joey).

sponge: Handle large data sizes by using a temp file rather than by  consuming arbitrary amounts of memory. Patch by Brock Noland. version 0.29 changelog

Also, on a non-command line note, I found a video on Joey’s site which I thought was pretty cool, Joey Learns to Fly.

I am making some changes to the moreutils sponge command. Sponge provides a method of prepending which is less specialized than my prepend util. However, it has trouble with large amounts of input.

Regardless, while testing my changes, I want to watch it operate. Normally, you would just do so from a second terminal. That is a pain. kill -0 can be very useful for this. After backgrounding the command, I assign the pid (via the variable $!) to $pid using eval. eval is needed to stop BASH from expanding $! until after the background operation.

After that, I enter a while loop on kill -0 $pid, which will not kill $pid, but will return successfully until $pid has died:

# cat large-file-GB | ./sponge large-file-GB-copy & eval 'pid=$!'; while kill -0 $pid; do sleep 10; ls -lh large-file* /tmp/sponge.*; echo;done
[1] 7937
-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw------- 1 root root 128M 2008-04-09 17:23 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw------- 1 root root 384M 2008-04-09 17:23 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw------- 1 root root 877M 2008-04-09 17:24 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw-r--r-- 1 root root  20M 2008-04-09 17:24 large-file-GB-copy
-rw------- 1 root root 896M 2008-04-09 17:24 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw-r--r-- 1 root root 413M 2008-04-09 17:25 large-file-GB-copy
-rw------- 1 root root 896M 2008-04-09 17:24 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw-r--r-- 1 root root 836M 2008-04-09 17:25 large-file-GB-copy
-rw------- 1 root root 896M 2008-04-09 17:24 /tmp/sponge.JMsBWG

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw-r--r-- 1 root root 920M 2008-04-09 17:25 large-file-GB-copy
[1]+  Done                    cat large-file-GB | ./sponge large-file-GB-copy
ls: cannot access /tmp/sponge.*: No such file or directory

-rw-r--r-- 1 root root 977M 2008-04-09 16:18 large-file-GB
-rw-r--r-- 1 root root 977M 2008-04-09 17:25 large-file-GB-copy
-bash: kill: (7937) - No such process
# md5sum large-file-GB*
b5c667a723a10a3485a33263c4c2b978  large-file-GB
b5c667a723a10a3485a33263c4c2b978  large-file-GB-copy

The web services paradigm of development is based on the Unix philosophy of “small is good”.  Web services should do one job, and do it well, allowing users to develop complex solutions by combining small, reliable and proven services.
Why not then, expose the power of familiar Unix commands like sort, grep, gzip… to the web?

Here is a proof of concept python script (Python 2.3 version) to demonstrate.

Start services:

$ ./to_web.py -p8008 sort &
Thu Mar 27 13:45:54 2008 sort server started - 8008
$ ./to_web.py -p8009 gzip &
Thu Mar 27 13:46:29 2008 gzip server started - 8009

Use the services:

$ for i in {1..10}; do echo ${RANDOM:0:2}; done | \
> curl –data-binary @- “http://swat:8008/sort+-nr” | \
> curl –data-binary @- “http://swat:8009/gzip” | \
> gunzip
97
37
23
23
21
18
11
11
10
10

In my position, we have a database with host information - which has a command line interface. This tool has dependencies which are a painful to resolve. With to_web.py, we can turn the command line tool into a web service and access the data without having to satisfy those additional dependencies.

This is guest post by my esteemed colleague Adam Fokken. He can be reached here: Sadly, he does not have a blog.

When organizations need to create an application (most likely doing CRUD), they create both the application logic and user interface. Typically, this is done via a web application whose user interface is HTML. This essentially decides how the user can best utilize application logic.

CRUD applications can be used seamlessly in a GUI, via the command line, or inside other applications by following these three principles:

  1. Decouple the user interface from the application
  2. Use a standard and stateless authentication mechanism
  3. REST

Decouple the user interface from the application

Do not send HTML to the browser, send XML and an associated style sheet. The browser will then render the document. My sitemap is an example. This makes the page both readable in a browser and machine processable. (Note, this is very basic style sheet.)

This way, anyone can create a client side user interface to your “application.” Your user interface, simply becomes the default user interface. Anyone can create their own. Bonus points if you provide an easy method of sharing these alternative user interfaces.

Use a standard and stateless authentication mechanism

Use only HTTP Basic Authentication over SSL. Being stateless and standard, this protocol is simple and leverages a ton pre-built tools. While Apache/IIS implement Basic Authentication, it is important to understand that Basic Authentication is simply a protocol for communicating credentials. You can use any authentication store. PHP.net has a good overview of HTTP Authentication.

REST

I had never heard of REST until last year. While speaking with an exceedingly intelligent colleague of mine - I explained how if I had designed this particular GUI I would have let users query data by simply modifying the URL. Example:

http://gui/servers?platform=linux&active=true

He said, “REST!”

This is SO simple, just use GET, be stateless, use logical names, and allow selection via all characteristics. UPDATE: This is not REST, but will get the job done. I’d prefer if you implemented REST. (See comments.)

Final Thoughts

Not all data nicely fits on a single line or few lines. However, in the vast majority of cases, records can be displayed in a grep’able format. As such, its trivial to create a parameter, say f=pt, which will output the data in some line based format. At the very least, xml can be displayed in a format with is grep’able. Instead of:

<records>
<record id="1">
<key name="abc" val="123" />
</record>
<record id="2">
<key name="def" val="456" />
</record>
</records>

Do this:

<records>
<record id="1"><key name="abc" val="123" /></record>
<record id="2"><key name="def" val="456" /></record>
</records>

Many times, a separate “Web Services API” is created to allow people to extract data in a machine processable format. However, if you follow the these three principles, your GUI and API are one in the same. There is no need to create a separate non-human API. Furthermore, in my experience, there is rarely a need for reference documentation. The API is self explanatory.

My favorite site to convert rpm’s to tar gzip files appears to have shut down. As such, I wrote my own tool. It has a web interface: Convert a RPM to a tgz and (keeping inline with my thoughts on software) can be used from the command line.

Five usage examples:

$ wget -q "http://bashcurescancer.com/rpm2tgz.ws?url=http://bashcurescancer.com/media/rpm2tgz/telnet-0.17-39.el5.i386.rpm"
$ ls -l telnet-0.17-39.el5.i386.tgz
-rw-r--r-- 1 noland noland 49804 Feb 23 17:09 telnet-0.17-39.el5.i386.tgz
$ curl -s -F "rpm=@telnet-0.17-39.el5.i386.rpm" \
"http://bashcurescancer.com/rpm2tgz.ws" >telnet-0.17-39.el5.i386.tgz.1
$ curl -s -F "url=http://bashcurescancer.com/media/rpm2tgz/telnet-0.17-39.el5.i386.rpm" \
 http://bashcurescancer.com/rpm2tgz.ws > telnet-0.17-39.el5.i386.tgz.2
$ curl -s "http://bashcurescancer.com/rpm2tgz.ws?url=ttp://bashcurescancer.com/media/rpm2tgz/telnet-0.17-39.el5.i386.rpm" \
> telnet-0.17-39.el5.i386.tgz.3
$ wget -q -O telnet-0.17-39.el5.i386.tgz.4 \
"http://bashcurescancer.com/rpm2tgz.ws?url=http://bashcurescancer.com/media/rpm2tgz/telnet-0.17-39.el5.i386.rpm"

Needless to say, if you abuse this, I will block your ip address from accessing the service. If there is an error the script will either return 404 File Not Found or 500 Internal Server Error and an empty body. As such, you should be able to the -s expression of test, [, and [[ to check the validity of the file.