Processing XML on the Command Line
April 24th, 2008
The other day on the cURL email list, someone asked:
Could someone please tell me (preferably with an example) of how I could parse and xml like the following:
<?xml version=”1.0″ encoding=”ISO-8859-1″ ?>
<FileRetriever>
<FileList>
<File name=”AMERI08.D4860.ZIP” />
<File name=”DTCCRSF.D4861.ZIP” />
<File name=”DTGSS01.D4862.ZIP” />
<File name=”DTGSS02.D4863.ZIP” />
<File name=”DTGSS03.D4864.ZIP” /
</FileList>
</FileRetriever>
This is not appropriate for the cURL list, but I thought a fair question. You could do this:
$ grep '<File ' config.xml | awk -F'"' '{print $2}' | xargs -l -I {} echo curl -I "http://bashcurescancer.com/{}"
curl -I http://bashcurescancer.com/AMERI08.D4860.ZIP
curl -I http://bashcurescancer.com/DTCCRSF.D4861.ZIP
curl -I http://bashcurescancer.com/DTGSS01.D4862.ZIP
curl -I http://bashcurescancer.com/DTGSS02.D4863.ZIP
curl -I http://bashcurescancer.com/DTGSS03.D4864.ZIP
Or, you could use the xsltproc command with an associated style sheet. This is really the correct method and much more effective when your processing complex XML or XML that is not easily grep’able:
$ xsltproc --nonet config.xsl config.xml | xargs -l -I {} echo curl -I "http://bashcurescancer.com/{}"
curl -I http://bashcurescancer.com/AMERI08.D4860.ZIP
curl -I http://bashcurescancer.com/DTCCRSF.D4861.ZIP
curl -I http://bashcurescancer.com/DTGSS01.D4862.ZIP
curl -I http://bashcurescancer.com/DTGSS02.D4863.ZIP
curl -I http://bashcurescancer.com/DTGSS03.D4864.ZIP
Links to config.xml and config.xsl.


April 25th, 2008 at 4:14 am
not a bash or a curl solution, but xmlstarlet is a lightweight package that’s extremely powerful for these sorts of things:
xmlstarlet sel –net -t -v “//status[1]/text” http://twitter.com/statuses/user_timeline/1942321.xml
This selects the latest status from my twitter timeline. I’ve hooked a couple of these into conky (last.fm as well) for some text-based desktop goodness.
You could use this in a very similar way to your xsltproc solution (xpath might be //File/@name – I think), the difference being that you lose your xsl dependency.
April 25th, 2008 at 9:29 am
Thanks for the comment…That is *awesome*!
April 25th, 2008 at 11:49 am
xmlstarlet maybe the most awesome thing I’ve even seen.
April 25th, 2008 at 3:23 pm
What about xmllint and xmlgrep?
They are both UNIX standart.
April 25th, 2008 at 3:31 pm
Yes, I use xmllint and xmlwf, but I think you would be hard pressed to say xmlgrep is standard. Its not available as a package on my ubuntu host.
April 25th, 2008 at 3:32 pm
It looks like xmlgrep is in perl. I could have wrote this in python, but xmlstarlet is in C.
April 25th, 2008 at 6:14 pm
You can also use ‘xml2′ and ‘2xml’ (Debian/Ubuntu package xml2) — these convert xml to and from a line based format:
$ echo ‘<?xml version=”1.0″ encoding=”UTF-8″?><foo bar=”1″><baz>Hi <frobnitz>there</frobnitz></baz></foo>’ | xml2
/foo/@bar=1
/foo/baz=Hi
/foo/baz/frobnitz=there
This is very easy to grep, and works nicely for many cases.
April 25th, 2008 at 6:16 pm
Ahh, your blog ate my XML instead of escaping it. See http://pastebin.com/m1b4de550
April 25th, 2008 at 6:22 pm
Thanks for the tip! I fixed the XML.
May 6th, 2008 at 12:55 am
dont forget about xsh
http://xsh.sourceforge.net/
May 6th, 2008 at 12:59 pm
Thank for the link!
May 11th, 2008 at 1:37 pm
@Simon Scarfe
Thanks. I wrote a small script to post to twitter – now I got something to read the status.
May 20th, 2008 at 6:35 am
There’s also XMLGAWK!
http://osx.hyperjeff.net/Apps/apps?f=xml%20awk
May 20th, 2008 at 11:21 pm
Thanks a lot for the tips. Good work here!
June 23rd, 2008 at 11:32 pm
[...] Processing XML on the Command Line [...]
January 23rd, 2010 at 3:23 am
…
Хм …
January 30th, 2010 at 8:43 pm
…
Хм …