The other day on the cURL email list, someone asked:

Could someone please tell me (preferably with an example) of how I could parse and xml like the following:

<?xml version=”1.0″ encoding=”ISO-8859-1″ ?>
<FileRetriever>
<FileList>
<File name=”AMERI08.D4860.ZIP” />
<File name=”DTCCRSF.D4861.ZIP” />
<File name=”DTGSS01.D4862.ZIP” />
<File name=”DTGSS02.D4863.ZIP” />
<File name=”DTGSS03.D4864.ZIP” /
</FileList>
</FileRetriever>

This is not appropriate for the cURL list, but I thought a fair question.  You could do this:

$ grep '<File ' config.xml  | awk -F'"' '{print $2}' | xargs -l -I {} echo curl -I "http://bashcurescancer.com/{}"
curl -I http://bashcurescancer.com/AMERI08.D4860.ZIP
curl -I http://bashcurescancer.com/DTCCRSF.D4861.ZIP
curl -I http://bashcurescancer.com/DTGSS01.D4862.ZIP
curl -I http://bashcurescancer.com/DTGSS02.D4863.ZIP
curl -I http://bashcurescancer.com/DTGSS03.D4864.ZIP

Or, you could use the xsltproc command with an associated style sheet. This is really the correct method and much more effective when your processing complex XML or XML that is not easily grep’able:

$ xsltproc --nonet config.xsl config.xml | xargs -l -I {} echo curl -I "http://bashcurescancer.com/{}"
curl -I http://bashcurescancer.com/AMERI08.D4860.ZIP
curl -I http://bashcurescancer.com/DTCCRSF.D4861.ZIP
curl -I http://bashcurescancer.com/DTGSS01.D4862.ZIP
curl -I http://bashcurescancer.com/DTGSS02.D4863.ZIP
curl -I http://bashcurescancer.com/DTGSS03.D4864.ZIP

Links to config.xml and config.xsl.

12 Responses to “Processing XML on the Command Line”

  1. Simon Scarfe Says:

    not a bash or a curl solution, but xmlstarlet is a lightweight package that’s extremely powerful for these sorts of things:

    xmlstarlet sel –net -t -v “//status[1]/text” http://twitter.com/statuses/user_timeline/1942321.xml

    This selects the latest status from my twitter timeline. I’ve hooked a couple of these into conky (last.fm as well) for some text-based desktop goodness.

    You could use this in a very similar way to your xsltproc solution (xpath might be //File/@name - I think), the difference being that you lose your xsl dependency.

  2. Brock Noland Says:

    Thanks for the comment…That is *awesome*!

  3. Brock Noland Says:

    xmlstarlet maybe the most awesome thing I’ve even seen.

  4. Noname Says:

    What about xmllint and xmlgrep?
    They are both UNIX standart.

  5. Brock Noland Says:

    Yes, I use xmllint and xmlwf, but I think you would be hard pressed to say xmlgrep is standard. Its not available as a package on my ubuntu host.

  6. Brock Noland Says:

    It looks like xmlgrep is in perl. I could have wrote this in python, but xmlstarlet is in C.

  7. Luke Plant Says:

    You can also use ‘xml2′ and ‘2xml’ (Debian/Ubuntu package xml2) — these convert xml to and from a line based format:

    $ echo ‘<?xml version=”1.0″ encoding=”UTF-8″?><foo bar=”1″><baz>Hi <frobnitz>there</frobnitz></baz></foo>’ | xml2
    /foo/@bar=1
    /foo/baz=Hi
    /foo/baz/frobnitz=there

    This is very easy to grep, and works nicely for many cases.

  8. Luke Plant Says:

    Ahh, your blog ate my XML instead of escaping it. See http://pastebin.com/m1b4de550

  9. Brock Noland Says:

    Thanks for the tip! I fixed the XML.

  10. James Fuller Says:

    dont forget about xsh

    http://xsh.sourceforge.net/

  11. Brock Noland Says:

    Thank for the link!

  12. Binny V A Says:

    @Simon Scarfe
    Thanks. I wrote a small script to post to twitter - now I got something to read the status.

Leave a Reply

If Wordpress eats your comment (shell output, loops, ex..) email the text to me.