XML::Simple for Non-Perlers
I've been working with Perl and XML recently, and--let's just say--there is a reason why Perl isn't listed on my resume. That said, after getting started with XML::Simple I've been really happy with the stuff I've been able to write in Perl and the general Perl workflow (have problem, download solution from CPAN, insert glue, repeat). So here is a tutorial for using XML::Simple for people who aren't ancient masters of Perl-fu.
The first step is to install XML::Simple
from CPAN.
perl -MCPAN -e shell
install XML::Simple
exit
Okay a brief musing: why is it that Ruby Gems and CPAN
both feel so much more comfortable to use than easy_install
(the Python solution)?
I feel like Perl and Ruby are the only communities to ever
like their module repository systems, and they are extremely
similar to each other, but I'm not quite sure how they're particularly
different from easy_install
.
Is it as simple as easy_install
coming later in the timeline of
the language? (I'm not even sure that's the case, but it certainly
feels like the community has never completely accepted it, or that the acceptance has been grudging.) I think even the
Common Lisp community has been more successful in unifying behind
ADSF:INSTALL than Python, and the Common Lisp struggles to
standardize almost anything (an unavoidable consequence of having
no standard implementation).
Is it just the ::
? Would everyone love easy_install
if we
wrote import datetime::datetime
? Maybe it's all just in my mind.
Basically what XML::Simple does is transform XML into Perl datastructures and Perl datastructures into XML with a very simple api. The hardest part of using XML::Simple is having a working understanding of dereferencing references for vector and hash datastructures.
Now let's load in an XML file from disk.
use XML::Simple;
use Data::Dumper;
my $xml = new XML::Simple;
my $in = $xml->XMLin('my_file.xml');
print Dumper($in);
Now let's create a datastructure and save it to disk.
use XML::Simple;
my $xml = new XML::Simple;
my $out = {
'an-attribute' => 'an attribute value',
'another-attribute' => 'yada yada',
'types' => ["a","b","c"],
'people' => [{'name'=>'will','content'=>'etc'},]
};
my $dec = '<?xml version="1.0" encoding="utf-8"?>';
print $xml->XMLout($out, XMLDecl => $dec, RootName => 'random');
There are basically three rules that you need to know to create the output you want:
- A scalar in a hash is an attribute.
- Unless it's key is
content
, in which case it is the element's primary content. - Scalars and hashes within a list are represented as elements.
Well, it's still kind of confusing despite that explanation. It's probably easier to look at the XML output by that script:
<?xml version="1.0" encoding="utf-8"?>
<random an-attribute="an attribute value" another-attribute="yada yada">
<people name="will">etc</people>
<types>a</types>
<types>b</types>
<types>c</types>
</random>
As a final example, lets grab an RSS feed, extract it's information and store it to disk in a different format. We're going to grab the main RSS feed for this blog, which looks like this after being converted to Perl (with some data stripped out for space constraints):
$VAR1 = {
'version' => '2.0',
'channel' => {
'link' => 'http://lethain.com/',
'lastBuildDate' => 'Thu, 06 Nov 2008 10:48:35 -0600',
'language' => 'en-us',
'item' => [
{
'link' => 'http://lethain.com/entry/2008/nov/06/you-only-learn-the-first-time/',
'guid' => 'http://lethain.com/entry/2008/nov/06/you-only-learn-the-first-time/',
'title' => 'You Only Learn the First Time',
'pubDate' => 'Thu, 06 Nov 2008 10:48:35 -0600',
'description' => 'full article'
}
]
}
}
Based on that we can write our script like this.
use LWP::Simple;
use XML::Simple;
use Data::Dumper;
my $xml = new XML::Simple;
my $uri = 'http://lethain.com/feeds/all/';
my $rss_xml = $xml->XMLin(get($uri));
my @articles = ();
foreach my $article (@{$rss_xml->{channel}->{item}}) {
my $a = {'title' => $article->{title},
'date' => $article->{pubDate},
'link' => $article->{link}};
push(@articles, $a);
}
my $out = $xml->XMLout({'article'=></span>@articles}, RootName => 'articles');
print $out;
The (truncated) output of running the script looks like this:
<articles>
<article date="Thu, 06 Nov 2008 10:48:35 -0600" link="http://lethain.com/entry/2008/nov/06/you-only-learn-the-first-time/" title="You Only Learn the First Time" />
<article date="Wed, 05 Nov 2008 08:00:00 -0600" link="http://lethain.com/entry/2008/nov/05/bad-ideas-and-regular-expressions-in-templates/" title="Bad Ideas and Regular Expressions in Templates" />
<article date="Tue, 04 Nov 2008 10:38:10 -0600" link="http://lethain.com/entry/2008/nov/04/deploying-django-with-fabric/" title="Deploying Django with Fabric" />
<article date="Mon, 03 Nov 2008 12:51:00 -0600" link="http://lethain.com/entry/2008/nov/03/development-to-deployment-in-django/" title="Development to Deployment in Django" />
<article date="Wed, 29 Oct 2008 10:11:54 -0600" link="http://lethain.com/entry/2008/oct/29/creating-slideshows-with-cocos2d-iphone/" title="Creating Slideshows with Cocos2d iPhone" />
<article date="Mon, 27 Oct 2008 10:45:00 -0600" link="http://lethain.com/entry/2008/oct/27/customize-site-style-by-user-with-django-userskins/" title="Customize site style by user with django-userskins" />
</articles>
I'm still very inexperienced and realize I am writing something like Python-in-Perl, but I think it's impressive how concise Perl makes accomplishing these routine tasks, and how quickly CPAN facilitated stringing together a couple of libraries to do a typical task pretty painlessly.
So far in my journey into Perl, my only complaint is that few of the tutorials do a good job of explaining the dereferencing syntax, perhaps because it is sufficiently complex that it merits its own tutorials. It's too bad that Perl doesn't have the sex appeal of Python and Ruby for young programmers, it's a fun tool. (Although I don't forsee using it for anything beyond fairly simple scripts and services.)