About Archive Tag Cloud Translations RSS

You are writing a comment about XML::Twig for Large XML Files in Perl, here is a quick summary:

This week I needed to rewrite a Perl script that used XML::Simple for some XML handling. The cause? The script it needed to parse grew from 13k to 80 megs. All of the sudden the in memory approach wasn't looking so hot.


You are responding to this comment written by mirod on November 26th 2008, 02:08.

Nice explanation, and I am happy that you found the module useful.

I spotted a couple of places where you could simplify the code (maybe you didn't wander around the docs for long enough ;--), there is also more information on the module page

You can set the handlers directly in the twig_roots option:

  my $roots = { server => 1, proxy => 1 };
  my $handlers = { 'machines/server' => \&print_ip,
                   'machines/proxy'  => \&print_ip };
  my $twig = new XML::Twig(TwigRoots => $roots,
                           TwigHandlers => $handlers);

becomes

  my $handlers = { 'machines/server' => \&print_ip,
                   'machines/proxy'  => \&print_ip };
  my $twig= new XML::Twig(twig_roots => $handlers);

You can use the field (or first_child_text, but that's a lot of characters) method:

  my $ip = $ele->first_child('ip')->text;

becomes

  my $ip= $ele->field( 'ip');

I was also thinking that it might be useful to have an autoflush (and an autopurge) option, that would flush the twig after every root so you don't have to do it. It doesn't necessarily save a lot of typing, but it would be a nice, declarative way to express what you want to do with the XML. Would it make sense?


Please be aware that comment forms go stale after one hour.





Comments may make use of LifeFlow MarkDown. Raw html will be escaped.


Quick Introduction to LifeFlow MarkDown Syntax

A highlighted code block:

@@ ruby
def a (b, c):
  b * c
end
@@

Other common languages work as well: scheme, python, java, html, etc.

Other markdown syntax:

 ### This is an h3 title
#### This is an h4 title
**this is bold**
*this is italics*

1. This is an
2. ordered list

* And an unordered
* list too

[this is a link](http://www.lethain.com/ "Lethain")