About Archive Tag Cloud Translations RSS

You are writing a comment about Huge CSV and XML Files in Python, here is a quick summary:

Quick walkthrough of my code for converting a very large CSV file into a very large XML file using the Python standard libraries. Despite a few issues along the way, was a very pleasant experience.


You are responding to this comment written by Peter Burns on January 22nd 2009, 17:21.

I ended up using the SAX generator code in Python myself when I was doing large file transformations in code_swarm.
yield makes stream-based code pleasant and easy in Python. Plus, as you said, you're much more likely to end up with correct xml with an xml generation library than doing it by hand.

Likewise, I'd suggest strongly against the "\n".join(map(",".join,table)) method of CSV generation. It works fine until there's a delimiter in your data. Please, think of the person that has to parse your data and use a library for generating data files.

Why am I bitching about this? As luck, or perhaps some darker power, would have it, I've been writing a TSV parser*, and it appears as though AppleWorks 6 takes the lazy method and doesn't escape its delimiters, making it actually impossible to deterministically parse its output.

*Why am I not following my own advice and using a library? Apparently noone has ever written a TSV/CSV parser in javascript, so I've hacked together some awful code I should have known to do as a state machine from the beginning.


Please be aware that comment forms go stale after one hour.





Comments may make use of LifeFlow MarkDown. Raw html will be escaped.


Quick Introduction to LifeFlow MarkDown Syntax

A highlighted code block:

@@ ruby
def a (b, c):
  b * c
end
@@

Other common languages work as well: scheme, python, java, html, etc.

Other markdown syntax:

 ### This is an h3 title
#### This is an h4 title
**this is bold**
*this is italics*

1. This is an
2. ordered list

* And an unordered
* list too

[this is a link](http://www.lethain.com/ "Lethain")