About Archive Tag Cloud Translations RSS

You are writing a comment about Huge CSV and XML Files in Python, here is a quick summary:

Quick walkthrough of my code for converting a very large CSV file into a very large XML file using the Python standard libraries. Despite a few issues along the way, was a very pleasant experience.


You are responding to this comment written by Will Larson on January 22nd 2009, 17:47.

So.. how are you handling the non-deterministic parsing? If you have a list of headers in the first row, then you can identify the number of expected elements per row. So you know any row with more than N unescaped commas is malformed. What then?

  1. Just say to hell with it, and after reaching N-1 columns, force everything else into the last column.
  2. Do the same, but starting at the end. (Why would this be better? I don't think it would be. But I want it to be.)
  3. Start from front and back. Take the first (N/2)-1 columns from the front, and the first N/2 columns from the rear, and then make everything inbetween into one column.
  4. ?????


Please be aware that comment forms go stale after one hour.





Comments may make use of LifeFlow MarkDown. Raw html will be escaped.


Quick Introduction to LifeFlow MarkDown Syntax

A highlighted code block:

@@ ruby
def a (b, c):
  b * c
end
@@

Other common languages work as well: scheme, python, java, html, etc.

Other markdown syntax:

 ### This is an h3 title
#### This is an h4 title
**this is bold**
*this is italics*

1. This is an
2. ordered list

* And an unordered
* list too

[this is a link](http://www.lethain.com/ "Lethain")