You are writing a comment about Deployment Scripts With BeautifulSoup, here is a quick summary:

Recently I have been doing a lot of website deployment and various repetitive but slightly complex html hackery in order to flesh out simple templates with content stolen from other pages of html. Although it could have been a bit frustrating, with the help of BeautifulSoup it has been a fun ride.


You are responding to this comment written by Will Larson on June 11th 2008, 05:04.

(Reposting it because I, amazingly, screwed up the tags without thinking about it. It turns out my sin was trying to escape the tags, blah.)

Yeah, it was a pretty confusing behavior to be sure. BeautifulSoup has a policy of dealing with the "most common case", even if it leads to parsing errors. The example in their source code is that they'll treat <strong> this is <strong> silly </strong> yeah?</strong> incorrectly, and turn it into <strong> this is </strong> silly </strong> yeah? </strong>. The first one is valid html, and the second one is horrifically broken, but most of the time people imbedding meaningless tags are doing it by accident, so BS assumes it is an accident.

They have a few different parsers you can chose from, one of which is the "ICantBelieveItsBeautifulSoup", which handles weird html better.

I guess I'm defending it because I feel it does a pretty impressive job handling the thousands of lines of code I am pumping through it with only that one fairly easy to fix (albeit annoying) error. Admittedly, what I really want is some kind of pyQuery...


Please be aware that comment forms go stale after one hour.





Comments may make use of LifeFlow MarkDown. Raw html will be escaped.


Quick Introduction to LifeFlow MarkDown Syntax

A highlighted code block:

@@ ruby
def a (b, c):
  b * c
end
@@

Other common languages work as well: scheme, python, java, html, etc.

Other markdown syntax:

 ### This is an h3 title
#### This is an h4 title
**this is bold**
*this is italics*

1. This is an
2. ordered list

* And an unordered
* list too

[this is a link](http://www.lethain.com/ "Lethain")