

Let's Scrape the Web with Python 3 - quakkels
http://codecr.am/blog/post/7/

======
ianb
This is a very poor approach to scraping. SAX parsers aren't good for much of
anything, and they are especially bad for scraping HTML. You'll get lots of
errors while parsing relatively normal pages, and your logic will become very
challenging to follow. There's no good reason not to use a proper parser that
parses into a document, like lxml or BeautifulSoup. lxml is also very fast, so
there's not even a performance argument for SAX.

------
jbackus
I used to love scraping with the Python, BeautifulSoup, Mechanize family and
wrote a lot of scripts like the one in this post. I've been using CasperJS[1]
though and I don't think I'll go back.

[1] - <http://casperjs.org/>

