

Lxml: an underappreciated web scraping library - astrec
http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/

======
lunchbox
_When people think about web scraping in Python, they usually think
BeautifulSoup._

When I think about web scraping in Python, I think Mechanize. BeautifulSoup is
great for parsing already downloaded HTML files, but it doesn't have the same
web-navigating features Mechanize does such as stateful web browsing and easy
form filling (unless I'm missing something).

~~~
utnick
+1 for mechanize, been doin alot of scraping in Ruby lately and the ruby
mechanize ( i'm assuming its a port ? ) is quite nice

~~~
ianb
Perl's Mechanize was probably the basis for both.

------
EastSmith
Love HN! I've been trying to accomplish "Cleaning up HTML" for some time now
and Lxml seems to have the exact functionality I was looking for :)

------
inovica
We use it extensively in what we do and its also incredibly fast (which is why
we started using it)

