
Python: Parsing XML with lxml « The Mouse Vs. The Python - driscollis
http://www.blog.pythonlibrary.org/2010/11/20/python-parsing-xml-with-lxml/
======
sqrt17
lxml is great if you need any XPath, XSLT, or BeautifulSoup HTML parsing,
which libxml does for you. If you're just interested in fast (really fast) XML
parsing, cElementTree is just great. It's even faster than lxml and consumes
considerably less memory; and, like lxml, it supports the .iterparse(...)
interface.

Python 2.5 and later even ship with cElementTree, so it's always preferable to
minidom, and (depending on taste) most of the time, preferable to lxml.

~~~
ianb
In my experiments ([http://blog.ianbicking.org/2008/03/30/python-html-parser-
per...](http://blog.ianbicking.org/2008/03/30/python-html-parser-
performance/)) lxml performed better than cElementTree for memory -- notably
cElementTree creates Python objects for all nodes, while lxml does so only on
demand (though I suppose if you ultimately touch every node in Python then
lxml will use more memory).

Also lxml preserves the namespace prefixes, which while formally not necessary
to understand a document are often in practice both aesthetic and sometimes
necessary (e.g., RDF).

Also algorithmically, I find lxml's parent pointer on nodes to be very handy.
I find some things really really hard in ElementTree because you can't
directly get the parent of a node.

