
Show HN: Python bindings to the Servo HTML5 parser, html5ever - tbodt
https://github.com/tbodt/htmlpyever
======
pcwalton
The html5ever parser source [1] is remarkably easy to read, since it uses the
Rust macro system to represent the state transitions declaratively. It also
uses pattern matching to nice effect.

[1]:
[https://github.com/servo/html5ever/blob/master/html5ever/src...](https://github.com/servo/html5ever/blob/master/html5ever/src/tokenizer/mod.rs)

~~~
SwellJoe
I would hope it would be nice to read since one could argue Rust was
_designed_ for the purpose of building Servo. So, if you can't implement Servo
nicely in Rust, it'd be a pretty bad design.

~~~
kibwen
I think you've mistaken what pcwalton was trying to say. This library is
interesting because (via macros) it creates a DSL that attempts to emulate how
the HTML 5 spec is written, in order to more easily verify the correctness of
the implementation. Note that, by dint of much of HTML being an accident of
history, the HTML 5 spec is somewhat imposing; it's not going to be a cakewalk
in any language, and at the same time it's bespoke enough that it would be
silly to design your language to cater to the needs of the HTML 5 spec in
particular. The fact that Rust can create DSLs via macros does help here,
though I wouldn't recommend this approach for anything other than a similarly
extreme case. In fact I'd say this library has the most extensive macro use of
any production Rust code I've ever seen, it's quite atypical as far as Rust
code goes.

------
edoceo
This one and
[https://news.ycombinator.com/item?id=14588333](https://news.ycombinator.com/item?id=14588333)
should fight.

------
git-pull
This is exciting. It's using Cython [1]

To the author:

What do you feel about binding python to rust? Did you use any tutorials?

[1]
[https://github.com/tbodt/htmlpyever/blob/880da57/setup.py#L5](https://github.com/tbodt/htmlpyever/blob/880da57/setup.py#L5)

~~~
gsnedders
Seems to be using lxml's C API for treebuilding; I wonder how that compares
(perf wise, primarily) to using libxml2 directly and then calling
adoptExternalDocument?

~~~
tbodt
I did not know adoptExternalDocument existed...

~~~
gsnedders
Heh, okay, so it wasn't a deliberate decision!

------
Animats
This sounds useful. It should parse the same way Firefox does.

~~~
Ygg2
As a contributor to html5ever, it isn't made to parse the same way Firefox
does, just so that it parses HTML5 correctly. Hopefully, the two are the same,
but in practice some Firefox parser errors/behaviors won't be reproduced.

