
HTML parsing in Elixir with leex and yecc - eellson
https://notes.eellson.com/2017/01/22/html-parsing-in-elixir-with-leex-and-yecc/
======
rdtsc
Another great use of leex and yecc -- a SQL parser from folks at Basho:

[https://github.com/basho/riak_ql](https://github.com/basho/riak_ql)

Specifically:

[https://github.com/basho/riak_ql/blob/develop/src/riak_ql_le...](https://github.com/basho/riak_ql/blob/develop/src/riak_ql_lexer.xrl)

[https://github.com/basho/riak_ql/blob/develop/src/riak_ql_pa...](https://github.com/basho/riak_ql/blob/develop/src/riak_ql_parser.yrl)

It is a very concise and well written piece of software.

------
daurnimator
The best library I've found for this sort of thing is gumbo.
[https://github.com/google/gumbo-parser](https://github.com/google/gumbo-
parser)

With its help I've created scrapers and crawlers that digest even the most
disgusting HTML.

------
EE84M3i
Hmm... This seems more like XML parsing to me than HTML parsing - in
particular, there's no handling of (completely valid) omitted end tags.

Definitely interesting though.

------
andy_ppp
The article mentions Floki which incidentally just added support for the
servo/html5ever parser written in rust.

[https://github.com/hansihe/ex_html5ever](https://github.com/hansihe/ex_html5ever)

Excellent article about creating parsers though even if html parsing is a
particularly difficult problem.

~~~
vikeri
Floki is a great lib, used it to write a very basic URL polling CLI tool in
just 72 lines of code:
[https://github.com/vikeri/proba/blob/master/lib/proba.ex](https://github.com/vikeri/proba/blob/master/lib/proba.ex)

------
tannhaeuser
As the author says, this is a toy project to learn Elixir; don't use in
production, especially not on dynamic/user content.

