Gumbo HTML parser: https://github.com/google/gumbo-parser It's one of the most c...

vaibhavmagarwal · on April 26, 2015

Looks interesting. Does it parse html in chunks? (I am actually looking for html5 parsing library that does it in chunks in C).

nostrademons · on April 29, 2015

No, it reads in a whole string at once and then parses it as a single document. It's actually pretty hard to parse HTML in chunks, because the spec allows for text that comes later in the document to alter the parse tree of nodes produced earlier (see, for example, foster-parenting or the adoption agency algorithm). You could take a look at Hubbub as a callback-based HTML5 parser, but the way that works is to take a callback interface where you need to implement 18 or so different functions.

http://www.netsurf-browser.org/projects/hubbub/