Does anyone know if anyone is working on a "flexible" HTML/XHTML parser a la BeautifulSoup / Nokogiri / TagSoup, etc? Node could become very useful as a base for building scrapers if this existed.
I've been trying to model libxml.js after Nokogiri. I wanted to get something built and working first. The next step is to expose libxml2's html parser.
Someone else has started working on find-by-CSS a la Nokogiri. I'll merge that into libxml.js when it's ready.
BTW, I'm looking for more help on this project. A new job has diminished the amount of time I can spend on OSS projects.
Why use libxml when JavaScript already has a standard XML API, E4X (ECMAScript for XML), as specified by ECMA 357? At least libxml should use the faster native XML support behind the scenes if available.
Does anyone know if anyone is working on a "flexible" HTML/XHTML parser a la BeautifulSoup / Nokogiri / TagSoup, etc? Node could become very useful as a base for building scrapers if this existed.