I haven’t directly compared them, but I have also found mercury parser (https://github.com/postlight/mercury-parser) to be pretty reliable. Its advantage is that you can directly pass it a page instead of having to give it a DOM.
Since it turns a website into very plain (X)HTML it‘s fairly easy to use it to make a browsing proxy or automatically produce epub files for e-readers, which is what I do.
Since it turns a website into very plain (X)HTML it‘s fairly easy to use it to make a browsing proxy or automatically produce epub files for e-readers, which is what I do.
Edit: Here’s the proof of concept type code I use: https://gist.github.com/solarkraft/d6306f17a761fcb5ce47f2be7...
It’s a bit crappy, but it works for me :-)