Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I haven’t directly compared them, but I have also found mercury parser (https://github.com/postlight/mercury-parser) to be pretty reliable. Its advantage is that you can directly pass it a page instead of having to give it a DOM.

Since it turns a website into very plain (X)HTML it‘s fairly easy to use it to make a browsing proxy or automatically produce epub files for e-readers, which is what I do.

Edit: Here’s the proof of concept type code I use: https://gist.github.com/solarkraft/d6306f17a761fcb5ce47f2be7...

It’s a bit crappy, but it works for me :-)



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: