Hey HN!
I originally built “html-to-markdown” back in 2018 (while still in high school) to handle complex HTML conversions where other libraries struggled.
Now, I’ve released v2 — a complete rewrite designed to handle even more edge cases. It supports entire websites with a high accuracy.
Example use: I’ve used it in my RSS reader to strip HTML down to clean Markdown, similar to the "Reader Mode" in your Browser.
It can be used as a Golang package or as an CLI.
Give it a try & tell me what edge cases you encounter!
You just fetch a URL like `https://r.jina.ai/https://www.asimov.press/p/mitochondria`, and get a markdown document for the "inner" URL.
I've actually used this and it's not perfect, there are websites (mostly those behind Cloudflare and other such proxies) that it can't handle, but it does 90% of the job, and is an one-liner in most languages with a decent HTTP requests library.