I actually really like my current customized KDE desktop. I have it all setup with transparency everywhere and a fully animated shader desktop wallpaper. Basically the opposite of everything Gnome stands for. :D
Hi snats, great article. You mention the accuracy of the various techniques you used, could you explain more about how you calculated the accuracy? Were the pdfs already categorized?
I wrote something similar so I could save recipes and web pages for reading offline. And if you save in html, it will inline images, so you can have a single file. In markdown, it just creates a link.
It also uses turndown and readability.
It's pretty finicky (readability doesn't always identify the correct content or misses pieces of the content). If you want to charge for it, you'd have to fix some of those edge cases.
Also, I don't think the value is this product is turning web pages into markdown, there are many free web clippers and archive sites that do this already. I see this as more of an "extra" in a product, like how Evernote has a web clipper built in to their note taking product.
Also, it's cool to see other people care about a stripped down web reading experience too!
Thanks for the post Sagar, I'm the maintainer of reckon [1], a tool to help categorize transactions, which I used in ledger. There are a lot of interesting tools for doing plain text accounting [2] and I'm always interested in learning about new ones.
Reckon uses TF/IDF with cosine similarity, but I would be interested to see how you use Random Forest. Please post your code somewhere, I'd love to see it and learn something new!
You can play a minimal web version at: - https://throwingbones.com/ben/s30/
Source: - https://github.com/benprew/s30 (patches welcome!) Written in Go using ebitengine