Ask HN: App for collecting a large, searchable database of website snapshots?

PaulHoule · on Sept 24, 2020

I have been starting to hack on this problem. It has been on my mind for years but I started coding on it for real this last week.

When I want to save something the system makes up a uuid (for the capture not the resource) and then it copies the web page and resources to a directory. I am using wget for now but I suspect I'll need something better.

Then the system runs "readability" and prints RDF metadata into a turtle file which could be imported into a triple store or document store.

Send a message to the email in my profile and we can talk about it.