Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Should we be saving our favorite information locally?
41 points by labrador 7 months ago | hide | past | favorite | 23 comments
Here's a thought, and someone please tell me if I'm wrong, but if you have bookmarks to favorite articles, essays, poems, etc... I recommend you use the browser print function to print them to PDF format and save them locally. Reasons:

1) With the advent of ChatAI people will be googling much less if at all, which will reduce traffic to websites. It may not make economic sense for the websites to stay up, so your favorite essay will go away. It might be saved in the internet archive, or it might not

2) Hard drives are ridiculously cheap. A 10 terabyte hard drive (10,000 gigabytes) is less than $200

3) An AI like ChapGPT is not guaranteed to be trained on your favorite information, so it may be lost to you or hard to find again

4) Soon we will have AI assistants which can be trained on all the PDFs you have saved to produce a highly customized and personal AI tailored to what you like




I have had the habit of saving my favorite stuff locally for a very long time. If it's locally stored, it's always available. If it's online, there's a chance that you'll lose access either temporarily or permanently.


Yes and no.

Instead of PDF, use Markdownload (on iOS, use a Safari web content to markdown file extension):

https://github.com/deathau/markdownload

And save in a journaled folder like "YYYY-MM-DD - Page Title.md" with a YAML frontmatter of all available metadata.

Have this as a folder in your PKM of choice (Obsidian, Foam, whatever).

These days, point some text embedding at it, and let it generate your own LLM brain.

But you can also static-site-generate that back into your own web knowledge site or base.

If you don't need it locally, and depending on the capture you want, consider pinboard.in or historio.us:

https://pinboard.in/

https://historio.us/


I figured I'd have to convert PDF or HTML to some format like markdown later, but for some articles I could do as you say. For others with great images inlined I'll stick with PDF until I actually need to convert. I'll check out those sites, but I don't trust that they'll stick around.


Can Markdownload handle things like equations within text?


From the 3.1.0 notes: Added support for MathJax -> LaTeX (thanks @LeLocTai)

So if done with MathJax, yes.


I'm not certain what kind of impact AI will have on the marketplace, but it seems like a good idea to store what you value locally regardless of what happens with AI.

Sites die off. Online web archives aren't reliable or completely trustworthy. And censorship of many forms seems to be occurring more and more.

With storage so cheap, there's little downside to saving what you like.


It's a bit of work, more than just bookmarking it, but I've started doing it. My thinking is that we could see websites going away at a much higher rate now that we have LLMs.


Only if you're a hoarder, or it's career-related and you're meticulous. I find that if I start saving stuff, the drive to collect starts outweighing the value of the collection rapidly.

AI search would be the only reason to. If it saves everything automatically and can query references / make inferences seamlessly, then great. Anything less, and my life eats itself like a snake.


I'm not talking about hording, which implies collecting every thing, but just your favorite, quality info that you can train an AI on later to produce a highly customized AI


Yes. Even before the AI risks, I have moved everything locally – photos, files, nothing syncs to iCloud/Dropbox/etc. Every once in a while I prepare and order printed photo albums.

The only service I still did not localize is email, secifically Gmail, which I believe Google is imminent to monetize/AI-ize in the very near future.


For backups I do both local & cloud. I got a WD 4TB Gold to be my "D drive" as my "Library"

I then back the "my Docs", and "Portable Apps" folders on Carbonite (my keys, unlimited space). I also dump an Acronis TIB of my C drive to my D drive, so that it is also siphoned to the Carbonite.

Using my keys make it harder, in case of a crash I need to build a vanilla machine, download the whole blob, and go from there.

If I trusted Microsoft (which I don't) I would have a second machine with some 'handover' (which I haven't studied) so I would leave MS keep my machines synced. But I don't, and I don't :)


Yes, had the same realization some months ago. I started building a CLI based tool, smaller in scope, offline first and occasionally online.


I've used HTTrack before. It's handy for downloading entire websites. The resulting pile of html can be converted to text or PDF later for processing

https://www.httrack.com/


HTTrack - a name I hadn't heard in 15 years may be. I used it back then. However, I want to clarify that the thing I set out to build is not about entire site backup, but more of site URL and some metadata, but with local-first approach and self-hostable.


How do things like these cope with client-side render websites?

Man, do I miss back when you could “save as webpage, complete” and generally get a working copy. I saved webpages a lot because after all, you might not always be online in those days!


I have a slightly different take on this: I save the text that I care about, and have some automation set up to archive the source URL of the text to archive.org[1] (which works well enough for me, even if it's not 100% perfect, because I'm only archiving it for the greater context of the highlighted text, which I rarely go back to).

I just got myself an Nvidia 4090, and I'm looking into using local LLMs to feed my data into (I think this is called retrieval augmented generation?) for various assistant-type use cases.

I'm particularly excited to potentially be able to go through my saved Kindle highlights for multi-novel sci-fi and fantasy series in order to refresh my memory by clarifying key story beats before continuing with the next book.

[1]: https://lgug2z.com/articles/notado-07-2023-update/


This is an old idea... when it was first proposed by Bush[1], the media to record on was microfilm, not PDF files. It's never been implemented, and forces that believe in "intellectual property" are aligned against doing so. (One of the main features was the ability to dump a selected "trail" through documents to microfilm for others)

You're right, of course. I'd like to see a local proxy that caches everything for at least a month, then automatically keeps stuff referenced or revisited.

[1] https://www.theatlantic.com/magazine/archive/1945/07/as-we-m...


This is such a great idea! I've used Squid like 15 years ago when no suitable AI existed. Looks like it can finally be put to some use :-)


Absolutely. The internet has proven itself to be ephemeral. The only part of it that is guaranteed is now. Content can be silently changed. Posts get deleted. Links break and 404. Images get lost. Sites put up paywalls, or go down entirely if the owners go bankrupt.

If you find something worth saving, save it! And don't forget to back up your stuff!


> It might be saved in the internet archive, or it might not

Anyone can save the current content of any http:/https: URL in Wayback Machine, so the question is simply whether IA will be around for the time that you care about.

https://web.archive.org/save

> It may not make economic sense for the websites to stay up

So, no more WWW?


The answer is a resounding yes. The corporate cloud is fake. A mirage. A timebomb whose chance of going off nears 100% over time. When it goes off your digital belongings are gone.

The only way to own things is to have copies offline. In three or more geographically distant locations.


I agree. Instapaper (phone app) is a good tool for doing this. But pdfs are probably more “open” in that you know the format and can choose where to put the files. Internet archive sometimes saves dead links though.


I keep my stuff on Dropbox and Github and it has been working for a while.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: