Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How do you store large amounts of structured data?
3 points by bobblywobbles on Jan 11, 2022 | hide | past | favorite | 4 comments
I'm looking to save large amounts of structured data - think coupons, recipes, ads, prices, etc. I'd like to be able to not have to worry about hard drives, so one thought was to save this in Github. However, the value proposition behind putting this in a SQL server is higher.

Have you stored large amounts of data before, how did you end up saving it?




I have a mess of folders with random stuff in it. Don't do that.

However, if you tend towards curation more than me, you have categories, it might work for you.

I have a WikidPad personal wiki, in there are various things I've written over the past few decades, including the always elusive key to my WiFi.

My photos (600 GB) sit in folders D:\masterarchive\yyyy\yyyymmdd\ I never edit the originals, always save to a new file name when opening them in GIMP, etc. I have two short python scripts today.py and yesterday.py that make and open a new folder for me, depending on when I transfer the files from the SD card from my DSLR. You could make a script do it for you by reading the metadata from your smartphone photos.

SQL is a pain in the ass if you don't match the structures you set up initially. It's going to always be running, and will slow down a laptop or notebook boot time considerably. However, they do work well for tightly structured data.


Github uses harddrives, and I believe SQL servers do as well, and you will (should) worry about them by proxy instead of directly. If you believe someone else (github, or whatevercloud) is going to care more about your data on their drives, than you about your data on your drives, well, best of luck :)

Still, it might make sense to use a database to store your documents, depending on your situations, like, is a receipt a text document or an image?

What are the access patterns you expect? Is it write and forget, or will you access it once a year, or once each hour? how searchable do you need it? What is large amounts? Terabytes? Petabytes?

A nice directory structure could still get you very far, depending on your needs, maybe even one with accompanying database for use as a searchable index to find the document (file) you need.


Define "large". Also define your access pattern.


Arangodb




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: