I will keep adding an /archive folder to every PC I own and copy the complete contents of my previous /home/ folder into it, including an endless amount of recursive /archive folders.
I will never look at any of those again. But Archeologists in the far future will find my data and it will revolutionize their understanding of our time.
I do look at mine. Rarely for something special / useful. Sure, i won't ever again need the drivers for that dell c600 that died 15 years ago. But just seeing that installer brought back fond memories! Randomly looking in there is fun every once in a while.
And with storage still growing decently, why should i delete anything? It's not like physical boxes of photos/etc that would take up more and more space every year.
Probably not in the case of a popular brand like Dell, but there are retro computing enthusiasts scouring the internet for any trace of a driver for some piece of hardware they have, and it might be in someone's backup files right now. Wouldn't hurt to add an iso to the internet archive if one isn't there.
Last year I was digging through my archive and found my old Gom player installation folder from ~2011 or so and just got a blashback of when I modded the interphase of it to be Protoss themed when I would watch GSL streams on it!
Indeed the emotional value of said sometimes silly things can't be understated
When I look back through mine, it tends to be for projects I worked on ages ago or to find some obscure app or file I know I had a copy of at one point and has good chance of sitting somewhere in my labyrinth of old system backups.
Not strictly useful I guess, I think there's value in being able to engage in one's old projects more deeply than screenshots can allow. It can be a bit of a trip to look at code I wrote 5-10+ years ago.
I find it very pleasant to look through these folders and hope I’ve got everything this time. Cleaning it all up has been on my to-do list for the past 5 years or so :/
I enjoy it, too. But I've got an active plan for cleaning up and managing my old folders- I'm going to wait until a computer can do it for me.
The same plan with regards to tagging my thousands of photos worked out great! Without me needing to anything else, auto-tagging has gotten good enough that I can find images that had been lost under some IMG_xxxx.JPG filename for years. I'm looking forward to being able to explore my history onion of folders at my leisure soon enough. :)
Getting vicuna or alpaca for this could be the best decision for those that want to keep their data.
Could you imagine the space saving you can achieve by a system that constructs a real normalized duckdb database with zstd compression and join tables and all from your big dump of tar.xml.gz files? Automagically converting all of your media to AV1 and Opus to save space and remove any private codec reqs?
Clear collation with directory choices similar to Linux style?
https://github.com/jjuliano/aifiles seems like one of the best ideas for data organization - just needs some polishing and local-only models
My Downloads folder is one of the few that actually is clean. In fact, it's usually empty. Everything downloaded is downloaded for a reason, and gets moved to the right spot as soon as the download is complete.
Anyone figured out how to do that most efficiently? You can't have two "old" co-existign at the same time? I'm using the rename method but struggle each time Which one is better to rename, the old-old or the new-old?
This should work in PowerShell on Windows or on other systems with a normal shell as long as your downloads folder is called Downloads and you don’t have a folder called old next to Downloads:
mv Downloads old && mkdir Downloads && mv old Downloads
This is assuming there aren’t any special permissions/attributes on Downloads
Some of them seem to have weird things where renaming a directory keeps it still the "downloads" directory - I prefer moving the contents of downloads instead.
Stuff I download for a reason gets “put away” which leaves random things to go look at when I have nothing better to do. I was poking around in there earlier today after I moved some stuff and found all sorts of interesting things to waste a couple hours.
Probably be a different story if I tried to be organized though. It tends to collect “things I need right now and will probably never use again” and pdfs that don’t get saved to the Documents folder.
Mine was worth like $9000 when I discovered an old Stellar Lumen crypto wallet with some coins on it I got during an airdrop back in the day. So hoarding paid off.
But then again - I have the same problem as OP. I have terabytes of photos which I should really cull but I can't get myself to do it because I'm a lazy fuck.
I recently went through 35 years worth of cassette tapes to pull off old songs I had written as a teenager and songs I played with bands I was in over the years. I’m so glad I saved it all to digitize. (And glad I finally got around to it.) I found one song of a band practice from before our band had a drummer or bass player. It was just me on keys, the guitarist and the singer. The guitarist is now dead. We never performed that song live anywhere as we ended up cutting it from our set because it was just a little outside of our style. I was able to bring it into Logic, clean up the tape noise and add a drum track, just to see what it would have sounded like. It let’s me relive a moment of my life in a way that just a simple memory couldn’t. I also found recordings of shows we played. So glad I didn’t throw any of it out. (And yes, some of it was terrible. I didn’t digitize those parts.)
I think this would make for a great blog post. I'm sure many would be interested in more details around this, not just the technical details of how you go about digitizing 35 years of cassette tapes! but any other details you found significant around this project.
If you feel like: stick it on youtube or some other public host so that it is accessible to the rest of the world, no matter how rough, someone will love it. You already did all the hard work anyway!
> I ended up deciding to get a small, simple external Thunderbolt SSD enclosure.
See this is the problem with the whole minimalism/decluttering/etc. movements: half the time they are so entrenched in consumerism that they authentically believe that buying this new thing will solve their clutter problem. The author goes on to buy a new cloud storage service that is "a backup power tool for advanced users". And spends all this time doing all this migrating and reorganizing activity that seems an awful lot like work.
People in this trap will talk about how minimalism and decluttering are about making their lives easier and simpler, but that's not what I'm seeing happening. This seems harder and more complicated. I don't want to have to be an "advanced user".
I fell for this for a while myself, but no longer. I want to be a dumb user. There are other things I want, like security and privacy, and control of my own data, and to his credit the author does mention that, but if that's your goal be focused on that, not this not-quite-sensible form of decluttering I keep seeing popping up.
From the outside it looks like shaving the yak if you ask me. It's easy to create the false sense that you are being productive / clean stuff up but ... are you really?
I get you point, but I'd still argue that SSD+Arq is a lot simpler than any NAS setup. Wether the migration is worth it is up to personal preference I guess, but there's certainly a risk of spending more time and money in the end.
This was my first thought when I finished reading the article. So he talks about terabytes on terabytes, and how much work all that was... then goes to buy terabytes of ssd enclosure?!
I work with photos and videos and cannot comprehend downsizing to 2TB. Or even finding time to go through and selectively cull old footage. I shot a simple one-day drone-only vineyard job last week that generated 140GB, not including graded renders and photo edits afterwards.
On any given day, I will have 5-7 external drives connected, plus a NAS box. There are another 15+ drives (2-5TB) in the drawer beside me. SSDs as drives I actively work from and platter drives for backups or rendered footage.
I feel like the secret is to nailing the workflow at ingest/render because it’s painful trying to going through en masse and a year later.
Edit: I’ll add that I think one problem is that in shooting with a drone, there’s less to cull. Everything is in focus. A high percentage of the photos are usable in media libraries for the client and about 95% of all video shot is. My wife is a photographer and far fewer shots make the cut because of focus, or a facial expression, etc.
If you are doing it for a client (?) then you just need to back it up and charge them an appropriate fee. There is not much cognitive load as to whether it is worth keeping or not as the client decides.
With hobby stuff and especially family photos it becomes hard to decide whether to keep it all, spend time curating it, maybe down sizing it. If you can get all your memories in 2Tb or less it makes backup management way more easier in terms of disks and time taken to back it up etc.
I should probably just add a 10% line item for drives and archiving.
But also a good amount is speculative - shoot a location and then try to sell the content to various parties. Sometimes sell stuff a year or more after shooting it.
Keeping one copy isn’t onerous or expensive, but the mental baggage of shuffling around multiple copies gets a bit much.
I don't think this "I [...] cannot comprehend downsizing to 2TB" in the top-level comment is really a useful comparison to the article (or rest of the thread) if you're talking about data that is from clients; not your own. Of course you can't "comprehend" that if it's not-very-compressible sensor data that you need to keep on someone else's behalf or for your business (the speculative part, that's a business investment).
Fair point, but my main initial comparison was of photographer versus photographer/videographer in the clutter/hoarding sense. I would have a lot more than 2TB that is just personal holiday content over the years. The video content from one camera on one holiday last year is 290GB and I wouldn't want to cull much of it because unlike a traditional camera on the ground, there's nothing out of focus, not 10+ shots trying to get everyone unblinking, etc.
Here is an idea: have "forever" directory and "for 3 years" directory. Stuff lands in "for 3 years" directory by default and gets removed automatically.
"Forever" is only for stuff you personally think it's exceptionally well made.
Why do you keep 5-7 external drives connected at all times instead of a NAS with all that storage, RAID5/6 or equivalent, and 10 Gbps network interface? Also the 2-5TB drivers in the drawer look like a pretty big risk of failure, the size suggests they are all old drives. I would rather build a NAS with 12TB drives (cost effective, larges is quite expensive) and plenty of redundancy, maybe a 0.5TB nVME or SATA SSD as cache for better speed. The only problem with such a setup is speed, unless you go for multiple RAID5/6 arrays the performance will be quite limited.
Because I work with those drives from multiple locations and don't want to cart around the NAS box (which I use as an extra backup instead). Home, office, location, while away, etc. Current method is survivable, it's just a lot to keep track of. Maybe there's an alternative I'm not considering though.
The 2-4 connected SSDs are usually projects I'm actively working intensely on that day or week, or they're annual Lightroom drives where I still need to access 2021-23 pretty regularly.
The 2-3 platter drives have slightly older projects, or edited/rendered files that I need to regularly send/shuffle/folio/etc but that don't need to be quite as fast. Or they're drives I'm assembling to post out.
I know this is talking about large files like photos and videos.
And I get the whole storage is cheap line of thinking.
But there’s actually a different problem outside of photos, etc. Simply, index pollution.
I have lots of little projects on my machine. I have lots of open source code on my machine. And, while I can’t think of any specific examples, there are areas of search on that, dare I accidentally trip into that hole, are just filled with detritus and garbage.
My Spotlight is gorged with false positives for some terms. And I know that I have searched for things that I know I have, but unable to locate, or at least certainly not easily, because I was searching “wrong”.
Indexing provides lightning fast access to everything I don’t want to see.
Mind not much I plan to do about it. I guess I can take some “never again“ stuff, put them on an external drive, and tell Spotlight to ignore it.
That said also, my phone is full. About 2g free. Mostly photos and videos. Mostly my cats. Solution is simple. Need a bigger phone.
You could swap to Windows. The search there never finds anything, not even files that you know exist in the root of the directory you are searching inside.
On my previous laptop, Windows 10 search was totally broken. For several years and despite me occasionally spending an hour trawling for fixes.
On my new laptop, which I bought a year ago, still running Windows 10, it has always worked perfectly. Not sure if MS fixed something, or some kind of indexing error happened on my old laptop.
It would nice if you could somehow deprioritize folders so that the files within them don't show up in the top N results in Spotlight, because yeah it's probably not useful for it to present source files from that one sprawling project I forked and tinkered with a couple years ago.
When I’m done with a project, I’ve taken to packing up these kinds of files into a disk image. That way, I can access them if I need to, but they don’t show up in search. There’s a risk that the image gets corrupted or isn’t readable on some future OS, but since these aren’t mission critical files, I’m not too worried about that.
Not sure how are these nvme enclosures supposed to work. The problem is right in the picture. Samsung SSD bla bla 3.3V/3.3A = 11W and that's just for the SSD itself, the controller chip in the enclosure will also consume a lot of power at the advertised speeds. So you're looking at say 13W or so peak power consumption of the whole deal.
Normal USB 3.0 port can do 4.5W. So sticking that contraption into such port will not work reliably at all. Specially marked USB 3.0 port can deliver maybe 7.5W. Type-C ports, who knows. Depends on what the port advertises via pullup-resistors. It may be 4.5W, 7.5W or 15W. You never know.
Nice in theory, but none of the usb nvme adapters that I bought work reliably in any port for longterm use.
They certainly don't work in any low power devices, like various ARM SBC USB ports because most of those use current limiting power switches for USB ports.
And they don't work in my workstation reliably longterm either, for whatever reason.
I guess you really have to be lucky to have a 15W Type-C data port. Funny how often these sell with Type-C <-> USB-A cables, though.
The thing in the picture won't work at all if you plug it into a USB 3.0 port so that problem solves itself. I believe Thunderbolt ports are required to provide a minimum of 15W. But overall you might be right, there is not an end-to-end negotiation that would avoid a worst-case brownout of you had a very power-hungry SSD. The SSD in the article has a high-power active mode of 7.5W max (I am not sure why it says 3.3v*3.3A on the sticker[1]) and the TB controller in the enclosure claims 1.2W, so in this case it seems fine.
It's possible to configure NVMe devices with power caps, but the 970 Evo in the article loses 95% of its performance if you cap it to 3.6W, which is its lowest operational power state.
1: Edited to say this is because mine is a 970 Evo Plus apparently the original really did draw that much current.
> It's possible to configure NVMe devices with power caps, but the 970 Evo in the article loses 95% of its performance if you cap it to 3.6W, which is its lowest operational power state.
Heh, yeah I saw about 4 MiB/s max read speed with similar enclosure plugged into USB 3.0 port (I guess it self limited for a while). Meanwhile much less power hungry SATA SSD + USB-SATA bridge worked at 400 MiB/s there.
I don't know why the Samsung is so bad at low power. A WD SN850X in the same machine has no noticeable change in random read performance with a low power state configured. It's half the speed of the Samsung at full power, but 10x as fast at low power.
Absolutely some drives won't qualify, especially PCIe 5. But many will be fine. Also, PCIe 3 x4 is 4GBps or 32Gbps which isn't that far from saturating usb4 anyways, so it's not like we even can get drives running full tilt over USB at present. Many of these controllers probably top out at PCIe 3 or even less, if they don't support 40Gbps (most cheap-ish ones won't).
Hopefully we start seeing some usb-pd capable systems & drives. The want to plug in 2 or 4 drive 3.5" enterprise drive arrays to my laptop & have it just work. Even just a single 3.5" of power would be so helpful. And it'd mean you could alternatively be charging your phone well too, which is probably the more common everyday ask.
5W is still a lot if you add controller power requirements, and 5V->3.3V DC-DC conversion losses.
Also I have one of those Power-Z Type-C <-> Type-C power meters, and even with less power hungry western digital nvme, it didn't fit into 4.5W. And it consumed the most power not even during initial write test, but during a minute after kernel reported sync() success so after everything was supposedly written to the device and no power hungry activity was needed. Then it jumped to 7-8W.
I guess these dram-less nvme's do some reshuffling from faster flash to slower flash locations during idle time. Which is probably terrible for portable use, where you may want to sync() and then unplug the device once the OS tells you everything is synced up. Creating regular power loss situations during these reshufflings just feels like putting too much trust into nvme firmware, IMO. :D
Regarding Arq backup: if you are worried about using a proprietary (enrypted) and closed-source backup format in case the company were to go under, they have an open source command-line restore tool:
I always find it interesting to compare 4TB/month pricing against buying N 4TB hard drives yourself, where N is the desired redundancy. Typically, you could buy a fresh set of hard drives 2-3x per year for the "standard" (they don't even call it "premium") storage tier. A normal drive lasts 5 years: go figure. (The markup was about double ten years ago, so it has gotten more competitive already.)
Of course, that's not entirely apples to apples because they save you labor time, but I find it an interesting baseline comparison, also because their labor is divided over a hundred thousand customers and approximates to zero per customer.
Another thing people tend to forget is that it's a backup copy, not your only. You don't need the premium storage if you keep the original copy around anyway and the odds of 3 unrelated drives (1 at home, 2 off-site) dying at the same time are probably better than you getting into a car crash this year. I did have 2 die at the same time: same make and model, nearly identical serial numbers, surprise: same crash date and crash behavior (few KB/s sequential read speeds for a while, but strangely no data was corrupted, before entirely crashing).
>I did have 2 die at the same time: same make and model, nearly identical serial numbers, surprise: same crash date and crash behavior
Out of curiosity, were those Seagate drives?
I bought a couple of Seagate 3TB (spinning rust) SAS drives some years ago, and they both died within a couple weeks of each other after only a few months.
FTR I tried looking it up in chat histories and found the month in which it must have happened, but didn't spot any messages of mine that mentioned the brand :(
I also had two crash almost at the same time, both seagate with similar serials. This was about 20 years ago though, now I don't buy more than one harddrive at a time.
What’s the best open source file and backup managemen tool that can upload to AWS and GCP cloud storage, with integrity checking and with pre-upload encryption? I don’t want to start writing one only to discover a powerful thing that already exists in the OSS community and is trusted.
I use Restic [0] for my personal backups and I use Backblaze for the backend, but AWS S3 and anything compatible (of which Backblaze is too) is also an option. I preencrypt all my data and use pass for managing my encryption password and the secrets.
I work with a vast collection of unprocessed photographs, which takes up about 6 terabytes of storage space on my 4-bay network-attached storage (NAS) device - a DS418. Despite being limited to 1 GbE speeds, the process of loading each of the raw images, which are around 60-70 MB in size, only takes half a second. I occasionally use the smb multichannel flag to improve my bandwidth when connecting to my Mac through two gigabit USB-C Ethernet adapters, but it doesn't make much of a difference. Editing over Wi-Fi using my Wi-Fi 6 connection, which usually runs at 700 Mbit/s, is also not an issue. Although having a 10 GbE connection would be nice, it is not a priority since the NAS hard drives' speed would become the limiting factor before reaching that level, and the cost of upgrading the hardware to support it is prohibitively expensive. Adding a solid-state drive (SSD) cache to my NAS could be beneficial, particularly when using a 10 GbE connection. Synology provides 10 GbE add-on cards for a small portion of their stations.
For 10 Gbps equipment look at Mikrotik switches, a 4 port SFP+ one was around $130 when I bought it and the PCIe NICs were $50 a piece. I used DAC and AOC instead of UTP cables, they were $20-$50 each depending on the length, I paid for the whole thing a bit over $600 (6 SFP+ ports in total on 2 switches).
I use TrueNAS and the cache works only for reads, it helps with navigation of the folder structure (tens of thousands of photos, for example), but not with writes and not with most reads. Some other solutions can do write caching on SSD, that may improve the performance over 10 Gbps network a lot, but for 1 Gbps cache is not that useful.
Everyone’s prohibitive level is different, but 10GbE connectivity has come down a lot in the last 5 years and is (IMO) reasonable for home backbone and select machine connectivity for techies now.
I went 10Gb for that and 2.5Gb to the kids’ desktops (with a 10Gb uplink) and I think the entire network setup was well under $1000 for 2 servers and 3 computers (1 10Gb, 2 2.5Gb).
The switches and fiber link will be in service for 10 years in all likelihood, making it feel pretty reasonable to me.
" I had gigabit ethernet but that doesn’t matter, these spinning disks are barely faster than 100MB/s for sequential read even with a RAID setup."
Umm doesn't gigabit ethernet basically give you a max of 125MB/s anyways? And I was seeing transfer rates of sequential read from 7200rpm SATA drives greater than that even 8 years ago...
10GbE NAS are starting to become more available in the prosumer range. Once they are more affordable I wonder if they are reasonably close to saturating the SATAIII limit.
In my 20s I got rid of some
stuff on purpose, and got rid of some on accident. It's a real loss. I LOVE to time travel though all my old files and it's a shame I don't have the first years of programs I wrote, a bunch of emails, all poker hands, etc.
I'm still looking for my wallet.dat in any of numerous old backups hoping it still has some early Bitcoin I mined when it was still novel and Bitcoin faucets were a freely accessible thing.
What about “tree shaking” the stuff you don’t care anymore?
It’s not easy to know what to delete on the present day but some years later it is. I never regreted having deleted anything!
What about deleting the worse photos from that hike 10 years ago? If you don’t like them now, most likely you never will… and nobody wants to rewatch hundreds of photos from that hike. Not even you.
I'm the opposite. I am selective, but it's work to decide what's worth keeping, especially if you are prolific with your camera. However, two years later I have more emotional distance and have grown as a photographer, and it's easier to discard swaths. Bulk storage plus time makes easier.
It's the same approach I take with tax documents. Roughly group by year, yes, but why carefully select which ones I have to keep now, when I can just throw all of them out in a few more years?
While it's admirable to hoard everything, and future historians (read: most likely inquisitive relatives) might be lucky enough to inherit one of your archives that hasn't succumbed to bit rot (which, even with proper storage, may be well under 10 years), don't overlook the relatively simple way to maximize the chances of your images being enjoyed by people in the future and today: make prints!
The best way I've found is to make photo books. Most companies use print technology that lasts upwards of 200 years https://your-digital-life.com/long-will-photo-books-last/. Print a few books, give them to a few relatives, and you can be assured that the best of your photos will be viewed for decades to come. This way you can share your best while still hoarding that archive of every. last. photo.
Every time I cleaned up I regretted it a few weeks later so I stopped doing that. Storage is cheap. Keeping storage alive is not cheap however so just power down the drives and maybe spin them up once every 6 months or so, so the bearings don't seize if it's spinning rust.
I almost never delete things. It's a form of memory and I've definitely found it possible to go back through photos, emails and journal entries to remember what I was doing at that point. Memory is linked so it only takes a few sparks to put you back in context.
In addtion, it's increasingly looking like LLMs will need inordinate amounts of data to get it to the next level. If we want this to be personalized this will certainly mean slurping our digital lives into a format for training (let's assume you can do this locally and maintain privacy).
I'm looking forward to using these capabilities to enhance my memory, and indeed one day merge those capabilities [h+]
I guess that's one way of looking at it. You still want to remember the important things though. Memories are funny in that they can be highly contextual. So a picture you took in Copenhagen, however bad/amateur the actual picture can be a hook for all the great times you had in Copenhagen.
It's one of the interesting side-effects of being a digital nomad. When you're in one place and doing the same thing every day it tends to get compressed and discarded. When you're constantly changing things and experiencing friction you don't drop into that 'automatic' mode in the same way. I feel like I'm constantly being challenged.
I've traveled a lot, both professionally and privately and there are some 'snapshots' that I have where I am 100% sure I was there but I haven't a clue where or when it was. It's the weirdest thing. One for instance is a series of grain storage silos, one after the other and they're all falling apart. No idea where it was or in what year that was other than that it must have been while I was living in Canada. But it could have been anywhere in the United States or Canada. I'd have to backtrack all of my trips to find it again, and yet, it's clear as day, so vivid.
Bad take alert. I have just over thirty years of my own photos (scans too) and a lifetime of my parents photos to go through.
Time has the quality of making the ordinary extraordinary and the race to purge one thing to make way for another can be passed on as a task to the next generation.
Frankly most of the problem could be fixed by screening what you put into long term storage. If photo looks bad now it won't look any better in 10 years!
No, that's exactly the wrong take in a different way. A photograph may has intrinsic qualities aside from technical ones, and time will change how you or others feel about the picture.
And for the technical issues on top of that, technology has certainly allowed me to extract more from older photos. I suspect film scanning tech has topped out, but the processing on computer has improved slightly, and I'm able to eke pretty much all the useful information there is out of a negative or transparency. RAW processors too have improved a lot in ten years, I revisited a photo story from 2009 and pulled out a few previously unseen pictures and 2022 Capture One smoked the Lightroom 5 processor. The camera has been superseded by 2 or 3 generations (a Canon 5DII) but you could not tell.
Combining technological advances with a picture of something or someone that might not be around in ten years and you'll be glad to have it. Maybe start culling pictures from 20 years ago now instead? :D
I bias towards keep rather than discard both because cost is low and because the tools are getting better. About 20 years ago I even realized I don't need to discard the blurry pictures because someday they might be recoverable or otherwise useful (that day is probably here).
And as personal (local) search gets better with smarter systems it may be that future codebases will surface interesting insights or memories of some long ago event or activity as it can do with photos today.
However when my parents pass on I will discard all the landscape photos and and all the photos of long dead relatives I only met as a kid and don't even remember. Some of those are quite meaningful to my mum and dad, but are meaningless to me.
Currently sitting on 250gb of RAWs since October, at some point that will hopefully become 40gb or less. The last time this happened I batch converted everything to Adobe DNG, but support for that isn't great outside Lightroom. Is there some format with good compatibility somewhere between DNG/RAW and JPG that preserves dynamic range and white balance information? I think exposure and WB correction is pretty much the only functionality worth fighting to preserve. Keeping RAWs for long term personal archival seems silly
Hoarding terabytes is sort of a thing you can never get your head around unless you are stealing mp3s.
But hoarding gigabytes is pretty easy, also I think the term hoarding is pretty loaded language in this context? Saving things so they don't get lost is the way to go.
Just as an example, after Steve Jobs died apple went into everyone's email at .me and deleted their emails between them and SJ.
> It's when you start looking for 4TB SSDs that the prices go up considerably
That’s not my experience, having recently built a PC for myself. SN850x 4TB was about 2x the 2 TB price. (Depending on the day, it was -15% to +20% from the linear price, but was usually in the 3-5% higher than linear.)
I didn’t see a reason to go small for a couple hundred bucks of delayed purchase.
I asked chat GPT-4 to write me a rust program that recurses all files in my external drives, generates a xxhash of each file, then inserts it into an RDF graph.
That way I can scan for any files I don't have centralized before wiping out each older drive, and query for specific cases.
It's also nice to recurse the old unimportant video files and re-encode them with ffmpeg vp9 or av1, opus, so they don't take up as much space. Get rid of those raw video recordings and xvid/divx codec videos.
Re-encode all your old 7z, .zip, bz2, .tar.gz into zstd compressed files. It can also save a ton of space.
Buy manufacturer recertified HDDs, you can get like 18TB for $190 or so. Buy a few and put them in a truenas server. Hoard all the data you want, forever.
I don't understand this idea of making an rdf - what value does the rdf provide? Is there a way to view a diff in terminal that makes it faster / more intuitive? I'm genuinely interested to do this or something like it for drives that are slightly out of sync with each other (yes I know, it's dumb but it happened).
RDF is just an easy way to put everything into a graph, since file systems are stored in a hierarchy. You can use neo4j or whichever graph database, it doesn't have to be rdf. From there you can compare hashes that are in one file system but not in another, and do querying for constraints like largest files, folders with the most files, most recent files, etc.
If your file system is slightly out of sync, try out meld for an easy way to diff.
I took up bird photography about 3 years ago. I have 400,000+ RAW photos in Lightroom on over a dozen 2 TB SSDs. It's not so much hoarding as it just takes effort to go through and decide what to keep and what to delete. I shoot bursts so it adds up really fast. I could probably get by with 10-20% of the storage if I could just keep up with the pruning.
I did eventually throw out my floppy discs and a couple of years ago I finally scrapped my windows 98 PC after transferring files off the hard drive.
I still have data going back to around 95 though, and I like that. Pictures and chat logs from way back when I first met my wife. Old games I played as a teen. Some of the code I wrote way back when I was first learning and some of my first open source contributions.
I did lose a lot along the way, though. Sometimes I wish I could still see some of that earliest stuff. And things that have been lost on the ephemeral 'net - old BBS discussions, usenet topics, my teenage livejournal, etc. that I didn't copy and save.
The tough part is that it gets hard to manage.
While I do pull forward the most important stuff each time I get a new PC, a lot of the rest is sitting in old backup formats from various backup software that I might not even be able to restore anymore. It's always nice when I think of something I used to have and can go dig it out of one of those backups, but I don't know that I will always be able to.
And it can be really hard to find something. I've tried various organizational schemes over the years, but that just means that some things are organized one way and some are organized another, and it's even more difficult to find things. I suppose I should go through and re-organize everything into one consistent standard structure, but that's what I've always done before and it just becomes like the XKCD about adding yet another standard.
Anyway, the digital clutter still works well enough for me. It's sort of a cozy old home filled with sentimental things instead of a sterile empty monastic cell.
I've had the same folder system for nearly a decade, which is a while for me, as I am young. Four main folders which are:
-Activity: social planning, ideas, games, holiday stuff, etc. in respective subfolders.
-Education: gradeschool, higher education, certificates, trainings, religious notes, etc. also organized by topic/semester as applicable.
-Management: responsibilities, work, taxes, health, family, housing, and all other boring but needful things.
-Media: mostly pictures and videos, organized by year. Some other stuff like journal entries.
I use the same system in my email and really any program that allows compartmentalization. I also keep other folders empty, like downloads, so I never have to guess where a file could be.
"I just wish Google Photos wasn't owned by Google. I'm trying to reduce my reliance on consumer-oriented Big Tech products" after discussing Amazon's S3 & Glacier products seems odd.
I recently nuked about 10TB of data and decommissioned my Unraid server. I do have a disk on a shelf with the contents of that data If I do need something from it I can dig it out, but it is there to rot.
Am I the only one who hates having to keep anything? As a kid I would go out of my way to not be in photos, and as an adult it's furniture and appliances.
If it's important it's probably somewhere in my email, but wanting to deliberately store personal data locally or in the cloud makes my eyes roll. Who cares?
I will keep adding an /archive folder to every PC I own and copy the complete contents of my previous /home/ folder into it, including an endless amount of recursive /archive folders.
I will never look at any of those again. But Archeologists in the far future will find my data and it will revolutionize their understanding of our time.