I remember looking for some saved emails on GMail once only to realize that at some point, emails I tagged as Saved were just gone, missing, one day beyond a particular point in time.
If you don't own your bytes, someone else does, and they can and do go missing. Everyone has to buy some storage sometime. It's super naive to think companies and organizations are "professional" and don't just straight up lose things. It's just life.
Did you know GitHub has an error page for when they cannot retrieve a repository from their database? Yeah, your emails can just go missing. Your repos can just go missing.
I shudder to think someone's deed to their home somewhere out here in the states has at some point in time just straight up gone missing, but I would not be surprised.
There's an accredited university in New York that's closing soon. All of those people's degrees--how will you verify you actually graduated from there? I guess you had better kept copies of your transcripts and your physical diploma.
First: Friendly reminder to everyone to back up your data and then back up your backups to a separate location.
Second: I’m the person in my family who wakes up in a cold sweat worrying if our photos are backed up properly, or my music collection. I’ve got a handful of backups at the house that I double check periodically to make sure they haven’t decayed, and some stuff backed up to some cloud services. I’ve been talking to some family members and close friends about building some simple home lab raid devices that we can use to back up our stuff locally and to one another’s locations.
I’m not enough of a network engineer or server admin to know the best way to set it up, but I’m hoping we’ll figure out a simple, cost effective way to handle it. I’m hoping it’s a fun project either way and we can all avoid entropy for just a bit longer.
For point 2, check out synology. I can say my experience has been great and I wish I’d bought a bigger device (I got a four bay enclosure and wish I’d gotten 8). QNap was also a home storage device I looked into, but don’t have any experience with. Not affiliated with synology, just a satisfied customer.
Love my Synology, and I bought a 2-bay cus I thought it was just going to be a toy. Now I have everything on it and I’m dreading moving to a larger unit. It’s backed up nightly to whatever cloud provider I feel like at the moment, but it would be super cool to put a second unit at a friend’s house and back up to that.
I use google take out to do a dump of email and back up the download file. In order to clear out a bunch of space on my gmail account, I did a full export and then deleted conversations based on date. I ended up having to use a script which ran over a few days, due to limits on how fast I could access messages. This was all to avoid signing up for a paid account.
Excellent question and not one I’ve bothered to solve, and you’ve pointed out a gaping hole in my backup plan.
If I had to decide now, I’d say setting up a server that pulls my email in with IMAP and saves it to a backup drive. Of course that means making sure spam and marketing emails I don’t want aren’t backed up, so maybe I mark them somehow as back up targets?
It’s an excellent question and I don’t have a good answer. And it’s worth pointing out that emails are just as important as photos when you think of correspondence from friends, family, business contracts, attachments, links and their contents, etc to forever. The network effect is very real and I’m not sure how to handle all of that.
I use mbsync to sync gmail to a mail folder on a local drive, and then backup the mail folder onto a network drive using rsync, and then sync every 24 hrs to a onedrive folder using rclone,
And a good way to do that is by reigning in the digital clutter.
No you don’t need those 15 shots of literally the same angle. Nope, you don’t! Maybe as a professional photographer but even then you’d mostly know what to discard.
Because if you don’t then cost will not be the only problem. It will be difficult to move around your data and you may have to. You also need to keep multiple copies online — so there as well cost will multiply.
Google Drive lost all our HR files, everyone's contracts etc. Huge panic. We only got them back by using Vault (the legal discovery back door, which doesn't return them in the same format or folder structure). And yes, no one deleted them - I checked the audit logs. If we weren't firefighting so bad here I would be pushing for us to switch.
I'm not judging, but I just don't understand why someone would save business critical data to Google Drive only. Is this a US thing? Aren't you afraid of privacy or trade secret leaks and potential legal ramifications?
One of the consequences of SAAS being easier to use to than old school local IT is that these decisions are often made by the user of the data directly, for whom these are often not the most salient risks. In the old days you couldn't really operate beyond a few people without a sysadmin who would know that they would be fired if something like this happened and so would be much more keen to make sure long tail risks are taken care of.
With SAAS it's difficult to persuade people that we need to pay for a third party backup service when the SAAS does their own backups for free.
Regarding privacy or trade secret risks, Google provides adequate permission mechanisms for that if operated correctly. For example, HR have their own shared drive which other users can't see. Threats like Google itself getting hacked or deciding to look at our docs are basically in the "act of God" category for our business.
Any of y'all worked at Google? I'm fascinated by the internal processes that would lead to something like this happening with no warning for any of their users. Everywhere I've worked before someone would be like "hey we should probably send an email at least"
What’s the incentive? None. There is no internal process. You’ve been working for a company that has had for its entire history a cash cow that essentially has no customer facing contact for its survival - you wouldn’t think to go there.
Keep google search and YouTube running, don’t fuck up chrome too much - everything else is basically unimportant.
Maps, yes. I think even people who don’t use a single Google service use this product - Google Maps (w/o logging in that is). Simply because every other alternative is worse than atrociously bad.
Android is like Chrome — a vehicle for their apps and services that carry ads.
GCP — I have no data on it but the places I have worked at, relevant people scoff even at the mention of GCP.
> Simply because every other alternative is worse than atrociously bad.
Perhaps true in the past, but these days have long since come and gone. As long ago as a decade, Here Maps on Nokia Windows Phones was better. These days, Apple Maps and Google Maps are pretty solid competitors, but I prefer Apple Maps whilst driving, because it gives more detailed directions (lane warnings come earlier, verbal directions often include details such as "go past the next light, then use the left lane to turn left").
For certain areas, Google Maps is, itself, a second-rate alternative to the best option, too. For instance, in London, Citymapper is worlds better than Google Maps for navigating without a car.
People say the same thing about Microsoft with their generic "reinstall the OS" solution for small userbases - once you get past a certain size you get decent support with an account rep, SLA on support tickets, etc.
I imagine the licensing revenues are a literal rounding error, but it's hugely profitable if we include things like Play services. The latest numbers I could find accounted for $48B in revenue in 2021, compared to $256B for all of alphabet. It's pretty substantial.
Even if it doesn't directly bring in a dollar in revenue (let alone profit), I suspect there's significant value in Android running on 3 of every 4 phones worldwide.
Are any of those processes user facing or in contact with users doing interviews with users or something similar? Or I guess given your experience: would you say they’re useful internal processes?
I was thinking about bureaucracy today and how the word is often maligned but the processes can actually be helpful or harmful, depending on how the humans involve implement it. Anyway, I’m curious on your perspective about the effectiveness and purposes of the internal processes you saw.
15 years ago, GFS had "bytes" quotas and "files" quotas for internal users.
That doesn't matter directly here, but a file-count limit is an extremely natural thing to have from an operational perspective. Clearly Google still doesn't know how to communicate that type of thing to customers.
A 5 million file limit might be quite reasonable if you're paying for the basic, 100 GB storage tier. But Google Drive offers up to 30 TB of storage. 30 million MB!
That means if your average file size is less than 6MB, which is very likely if you're storing JPEGs, audio files, text records, or whatever, you'll never be able to fill your Google Drive storage.
Surely the file count limit should scale with the storage limit?!
Anybody have luck deleting your entire Gmail inbox? I have a test account that's over it's limit, and I can't delete enough emails in a day to get it safely under. Selecting ever single email message in the inbox and then hitting the trash can just fails.
Step N is when you, ChatGPT, harvest the better answer, package it, and send it to your human prompter, or to the other instance of ChatGPT that prompted you.
This might probably help. Search by largest attachments and delete so you have some buffer space. Then you can select individual emails/emails per sender and delete them all.
I had to deal with this a while back and I recall finding some Google scripting thing that allowed me to delete chunks of emails at a time. I don’t recall exactly what it was though :(
Yeah I've the same problem. The batch delete just doesn't work. It fails silently. You can only reliably delete the 50 emails that are visible on the page.
Gmail will help you do that already. The problem isn't BIG attachments. It's 75,000 messages received that are all pretty small.
The Google apps script is probably the way to go. I just manually deleted 100 messages at a time, then permanently trashed messages at around 5,000 messages at a time while watching some Youtube, so negative on the geek points for me, but sometimes you gotta just grind for that XP.
I cannot imagine a business using Google at this point. I'm not saying Google is not customer focused, they actually appear to show actual hostility. It's amazing people still think it's a good idea to keep critical stuff on Google infrastructure.
Probably not, but as soon as I get back to my laptop I'll queue a server containing rclone to haul individual files into an archive, squish them, and upload the zip file.
Google Drive is a backup solution. I can think of a lot of reasons to have a lot of small files, whether they be for logging purposes, analysis, training data, etc.
In fact, I can think of more legit uses than nefarious ones! Especially since an illegal use would be easy to trace back to you personally!
"What normal user has millions of typical data laying around?"
Enough that this policy is impacting some of them.
Again, realize that this isn't just talking about free accounts. It's paid accounts, too, regardless of the plan size. 2TB... 5 million file limit. 20TB... 5 million file limit. 1 plan with 10 users... still a 5 million file limit.
If you can't see that this could be a problem for normal customers (or normal accounts that may comprise multiple users), then you're not thinking creatively.
I wouldn't doubt that I have millions of files backed up on various drives, simply because I back up every system, and I've owned computers since the 90's. And I'm not intentionally trying to generate data!
Why would you care about reasons, if you offer a certain amounts of space?
There is a size limit. Why would you implement an item limit?
I have about 15M files in my Dropbox. The limit of my box is the size of my files combined AND their own size not their number.
"Files uploaded to dropbox.com must be 50 GB or smaller. All files uploaded to your Dropbox must be smaller than your storage space. For example, if your account has a storage quota of 2 GB, you can upload one 2 GB file or many files that add up to 2 GB. If you are over your storage quota, Dropbox will stop syncing."
The question is in good faith. 15m files is ALOT of files and one wonders how an individual accumulates so many and that they are also important enough to have backed up on dropbox no less. Is it a bunch of very small files with many versions? I mean you obviously don’t have to answer, but it just seems silly to attack others asking legitimate follow up questions. It is now obvious that you value your privacy to a higher level of secrecy than the poster thought.
I certainly can, but I can't imagine why anyone would have 5M in cloud storage meant for manipulation by end users.
Locally you could process those pretty quick. Or in cloud block storage. But the amount of latency involved in each cloud API operation to create a new file... yikes.
hearing the argument "This is for data, not for backups." strikes me as so strange.
I can understand why a company would care about the access policies; if you are storing some kind of database that is accessed constantly and routinely then it's nice to know -- but when did we become okay with categorizing bits beyond that?
I don't really care for the future of needing a subscription to backup cat photos, specifically. I'd rather bits be bits except in limited instances where the data is priced based upon access frequency/retention/guarantees. (yeah, it stands to reason that cold storage that has limited access should be a fair bit cheaper than hot storage where the data is ready to go ASAP.)
This kind of thing just accelerates the stenography arms-race wherein people try to store their terabytes of data as cat pictures.
Where is the difference though seriously?
If it is for "data" I might be accessing it every other minute. If it is for backups I might be accessing it every other minute as well. I do look at my cat pictures every other second though and yes I do encrypt them before and after I do so.
The whole spiel of cold/hot storage ..
But not for backing up 5M files. Again, the latency per-file will absolutely kill you.
Just for performance, if you insist on backing up the content of 5M files to Google Drive, you really need to write a custom script to tarball and compress whole directories on-the-fly or something.
I mean, 5M files is like multiple sets of map tiles for the entire world. Sometimes people definitely need to work with those, but using cloud backup through the Google Drive API interface is just not the way to go there...
This is not for you or Google to decide.
And no, I can sync changes without having to tarball.
I might only touch one of those 5M files a week actually, depending on my structure.
It was not for them to decide up until now.
And yes of course users were willing to wait $duration for it until now.
If you are suggesting a work around that is okay. It is not okay for google to tell their users to tarball "now" "because".
Why should anyone have to think about number of items at all? For most of computing history, the only thing people really care about is storage space, not items stored...
That is the part that baffles me. There might be an argument against when this becomes unsustainable (e.g. the number of bytes inside a file is actually smaller than the number of bytes the metadata associated with that file takes) but that can be solved quite intuitively by making user pay for it (essentially they should be counted towards the same usage limits, if they aren't already?).
I was thinking similarly. Any kind of programming language environment can leave all sorts of real + cached files lying around.
In the Python word, I just created a new virtual environment on Linux, and that had 1339 files. I then loaded it up with a few standard players (numpy, pandas, scipy, scikit-learn, matplotlib, dask, and django) - this ballooned the file count to 16,945.
Wait, really? Coming from Windows and macOS that boggles my mind.
What on earth are the main contributors to that file count? I can't even begin to imagine, but I'm incredibly curious now.
Also, that seems like an enormous waste of disk space. Assuming a 4 KB block size means roughly an average of 2 KB of space is wasted per file, which for 700K files would total well over a gigabyte of wasted space. Yikes.
well, it's not like windows is too far off - on my w10 install, c:/windows/ comes up to about 280k files. (and in total, that small-ish system drive, 40 gb, has about 380k files on it. i'd imagine macos has numbers that go something like that as well, since it is a *nix system.
That's 4MB/file, which is not exactly large. That's also a small use case. 20TB is 4 people on Google Workspace. Our company is 40 people, so our limit is 200TB (which we pay for), but 5 million files.
> I am astounded that people put anything more important than cat photos on Google, Apple, Microsoft, Linux, Unix, general cloud provider, mobile phone, HDD, PC, etc. etc.
Also, sorry, but I have to say: there nothing wrong with caring about cat photos.
I do feel bad for the victims. I am still astounded. I run two businesses. It never would have occurred to me that Google Drive is something I would use for operational data.
[Edit: I get the impression my original comment was interpreted as snarky. I am 51. I did not grow up in the world of ubiquitous Google Drive. I have only heard how hard it is to get customer support out of Google. I am genuinely surprised that people rely on Google Drive and other Google services for business. This has me wondering if there is a market opportunity to sell more serious services to businesses that never thought about operational reliability.]
Every company that uses GSuite probably uses Google drive as their shared file storage, which obviously includes a lot of files critical to company's operation. What else would you expect?
I would expect them to read the terms of their contract in regards to how their data is backed up, can it be restored if deleted, how long its retained, etc.
If they just go "derp derp put it in cloud" they are negligent.
It also had not occurred to me that serious businesses use GSuite. I cannot imagine depending on Google, given their history. I realize I am out of touch. I remain astounded. Market opportunity!
If you don't own your bytes, someone else does, and they can and do go missing. Everyone has to buy some storage sometime. It's super naive to think companies and organizations are "professional" and don't just straight up lose things. It's just life.
Did you know GitHub has an error page for when they cannot retrieve a repository from their database? Yeah, your emails can just go missing. Your repos can just go missing.
I shudder to think someone's deed to their home somewhere out here in the states has at some point in time just straight up gone missing, but I would not be surprised.
There's an accredited university in New York that's closing soon. All of those people's degrees--how will you verify you actually graduated from there? I guess you had better kept copies of your transcripts and your physical diploma.
Dust to dust, ashes to ashes.