Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From reading the thread it sounds like this is 5M per user.

Google Drive appears to be targeted at "human" usage, i.e. people uploading or creating files. I would guess that this is also worked into the cost – that the assumptions of the amount of work a human can do are a part of the price formulation. The reporter of this bug seems to be using this as a storage backend for software though, which I don't believe is the intended use-case.

Looking at S3 pricing, just the storage is ~5x the cost of Google Drive, and then you need to add transfer and API calls on top of that.

I don't personally think that there are reasonable use-cases for human users with 5 million files. There may be some specialist software that produces data sets that a human might want to back up to Google Drive, but that software is unlikely to run happily on drive streamed files so even those would be unlikely to be stored directly on Drive.

(Disclaimer, I work at Google, not on Drive, this is my personal reading and interpretation of the public info, I don't have any inside info here)




> "I don't personally think that there are reasonable use-cases for human users with 5 million files."

A 5 million file limit might be quite reasonable if you're paying for the basic, 100 GB storage tier. But Google Drive offers multiple tiers with up to 30 TB of storage. 30 million MB!

That means if your average file size is less than 6MB - very likely if you're storing JPEGs, audio files, text records, or whatever; you'll never be able to fill your Google Drive storage.

What's the average size of a file on a regular macOS or Windows HDD? It wouldn't surprise me if it's much less than 6MB. Shouldn't the file count limit scale with the storage limit?


Your analysis of why this limit exists is probably correct.

However, when you're paying for a product, the limits should be disclaimed. If I'm investing in Google Drive, I should be able to easily see the limits which apply to the product I've bought; that means total storage caps, file count caps, bandwidth caps, and whatever else.


While I agree with you that restrictions should be clear, this feels like a limit well beyond what anyone would encounter. Even abnormal usage with something like many big git repos with tons of little nested files inside the data directory isn’t going to hit 5M files unless you’re really committed to pushing until you find the invisible wall.

It would be pointless if your car came with a giant list of the melting point of every material used in the structure of your car so you know what the “real” temperature limits are. Oh boy, better not let the car get to 1450C or else the steel frame might melt!


But the number of files you're allowed to store is a pretty basic feature. You know cars are made of materials which will break down at 1450C. You have absolutely no way to know whether Google's file limit is 100k, 1M, 5M, 100M, 1T, or what. How are you supposed to know if Google Drive fits your use-case when they don't document their limits?


Many, many, parts of a car will break down well before 1450 C, and you don't know exactly when.


Whats the use case for car getting to 1450C ?


There isn’t one, comparing that to storing 5M individual files in consumer cloud storage platform which is also probably missing a use case.


I store way more than 5M files on consumer storage devices all the time. It truly isn't a ridiculous number, with 2TB it's not even hard to hit with a mix of text files and smaller media files.

If the storage vendor doesn't want people to use their product for the many use-cases where you end up with a bunch of small files, advertise the file count cap. Simple as that.


If they listed every configuration to this level of granularity, the doc would be huge and no one affected would even see it. So the easiest way would still be to just pay for a month, try to use it as intended, and if you hit a limit, try another product.


Few people would read the doc, but someone creating 5m files might.


That's a fair point. It looks like most limitations are documented, but I couldn't find a mention of the 5m file limit.


I recently moved on to using Google Drive as a backup drive for my computer, the tool I'm using for it, Restic[1], does file deduplication at file block level. Restic allow multiple hosts to upload to the same backend/repository. So far I already have 25k files, even though I started backing up a only few weeks ago.

I imagine that if you setup multiple servers/personal computers backups that could scale significantly.

[1]: https://restic.net/


> Restic currently defaults to a 16 MiB pack size.

Given this, it is impossible to exceed the 5M file limit with the top plan google offers of 30TB. You'd have to lower the pack size to 6MB or less.


Restic + rclone + GCP Service Account.

This is the way to go. Files will count towards the Service Account, not your account. If you use Google Workspace I strongly recommend using a Shared Drive (instead of sharing a folder in “My Drive”) so the files will be owned by your org, not by the Service Account (otherwise deleting the SA would result in the files being deleted as well, because in “My Drive” the creator of a file keeps ownership even when creating in a shared folder). If you have problems with the 400,000 files limit in a Shared Drive: create a new one, rclone has a “union” backend. It can be set to create new files in the new Shared Drive while still showing files from the other drive(s). Also Shared Drives have their own recent activity views so it doesn’t clutter your “my drive” view.

Create a SA in GCP and generate a JSON credentials file. Create a folder or Shared Drive via web and share it with the SA.

I’m using this setup for years with zero problems.

Example rclone config: https://pastebin.com/KdsFQz5K


I did some quick math on this. The top plan Google sells is 30TB, to reach 5M files in 30TB, you'd have to fill the entire thing with 0.75MB files. So yeah, if you are using Google drive as some kind of JSON blob storage, it isn't designed for your use case.


For 30TB I think it would be 6MB, but regardless, that appears to be per-user. If you're an individual needing 5m files, maybe you need a specialist storage solution, and if you're a business then that's one hell of a bus factor.


Ah yeah you are right, I had selected terabit instead of terabyte in the converter.


Makes me wonder if the limit was actually as much for finding run away internal code. I’m sure there’s cases deploying something like Google Drive where code went rampant making duplicates or something.


My guess is someone wrote some stupid "Google drive file name FUSE mount" script that attempted to exploit unlimited file storage in metadata. So some team working on Drive ran a query and saw 5M files is enough for basically anyone.


This reminded me of early days running AWS SQS. We didn't charge for storage as message size was limited and the expectation was any message was going to be read immediately... But sure enough some users would just leave messages there forever and we could never figure out what they were trying to do except maybe exploit it for free storage somehow.


Oh that's been done many times, including by Googlers. For example: https://github.com/asjoyner/shade


> I don't personally think that there are reasonable use-cases for human users with 5 million files.

% find $HOME -type f | wc -l

4969169

Almost 5M there.


But are you human?


Minecraft saves are (were? Idk anymore) just big folders with tons of files representing each chunk of the world. These would be a common thing a kid might want to back up to drive and could potentially be zillions of files.


What if you check out a few large git repos in a directory synced with Google Drive?


I thought about this, but I accidentally put the Webkit repo (~1m files) into Dropbox once and it didn't go well. Git needs a consistent view of the filesystem across many files to work, and that's not really possible with that many files stored in a sync system that works per-file like Drive does. I put Git in the category of specialist software that shouldn't be operated in Drive. By all means tar and backup, that should be effective, but there's no need to store all the individual files.


Storing a git working tree in a cloud drive does seem like a bad idea, but the actual .git directory itself might be okay? I think most of the files in there are immutable or infrequently changed.


Consider an incoming sync change for .git/refs/heads/* before the related .git/objects/*




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: