
Ask HN: Best way to store large amount of binary files? - kluck
What is the best technology to store a large amount of binary files?<p>I thought about this problem and explored the solution space but still fail to get a final answer. The problem with flat file storage is that the size of the directories grows very large making backups and browsing a pain. And the dbms solution has the disadvantage over beeing not browseable (using file system tools) and beeing a &quot;black box&quot; of sort.<p>Also I prefer solutions that are slick and do their job fast. That any alpha&#x2F;beta projects are out of the question for a durable data store, is obvious IMHO.
======
stephenr
On disk, using nested directories - (00-zz)/(00-zz)/file.ext

The two levels above can either be based on the initial letters of either the
filename of a hash of the file content, or use incremental names until each
one is "full".

Increase depth to increase maximum scale.

~~~
kluck
We tried exactly this approach, but browsing (ls) was really slow. If you
would change the depth of the nesting in relation to the overall amount of
files it might work better but then you would have to impl. a logic for moving
files around if the depth changes.

~~~
stephenr
How many files did you have per directory?

Normally something like this would be used with a simple metadata db -
filename, type, bytes, filepath.

~~~
kluck
Random access by filename was fast, since we used a hashing mechanism to
locate the directory quickly. But (what I meant with "browsing") was slow
backup and slow inspection using normal shell tools.

~~~
stephenr
Yes I understood the issue, I specifically asked how many files you were
storing per directory - if directory scanning is slow, you still have too many
files per directory.

~~~
kluck
I can't really remember, but we had tens of millions of files and the
directory depth was 2, so there were approx. 80000 files in each directory (we
used an md5 checksum for the directories). And we used ext3. I understand that
the speed with large directories was increased in ext4... maybe that was the
problem back then.

