There are several use cases where this is a sure-fire way of shooting yourself i...

michaelmior · on Sept 22, 2015

Whether a large number of files becomes sluggish really depends on your file system. In any case, a common technique is to break large numbers of files into subfolders which usually does reasonably well at solving this problem.

As for updating, flock[0] solves this issue on operating systems which support it.

[0] http://linux.die.net/man/2/flock

sqrt17 · on Sept 22, 2015

Usenet news and maildir are cases where current operating systems already have to cope with that kind of load, so it's definitely possible.

The question is, can this be useful without becoming a partial and bug-ridden reimplementation of a NoSQL database (just because we have NoSQL databases that fit the bill and carry less maintenance costs wrt a spit-and-glue solution).

_hyn3 · on Sept 22, 2015

ReiserFS (v3) is a great small filesystem that's fantastic at lots of small files (and also copes great with power outage events, like on my laptop for the last 15 years). I've had tons of issues with ext3/4 (running out of extents, slow performance on lots of small files), btrfs (running out of metadata space when I still have hundreds of GB left?!), xfs (great on everything except lots of tiny files or power loss). It even supported reliable shrinking and growing on LVM.

It's too bad no one is supporting it anymore, since the founder is in prison and the only other people who seem to be able to support it seem to be focused on a Reiser4 pipe dream instead of supporting great, reliable technology that had most of the bugs worked out a long time ago.

nodesocket · on Sept 22, 2015

> if you have many records (i.e. more than a couple hundred), the file system will have a lot of work to do and the whole thing becomes sluggish

On OS X, running the simple test which creates, reads, and deletes 1,000 documents is no problem. The big concern is reaching the maximum number of inodes or files in a single directory limit. This can be worked around by "sharding" into sub-directories based on the first character of the UUID.

amirouche · on Sept 22, 2015

wiredtiger is faster that LevelDB in my experience.

sqrt17 · on Sept 22, 2015

Interesting, hadn't heard of it. Thanks!