Hacker News new | comments | show | ask | jobs | submit login
From Filesystems to CRUD and Beyond (cloudatomiclab.com)
58 points by ingve 5 months ago | hide | past | web | favorite | 13 comments

I'm aware of one filesystem implemented on top of S3 using FUSE that makes a serious effort to be fully POSIX-compliant: ObjectiveFS [1]. It doesn't map files directly to S3 objects, so you can't access your ObjectiveFS data directly using other tools. As I understand it, it treats S3 objects more like the blocks in a log-structured filesystem. I've successfully used it to run applications that aren't "cloud-native" in an environment where VMs are (at least theoretically) ephemeral.

[1]: https://objectivefs.com/

BTW Multics didn't have the first filesystem; its design was influenced by experience with earlier systems like CTSS also developed at MIT. The AI lab's ITS OS also had a filesystem, though inferior in many regards to Multics's design.

An interesting side point relevant to the article: the original intent of Multics was that pages would be the basic data structure, with the filesystem essentially merely a way to keep track of groups of pages when they were not otherwise in use or when a way was needed to refer to them. (The actual initial implementation fell short of this ideal.)

(author here) Thanks. I found it quite hard to find references to early filesystem papers, I must try harder.

Much of that early information may not be digitized.

Even if it is, it may not be widespread knowledge. E.g. WoFS, the first embodiement of many ideas found in ZFS and friends, is virtually unknown, despite being developed in the late 80s, and with documents available online.

"so in summary, don’t use filesystems for large distributed systems."

Don't use a _normal_ filesystem for large distributed systems.

Being able to seek() on a file (without having to download it first) is something that is very underrated these days

just like databases, clustered filesystems also conform to CAP. You need to pick which part of the triangle you need.

I really like GPFS, as it has lots and lots of hooks (like S3) for events. It also has "HSM" which allows you to optimise which part of CAP you want based on arbitrary parameters (for example I've seen a rule that was if the file was an image, larger than 1000pixels and red, put it on long term storage.)

> just like databases, clustered filesystems also conform to CAP

Instead of being like databases, I like to think filesystems (clustered or not) as one type of database. Of course that is just more of a mental trick rather than any deep insight to anything.

I don't really understand this post. It reads like a neutral description of differences between kv/content addressed storage and file systems, but then concludes that you shouldn't use file systems for distributed systems. Further motivation is needed.

> I quite like the term “value store” for these.


He literally talked about CAS right before that, here's the whole quote: 'The core part of git is a content addressed store. I quite like the term “value store” for these.'

Hash-based cache-busting is pretty much CAS, and a widespread practice. Using CAS for user uploaded content is somewhat common, too.


CRUD has been around forever. Like literally, since the mainframes.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact