Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Fair enough, basing folders on object names split by / is pretty inefficient. I wonder why they didn't go with a solution like git's trees.

What, exactly, is inefficient about it?

Think for a moment about the data structures you would use to represent a directory structure in a filesystem, and the data structures you would use to represent a key/value store.

With a filesystem, if you split a string /some/dir/file.jpg into three parts, “some”, “dir”, “file.jpg”, then you are actually making a decision about the tree structure. And here’s a question—is that a balanced tree you got there? Maybe it’s completely unbalanced! That’s actually inefficient.

Let’s suppose, instead, you treat the key as a plain string and stick it in a tree. You have a lot of freedom now, in how you balance the tree, since you are not forced to stick nodes in the tree at every / character.

It’s just a different efficiency tradeoff. Certain operations are now much less efficient (like “rename a directory” which, on S3, is actually “copy a zillion objects). Some operations are more efficient, like “store a file” or “retrieve a file”.



I think it is fair to say that S3 (as named files) is not a filesystem and it is inefficient to use it directly as such for common filesystem use cases; the same way that you could say it for a tarball[0].

This does not make S3 a bad storage, just a bad filesystem, not everything needs to be a filesystem.

Arguably is it good that S3 is not a filesystem, as it can be a leaky abstraction eg in git you cannot have two tags name "v2" and "v2/feature-1" as you cannot have both a file and a folder with the same name.

For something more closely related to URLs than filenames forcing a filesystem abstraction is a limitation as "/some/url", "/some/url/", and "/some/url/some-default-name-decided-by-the-webserver" can be different.[1]

[0] where a different tradeoff is that searching a file by name is slower but reading many small files can be faster.

[1] maybe they should be the same, but enforcing it is a bad idea


I think what you’re describing is simply not a hierarchical file system. It’s a different thing that supports different operations and, indeed, is better or worse at different operations.


Renaming a folder is inefficient.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: