> Fair enough, basing folders on object names split by / is pretty inefficient. I wonder why they didn't go with a solution like git's trees.
What, exactly, is inefficient about it?
Think for a moment about the data structures you would use to represent a directory structure in a filesystem, and the data structures you would use to represent a key/value store.
With a filesystem, if you split a string /some/dir/file.jpg into three parts, “some”, “dir”, “file.jpg”, then you are actually making a decision about the tree structure. And here’s a question—is that a balanced tree you got there? Maybe it’s completely unbalanced! That’s actually inefficient.
Let’s suppose, instead, you treat the key as a plain string and stick it in a tree. You have a lot of freedom now, in how you balance the tree, since you are not forced to stick nodes in the tree at every / character.
It’s just a different efficiency tradeoff. Certain operations are now much less efficient (like “rename a directory” which, on S3, is actually “copy a zillion objects). Some operations are more efficient, like “store a file” or “retrieve a file”.
I think it is fair to say that S3 (as named files) is not a filesystem and it is inefficient to use it directly as such for common filesystem use cases; the same way that you could say it for a tarball[0].
This does not make S3 a bad storage, just a bad filesystem, not everything needs to be a filesystem.
Arguably is it good that S3 is not a filesystem, as it can be a leaky abstraction eg in git you cannot have two tags name "v2" and "v2/feature-1" as you cannot have both a file and a folder with the same name.
For something more closely related to URLs than filenames forcing a filesystem abstraction is a limitation as "/some/url", "/some/url/", and "/some/url/some-default-name-decided-by-the-webserver" can be different.[1]
[0] where a different tradeoff is that searching a file by name is slower but reading many small files can be faster.
[1] maybe they should be the same, but enforcing it is a bad idea
I think what you’re describing is simply not a hierarchical file system. It’s a different thing that supports different operations and, indeed, is better or worse at different operations.
What, exactly, is inefficient about it?
Think for a moment about the data structures you would use to represent a directory structure in a filesystem, and the data structures you would use to represent a key/value store.
With a filesystem, if you split a string /some/dir/file.jpg into three parts, “some”, “dir”, “file.jpg”, then you are actually making a decision about the tree structure. And here’s a question—is that a balanced tree you got there? Maybe it’s completely unbalanced! That’s actually inefficient.
Let’s suppose, instead, you treat the key as a plain string and stick it in a tree. You have a lot of freedom now, in how you balance the tree, since you are not forced to stick nodes in the tree at every / character.
It’s just a different efficiency tradeoff. Certain operations are now much less efficient (like “rename a directory” which, on S3, is actually “copy a zillion objects). Some operations are more efficient, like “store a file” or “retrieve a file”.