

Amazon S3 file system with improved caching: the itch I scratched over Christmas - russross
http://github.com/russross/s3fslite

======
timmorgan
This project has one of the best READMEs I've seen in recent days of looking
at a lot of open source code. Good overview, well written.

------
stephen
Very cool. This looks like a real version of the Ruby fusefs I wrote to grok
all of the s3organizer vs. s3sync vs. whatever schemes for differentiating
files vs. directories in S3:

<http://github.com/stephenh/s3fsr>

Given I used Ruby's fusefs, nothing is streamed, and its single threaded,
limitations I assume this C++ implementation doesn't have to deal with.

------
ZitchDog
Would it be possible to reuse an existing HTTP caching solution like squid or
nginx for the caching since s3 exposes a REST api?

~~~
russross
I'm not sure it would interact nicely with the request authentication system
that S3 uses.

I think a generic cache layer would be a better solution. A bit of googling
turns up fuse-cache, which sounds like about the right thing (although I
haven't actually examined it in detail), or fs-cache, which also sounds like a
discrete cache layer to be added to any file system. Basically, you mount it,
and it passes requests through to any other mount (like s3fslite) while adding
an on-disk cache layer.

I haven't tested any of these, but that seems like an approach worth pursuing.

\- Russ

------
pan69
Very nice! I was rather disappointed with the FuseOverAmazon version I tried a
couple of months ago. I will definitely give this a try. Thanks for the great
documentation on how to use it as well.

Great job!

------
anotherjesse
Neat! Rather than having to issue a find, perhaps a background task that
primes the cache as soon as you mount?

~~~
russross
I'm hesitant to automatically fire off that many requests, especially when
they may not end up being necessary. If you are using the same machine and
preserving the cache, it will already be primed each time you mount the
bucket, except the first time (or any time you delete the cache database
file).

Using find is just a trick I used whenever I'd corrupt the cache or change the
DB schema while developing it, and then wanted to go in and test it again
interactively.

I should probably mention that reducing the number of requests was one of my
primary goals. The first time I played with s3fs (the one I forked), my bill
for the month was roughly 10% storage and bandwidth, and 90% requests (or was
it 20/80?).

Anyway, thanks for the feedback; I do appreciate it!

\- Russ

------
chrischen
Does this use http to upload everything to S3?

~~~
russross
Yes, it does. Adding https as an option is something I'll probably look into.

edit: It uses libcurl for transfers, and libcurl supports https, so getting a
secure connection is as simple as adding the option:

    
    
        url=https://s3.amazonaws.com
    

at mount time.

I've added that to the README file.

