
Show HN: Goofys – a faster s3fs written in Go - khc
https://github.com/kahing/goofys
======
rictic
I appreciate any project that makes its strengths and weaknesses very clear
and documents its consistency semantics.

Reading around[1], it looks like close-to-open consistency means that all
writes must be flushed before the close() call returns, and all subsequent
open() calls to the file must see the changes. IO may otherwise be buffered or
cached by the FS.

1] Best source I found were these lecture slides:
[http://www0.cs.ucl.ac.uk/staff/B.Karp/gz03/f2011/lectures/gz...](http://www0.cs.ucl.ac.uk/staff/B.Karp/gz03/f2011/lectures/gz03-lecture3-NFS.pdf)

~~~
khc
The caveat is that goofys is subjected to the underly S3, which if you are
using Amazon is only eventually consistent
([http://areweconsistentyet.com/](http://areweconsistentyet.com/))

------
notacoward
I applaud the "Filey System" clarification. Confusion between things that are
very close to 100% POSIX compliance and things that don't even have that as a
goal but adopt similar interfaces or structures has been a real problem. It's
nice to see someone being up front about the distinction, and it looks like a
useful project too.

~~~
khc
Thank you! It's not a general purpose filesystem but I don't think that has to
be the case to be useful. I am lazy and I want to do as little POSIX as
possible.

~~~
notacoward
That's wise. There are parts of POSIX that even I (a notorious purist) think
are outdated, unclear, or just not worth the trouble. Appending and random
writes sure would be nice, but I appreciate the difficulties involved. Maybe
if I get a chance to learn more Go I'll be able to contribute in some of those
areas.

------
khc
I started learning Go a month ago and wrote goofys in it as an exercise.
Looking for feedbacks from Go best practices as well as usefulness of reduced-
POSIX filesystems.

------
edutechnion
riofs ([https://github.com/skoobe/riofs](https://github.com/skoobe/riofs))
seems much faster than s3fs and deserves a spot in any benchmarks.

~~~
khc
Yes I looked into it (I only found out about it after I started working on
goofys). They have a stub flush() which does nothing
([https://github.com/skoobe/riofs/blob/master/src/rfuse.c#L104...](https://github.com/skoobe/riofs/blob/master/src/rfuse.c#L1043)),
so of course they are faster and the benchmarks won't be meaningful.

~~~
henningpeters
AWS S3 doesn't expose a flush()-like call in their API, hence what is the
point of exposing something that the underlying service doesn't support?

~~~
khc
data on s3 is durable after a successful PUT (or a complete multipart upload),
so their flush() is implied.

~~~
henningpeters
exactly, so why do you need a flush then?

~~~
khc
I looked at the code some more and they do handle release(), so much of my
point above was invalid. I expect riofs's streaming write performance to be
comparable to goofys because we both use the same implementation strategy.

------
Goopplesoft
This is cool. Any reason you chose not to cache reads locally (detect
md5/mtime changes for a new read or something similar)?

~~~
khc
Many reasons:

* I am not working at the moment and have some free time (shameless plug: resume at my profile), so I want to bound the amount of time I need to get something useful

* many archiving/backup workloads are WORN (write once read never), that and many streaming workloads (data processing, media streaming) don't really benefit from cache. (Unless your cache is as big as your data, but that's usually not why people use S3)

* for the use cases that cache can help, I think you can just use another layer of caching filesystem. I intend to write one if one doesn't exist already. I wonder if you can use cachefs with fuse filesystems? Let me know your use cases and I will think about it some more.

~~~
notacoward
As far as I know, FS-Cache does not yet work with FUSE, though we (in Gluster)
and others have often considered adding such support. You can actually get
some caching within FUSE itself, but I think you have to go down to the low
level (inode-based) interface to get any control over it.

~~~
khc
the go fuse binding that I am using only has low level interface anyway. That
said I think the parent meant on disk cache not vfs cache.

------
SergeyPopoff
I want to see benchmark comparsion with the Rust version.

~~~
khc
I see what you are doing there :-P goroutines do make certain things easier
but there's nothing you can do with good old pthreads. One of the motivations
for this project is for me to understand better what this Go hype is about.

~~~
SergeyPopoff
Congratulations then! Hope you enjoyed writing code in Go.

