Hacker News new | past | comments | ask | show | jobs | submit login
Writing a FUSE filesystem in Python (stavros.io)
142 points by stavros on Nov 4, 2013 | hide | past | favorite | 48 comments



I write ExpanDrive, http://www.expandrive.com - which mounts SFTP/S3/Dropbox/Openstack Swift/Google Docs/Box on Mac/Windows & soon Linux all in Python/FUSE.

Ctypes is the way to go for interacting with libfuse. Even better would be speaking the FUSE protocol directly to /dev/fuse*, but is rather complicated by our Windows support.


Oh, congrats, I used ExpanDrive when I was using Windows and I remember it being the best S3 client I had tried. I'm very excited for Linux support. Feature request: please make sure it supports ACLs properly, nothing else I've tried does. I'll set a folder as world-readable, but, when I upload a file into it, the file won't have inherited that permission and I have to go and do it manually every time...


For those who don't know the space, ExpanDrive used to be SftpDrive. I've been recommending it for years to people who are not technical for accessing sftp shares and have never been let down. Thanks for a great product and sharing some detail about how it works.


Thanks :)


Out of interest what do you use for mounting on Windows?

I have found Dokan (http://dokan-dev.net/en/) and pyFilesystem (http://pythonhosted.org/fs/expose.html) has a clever binding there you write your fs once and can mount it with FUSE or Dokan.

Is there something better?


Nothing open-source, unfortunately. I originally wrote a Windows fuse-like filesystem driver that we used in our 1.x product line, but that was a lot of work to maintain and have since switched to CBFS https://www.eldos.com/cbfs/ - which I wrote a libfuse binding for that we directly mount through via ctypes. They are actually in the process of formally building out FUSE support for CBFS but it's not officially supported.


CBFS has a $7k-20k license price-tag. How are you affording that for your project?

I looked at it some years ago, but couldn't justify the cost for what I needed a FUSE like ability for. And moved on to a couple of other things.


I wouldn't ship a loosely maintained open-source kernel driver to paying customers. You can get CBFS for less if you're an indie shop, but the don't directly publish the price anymore. Compared to spending a year writing a gross Windows filesystem driver or hiring somebody to do it CBFS is amazingly inexpensive.


What price range are we talking about for the "indie shop" license? Between $500-$1000? Or is it more like 1/2 of the $7k license?


Expandrive is commercial software that sells for, IIRC, ~$20.


$40.


Hey, I had thought that ExpanDrive was using Dokan, but I think I'll pick up a copy now that I know some of the backstory. Thanks for posting here!


Neat project and great post. One thing I'd like to point out is that there is another mechanism for implementing userspace filesystems in Linux: V9FS. FUSE has a lot of documentation -- much of it in various stages of bitrot -- and many of the comments here point to other FUSE projects as well!

The underlying mechanisms for FUSE are pretty convoluted and difficult, so much more than trivial examples can break the abstractions afforded by the libraries like the one used in the post.

V9FS is in contrast exceedingly simple and basically unchanged. And, yes, it's based on the Plan 9 filesystem protocols :)

http://v9fs.sourceforge.net/


This is interesting. I'm aware of the fact that some plan 9 protocols were ported to linux, but are you suggesting that this v9fs is a replacement for fuse in common use cases? Do you know about simple comparision of fuse and v9fs implementation of the same userspace filesystem?


V9FS and FUSE both enable userspace filesystem drivers. They are actually pretty interchangeable for 99% of use cases, and the rest of those cases are where you'd directly interact with both, so it wouldn't be fair to compare a FUSE library abstraction with V9FS.

FUSE is too complicated to directly interact with, so everyone (OP included) is using libraries and abstractions on top of it. So what you'd really be comparing is that abstraction, as well as possibly how it limits actual FUSE interaction. Additionally, the performance of these abstractions is hard to predict, as many do lots of work (threads/IPC/etc) silently in the background, so the apparent workload on your filesystem driver is far from actual workload.

V9FS is simple enough that drivers often just talk it directly. The protocol is a simple RPC mechanism, and parallelism happens naturally.

As far as a direct comparison example, I don't know of one.


I did one of these a while ago that lets you mount a redis server as a local filesystem: https://github.com/mattsta/redisfuse

It's a great way to do FS-like things ("edit X in vim", "run aspell across all keys in Y") on very non-file-like objects.


I did something similar with a digital repository server software for a conference hackathon. PyFUSE is pretty fun to play with, despite the documentation gotchas and warts.

https://gist.github.com/BHSPitMonkey/01cef0d528f374cca8cb#fi...


I did the same thing, but I used C.

http://www.steve.org.uk/Software/redisfs/


Oh, wow, that's a fantastic idea! Well done, I will play around with it. I love the idea of exposing disparate datasets as filesystems.


You should take a look at Plan 9. Everything is a file, and service are implemented as file servers - including the windowing system (rio).

Some of the ideas are available in Plan 9 from User Space.


I've seen that, it sounded like a great idea to me when I was looking at it. Too bad Plan 9 never caught on.


FWIW, you can also write a GlusterFS "translator" in Python.

https://forge.gluster.org/glupy

This allows you to add your own functionality alongside everything GlusterFS already has - e.g. distribution, replication, handling for all sorts of annoying VFS/POSIX special cases - instead of having to do everything from scratch yourself. I'm not saying it's the right option for everyone who might just use FUSE directly, but it might be an option to consider.

Disclaimer: I'm the original author, though others have taken over since.


Another "me too", but I really enjoyed writing this: https://chris-lamb.co.uk/projects/aptfs


Have yet another "me too" - I wrote up writing a FUSE module in Perl in a little more detail than The Fine Article:

http://blogs.perl.org/users/fuzzix/2012/08/fun-with-fuse.htm...

The info should transfer to other FUSE bindings.


Hey, I love "me too"s, they are all interesting projects. That's a very novel way of managing packages, does it work if you copy all the symlinks out and then restore them (so you can back up/restore your installed packages)?


It's simply a view "remote" APT repositories, not your local package setup.


That's a really cool utility - can it be used with a caching proxy to reduce network traffic?


Yes, assuming that caching proxy is an regular APT proxy.


What kind of performance did you get out of this? I wrote a naive implementation of a FUSE filesystem in Python a while back -- it read the sickbeard DB and built a directory of the last snatched programs. The performance was pretty shocking, but workable for a single user. I always assumed that was down to Python, so I'd be interested to hear how well yours performs.


I haven't tried it at all, since it just needs to be faster than my Internet connection (it's for backups), but you can just run the file provided there and it should work properly. Please post a benchmark if you do (I'm not at home now)!


I'll give it a try tonight :)!


If anyone interested to see some real-world open-source FUSE example (but written in C), take a look at our project RioFS: (https://github.com/skoobe/riofs): an userspace filesystem for Amazon S3 buckets that runs on Linux and MacOSX (FreeBSD support is comming as well).


Yes! Fantastic, thank you, I've been looking for something like that.


Fun fact: bup has a fuse module to access saved files: https://github.com/bup/bup/blob/master/Documentation/bup-fus...


That's pretty great, I think I will use bup when EncFUSE is done, rather than rdiff-backup. It's also very actively maintained, which is great.


I found rsnapshot to be far superior.


Interesting, in what way?


It requires less wheel inventing as it is mostly declarative. It provides a higher level of abstraction vs rdiff-backup. Finally, it is more like Apple's Time Machine since t uses hardlinks to dr duplicate files. This means that you can browse all you backups in parallel and do not have to worry that one of the incremental backups being corrupt means that all subsequent ones are corrupt as well.


That's good to know, thank you. It looks like the only drawback is that rdiff-backup stores file diffs as well, so rsnapshot will have to store the entire file again if you change one byte in it, but, since most of my files will never change, it sounds very good for my use case, thank you.


Yes. Another drawback is the lack of encryption support, which may or may not be easy to work around.


Well, I'll be using the EncFUSE script from this thread to provide the encryption, so it should be quite easy to work around it (just point it to the encrypted FUSE mountpoint).


superior to bup or rdiff-backup?


I did not evaluate bup beyond this statement in the readme:

> This is a very early version. Therefore it will most probably not work for you, but we don't know why. It is also missing some probably-critical features.

My data is worth more to me than that. However, based on the description it sounds like a more complex version of rsnapshot. rsnapshot uses rsync as the underlying protocol, and uses standard hardlinks instead of packfiles for deduplication. The nice result is that you do not need any FUSE plugin to browse full backups: they are simply stored on your normal filesystem.

I did not look at how bup is configured, but will say that from the "set it and forget it" perspective, rsnapshot is very well made: a single declarative config file will suffice to back up all your servers.


Nice post! You mentioned that you implemented this wrapper for backups. Apart from creating a virtual file system for sandboxing perhaps, isn't this generally slower? Maybe I am not getting the exact purpose of this task.


Slower than what? This is for creating encrypted versions of your files before you back them up, so your backup program will only see encrypted files.


Ha working on a fuse file system for one of my projects in grad school.


Does encryption prior to backup mean less de-duplication?


It does. encbup uses CBC mode, which means that you'll only have per-file deduplication, but EncFUSE uses the less secure ECB mode, which means per-block deduplication (roughly almost as good as unencrypted).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: