Hacker News new | past | comments | ask | show | jobs | submit login
Ori File System (stanford.edu)
172 points by rcarmo on Jan 10, 2015 | hide | past | favorite | 49 comments



To the authors: please add immediately accessible documentation, best in form of a 'Quick Start' document. Also please add one page to your website that you call 'Features' where you describe what it can do / can not do and add another page to your site that you call 'Documentation' where you put the USER DOCUMENTATION. No, nobody wants to read a paper to get some sync software up and running. THANKS!


I promise I'm not being dismissive.

My question is:

Ori is a distributed file system built for offline operation and empowers the user with control over synchronization operations and conflict resolution. We provide history through light weight snapshots and allow users to verify the history has not been tampered with. Through the use of replication instances can be resilient and recover damaged data from other nodes.

Does a user actually want to do any of these things? Not a techie user, but a user. Someone who cares about their photos, for example.

One big use case, which I often run into, is that Dropbox is very valuable because it serves your files through a website. Yes, that means Dropbox and governments have access to that data. On the other hand, it's fine for Dropbox and governments to access my "chatlog with my ISP where I complain about a late fee and ask for a refund" that I saved to Dropbox but need to pull up on my laptop which doesn't have Dropbox installed. That type of thing is where Dropbox really shines, so I'm not sure I'd want to put my docs into a system which doesn't have a web interface.

Those kinds of "casual documents" are extremely common. I'd say most of my stuff in Dropbox isn't actually sensitive, whereas a small percentage is highly sensitive. It'd be great to store the highly sensitive into a filesystem like this, since I could control it directly, but that involves quite a lot of effort to set Ori up and to understand it. And it's not entirely clear that it's more secure than, say, encrypting my Dropbox files directly.

In summary, what precisely is the value add that Ori brings to the user? I'm trying hard to see it, and I want to believe there is one.


I think there is definitely a growing segment of people who do not want to contribute to the growing surveillance of their lives, but nevertheless do want to have a way to share their data and details with whoever they want, with trust, at a personal level.

I know a few families, for example, who are now using their own social-media apps built by family members, and for which there is no outside access - yet they have their own network, trust system, shared media and data, and so on.

It is completely within their control.

So there are people out there for whom Ori and its peer technologies are very important. Not everybody wants to just genuflect in front of the ultra-corporate ruler(s). Its quite possible to continue using computers unhindered by third parties; tools like Ori assist that - technologically as well as culturally. Get all your friends on their own p2p networks.


On the other hand, Synology NASes with new DSM already offer a better alternative - web access to data, sync with other computers and other services while you still own your data. In that case Ori doesn't bring all that much to the table :)


Synology NASes have had a number of security vulnerabilites and privacy breaches over the last several years.


Yes, and they were rather quickly patched. I'm not really sure what are you trying to say... to avoid security vulnerabilities, just upload all your data to a foreign datacenter?


check out the synology employee


Thank you for the explanation. That's very interesting. I see now that Ori would be vital for powering something like that.


Glad to help. I think it is very important to explain to people - in the now generation sense - right now that its dangerous to 'just give up and start using the Cloud', and that indeed the Cloud - and its issues - is not a new battle in computing, nor even a new and innovative feature in the candy-sense that it is coated now, but rather a continuation of a long-standing battle in the computer world about where data 'lives' and who 'owns' it. We swing back, forward, back and over again between the monolithic model, and the distributed model, in computing. The youngsters need to know that their iPhones are real computers and can do real computer things - un-impacted by outside/external entities. It is a real travesty that there are already new generations of kids who grow up thinking computers are something you need a credit card to use.


> Through the use of replication instances can be resilient and recover damaged data from other nodes

I sure want this. Setting multiple nodes might be cumbersome, but it should be worth it for important data.

Also, the use case you are presenting is for overly casual cases. People just caring about their photos could use Flickr or just about any random photo service with a private setting.

Dropbox users are already a step or two further down the line, and there are files like password databases (keepass databases for instance), and other sensible things you'd want to synch seamlessly but wouldn't want to be accessible to anyone else, even with just a warrant [0]

[0] I think you still can refuse to give a password if I'm not wrong


> [0] I think you still can refuse to give a password if I'm not wrong

Can't speak for the US, but in the UK and Australia at least it's a crime to withhold a password. Seems the US varies depending on jurisdiction. See http://en.wikipedia.org/wiki/Key_disclosure_law


I think forced password disclosure is sickening. In my opinion, they are forcing defendants to assist in their own prosecution.


I don't think that invalidates the presumption of innocence. It's the digital alternative of a search warrant. Imagine a world where prosecutors could not enter your house.


"Imagine a world where prosecutors could not enter your house"

I think I like this world.

It's not the presumption of innocence that's the problem, it's being forced to be a witness against yourself.


And what happens if you actually forget your password?


Offline is great for people living in the rest of the world where the internet can be extremely unreliable or inaccessible. For instance, all major cloud file services I've tried are blocked in China. If a new service isn't blocked today, it probably will be tomorrow because it allows easy sharing of information secure from censors. There isn't much hope of the web getting any more accessible in non-western countries but our home networks are still free.


It appears to be about the same service as Dropbox, except a bit less centralized. So it'll be faster because you don't have to go through a centralized server, e.g. your phone and computer can talk directly on WLAN instead of going through your slow internet connection


Dropbox already syncs on lan.


The beauty of software is: It could be built upon. Including a simpler setup/interface, and a web interface.


There has been several commercial ventures more recently building something along the lines of Ori (i.e. decentralized and secure dropbox alternatives). For companies and people's sensitive documents this can be big deal.

I agree use ability could be improved and I would really like getting more developers involved so we can finish a Windows port, iOS app, and a graphical interface to help make the whole system easier.

~ Ali


My company has three offices. We have about 200GB of shared storage. We often need to access and write files that are 10-20MB in size. Currently we have a centralized file server at the main office and a VPN. Using the file server at the main office is a pleasure, using it from the other two offices is slow. I am looking for a stable distributed filesystem, so that I can have a full copy of the data on a server at each of the three locations. I wish to share between each server and it's local clients via both NFS and Samba. Clients should be able to read and write to their local server, and the three servers should collaborate to keep all their data in sync. Can Ori offer this? Something else?


If you can give up NFS and only use Samba for mounting your storage, this sounds like the use-case for DFS Replication.

http://technet.microsoft.com/en-us/library/jj127250.aspx


I would stay away from something new and "researchy" like this.

Use something others have used and configured with documentation and all.

Take a look at distributed files systems. Those usually come with their own share of failure scenarios:

Some I know of are GlusterFS, Ceph (both made by Redhat I think) and Luster.

But as I said now you'll be in the business of understanding and configuring these. Especially how failures can happen.

See first if you can somehow improve the bandwidth between the sites, improve VPN settings, setting QoS flags. Separate networks.

Then take a look of FS-Cache/CacheFS.

http://en.wikipedia.org/wiki/CacheFS

It is a way to use local cache to improve _some_ access patterns. Specifically if you look here:

http://www.linux-mag.com/id/7378/

---

The goal of FS-Cache and CacheFS is to reduce network traffic because some of the data requests will be satisfied by local storage (CacheFS) reducing the amount of network traffic. The load on the server should also be reduced since it will not have to satisfy all data requests. Consequently, this reduction may make up for the increased file lookup time and file read time due to the cache.

---


Depending on the exclusivity/locking requirements, and the amount of changes, you might be better off with complete replication among offices, such as DropBox (if you trust them) / SparkleShare (if you want to host yourself - sync is less efficient though), or simply rsync scripts going back and forth.

A key for convenience, which DropBox delivers, is to fetch the data before you need it. 200GB is not that much in the grand scheme of things today - if you only have 1GB/day of changing data, it could be viable.


I set up syncthing, it's syncing now. I think this software will do exactly what I want. It's a standalone binary and runs as a standard user space app, keeping data synchronized between all the systems without the need to mount a new filesystem or do any advanced configuration.


I tried it ~2 weeks ago and was unable to figure out how to get it to synchronize across multiple computers as promised (could have been operator error or documentation lack). I also had a few times where I dismounted the virtual file system (or maybe not) and ended up with a repository that was in a broken state with an error message that said fixing it was not implemented yet.

I've also played with SparkleShare, which has been working well in my very limited testing. The downside is that it is written in .net which has a large footprint (pre-paid if you are in the Microsoft world). With SparkleShare, I'm using my own git repositories (gitolite) which works well but requires some hand editing of the configuration file to make it work.


It sounds like it might work (caveat: I haven't tried it yet). The best info I could find on it was this paper[1]. It gives some command line examples, explains how the data model works (spoiler: it's similar to git) and provides some benchmarks. The NFS comparison benchmarks were very interesting (generally faster over a WAN than NFS is over a LAN).

[1] http://dl.acm.org/ft_gateway.cfm?id=2522721&ftid=1403940&dwn...


Thanks for the link. "Ori over a WAN outperforms NFS over a LAN" that's a ridiculous claim, maybe if you ignore bandwidth. I might have to give this a shot though. I hope it's stable.


Can't say about Ori or a lot of the distributed file systems being showcased, but Panzura's global file system addresses this exact use case. But since they make enterprise products that run on dedicated hardware, they might be a bit pricey.


AFS might work for you. It would still be centralized, but at least you get local caching [1] compared to NFS for example.

[1] http://csg.sph.umich.edu/docs/unix/afs/


I have not looked into this much yet, but I think comparing it to Dropbox etc. like many are in here is silly. There is a huge amount of use cases where people work with large amounts of shared data in lan environments over nfs and such. This is very much tied to location and becomes a bit tricky to make offline, remote, conflict proof. Something like git can support distributed workflows in work groups, but it only works with light weight data and needs pretty educated users. I think programmes under appreciate how horrible it still is to work together with others using computers.


I wasn't quite able to grasp what this is. Is it similar to dropbox / btsync (without the bt part) in what it provides to the user? What are the usecases? Looks cool anyways, and I love that packages are already available!


Read the white paper to learn how to use it:

http://sigops.org/sosp/sosp13/papers/p151-mashtizadeh.pdf


Neither dropbox nor btsync are free software afaik. I would be excited about a viable open source alternative.



If only Syncthing was as easy to use as BTSync.


www.seafile.com is another



I don't see any end-user documentation (only a short building section). Is this not a release, but the project's "work in progress" page?


Yeah, I would have to say that Ori has the worst documentation ever. Except, this is pretty useful:

http://sigops.org/sosp/sosp13/papers/p151-mashtizadeh.pdf


You could always run 'man ori' to see some end-user documentation. I agree that it would be nice with some documentation on the home page but man is nice if you found none.


Reminds me of Brad Fitzpatrick's camlistore http://camlistore.org


Looks kind of interesting to me. How does it compare to use a git repo as a "decentralized file store"? What is the speed of this thing.. I see it uses fuse, which generally isnt that fast. Speed probably isnt a goal, but none the less important. To make this a succes, docs should be available. Now it feels like a research project to me.


I quickly scrolled through the paper a few people mentioned earlier. About speed:An evaluation shows that as a local file system, Ori has low overhead compared to a File system in User Space (FUSE) loopback driver; as a network file system, Ori over a WAN outperforms NFS over a LAN.

Interesting. Git is mentioned as well. I guess i have to read the paper after i finished cleaning the house ;)


This is pretty cool stuff, but when I tried it in mid-2014 (release 0.8.1), there were stability issues with orifs, and the user interface was rather confusing. Quite possibly these issues are resolved now, as I see there have been new commits again from October onwards.


Usually I want only a subset of my synchronized data on my phone. Can I set this up so all my data is synced to my nas but only some to my phone?


does it work best on local networks, or is it also good when there's high latency and lower bandwidth ?


Hallowed are the Ori


Came in to post this. Thank you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: