Hacker News new | past | comments | ask | show | jobs | submit login
Ori: A Secure Distributed File System (stanford.edu)
87 points by ahomescu1 on Jan 16, 2014 | hide | past | web | favorite | 16 comments



I really wonder how this compares to the Tahoe LAFS (least authority file system) which was built for similar reasons and has been around a lot longer:

https://tahoe-lafs.org/trac/tahoe-lafs

And in specific:

https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/docs/ab...

Or the Redhat sponsored HekaFS:

https://fedoraproject.org/wiki/Features/HekaFS?rd=Features/C...


I think Ori is more similar to Dropbox or BitTorrent Sync in the sense that it focuses on creating (nearly) full replicas on a fairly small number of machines with P2P sync between them. The Ori paper explicitly says disk space is cheap, which motivates their decision to store full replicas and history. It's really a quite different design than client-server cluster filesystems like Tahoe, Gluster/HekaFS, Ceph, HDFS, pNFS, etc.

I think Ori is confusing people by calling itself a file system instead of a sync/backup system. Sure, it's implemented using FUSE, but it doesn't behave like a traditional network file system.


I think it's OK being a file system.

All a fs is really is a specialized key-value store. Redis is a file system.


It isn't clear to me how this system is secure. Very little is actually said on the website beyond using the word "secure" and the usage of the tool "ssh" (which could have been maliciously tampered with or man-in-the-middled).

How exactly is data stored? Is the data stored encrypted should a device be stolen? Is it easy to "revoke" trust from devices storing parts of one's data?

Again, very little on the security of the overall system is actually described.

I hope the authors posted this or are here?


If you read the paper, the security they are referring to is most likely that their data model is similar to git, using Merkle trees to ensure the integrity of files and their histories:

> Our data model, like that of Git, forms a Merkle tree in which the Commit forms the root of the tree pointing to the root Tree object and previous Commit object(s). The root Tree object points to Blobs and other Tree ob- jects that are recursively encapsulated by a single hash. Ori supports signed commit objects, by signing the seri- alized Commit object with a public key. The Commit ob- ject is then serialized again with the key embedded into it, thus forcing subsequent signatures to encapsulate the full history with signatures. To verify the signature we recompute the hash of the Commit object omitting the signature and then verify it against the public key.

The filesystem they describe has a few other very interesting features that paint it as a possible replacement for something like Dropbox, or even a network filesystem like NFS. You should check it out.


Data integrity is just a small part of security. I wouldn't even call that security at all.

I mean, we technically shouldn't trust our HDDs and should do data integrity checking at every level.

I'm interested in what they mean here by security, and what things they support beyond the normal integrity features of most modern file systems (zfs, btrfs).


Ori's internal storage is very similar to git, so it's difficult to modify history without being detected. I think that's the main security claim they're making.


Right, but data integrity doesn't equate to security in my mind.

If that's the main security claim, than I feel that much isn't new here or even "secure."


IMO security is the least interesting part of Ori. I think efficient syncing is the key feature.


Perhaps you could read the paper instead of being spoonfed by somebody?


I was at the conference where they presented this. I'm actually in charge of the conference website (Wolfgang Richter, http://sigops.org/sosp/sosp13/cfp.html#pc).

At the conference, and in the paper title/abstract, it did not seem that _security_ was a focal point, or even a point, of their design.

I understand already that they protect the _integrity_ of data---but I feel like that should be a given for any modern file system anyways.

I am more specifically interested about the security of the file system in general. How does it secure data in a distributed setting? Is there anything new and cool there?


Would be nice if this had iOS, Java/Android and Windows Phone clients.



I'm already struggling with a diversity of versioning solutions. I'm obviously going to be adding ori to the roster somewhere -- it looks AWESOME -- but I'm also increasingly frustrated with the fact that there is no meta-level tool to navigate all these sources of history:

1. Git for code, obviously, just not binaries larger than 25mb. Profound network effects (Github, hi) will keep this true indefinitely.

2. For versioning large binaries, I go back and forth between boar and git-annex, for the desultory reason that I can't really get down with either the symlink insanity of git-annex or the old-school server/client svnish model of boar.

2. Couchdb, which I use for (1) couch apps, (2) a backup medium, and (3) the main document store for my company. Couch is great, but is emphatically not a filesystem -- you get CRUD, but few of the niceties you'd get with a 'real' FS. That's okay -- the whole point of Couch is worse-is-better network-native data storage -- but I can only be pacified so many times by a recitation of the CAP theorem. At some point, you want a POSIX compliant filesystem, and you're willing to bite many bullets to get there, including those that Couch made a name for itself by dodging! In other words, you'll take a hit (probably on partition tolerance) in order to get, say, transactional atomicity, and so you'll toss Couch and go with a SQL, at least for some use cases. (Don't get me wrong -- you can pry couchdb from my cold dead paws -- but I also know when not* to use it.)

3. There are many ways of implementing file versioning in an RDBMS, and between Concrete5, Owncloud, and the innumerable sqlite databases scattered throughout my system, I'm sure all are in use, somewhere.

4. My main file server runs btrfs, and sometimes zfs, and the excellent 'snapper' tool that comes with OpenSUSE helps me maintain and access fs-layer diffs. My home directory is regularly snapshotted in this manner.

5. For backing up the linux machines on my network, I use rsnapshot, which uses non-symbolic links for creating differential backups. It's one of those old-school solutions that just happens to work great.

6. Google Drive and Dropbox both offer versioning, so at least when I have these installed (not at the moment) I arguably have a sixth source of version history.

7. My vms are all getting snapshotted on the daily too, and I use bedup to manage the btrfs partition where they are stored. The net effect is storage-equivalent to differential backups, although admittedly less efficient to create, at least on btrfs (deduplication occurs offline). But then again, having an image is the gold standard for backups for good reason.

8. And then there were eight. Welcome, ori. But you wouldn't happen to speak git, couch, btrfs, zfs, vmdk, and sql, would you? Because that would make my life a lot easier.


Yikes. It sounds like you are in a constant state of trying new file systems and backup solutions. Why not simplify and go with a good enough solution?

Personally, I like zfs and rsnapshot. That is what I use on my NAS and it seems to work without me spending time one maintaining it after the initial setup. Why rsnapshot and not snapshots of the zfs volumes? Because I want to keep lots of backups and my understanding is that zfs loses performance after you cross into a few dozen snapshots on the type of hardware I have.

Now, I never understood backing up your home for. Why? I keep any documents on the NAS (thanks VPN and ssh for making life easy here), dot files on GitHub, and source code in git with origin on either GitHub or the NAS depending on if it is private or not. Chat logs are going to be my one exception, but perhaps the servers will back them up for me instead a la GChat.


You're right, there's a lot going on, but -- and you are just going to have to believe me on this -- each method gives me something the others don't. As will ori. And honestly, I don't mind the extra moving parts: having recently suffered a catastrophic drive failure that set my business back a quarter, I am okay with a never-too-much-overkill approach to backups.

My problems number two: On the first hand, there are often bizarre interactions between versioning systems when they are hosted atop of one another, breaking encapsulation in unexpected ways. You can't use .vmdk snapshotting on btrfs, for example, because the performance is whatever the opposite of 'breathtaking' is. (Sighgiving?) Meanwhile, a git repo on a Dropbox share is going to chew itself to pieces the first time there's an fs-level merge conflict in .git. I could go on.

The second problem is the flip side of the first -- there is no way of importing and exporting history between versioning systems. I shouldn't worry about git and dropbox: dropbox should offer itself as UI for git.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: