

Ori: A Secure Distributed File System - ahomescu1
http://ori.scs.stanford.edu/

======
SEJeff
I really wonder how this compares to the Tahoe LAFS (least authority file
system) which was built for similar reasons and has been around a lot longer:

[https://tahoe-lafs.org/trac/tahoe-lafs](https://tahoe-lafs.org/trac/tahoe-
lafs)

And in specific:

[https://tahoe-lafs.org/trac/tahoe-
lafs/browser/trunk/docs/ab...](https://tahoe-lafs.org/trac/tahoe-
lafs/browser/trunk/docs/about.rst)

Or the Redhat sponsored HekaFS:

[https://fedoraproject.org/wiki/Features/HekaFS?rd=Features/C...](https://fedoraproject.org/wiki/Features/HekaFS?rd=Features/CloudFS)

~~~
wmf
I think Ori is more similar to Dropbox or BitTorrent Sync in the sense that it
focuses on creating (nearly) full replicas on a fairly small number of
machines with P2P sync between them. The Ori paper explicitly says disk space
is cheap, which motivates their decision to store full replicas and history.
It's really a quite different design than client-server cluster filesystems
like Tahoe, Gluster/HekaFS, Ceph, HDFS, pNFS, etc.

I think Ori is confusing people by calling itself a file system instead of a
sync/backup system. Sure, it's implemented using FUSE, but it doesn't behave
like a traditional network file system.

~~~
theonewolf
I think it's OK being a file system.

All a fs is really is a specialized key-value store. Redis is a file system.

------
theonewolf
It isn't clear to me how this system is secure. Very little is actually said
on the website beyond using the word "secure" and the usage of the tool "ssh"
(which could have been maliciously tampered with or man-in-the-middled).

How exactly is data stored? Is the data stored encrypted should a device be
stolen? Is it easy to "revoke" trust from devices storing parts of one's data?

Again, very little on the security of the overall system is actually
described.

I hope the authors posted this or are here?

~~~
jperras
If you read the paper, the security they are referring to is most likely that
their data model is similar to git, using Merkle trees to ensure the integrity
of files and their histories:

> Our data model, like that of Git, forms a Merkle tree in which the Commit
> forms the root of the tree pointing to the root Tree object and previous
> Commit object(s). The root Tree object points to Blobs and other Tree ob-
> jects that are recursively encapsulated by a single hash. Ori supports
> signed commit objects, by signing the seri- alized Commit object with a
> public key. The Commit ob- ject is then serialized again with the key
> embedded into it, thus forcing subsequent signatures to encapsulate the full
> history with signatures. To verify the signature we recompute the hash of
> the Commit object omitting the signature and then verify it against the
> public key.

The filesystem they describe has a few other very interesting features that
paint it as a possible replacement for something like Dropbox, or even a
network filesystem like NFS. You should check it out.

~~~
theonewolf
Data integrity is just a small part of security. I wouldn't even call that
security at all.

I mean, we technically shouldn't trust our HDDs and should do data integrity
checking at every level.

I'm interested in what they mean here by security, and what things they
support beyond the normal integrity features of most modern file systems (zfs,
btrfs).

------
tobinharris
Would be nice if this had iOS, Java/Android and Windows Phone clients.

------
NatW
Link to the source paper:
[http://delivery.acm.org/10.1145/2530000/2522721/p151-mashtiz...](http://delivery.acm.org/10.1145/2530000/2522721/p151-mashtizadeh.pdf?ip=78.193.6.23&id=2522721&acc=OA&key=24B49002E011608CF08962C63678233A&CFID=400502279&CFTOKEN=45623156&__acm__=1389946062_8327da9921418091985f699c8d9c105c)

------
undoware
I'm already struggling with a diversity of versioning solutions. I'm obviously
going to be adding ori to the roster somewhere -- it looks AWESOME -- but I'm
also increasingly frustrated with the fact that there is no meta-level tool to
navigate all these sources of history:

1\. Git for code, obviously, just not binaries larger than 25mb. Profound
network effects (Github, hi) will keep this true indefinitely.

2\. For versioning large binaries, I go back and forth between boar and git-
annex, for the desultory reason that I can't really get down with either the
symlink insanity of git-annex or the old-school server/client svnish model of
boar.

2\. Couchdb, which I use for (1) couch apps, (2) a backup medium, and (3) the
main document store for my company. Couch is great, but is emphatically _not_
a filesystem -- you get CRUD, but few of the niceties you'd get with a 'real'
FS. That's okay -- the whole point of Couch is worse-is-better network-native
data storage -- but I can only be pacified so many times by a recitation of
the CAP theorem. At some point, you want a POSIX compliant filesystem, and
you're willing to bite many bullets to get there, including those that Couch
made a name for itself by dodging! In other words, you'll take a hit (probably
on partition tolerance) in order to get, say, transactional atomicity, and so
you'll toss Couch and go with a _SQL, at least for some use cases. (Don 't get
me wrong -- you can pry couchdb from my cold dead paws -- but I also know when
_not* to use it.)

3\. There are many ways of implementing file versioning in an RDBMS, and
between Concrete5, Owncloud, and the innumerable sqlite databases scattered
throughout my system, I'm sure all are in use, somewhere.

4\. My main file server runs btrfs, and sometimes zfs, and the excellent
'snapper' tool that comes with OpenSUSE helps me maintain and access fs-layer
diffs. My home directory is regularly snapshotted in this manner.

5\. For backing up the linux machines on my network, I use rsnapshot, which
uses non-symbolic links for creating differential backups. It's one of those
old-school solutions that just happens to work great.

6\. Google Drive and Dropbox both offer versioning, so at least when I have
these installed (not at the moment) I arguably have a sixth source of version
history.

7\. My vms are all getting snapshotted on the daily too, and I use bedup to
manage the btrfs partition where they are stored. The net effect is storage-
equivalent to differential backups, although admittedly less efficient to
create, at least on btrfs (deduplication occurs offline). But then again,
having an image is the gold standard for backups for good reason.

8\. And then there were eight. Welcome, ori. But you wouldn't happen to speak
git, couch, btrfs, zfs, vmdk, and sql, would you? Because that would make my
life a _lot_ easier.

~~~
IgorPartola
Yikes. It sounds like you are in a constant state of trying new file systems
and backup solutions. Why not simplify and go with a good enough solution?

Personally, I like zfs and rsnapshot. That is what I use on my NAS and it
seems to work without me spending time one maintaining it after the initial
setup. Why rsnapshot and not snapshots of the zfs volumes? Because I want to
keep lots of backups and my understanding is that zfs loses performance after
you cross into a few dozen snapshots on the type of hardware I have.

Now, I never understood backing up your home for. Why? I keep any documents on
the NAS (thanks VPN and ssh for making life easy here), dot files on GitHub,
and source code in git with origin on either GitHub or the NAS depending on if
it is private or not. Chat logs are going to be my one exception, but perhaps
the servers will back them up for me instead a la GChat.

~~~
undoware
You're right, there's a lot going on, but -- and you are just going to have to
believe me on this -- each method gives me something the others don't. As will
ori. And honestly, I don't mind the extra moving parts: having recently
suffered a catastrophic drive failure that set my business back a quarter, I
am okay with a never-too-much-overkill approach to backups.

My problems number two: On the first hand, there are often bizarre
interactions between versioning systems when they are hosted atop of one
another, breaking encapsulation in unexpected ways. You can't use .vmdk
snapshotting on btrfs, for example, because the performance is whatever the
opposite of 'breathtaking' is. (Sighgiving?) Meanwhile, a git repo on a
Dropbox share is going to chew itself to pieces the first time there's an fs-
level merge conflict in .git. I could go on.

The second problem is the flip side of the first -- there is no way of
importing and exporting history between versioning systems. I shouldn't
_worry_ about git and dropbox: dropbox should offer itself as UI for git.

