

Redundant Array of Independent Clouds - zooko
https://tahoe-lafs.org/~marlowe/TWN31.html

======
dhughes
Redundant Array of Independent Networked Storage or RAINS would sound so much
cooler considering it is "cloud" storage.

~~~
Cieplak
Redundant Array of Independent Networked Storage (RAINS) © 2012 dhughes

~~~
stephengillie
Random Array of Independent Data Centers (RAIDC)

Edit: maybe Random Array of In-dependent Clouds (RAIdC)?

------
j_s
Cool! First thoughts:

* how many of the supported options boil down to Amazon S3?

* is this the new "upload their important stuff on ftp, and let the rest of the world mirror it"? [https://groups.google.com/group/linux.dev.kernel/msg/76ae734...](https://groups.google.com/group/linux.dev.kernel/msg/76ae734d543e396d?pli=1)

* whoever put together this newsletter is clearly doing a great job for that community

~~~
zooko
> how many of the supported options boil down to Amazon S3?

That's a really good question. Diego's experiment used:

• memopal: I don't know if it uses S3

• SugarSync: I don't know if it uses S3

• syncplicity: I don't know if it uses S3

• googledrive: not using S3

• UbuntuOne: yes, it uses S3

• DropBox: yes, it uses S3

By the way, my startup, Least Authority Enterprises is working on a future
product which also goes by the codename "Redundant Array of Independent
Clouds". Our project is no relation to Diego Righi's experiment, except
perhaps we inspired him by talking about it.

We've received a research grant from DARPA to implement it. The backends we're
developing for are all guaranteed to be separate backends from each other --
none of them turn out to be front-ends for another one!

• Amazon S3

• OpenStack Swift/Rackspace Cloud Files

• Microsoft Azure Blob Storage

• Google Storage for Developers

------
uncle-ezra
Nice idea! This paper ([http://www.cs.cornell.edu/~hweather/publications/racs-
socc20...](http://www.cs.cornell.edu/~hweather/publications/racs-
socc2010.pdf)) called RACS from Cornell explored the same space about two
years ago, but it's really nice to see it rendered to practice in usable
filesystem form.

~~~
lprincehouse
One of the authors of RACS here. Indeed, the architecture of RACS is quite
similar to Tahoe-LAFS, with a proxy server that sends shares of data to
different cloud providers. With RACS, we focused on fault tolerance rather
than malicious modification of data: Cloud providers have their own internal
redundancy, so they (hopefully) protect their users from low-level hardware
failures... but a company itself can fail, or even raise prices suddenly which
can lead to an "economic failure", where the user's application becomes
prohibitively expensive to run. So RACS is intended to protect against things
like this. It can tolerate malicious modification of data shares on individual
cloud providers, but that kind of security wasn't our focus, so it's
interesting to see that Tahoe-LAFS takes that direction.

~~~
zooko
Hi -- I'm one of the authors of Tahoe-LAFS. I haven't read your RACS paper
before, but it looks pretty good. I appreciate the emphasis on real-world
economics and on the costs of vendor lock-in. I haven't yet completely
digested your results numbers to see how they inform my business, but I
definitely will.

It's too bad that you weren't aware of, or didn't cite, Tahoe-LAFS when you
wrote that paper! Even though you used my zfec library, which I created (by
copying Luigi Rizzo's feclib) for Tahoe-LAFS's use. Heh heh heh.

I tried to get Tahoe-LAFS's existence registered in the official academic
research world by publishing this:
[http://scholar.google.com/scholar?cites=7212771373747133487&...](http://scholar.google.com/scholar?cites=7212771373747133487&as_sdt=4005&sciodt=0,6&hl=en)
but it didn't really work. Most of the subsequent research that probably
should have cited Tahoe-LAFS still didn't.

Perhaps that 5-page paper was too telegraphic to communicate a lot of the
important properties. For example, it does not spell out the fact that Tahoe-
LAFS includes a kind of proof-of-storage/proof-of-retrievability protocol.
Also, perhaps, I chose too obscure of a venue to publish it in. I'm not sure.

For your reading pleasure here is a big rant by me on my blog, whining that
Tahoe-LAFS is deserving of more attention than HAIL (which you do cite):

[https://lafsgateway.zooko.com/uri/URI:DIR2-RO:d73ap7mtjvv7y6...](https://lafsgateway.zooko.com/uri/URI:DIR2-RO:d73ap7mtjvv7y6qsmmwqwai4ii:tq5tqejzulg7yj4h7nxuurpiuuz5jsgvczmdamcalpk2rc6gmbsq/klog.html#%5B%5BHAIL%3A%20A%20High-
Availability%20and%20Integrity%20Layer%20for%20Cloud%20Storage%5D%5D)

"It is frustrating to me that the authors of HAIL are apparently unaware of
Tahoe-LAFS, which elegantly solves most of the problems that they set out to
solve, which is open source, and which was deployed to the public, storing
millions of files, and summarized in a peer-reviewed workshop paper before the
HAIL paper was published."

~~~
zooko
You know, I have to admit that the main reason academic researchers don't
appreciate the sophisticated proof-of-work/proof-of-retrievability features in
Tahoe-LAFS is that _I didn't write it up_. The 5-page paper that I linked to
--
[http://scholar.google.com/scholar?cites=7212771373747133487&...](http://scholar.google.com/scholar?cites=7212771373747133487&as_sdt=4005&sciodt=0,6&hl=en)
\-- doesn't explicitly mention that it has proof-of-work/proof-of-
retrievability properties at _all_ , much less do the sort of thorough,
precise specification and proof that, for example, the HAIL paper has.

------
fsniper
Even before "cloud" or online storage was too hot or even showed up yet, I was
thinking of doing this.

Time was Google providing "unlimited" gmail storage and fuse gmail-fs was just
released. I looked for another fuse fs like a hotmail-fs but did not push hard
on it. I could not find and let my idea die.

It was hard to do and time consuming, but marginal gain would be small. Also
I'm a f __ __lazy system administrator. I hate coding :)

~~~
crusso
Yeah, a lot of us have had similar thoughts of using various forms of free
storage like our own hard drives and RAID setups.

This should just go toward reinforcing our understanding that ideas are the
easy part. Follow-through is the hard part.

------
zooko
See screenshots: <http://www.sickness.it/crazycloudexperiment1.png> and
<http://www.sickness.it/crazycloudexperiment2.png> from Diego "sickness"
Righi's notes: <http://www.sickness.it/crazycloudexperiment.txt>

------
emmelaich
I;m sure many of thought of this, nice to see someone do it.

My idea for backup would use something like DIBS (
<http://www.mit.edu/~emin/source_code/dibs/>) but instead of peers, use many
free salami slice sizes of storage from cloud/hosting platforms.

------
moe
Sidekick: Why am I not surprised that he replaced _inexpensive_ with
_independent_ in the original acronym...

Other than that, interesting project.

~~~
MaxGabriel
Well in fairness, both inexpensive and independent are used in the original
acronym

------
memopal
Memopal does not use Amazon S3. Memopal is based on the Memopal Global File
System (MGFS), the archiving technology created and used by Memopal. MGFS is a
distributed file system, designed to be highly reliable, scalable and at the
lowest cost per Gb possible. read more
<http://www.memopal.com/en/technology.htm>

------
conformal
nice work zooko et al :)

my understanding is that tahoe-lafs is meant to be used as a live filesystem.
how does the redundancy configuration affect latency? i would guess a cloned
volume ("RAID 1") would be faster than a distributed volume (e.g. "RAID 5" or
"RAID 6").

~~~
zooko
Tahoe-LAFS performance leaves a lot to be desired, for various reasons, but
the erasure-coding levels (i.e. the degree of distribution of each file via
"RAID"-like math) aren't necessarily the most important component. Brian
Warner did some thorough benchmarks using dedicated servers on a LAN, and
specifically look at how increasing the number of shares ("K") affected
throughput:

[https://tahoe-lafs.org/trac/tahoe-
lafs/attachment/wiki/Perfo...](https://tahoe-lafs.org/trac/tahoe-
lafs/attachment/wiki/Performance/Sep2011/MDMF-100MB-partial.png)

The different colors of the samples there are for three different settings of
how many shares the file was erasure-coded (RAIDed) into: 3, 30, or 60 shares.
This type of file ("MDMF" type) seems to go about as fast at any of those
three levels of distribution, but this older and more common type --
[https://tahoe-lafs.org/trac/tahoe-
lafs/attachment/wiki/Perfo...](https://tahoe-lafs.org/trac/tahoe-
lafs/attachment/wiki/Performance/Sep2011/CHK-100MB-partial.png) \-- ("CHK"
type, which is for immutable files) performs much worse for larger levels of
distribution. There's probably just some dumb bug which causes this slowdown.
This page has some ideas as to what's causing it: [https://tahoe-
lafs.org/trac/tahoe-lafs/wiki/Performance/Sep2...](https://tahoe-
lafs.org/trac/tahoe-lafs/wiki/Performance/Sep2011)

------
riffraff
the idea of using multiple independent free providers as a unified one it's
something that could be very popular among some of my friends, if someone
could slap a nice UI on it I believe there would be a good market for it.

------
mey
Beowulf cluster?

