
Space Monkey: Taking the cloud out of the datacenter - jacobwg
http://www.kickstarter.com/projects/clintgc/space-monkey-taking-the-cloud-out-of-the-datacente
======
mistercow
Without much more technical detail, it's hard to say if this is technically
sound, but I've never seen a tech project better suited to Kickstarter. Here's
a product that absolutely depends on having lots of early adopters, and using
Kickstarter means that unless it has those adopters, you won't be investing in
its inevitable failure.

------
yread
I don't get it. How does it work?

How big is the hard drive in there? Judging from the price, the disk inside
can't be much bigger than 2TB. That means if I have 1 TB of data and all the
other have 1TB of their own data, it can only "replicated" exactly once
(unless there are dedicated "cloud" servers involved), Which means, if my
device breaks all devices which received a portion of my data need to be
online for me to restore the content.

How does the encryption work? Surely the AES key can't be stored on the device
- if it breaks your data is lost. If it's stored in the cloud it's not any
safer than cloud storage.

~~~
ryankshaw
also getting multiple levels of redundancy doesn't necessarily need 1 X
redundancy_level physical storage. see:
<http://en.wikipedia.org/wiki/Erasure_code> and
<http://en.wikipedia.org/wiki/RAID#Comparison>

~~~
bryanlarsen
Does this work with encrypted data, though? Properly encrypted data is
indistinguishable from random data...

~~~
ryankshaw
yes, the concept of erasure codes just works with the raw bits. so it would
work equally for encrypted/unencrypted data from the wikipedia page, it
"transforms a message of k symbols into a longer message (code word) with n
symbols such that the original message can be recovered from a subset of the n
symbols"

------
Rhapso
I think this is a step both in the right direction, and the wrong direction.
This attempts to mitigate issues with cloud storage by giving you hardware to
keep (actually a really good idea! It reduces latency and allows for file
recovery) but the service is still controlled by a company/central hub. If the
hub fails, the system fails. We need a system that works fully distributively
(and preferably open source so anybody can add more nodes)

~~~
mistercow
Also, there's really no reason for this to be a special device. It easily be a
blob on your computer's hard drive. An availability problem for laptops, but
you'd best have that problem solved anyway since residential internet
connections aren't exactly 100% uptime affairs.

~~~
e40
_Also, there's really no reason for this to be a special device._

It will vastly increase reliability, though. I can think of many reasons why
having it on a general purpose hard drive is a really bad idea (turned off
accidentally, freezes due to bad apps, virus infection, etc, etc).

~~~
mistercow
Come to think of it, there's really no reason you couldn't do both, and I
wonder if this system does.

------
agentultra
File stores like this have been pretty common place for a long time now
actually -- systems like Freenet, WASTE, etc.

Interesting to see someone stepping up and delivering a consumer-friendly
experience on top of such technology. It's incredibly useful but reliant on a
healthy network of peers. A plug-in-and-forget solution with good clients that
integrate well into the ecosystem of devices we use today, I think, is very
clever.

I'd prefer to see something open-source but this is definitely a step in an
interesting direction. Personally I limit my use of "cloud," services because
I don't trust other people hosting things like my contacts list, calendar
appointments, etc. I do see the benefit and convenience of such services
though and if this device/service is capable of synchronizing more of that
data from my applications between my devices I would buy into it in a
heartbeat.

I hope the kickstarter goes well and this thing takes off (or something
similar... again, open source is preferable).

~~~
dreen
It isn't quite obvious from the kickstarter, but it doesn't seem to be
decentralized and anonymous like Freenet (dont know about WASTE), which I
thought was the main point of Freenet.

------
6ren
How much of your terabyte is yours? The duplication seems to be at least three
times (more?), so that would be 333GB. Note: it doesn't matter how finely you
cut it up, you need duplication to avoid loss.

It can be ameliorated by the cloud tricks of compression and git-like copy-
detection across nodes (people downloaded the same movie, music, image,
document, software etc) giving aggregate savings.

Of course, most people won't use their full quota initially, and meanwhile
more adopters come online with fresh excess capacity... a kind of ponzi-
scheme/chain-letter (to be fair, as drive prices fall over years, units will
likely become 2TB, 4TB etc).

~~~
varikin
except you can't deduplicate against what other people have backed up
(multiple people backing up the same movie, etc) and encrypt your backups. If
my backups are being put on a device owned by someone else, I want that
encrypted so they can't read it.

------
josh2600
I met these guys when they launched at either Calacanis' Launch.co or
Arrington's Disrupt. I remember them winning the event too.

Good idea, but I question the scalability of the solution. How does
deduplication work on remote devices if you have a lot of them. Is there a
checksum of the file against a central server? Curious.

~~~
jonathanjaeger
They launched and won the Launch Conference. That doesn't necessarily de-risk
their plans from a technical perspective, but they got some validation from
those who vetted the team and product already on the conference and judges
panel side of things.

~~~
josh2600
They also had already raised funding prior to going on stage :/. I like this
idea but there's a lot of technical hurdles that I never hear answers to :(.

------
notum
Peer to peer backup? Reminds me too much of Skype Supernodes. The premise is
sound but in reality you're paying $10 a month to help distribute other
people's data via your own bandwidth? This goes both ways, of course, but how
long until someone finds a way to make their SpaceMonkey exclusively a network
leecher, only fragmenting owner's backup but not delivering other people's
data?

I could be entirely wrong, of course, someone correct me please.

~~~
lifeisstillgood
Doesn't need to be "hacked" - you just need to constantly churn your 1TB while
someone else puts 500MB up and adds a photo a month. Effectively you are
leeching.

Correct me if I am wriong but this is storing data as torrents and each device
is a torrent client and store box?

~~~
notum
You're right. People could do that to save their upload bandwidth, because
unless you route your network through the device it can't know how much of
your bandwidth it's stealing and if you're currently using it etc. I suspect
most of the target audience doesn't have QoS properly setup in their routers.

I'm not sure how valid the torrent analogy is in this case.

------
austenallred
There's really no question about whether or not it will get funded in my mind;
the early-bird option alone (which is half sold out) will take them to 50K -
half of their goal. I bet this funds today.

------
bryanlarsen
Is this just a fancy interface to Tahoe-LAFS? <https://tahoe-
lafs.org/~warner/pycon-tahoe.html>

~~~
rsync
I suspect this as well - the part where they talk about "even if half of the
remaining nodes disappeared" makes me think that a Tahoe (or tahoe style)
technology is behind this.

Interestingly, the Tahoe people have their own product to offer:

<https://leastauthority.com/>

... although I have no idea what state of production that is in ...

------
susi22
I'm waiting for the day that a hacker takes over their central update server
and bricks all devices worldwide and makes them non-upgradable (or makes them
into bots). How would you deal with that? Ask them all to send it in?

Not sure if I like this. $10/month for a TB is the same as Amazon glacier.
Sure that's not the same since Glacier is for archival but still... It's not
THAT cheap.

~~~
the_paul
(a) I think that's the same problem that all consumer devices have that "phone
home" for updates; game consoles, TiVos, smart TVs, VoIP phone adapters, etc.

(b) If all you need is archiving of your stuff, and you don't want ongoing
fast access to your movies, music, photos, etc., then sure. Glacier is
probably a better fit.

------
rheide
Why would I want this? I can use CrashPlan to backup my files to the 'cloud'
safely. I don't care where it's stored, as long as it's safe. For fast access
and high availability I could just buy a hard drive instead of a device that
acts like a hard drive but instead stores other people's files on it.

~~~
ryusage
If all you want are backups, then no, I don't imagine this is very compelling.

The idea is to actually access these files regularly from more than one
computer. Hence the web interface and the emphasis on mobile devices. It could
also serve as a media library for various devices, not just in your home, but
presumably anywhere.

Dropbox and plenty of others basically do this already, but $10/mo at Dropbox
only gets you 100GB. And I don't think it does streaming, though I could be
wrong.

Personally, I'm curious how they access the files externally. The device must
act as a server, I suppose, but it seems like firewalls would present an issue
to less technical people. Maybe they just assume those people would never buy
this?

------
asm89
Really like the design. What I'd like to see is a device with a design like
this that also has:

\- options to pair up with specific other devices for the backups (family,
trusted friends etc)

\- a web app for posting status updates etc (implementing tent.io?)

Basically the things social networks offer now, but with you owning your own
data.

------
RoryH
Why is this a subscription pricing model?.. if all the data is stored on peers
Space Monkeys, then the only ongoing cost of keeping it running is electricity
for the unit and bandwidth for the connection.

Unless it's a way to draw out more money over time from the user?

~~~
the_paul
Yeah, Space Monkey is a business, and we hope to be able to make at least some
money off the whole deal in the end.

But also yes, there are nontrivial ongoing costs to keeping the network
working and working right. For example, we need dedicated online servers to
allow NATted devices to talk to each other, coordinate storage use, and deploy
security patches and bugfixes, and we want to add more features over time.

------
JustinAiken
So if I'm interacting with other users' devices, why am I paying $10/month?

This seems like a great (nay, genius) idea if you just buy a device that
accepts a standard 3.5 inch HD, half the space is yours, half is for
distributed backup, no monthly fees, just the upfront $ for the box itself..

------
gz5
While engineer in me loves edge distributed and duplicated shards of encrypted
data, does the business model hold up against the economies of the giant cloud
providers doing it all in a few of their data centers?

------
egb
Sounds interesting, but I'd want to know what the estimates are for up/down
bandwidth monthly. US data caps are common and tech like that could suck them
up pretty quickly...

~~~
jbellis
Both upload and download can be capped in the Preferences.

Upload, where bandwidth limits are the worst, is almost entirely based on
replicating data you've added to Space Monkey.

(I have an alpha Space Monkey device.)

------
antr
The early-bird option is already selling like hot cakes.

------
silasb
This is actually a really interesting device. I wonder how it works for a
household though. Can there be different accounts setup per user?

------
politician
It'd be nice if this was paired with BTC microtransactions for paying the
owners of the devices that your blocks are on.

------
siculars
Does this use <http://ceph.com/> ? Seems to take a similar approach.

------
durana
Does anyone know the Seagate device they used for their 'Alpha Test Network'?

~~~
rckclmbr
Seagate Goflex Home.
[http://www.newegg.com/Product/Product.aspx?Item=N82E16822148...](http://www.newegg.com/Product/Product.aspx?Item=N82E16822148614)

------
venomsnake
Where is the encryption done and who has the keys?

How reliable is the network - will you be able to cope with Sandy style
blackout?

Will there be a software component in which I designate a hard drive on my
(always on) pc and skip the external device entirely?

How many times the data will be backuped?If only twice than if my home device
get stolen/eaten by a rabbit/simply breaks then it is needed only one hardware
failure somewhere else in the system to have stuff lost forever.

~~~
the_paul
The encryption is done on the device. Other people's devices (obviously) won't
have your encryption key, but since Space Monkey will have a web interface to
get at your files, you can tell that Space Monkey will have access to your
key.

Regarding durability, the Kickstarter page mentions:

    
    
        Q: Is my data safe?
    
        A: Yes! Super safe. Here’s why: when you put files in
        Space Monkey, you not only have a copy of everything on
        your Space Monkey device, but each file is chopped up into
        tiny pieces, encrypted, then stored to dozens of locations
        outside of your home, in such a way that even if half of
        those locations were destroyed, all your files would still
        be safe.
    

We (yeah, I work there) are gonna be taking the reliability and privacy of the
network VERY seriously; no one would want to use a storage service that loses
your data.

~~~
rsync
I wonder if I may suggest the book:

_Normal Accidents_ by Charles Perrow

Please, please read this book RE: complex, tightly coupled systems.

------
peterwwillis
tl;dr Space Monkey is a subscription-based Dropbox with a home NAS that
replicates data offsite.

They require hardware/software/network engineering, manufacturing, support,
sales, and more scaling to support these products.

I still don't understand how exactly they store all the data since this is
supposed to be removing datacenter overhead, not adding to it. From what i've
read it sounds like the home devices themselves are the redundant storage for
all the other customers. Which sounds terrible.

~~~
austenallred
It doesn't just replicate data off-site, it duplicates and distributes your
data to other spacemonkey users. It's like P2P cloud storage.

~~~
peterwwillis
...Yeah, that's what I said. Still sounds bad, for three reasons: bandwidth,
redundancy and availability.

Bandwidth is a simple consideration. In America we really haven't caught up to
the bandwidth of most other 1st-world nations. Some reports give the average
download speed of American broadband users at 7.1Mbps. Netflix reports the
highest average broadband speed is with Google Fiber, at 3.45Mbps. Large
swaths of the country still operate on 1.5Mbit connections or less, with
upload caps being more like 256Kb for those users. Due to the disparity in
speeds we can assume most of our transfers from home users would be limited to
relatively slow speeds, not to mention (supposedly) using idle bandwidth.

If you need your files, how long is it going to take to get them? Assuming
something like 2.5Mbps down, and if there were five copies of your data
transferred at 256Kbps, that's 160KBps. Downloading at that speed would take
77.6 days to download the entire archive.

Then there's redundancy to consider. Disk drives are limited in capacity.
Currently 4TB drives are available, with 5TB by the end of the year. To give
each customer 1TB of storage while still storing other customer data on the
device, you have to decide first how redundant you want the data.

If you have a fancy algorithm you can store hundreds of clients' data on the
device. Maybe they only have a couple small files, so they can be replicated
in lots of places in a small amount of space. But eventually the files will
grow in size, so you'll have to remove some redundant copies to make room for
maximum capacity. Let's assume they use something like RAID-6, because RAID-5
is shitty for large capacity small arrays. Since we assume a 4TB hard drive,
we can have N-2 storage, so for 4TB that's 2TB; make it N-3 storage and you
can fit 1TB of user data and 3TB of random customer parity.

With maximum capacity use, if three replicated copies of your data go away,
your data goes away. How likely is this to happen? This[1] article explains
how as drives increase in capacity they aren't increasing in reliability,
which results in faster unrecoverable errors. As time goes by it's more likely
your data will be lost.

This is assuming your data is available. If their software engineers are smart
they might develop an algorithm that can take into account actual used space
and add or shrink the number of replications across the network based on
availability. So even if you have used 1TB of data, as long as other people
have extra free space, you can keep replicating copies of your data to more
devices. Sounds good!

Let's say your data is replicated on five devices. One device holding your
data fails and is sent in for repair. Another is taken offline because the
owner's internet connection is taken offline by a storm. Yet another is
available, but is on a very slow connection. Your data is now on two devices.
Sounds fine.

In the event that your device is unavailable, you will be able to replicate
your 1TB from the two people who hold parity copies of your data. How much
data that is, and how long it takes to recover, and whether or not that
transfer puts one of their devices over the limit for a URE, is subject
several factors.

Of course, there's many more factors to consider here, and i'm not an expert
in any of them. But before you trust your data to a $10-a-month service, ask
yourself: How reliable is the drive, how fast is the connection, and how much
do you really need your data?

[1] [http://www.zdnet.com/blog/storage/why-raid-6-stops-
working-i...](http://www.zdnet.com/blog/storage/why-raid-6-stops-working-
in-2019/805)

~~~
rckclmbr
1\. Each node is paired with nodes that are "alike", and bandwidth is
considered. You should only be paired with other nodes that have bandwidth
similar to you.

2\. Restoring backups takes a while. It's just the way it is. But the client
should be smart enough to get prioritized data first -- if you need to open a
picture, it should fetch that first, so you can open it right away.

3\. Files are broken apart and stored in chunks, so "growing size of files" is
irrelevant.

4\. The probability of your data being lost doesn't increase over time. The
node is constantly checking the status of your backups. If a chunk is missing
over a long period of time, it will back it up again. Redundancy is only part
of the solution -- some missing chunks are recoverable through erasure
encoding (like in the article you mentioned). Your node backs up to hundreds
of other nodes, reducing risk of failure. Think of having 200 drives in an
array instead of 7.

5\. The article is somewhat FUD. Making projections for 2019 based on hardware
speed where we're at now? Many embedded devices are already replacing ARM
processors with dual core intel. As data sizes increase, our processing power
will also increase.

~~~
SteveArmstrong
3\. I think he was referring to aggregate file size (total amount stored) not
a single file. If that's the case, his point still stands.

4\. He was probably saying that, as the average age of the nodes in the
network goes up, the chance of them failing goes up. Also, the duplication you
say (hundreds of copies) is in direct opposition to usable space (hundreds of
copies means you can only use 1/100 of the space on your node. Because of
this, I don't think your files will ever be duplicated on the order of
"hundreds", probably less than 10. Also, it's important to keep clear: My node
will backup to hundreds of other nodes, but each individual file fragment will
only be duplicated to 3 (or whatever) nodes. That smaller number is the
important one that keeps getting discussed.

~~~
the_paul
There won't be hundreds of copies. A device may back up data to hundreds of
other devices in total, though.

Space Monkey data will be resilient to considerably more than 3 nodes failing.

~~~
peterwwillis
Hundreds of devices storing small files on your one hard drive could be more
iops than the device can handle. Does Space Monkey account for other users
potentially killing the performance of a single device?

