

The Cloud I'd Like to See - NelsonMinar
http://machine-theory.com/the-cloud-id-like-to-see

======
lobster_johnson
Jungle Disk (<https://www.jungledisk.com/>) is pretty much what the article
describes:

* Files are stored in a provider of your choice (current options are S3 and RackSpace)

* Encrypted using a key stored locally

* Looks like a local disk

* Files are cached locally according to a set limit

Unfortunately, the implementation is too sloppy.

At least on OS X, the server is accessed using a local WebDAV server that
proxies all of the file access. The WebDAV server is a piece of crap that was
extremely slow, and would periodically, mysteriously drop its connection,
which would cause Finder to stall and complain. (I reported this problem to
the Jungle people, but they were not able to figure out the problem, even from
the debug logs.)

The caching does not work in practice. It seems to not support streaming -- if
you try to access a file that is not in the cache, it will download the entire
file before you can read a single byte from it. (Actually this is merely a
hypothesis at this point: I tried using Jungle Disk for my MP3 library, but
whenever I tried to play a non-cached file, I had to wait half a minute for
the file to be downloaded. This problem may be because MP3 files often store
their song metadata, which players like Spotify use, at the _end_ of the file,
and Jungle most likely does not support seeking into uncached files.)

There is no reason why a service such as Jungle Disk could not work, but it's
all in the implementation.

------
saurik
If you really want to see something, you have to come up with who would build
it and why. The market generally doesn't want to pay much for software, but
are willing to pay through the nose for services. If you don't think through
this part you really are just whining, and quite often are even doing so to
people who already agree with you.

This means that there is very little money available to build something like
this (where one of the key properties was "use the provider of your choice for
the data") unless it at least goes through a single centralized middle-man
first.

In fact, I will go so far as to say that it would have a purely negative
effect on this space: at this time, companies offering similar devices are
able to make enough money to, well, spend it on R&D in this space.

In a world where the revenue stream becomes a combination o "dumb hosting" and
"software" you will see fewer features, less reliability, poor scalability,
and generally a slowing down of this entire area of the storage market.

Which, of course, then means that all of the other comments in this article
regarding "rethinking the filesystem" is almost certainly not something you
will see from this kind of project, whereas I'd be floors if Dropbox wasn't
looking at that right now and we all know Apple is (albeit using profit from
integrated hardware, not hosting).

~~~
kwindla
To vastly oversimplify, this stuff happens in one of two ways: like email or
like Facebook. (Like linux or like iOS. Like SQL or like SAP. Like the web
browser or like Photoshop.)

In other words, a standard protocol, plus adoption, creates lots of
opportunities to make money as a service provider and by building capabilities
that are extensions and integrations. On the other hand, a network-effects
monopoly creates opportunities to scale by monetizing lots of users in a
centralized way, and to play a gate-keeper role for other participants in the
ecosystem.

I can imagine "the next generation filesystem" work taking either form.

~~~
saurik
Right; in fact: exactly. The challenge, again, is to come up with who will
build it any why (or, to put it differently, who will pay for it, and again
why). Additionally, as this is becoming increasingly interwoven in this
thread: how further innovation of the core protocol will be funded.

Really, my problem is that the original article comes off as a whine to the
universe about something that isn't actually something that is fleshed out
enough to happen. We've all had these ideas... some of us have even built
companies trying. Most of these failed, with the successes remaining small due
to an inability to effectively monetize the offering.

That's why I use the term "challenge": if you want to write an article like
this, it shouldn't just be "this would be cool": such a simplistic article is
simply condescending to the numerous people (which may in fact be "all
developers") who have had the same vision.

(some unrelated-to-that-but-following-up-on-your-post parentheticals on
innovation:)

(Linux doesn't innovate what it means to be an operating system: the goal of
that project is to be a world-class implementation of proven designs. The few
interesting mechanisms they have added were taken from other dying
environments, like Solaris.)

(E-mail was an attempt to standardize a bunch of existing incompatible
practice, and in the zillions of years since it was standardized, Exchange,
Gmail, and Facebook are the only real examples of attempts to innovate on the
design, and their innovations were possible due to gatekeeper benefits.)

------
mark_l_watson
Please let me know if this is unsafe, because I do it: on OS X I keep two
small encrypted Disk Volumes for two classes of files that I consider
confidential.

These are only 10MB each, and I realize that when I change a file in one of
them, DropBox has to update a 10MB file.

Does anyone know anything wrong with this, security wise? (I just keep my own
private files stored this way, not customer files which are only on my
encrypted OS X FileVault file system.)

~~~
delinka
If you're exceptionally paranoid, an adversary with access to your encrypted
.dmg on Dropbox _may_ be able to deduce sensitive information by analyzing the
changes in the file. It doesn't necessarily mean your data is at risk, but
maybe it could point your adversary to the bits of information you change most
often. This would be true with any encrypted file using any algorithm.

That said, unless you've gained the attention of a TLA (that's Three Letter
Agency), it's a minuscule risk and I really wouldn't worry about it.

~~~
ontoillogical
Can you point me to any papers about how this sort of attack would be done?

------
tsmith
"But I still have a couple hundred gigabytes in my home directory that I pull
across to every new computer I set up."

That seems... like a lot. Maybe I'm used to the space constraints of
tablets/netbooks/SSDs, but I've been pushing everything not inherently tied to
my local machine to a home NAS (e.g. a Kurobox or Buffalo NAS) for years now.

------
mikecane
I'm less interested in version syncing since I don't use a large variety of
devices. I just want something I can set up at home, as a user, and be able to
access via a Net connection. But I also want every bit of security that he as
a technical person desires. I want all this to reside on hardware in my home,
not somewhere else where the government can examine it without having to
notify me first. And it must be dead simple. Put software on PC (or a Mac),
point to directories, put client on remote device (tablet or phone), and BAM!
done. And I want to be able to _download_ locally, even video -- not stream.
Now I will sit and wait for Godot.

~~~
icebraining
I think Godot is Joey Hess: [http://www.kickstarter.com/projects/joeyh/git-
annex-assistan...](http://www.kickstarter.com/projects/joeyh/git-annex-
assistant-like-dropbox-but-with-your-own/)

------
joewest
This reminds me of OceanStore:

    
    
      > OceanStore is a global persistent data store designed to
      > scale to billions of users. It provides a consistent,
      > highly-available, and durable storage utility atop an
      > infrastructure comprised of untrusted servers.
    

<http://oceanstore.cs.berkeley.edu/info/overview.html>

It's not productized and pretty difficult to discern the project's status BUT
they do seem to publish interesting papers / update their site every so often.

------
treelovinhippie
That's a good perspective of what the client-side functionality should be.
I've had an idea on what the hardware-side or "the cloud" should be for a
while...

The strangest thing about "the cloud" as it stands today is that it's really
just Amazon, Google and Apple with giant server farms. Something goes wrong
with the server farm, the whole thing crashes. This is not "the cloud".

At some point in the near future I think someone will work out a way to
construct a true cloud that has the ability to harness any node on the
Internet.

Think something like SETI@home, BOINC, Folding@home but where clients install
a program, set aside how much space (in GBs) they'd like to offer and they get
rewarded over time (money, points, currency, prizes etc).

As I upload things to this true cloud, the files are split, encrypted and
distributed around this cloud in such a way that backups are made, redundancy
and security is inbuilt.

If I'm storing 10GBs of other people's files, I can't access them since
they're only partial segments and they're encrypted. If I turn off my
computer, the file owners can still access their files from other nodes
hosting backups. And files are hosted on nearby nodes to ensure speedy access.

It would be a rather difficult software engineering problem to solve, but
ultimately this is the future of the cloud.

~~~
terryjsmith
We were building a system like this back in 2009 or so as a competitor to
Dropbox (we called it P2P storage cloud). We had a competitor you might want
to check out: <http://www.wuala.com/>

~~~
treelovinhippie
I just came across this which seems to be what I was talking about:
<http://www.symform.com/>

------
douglashunter
When I first saw camlistore (<http://camlistore.org/>) it jived with much of
how I pondered the foundational bits of "the cloud I'd like to see". Private
by default, distribut(ed|able), share-able, index-able, version-able dumb
content blobs who are described by other dumb content blobs of the same
nature.

They might be getting some of the programmer interfaces right enough to give
folks a chance at getting some user interfaces right.

------
andrewflnr
I'd like to understand what he(?) means by "time as a first-class construct",
separate from versioning. He goes into a lot of detail about the encryption
requirements, but not that.

Overall, I fully approve. I don't think it's just a nerd dream; I'd like to
see it become a neccessity. It's just going to be hard.

------
Gring
Another wish to add to that: independence from any one cloud provider. So if
amazon has a power failure, or Dropbox gets asked to turn off your account, or
MegaCloud gets raided by your favorite 3-letter acronym, all your files are
still there. And you did not need to upload your data 3 times (in other words,
there is a protocol where cloud providers sync their data for you).

Not sure if this is technically feasible, though - for example, what happens
if an interceptor gets hold of service 1 and tells the other services to
delete everything as part of the synching process?

Article also mentions the wish to tell cloud providers to search through data
for you (without you needing to download everything) while withholding the
decryption key from them (that's how I read it, anyway). I'm not sure this is
possible either.

------
njharman
> all my files in the cloud, encrypted by me, accessible from any program on
> any of my computing devices, cached locally as needed, and served by
> providers that I choose and can freely migrate between.

Author forgot to list several important req. that are implied. Being able to
update/add files. Being able to "manipulate" files with arbitrary software
across all devices.

That last one is the real problem needing solving. All files in cloud,
universally accessible is solved. It's called HTTP. Except there is not
universal en/decryption across all devices. HTTPS + server side en/decryption
might work.

~~~
icebraining
Server side encryption needs you to trust the server, 'though.

------
lbotos
At my office we often discuss this exact topic and most of us want exactly the
same cloud. It seems like a purely "nerd" solution though. Dropbox has the
market cornered because not technical users get it and it works. On multiple
occasions we've come to the conclusion that this is a nerd dream and out of
the computational desires of the average user. The question becomes will the
gap between "average" and "power" user only get larger or smaller with simple
cloud services at the forefront? Also, we've come a long way with API's but I
don't know of too many services that allow competitor interop.

~~~
kwindla
The reason I don't think this is just a "nerd dream" anymore is that general
trends in the broader world are sharpening the need for really good cloud
storage for everyone.

Like a lot of folks here, I'm sure, I've been thinking about this stuff for a
long time. I built a peer-to-peer backup proof of concept in 1999, for
example. (But knew, even while writing that code, that there bandwidth wasn't
cheap enough for enough people for such a thing to take off.) The thing is,
until very recently, my needs diverged a lot from all of the non-techies I
know.

But that's less and less true. These days, most white collar workers depend
completely on their computers (and phones and tablets) to do all of their
work, every day. Corporate IT investment in "collaboration" tools like
Sharepoint and Dropbox is large, and getting larger every quarter. Having
access to your files, and the files that your colleagues share with you, and
the various static resources your company maintains on your intranet, is
becoming more and more important. So there's a clear business driver to build
more sophisticated storage, access and sharing capabilities.

And, relatedly, many non-techies now use multiple computing devices every day.
The laptop-plus-phone-plus-tablet user profile is the norm for more and more
people. My sense is that iTunes Match and iCloud and various photo stream
implementations are the special-purpose, walled-garden, leading edge
indicators that this stuff will, eventually, get built into all our platforms
and that we'll expect some level of interoperability between platforms. (Just
as AirPlay is the leading-edge indicator that in a few years we'll all expect
to be able to push video around between all our screens without having to use
any wires.)

------
tonygauda
This is exactly what Bitcasa does.

    
    
      * Integrate directly into the OS and intercept the filesystem calls
      * We encrypt client side using keys that aren't exposed to the server. 
      * Stream the content in real time like Spotify or Netflix and enable random access
      * Cache frequently used items locally
      * Works across multiple devices and platforms
      * We're building primary storage vs backup or sync
    

Its a well thought out system.

Full disclosure - I'm the founder and CEO of Bitcasa.

------
Wilya
I think simple and automatic local caching is the hard part. Having access to
your distant files isn't particularly hard. Take sshfs, samba, nfs, openvpn
(or whatever equivalent you happen to know), mix them in a big bowl, put a
server under your bed, and you're good to go.

But: 1/ Support on dumb mobile devices is generally quite bad 2/ Caching
support is generally quite bad. No connection pretty much means game's over.

~~~
kwindla
I think there are a number of "hard parts." Getting encryption right -- really
right -- is always hard, and my "requirements" for full-text search and
capabilities-based sharing make that even harder. And I definitely agree with
you that the caching implementation needs to be really good to work properly.
Scalable tagging (database-backed FS design and mechanics) isn't as easy as it
looks at first glance, either.

But if I had to pick one hardest part, I'd vote for the UI. This stuff, at
least the way I'm imagining it, is a significant departure from traditional
file manipulation semantics. It's a file system plus the kind of user
interactions that "web 2.0" and mobile apps have taught us all to expect.
People won't use something that's not both really easy and elegantly designed.
And there won't be enough adoption unless there's enough feature set coverage
to appeal to lots of slightly different use cases.

I mentioned Dropbox a number of times in the blog post because I think they've
done a remarkably good job at _both_ implementation and user experience.
That's pretty cool!

~~~
Spearchucker
UI is indeed interesting. I've spent many years working on a solution to a
similar problem (data, rather than files, although nodes in my app can contain
attachments) and the UI is the part I've struggled with most. My most recent
version isn't online (yet), but the basic design hasn't changed in the two
years since I put this up: <http://www.wittenburg.co.uk/Interact/>

It requires a central server, but has a redundancy mechanism built in, in that
if server A is unavailable the client simply goes to server B.

It also has a panic password the hides/deletes data with a particular
protective marking. Everything is encrypyed, and the server cannot identify
any user, even given a user name.

The app can easily be converted to work with files. The thing standing in the
way of that is time. I have a four-month-old son that expects board and
lodging.

------
jimmy2times
Besides security and caching, I think standards/interoperability are key.

I want my next-generation apps to work on a cloud filesystem of my choice, be
it Dropbox, Google Drive, etc.

That will be the most significant step towards a real cloud OS, and I bet
someone's already working on it.

------
blu3jack
I'm less security paranoid; I would be content with universally
accessable/syncable cloud storage at a reasonable price that simply worked and
wasn't tied to some particular corporate entity's efforts to monopolize some
market or another.

------
rtkwe
SpiderOak perhaps? I haven't used it heavily my self but it seems to cover at
least the encryption requirements the author puts forward.

------
ricardobeat
Having full-text search prevents you from having your own encryption, unless
you want to hand over your key which makes it worthless.

~~~
repsilat
This is incorrect. Full text search works by querying an index. The client has
access to all of the encrypted indexed content, so it can easily build the
index, encrypt it and upload it.

Cloud providers couldn't "give you" full-text search, but if it's technically
possible and if clients implement it then it's essentially the same thing.

~~~
ricardobeat
> encrypt it and upload it

if the index is encrypted with a local key you'll need to hand over it all the
same to use it online...

~~~
repsilat
You never hand over the key - you decrypt the index on the client-side when
you want to search.

~~~
ricardobeat
That's local search, not online. If you're syncing folders then
spotlight/whatever already works locally.

~~~
repsilat
You don't need to sync folders for this to work, though. When you search your
data you still leave all of it "in the cloud", you just download a few
relevant bits of the index database.

As far as the user is concerned this provides efficient full-text search
without heavy bandwidth or storage requirements and without giving the storage
provider access to any plaintext. Maybe this is still a "local" solution by
your definition, but who cares? How does it fall short of an "online" one?

~~~
ricardobeat
It falls short simply by not being online and requiring an app. Though it
might be possible to do the decryption in javascript, I hadn't thought of
that.

~~~
repsilat
To each their own. For anything I'd want to encrypt I wouldn't trust a closed
app or a web interface. It's just too easy for them to get private data across
the wire. If I'm blindly running the Javascript they send me I might as well
give them the key and be done with it.

My "best case" solution is a dumb protocol, a FUSE mount and _maybe_ a local
webserver for sharing, configuration and metadata. I don't really understand
wanting to access files through anything but a filesystem (regardless of where
they're actually stored), but I can accept that some people might.

------
shaunxcode
Have you seen this? <http://www.spacemonkey.com/>

------
radarsat1
dropbox + encfs?

