
Building PRISM-Proof Web Services - Libertatea
http://www.technologyreview.com/news/525651/new-approach-could-stop-websites-from-leaking-or-stealing-your-data/
======
nnq
Yeah, but your data is the only reason why you are valuable to you provider of
free or cheap cloud based services. If they can't use it (even anonymized),
they will have no incentive to provide free/cheap stuff to you :) Imagine the
outrage if Google _charged_ all people for Gmail in today's "free everything"
world.

(...it would be interesting though if someone figured out a way for the server
to only store data without possibility to tie it to a particular identity, so
they could still do aggregate queries on it. dunno if it's possible though
while still not alowing the client to access data he's not allowed to see. and
I bet someone could figure out a way to identify which of the non-client-
encrypted-but-anonymized-through-such-a-method data you access from the server
is actually yours via some patter recognition...)

~~~
nandhp
Not necessarily. For example, I'm pretty sure Dropbox isn't making money by
mining my content -- they don't even show me ads. They make money by offering
premium plans with more storage. Dropbox could offer a client-side encrypted
version of their service without changing their economics -- and there are
other companies that already do this. MEGA and Spideroak both do this, with
free plans offering 50 GB and 2GB, respectively. Lastpass is another example.

------
solox3
How's the server supposed to manage, sort, and aggregate the encrypted data?
Say, SELECT * FROM Users WHERE age > 65... can't do that, age is an encrypted
field.

I understand that the paper discusses the ability to _search_ the database for
a string within records encrypted using different keys (without knowing what
the search term is), but it makes no mention of sorting.

~~~
droopybuns
[http://crypto.stanford.edu/~dabo/pubs/papers/encsearch.pdf](http://crypto.stanford.edu/~dabo/pubs/papers/encsearch.pdf)

Dan Boneh talked about this at trustycon this year. It's a pretty interesting
concept.

------
svbito
As I see it, the JS gets downloaded from the untrusted server, encrypts data
on the client and sends it to the server.

What makes it impossible for the untrusted server to serve malicious
JavaScript? I can't think of a scenario where I could trust code I get from an
untrusted party...

~~~
einhverfr
Good point. I was thinking of another sort of attack though.

"It is possible for a service built with Mylar to search across encrypted data
stored on its servers, for example, so a person could search documents they
had uploaded to a file storage service."

If the server doesn't have access to the plain text then you are giving the
server access to a useful subset so it can do things like index it right? That
useful subset must then be of interest to folks in the NSA.... So either the
server has access to the plain text (temporarily or permanently) or it has
access to a useful subset.

And from there a whole host of attacks are possible. Not that this surprises
me. If robust encryption architectures were easy....

~~~
vidarh
You can obfuscate the index quite a lot, and combine it with additional work
on the client to compensate.

E.g. a traditional reverse index is generally conceptually turned into
relations <word,word-id> and <word-id,document-id,position> (though in reality
there's tons of tricks to store them more efficiently by removing a lot of
redundant information).

If all indexing passes through the client (client downloads and decrypts every
message, and uploads bits and pieces to update the index with), and the client
maintains the <word,word-id> relations (optionally passing it back to the
server encrypted, in chunks), you've improved things a bit, but only a tiny
bit (anyone with access to the index can apply some fairly trivial statistical
attacks that makes it uninteresting to attack the encrypted real messages).

You can improve on that by leaving off or fuzz the position to reduce the
potential for statistical analysis, at the cost of recall. E.g. if you drop
position entirely, you can get the server to tell you which documents contains
the individual words in "service built with Mylar", but you'll have to
retrieve all of them to determine if the _phrase_ is present.

You can also do stuff like "inventing" additional detail. E.g. assign "fuzz"
word-id 42, but based on some operation on the document id, you might
also/instead assign it word-id 201. The problems with each of these type of
attempts at obfuscation is that they alter your search patterns, and you'd
need to figure out how to obscure from the server that word-id's 42 and 201 in
fact are two different subsets of the same thing, or you're still leaking
information. You can also record outright wrong information, at the cost of
having to download more messages to weed out the false positives when
searching.

You can "scale" your tradeoff between accurate recall and information leak
pretty much arbitrarily from letting the server only see a single bit of your
word-id and a single bit of your document id's, to the full word-id, document-
id and position, depending on how much work you're willing to do client side -
every bit you chop off means you need to retrieve a larger number of documents
to post-process the results to get something useful, but gives the server less
information about your document collection.

You can improve further on it by letting third parties provide "indexing
services" and farm the indexing to them, without ever letting them see the
documents. You could even break the collection into pieces and merge the
results client side, so no single service has the whole picture. But of course
it just raises the bar for an adversary.

But you are of course right that all of this is going to leak information that
might be useful to an adversary - the approach taken just moves the difficulty
bar up and down a bit. Ultimately all you can do is make an assessment about
how much the information is worth to you and make an educated guess based on
that about how much you can "afford" to farm out to servers that are not under
your physical control...

~~~
einhverfr
If you are trying to NSA-proof a service, obfuscation is not going to cut it.
At best you could have a client maintain its own index locally, I suppose.

~~~
vidarh
"obfuscation" here is anything from leaking so little data that you could
compensate by "just" going a step up in key length for the encrypted data, and
ranging down to so poorly hidden a bored teenager could break it depending on
how many bits you effectively "chop off", and other trickery.

Frankly I think the problem with NSA-proofing it, though, is that while you
can certainly take pleasure in driving up their costs, if you make it
impossible for them to simply eavesdrop or break your encryption, the only
thing you've really achieved is to make them prioritise other attacks, like
capturing copies of all the mail before it reaches your servers. Large amounts
of that traffic will be entirely unencrypted anyway, since many of the sites
people exchange e-mail with still don't support SSL/TLS.

------
indeyets
Interesting, but seems to be applicable only to personal-information storage.
Sharing data between groups of users is a different story and would require
something like pgp-style key management on client

~~~
pdonis
Mylar includes support for secure sharing of keys and data; see the project
website:

[http://css.csail.mit.edu/mylar/](http://css.csail.mit.edu/mylar/)

------
sorincos
How's this different from the online storage services which encrypt already
data on the user computer before sending it out? And Meteor is not even a web
services platform... So, nice hype talk but I still have no idea what it's all
about (...open Google and...).

~~~
TTPrograms
You'll probably want to start around here:
[https://en.wikipedia.org/wiki/Homomorphic_encryption](https://en.wikipedia.org/wiki/Homomorphic_encryption)

~~~
josephwegner
See also: [http://ajaxpatterns.org/Host-
Proof_Hosting](http://ajaxpatterns.org/Host-Proof_Hosting)

~~~
sorincos
What I mean is that services like Wuala or Boxcryptor offer this since years.
And there might be more.

------
pdonis
There's another HN thread that points to the actual MIT Mylar project website:

[https://news.ycombinator.com/item?id=7465015](https://news.ycombinator.com/item?id=7465015)

------
szymo
Thank you for the link. Looking forward to seeing how easy this could be
implemented by various organisations that need it.

