
Order-Revealing Encryption - henrycg
https://crypto.stanford.edu/ore/
======
ianmiers
Impressive work. Especially for producing an implementation.

But be very very careful about ever using it ( and really any cryptography
that does more than standard encryption/ signing) in practice without
understanding the security it provides. Revealing element order can in fact
reveal most of your data set in some cases. There are cases where leaking
order is safe and very useful and there are cases where it is not safe despite
there being nothing cryptographically "wrong" with the scheme.

(Edited) To give an example: although not the exact same primitive,some Order
Preserving Encryption schemes leak square root(n) bits where n is the bit
length of your plaintext[0]. For social security numbers, this would mean you
loose about 5 bits of entropy and there are only 10 or 11 bits to start with
in many cases[1].

And as they say, attacks only get better and people are actively working on
attacks using order leakage[2].

Just because something is provably secure doesn't mean it is the right kind of
secure for your application. Crypto is not pixie dust. You actually need to
understand the exact guarantees something provides.

[0][http://www.cc.gatech.edu/~aboldyre/papers/operev.pdf](http://www.cc.gatech.edu/~aboldyre/papers/operev.pdf)
[1][http://www.pnas.org/content/106/27/10975.full.pdf](http://www.pnas.org/content/106/27/10975.full.pdf)
[2][https://twitter.com/hackermath/status/741054948612308992](https://twitter.com/hackermath/status/741054948612308992)

~~~
schoen
> Revealing element order can in fact reveal most of your data set in some
> cases.

A contrived example to demonstrate this is if you published some kind of
database about behavior observed from IP addresses, but using ORE-encrypted IP
addresses instead of plaintext ones. If the database explicitly contains all
2³² IPv4 addresses, someone can then perform a series of comparisons to sort
the entire database (which is an explicit goal of ORE), which then reveals the
plaintext values because they're exactly equal to the positions of the entries
in the sorted list!

A subtler case could arise if the database is complete _for some range of IP
addresses_ and an attacker happens to know the plaintext values at the bottom
and top of that range. Then the attacker can learn the plaintext values of
everyone else in the range.

An even subtler case could arise if the attacker simply knows a bunch of
specific plaintext values. In that case, the attacker can learn constraints on
the numeric values of other database entries, which could mean, for example,
learning the first 1-2 bytes of a particular IP address, which could confirm
or disconfirm hypotheses about a user's ISP or geographic location.

Edit: I'd like to see some of the more subtle attack ideas that the people you
linked to are developing!

~~~
cvwright
That's me above in Ian's [2]. We're hoping to have something ready to share by
the end of the month. I'll post a Show HN when it's available.

In the mean time, here's a paper from some folks in Norway that shows attacks
against two prefix-preserving IP address anonymization schemes:

[http://link.springer.com/chapter/10.1007/11767831_12](http://link.springer.com/chapter/10.1007/11767831_12)

~~~
ianmiers
Oh right, it was either you or your advisor who told me the IP address story.

------
coldcog
It's important to note that either:

\- the things you are encrypting are not numbers, or

\- your scheme is not as secure as you would like it to be,

where the second point means that necessarily there is some information
leakage: that is, more than just their order can be derived from ciphertexts.
This is not necessarily bad; the questions is what it is that is leaked, and
if that is harmful to your situation.

The website links to two papers, one for each of the above categories, of
which only the second one is sufficiently efficient to be of any use. Apart
from the order of two ciphertexts, it reveals the index of the first bit at
which the ciphertexts differ.

~~~
Natsu
> the things you are encrypting are not numbers

I do not believe there's any data that cannot be represented as a number, so
this point confuses me.

~~~
coldcog
That's a good point, indeed any sequence of bytes can be interpreted as a
number. But it may also encode an element of a set that has an ordering
differing from that of the numbers. (For example, you could order, let's say,
the set of all cat pictures by how cute they are.) That's what makes the
difference.

~~~
Natsu
So if I understand properly, the real issue is whether knowing the ordering
and the data type would allow one to infer something about the contents?

I also wonder if there's a way to make that statement of when this causes
something to be revealed more mathematically precise... maybe it's something
that could even be automatically detected by someone with access to the
original data?

~~~
schoen
You could talk about whether something is unique. For example, Social Security
Numbers, telephone numbers, and IP addresses are designed to be unique. By
contrast, given names, ages, or favorite colors are not unique in isolation.

Unfortunately, that turned out not to be a hard-and-fast distinction because
things that are not unique in isolation are often unique in combination.

[https://en.wikipedia.org/wiki/Quasi-
identifier](https://en.wikipedia.org/wiki/Quasi-identifier)

The Netflix deanonymization paper discusses how things (how much you like a
movie) that are _very_ non-unique in isolation can be _very_ unique when you
have an extremely large number of them. Arvind Narayanan (one of the authors)
has given a few discussions of the problem of dimensionality for privacy; one
way of thinking of it is that there's an unfathomably large amount of volume
in a very high-dimensional space, so there's an extremely large amount of
opportunity for points in it to be very far away from each other, even if
there's nothing especially "atypical" about the individual points.

This is closely related to the "curse of dimensionality"

[https://en.wikipedia.org/wiki/Curse_of_dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality)

although the curse is often stated from the analyst's perspective when hoping
to find patterns, whereas the phenomenon Narayanan is describing is more from
the perspective of the individual whose data are in a database and who hopes
to appear similar to other individuals in order to retain anonymity, yet turns
out to be very distinctive merely because of the number of dimensions.

------
mixedmath
I find this interesting. I am a bit more interested in the implementation
FastORE linked to at the bottom of the page. It seems to be pretty reasonable.

I've been thinking for a while now about what could be accomplished with
partial steps towards homomorphic encryption. ORE is an interesting twist on
homomorphic encryption that I hadn't thought of before --- one can merely sort
encrypted data. I wonder what sorts of unexpected things one might be able to
do with the ability to sort encrypted data?

------
gengkev
Not related to the content, but in the Background section I tried to click on
"Provably secure" sort of expecting it to expand. Meanwhile under ORE
constructions, the similarly-styled "Semantically Secure Order-Revealing
Encryption" is a clickable link.

------
baby
Is an ORE scheme implying that if you have some key, you have an efficient
algorithm to sort through the plaintexts of a set of ciphertext. That would be
faster than decrypting them and comparing them all.

~~~
schoen
Their paper says

> In our scheme there is a public algorithm that given two ciphertexts as
> input, reveals the order of the corresponding plaintexts and nothing else.

It's not clear that this is "efficient"; it might be somewhat computationally
intensive. But you can do it using only public information and without being
able to decrypt the data.

------
Skunkleton
So why ORE instead of headers validated with a signature?

