
Efficient Logistic Regression on Large Encrypted Data - coolspot
https://eprint.iacr.org/2018/662
======
coolspot
> In this paper, we propose an efficient algorithm for logistic regression on
> encrypted data, and evaluate our algorithm on real financial data consisting
> of 422,108 samples over 200 features. Our experiment shows that an encrypted
> model with a sufficient Kolmogorov Smirnow statistic value can be obtained
> in ∼ ∼ 17 hours in a single machine.

------
olliej
I need an eli5 for this, attempt at reading makes it sound like i would be
able to infer an approximation of encrypted data in the examples that they
provide.

Basically, if this works as the headline suggests, and my poor understanding
of the paper implies we can take their example of competing financial
institutions:

Two banks share data, that is intended to be secret, and they use this
technique to compute some property over the data. Each bank can then do the
same computation over there data alone, and compare the resulting model over
their data and the result over their competitor’s. From that they can infer
the properties of their competitors portfolio, which seems to be leakage of
data that is not intended to be possible.

Hence I am clearly missing something, which isn’t unexpected as HE is
unintuitive to me.

~~~
TTPrograms
I don't think you can apply the model (with unencrypted outputs) without all
the decryption keys.

~~~
olliej
The point of the paper was to operate over data without decrypting, the exact
scenario they gave was competing finance companies, that are clearly not going
to share keys.

Also if you had all of the decryption keys you’d just decrypt the data and use
the raw data. They explicitly state that it is only fast in the context of HE
problems - being multiple orders of magnitude slower than techniques you can
use on raw data (they actually said “fast” - complete with the quotes, which I
appreciated)

~~~
TTPrograms
I agree the example is weird, but that's literally what the paper says:

"In the training phase, it takes as input an encrypted training data and
outputs an encrypted model without using the decryption key. In the prediction
phase, it uses the encrypted model to predict results on new encrypted data."
Figure 1 further implies that the results must be decrypted.

This is the typical operational setting of homomorphic ML.

------
confounded
My understanding of this paper/space is patchy-at-best, but I am I right in
thinking that this could only ever work for models where both the outcome-
variable and the predictors are entirely categorical, and the input data is
always encrypted with the same key?

(E.g. so that the value 'male' in the 'sex' column has the same ciphertext
across subjects/observations).

If this is the case, then couldn't a standard logit model be used? (It needn't
know what it's categories 'mean', either x or y).

These properties would be fulfilled by, e.g. the MNIST example in the paper
(and it wouldn't take 17 hours to train!). Does anyone know where the
technique in the paper would work but the one described above would fail?

------
testtest7777
The authors should do a comparison with recent work on HE for inference on
other model types. Logistic regression is a subset of neural network
inference, and there is significant prior work that appears to be competitive
with their results.

Recent work [0] shows that MNIST, for example, can be done in a model
secure/data secure way with online 30ms inference latency. For ~500,000
samples, evaluated sequentially, that's roughly 5 hours, which beats their
time by about 3x. Perhaps I'm missing details in my read of the paper.

[0] Gazelle: A Low Latency Framework for Secure Neural Network Inference
[https://arxiv.org/abs/1801.05507](https://arxiv.org/abs/1801.05507)

~~~
ssmiler
Gazelle paper uses an interactive protocol ("combination of homomorphic
encryption and traditional two-party computation techniques") compared to
thread paper.

~~~
TTPrograms
It's also just inference, not training.

------
xjia
The encryption procedure doesn't seem to be salted.

~~~
olliej
Oh? Is that how they do it? I recall getting confused about the security
guarantees of some HE paper the was on HN a while ago, and someone finally
explained that you only get the security guarantees if you have randomness
injected.

