
Machine Learning on Encrypted Data Without Decrypting It - KenoFischer
https://juliacomputing.com/blog/2019/11/22/encrypted-machine-learning.html
======
StefanKarpinski
Since there seems to be a lot of confusion throughout this thread, perhaps I
can attempt to clarify what's going on at a very high level. The setting is
that the user sends their data to a service and wants the service to do useful
work on their data without knowing what their data is. In this case, the work
that the service does is apply a pre-trained, non-secret ML vision model to
the secret, encrypted data. The result is an encrypted answer that is sent
back to the user, who can decrypt it and get a useful result. The service is
none the wiser about what the user's data was, nor what the answer was.

Does this seem like impossible magic? Yes. Everyone was pretty surprised when
the first theoretical fully homomorphic encryption schemes were found. For a
while people believed that it might never be efficient enough to be practical.
But now it's actually feasible to do limited but real useful computations on
behalf of a user without the service learning anything about what it is
computing for the user. There is ongoing work to expand what kinds of
computations can be done efficiently.

~~~
k__
Sounds strange. Does this mean that the model has to be trained on encrypted
data?

~~~
StefanKarpinski
Yes, it’s rather surprising that it can be done at all. And no, you train the
model normally and then express the application of the model in terms of the
set of primitive operations that the encryption scheme supports. Broadening
the set of operations supported by the encryption scheme is an active area of
research since having a better “instruction set” allows more computations to
be done efficiently.

It is also possible for the model to be provided by the user and encrypted
along with the data rather than being fixed and public. That way the service
can apply a model that it doesn't know to data that it also doesn't know. This
allows a fairly general service in theory, but it remains a challenge to
express more than fairly simple models in a way that can be computed
efficiently.

~~~
de_watcher
I don't understand why it has to be trained on encrypted data. Isn't it faster
to train on the cleartext data (and of course use the same instruction set,
the unencrypted counterpart of it)?

~~~
mikorym
I think he is saying that you do the training on cleartext data.

~~~
de_watcher
Ah ok, his first "yes" confused me so much that I saw no value to read
further.

------
falcolas
Before reading: "I bet they're using homomorphic encryption to expose patterns
in the encrypted data"

After reading: Yup. It makes sense, so long as your resulting model is run
against similarly encrypted data, the same patterns will be there for the ML
to identify.

Which is, of course, one of the issues with homomorphic encryption.

~~~
KenoFischer
Just to clarify, homomorphic encryption does not expose patterns. At every
point in the computation the ciphertexts are computationally indistinguishable
from random. The result of evaluating the ML model will be an encrypted
prediction that you then need to send back to whoever encrypted the data (or
more precisely whoever has the key - doesn't need to be the same person) so
they can decrypt and use the prediction.

~~~
marviel
Do you have a reference somewhere that backs up your assertions, where I can
read more on this topic? I'm super curious about it.

~~~
KenoFischer
The CKKS paper that describes the crypto scheme I'm using is described here:
[1]. The paper is decently readable, but frankly I feel that it doesn't really
convey much intuition and it's a bit hard to follow if you don't have an
algebraic number theory background. Probably the correct thing to do is to
read the original BGV paper [2], which is still quite technical of course, but
at least doesn't implicitly assume all the development that has happened since
then. Once you've gotten the basics, I have an overview that focuses more on
the practical aspects of how it works in the documentation [3]. I've also
written an overview of how CKKS works, in which I've tried to highlight what
the two main ideas of CKKS are compared to earlier schemes [4]. Let me know if
you were looking for something else.

[1]
[https://eprint.iacr.org/2016/421.pdf](https://eprint.iacr.org/2016/421.pdf)

[2]
[https://eprint.iacr.org/2011/277.pdf](https://eprint.iacr.org/2011/277.pdf)

[3]
[https://juliacomputing.github.io/ToyFHE.jl/dev/man/backgroun...](https://juliacomputing.github.io/ToyFHE.jl/dev/man/background/rlwe/)

[4]
[https://juliacomputing.github.io/ToyFHE.jl/dev/man/ckks/](https://juliacomputing.github.io/ToyFHE.jl/dev/man/ckks/)

~~~
marviel
this is great, thank you.

------
ChrisRackauckas
This blog post reminds me of the "Machine Learning Systems are Stuck in a Rut"
paper [1], where they mentioned:

> It is hard to experiment with front end features like named dimensions,
> because it is painful to match them to back ends that expect calls to
> monolithic kernels with fixed layout. On the other hand, there is little
> incentive to build high quality back ends that support other features,
> because all the front ends currently work in terms of monolithic operators.
> An end-to-end tool chain for machine learning requires solutions from many
> specialist disciplines.

This is a fantastic example of getting around this problem using Flux.jl's
interaction with the full Julia language. Here the author does some
exceedingly cool stuff (doing machine learning on encrypted data!), and in
order to get there he needed to write what many would think of as lower-level
kernels that should be "provided by the library" (encrypted matrix
multiplication, encrypted convolutions). To make the interface useful, he
needed to use a mature interface that people are already using and make it
something that can automatically switch over to encrypted implementations.
And, because of the nature of fully homeomorphic encryption, the
implementation has to be fast, otherwise it's dead in the water since FHE is
expensive!

To me, this example showcases one way how Flux.jl's is helping machine
learning get out of that "rut". The author adds dispatches to standard kernels
which allow for his encrypted data types, which then allows standard Julia
Flux.jl machine learning models to act on encrpyted data, and it uses type-
inference + JIT compilation to make it fast enough to work. Not only that, but
it's also not tied to some "sub-language" defined by a machine learning
framework. That means the FHE framework not only works nicely with machine
learning, it can be used by any other package in the Julia language
(differential equation solvers, nonlinear optimization, macroeconomics
models?). This allows composibility of tools and community: all tools from all
fields can now use this same FHE implementation, so authors collaborate and
mature this to a very good one. These knock-on effects give people doing
research in Julia a lot of competitive advantages over other researchers, and
it will be interesting to see how this effects not just the ML research
community, but also everyone else!

[https://dl.acm.org/citation.cfm?id=3321441](https://dl.acm.org/citation.cfm?id=3321441)

(Repost from the previous thread on this!)

~~~
orbifold
Julia always seemed great on paper and definitely is a strong candidate for
replacing Matlab. But whenever I tried using it, the user experience seemed
much more broken than python or c++. It just seems way easier to structure and
work on a python + c++ project than it is to structure and work on a Julia
project. A moderately sized sane c++ code base compiles and runs faster than
whatever gymnastics Julia performs to create a single 2d plot (literally
freezes my laptop for seconds). It also installs gigabytes worth of libraries
into global directories by default. The compiler itself is also of
questionable quality compared to clang, Swift or Ocaml.

~~~
svnpenn
> literally freezes my laptop for seconds

Yep. This has been commented on again and again, but many refuse to show
willingness to fix or even acknowledge the problem:

[https://github.com/JuliaLang/julia/issues/28092](https://github.com/JuliaLang/julia/issues/28092)

[https://github.com/JuliaLang/julia/issues/17285](https://github.com/JuliaLang/julia/issues/17285)

[https://github.com/JuliaLang/julia/issues/4452](https://github.com/JuliaLang/julia/issues/4452)

[https://github.com/JuliaLang/julia/issues/1064](https://github.com/JuliaLang/julia/issues/1064)

[https://github.com/JuliaLang/julia/issues/260](https://github.com/JuliaLang/julia/issues/260)

~~~
pkofod
What do you mean by "refuse to show willingness to fix or even acknowledge the
problem"? Do you realize that two of those issues were started by core
contributors?

------
Stasis5001
As somebody with some ML background but no expertise in crypto, is the
following ELI~20 summary correct?

We take an ML model trained on unencrypted data, use a 'homomorphic
evaluation' technique (let's just leave that as magic here) to convert the
model operation-by-operation to a model that runs on encrypted data, do a
little more crypto magic, and we've solved the business problem described at
the beginning of the article.

(In particular, if you train a model on encrypted data you get a really bad
model, right?)

~~~
KenoFischer
Yep, that's correct. With the minor caveat that we choose an ML model that's
"easy" to evaluate using homomorphic encryption.

~~~
Analog24
As someone who is not a cryptography expert, is there any hope of using
similar logic to train on encrypted data? Naively it seems like you could
perform the same operations on the back propagation steps (or any other update
algorithm you're using for non NN models) to arrive at the encrypted version
of the parameter updates, which you could then decrypt to get the updated
model. Am I missing something here?

~~~
KenoFischer
Training is a lot tougher. Just doing one gradient update step isn't all that
bad (although you may have to play with the loss function a bit, e.g. logit
cross entropy is probably tough to evaluate). However, then you need to go and
actually do all the steps and gradient updates, so you probably need some form
of bootstrapping to be able to evaluate computations of that depth. Also, the
use case is slightly less compelling. For training, you can probably get all
the parties who have data to coordinate and evaluate an MPC more cheaply than
you could with HE alone. I think it'll require a very compelling use case for
somebody to go and think through what the best way to do it is and it'll
probably depend on the specifics of the application (who has what data, and
what are we willing to leak as we go along - e.g. it's a lot easier if you
don't care about keeping the weights secret).

~~~
Analog24
There are definitely compelling use-cases and there are people working on it
(though not me). Developing tools/systems to handle sensitive data in a secure
way is extremely expensive and time consuming. If you can create data
collection and model training pipelines that can operate effectively with just
encrypted data then you greatly reduce the number of vulnerabilities (e.g.
fewer employees need to actually see the sensitive data and fewer points of
attack on the system itself).

There are certainly a number of factors to consider besides data security when
evaluating the practicality of such an approach but I just wanted to confirm
that it was technically possible before getting in to any of that. Thanks for
your response and the post, I knew almost nothing about HE before today.

------
littlestymaar
With Homomorphic Encryption, the data owner encrypt data _X_ , and give C(X)
it to the owner of the function _f_ applies it to _C(X)_ , which gives
_f(C(X))_ , which equals to _C(f(X))_ and return it to the data owner, which
can decipher it to _f(X)_. The owner of the function know neither the initial
data nor the result, and the owner of the data doesn't know _f_.

There is also Functional Encryption, a different technique solving a similar
(but different) goal. With F.E., the owner of the data must know the function
_f_ , but the party performing the calculation get directly the result _f(X)_.

------
zcw100
This sounds like what Microsoft research did with SEAL to produce CryptoNets

[https://www.microsoft.com/en-
us/research/publication/crypton...](https://www.microsoft.com/en-
us/research/publication/cryptonets-applying-neural-networks-to-encrypted-data-
with-high-throughput-and-accuracy/)

~~~
KenoFischer
Yep, same research setting, though a different network. I don't at the moment
remember all the details of CryptoNets, but IIRC they were doing batch size
8192 evaluations (i.e. just using each slot as an independent value and
evaluating the code as if on scalars), which allows you to get away without
the fancy ciphertext encoding magic that's described in the blog post (at the
cost of high latency of course).

~~~
osaariki
You're right, CryptoNets used a data layout optimized for throughput with a
batch size 4096. Since then we've done a lot of work on low latency inference
with our CHET compiler [1] and my colleagues with LoLa [2]. It all comes down
to the data layouts you use.

[1]:
[https://www.cs.utexas.edu/~roshan/CHET.pdf](https://www.cs.utexas.edu/~roshan/CHET.pdf)
[2]:
[https://arxiv.org/pdf/1812.10659.pdf](https://arxiv.org/pdf/1812.10659.pdf)

------
mikedilger
"True homomorphic encryption isn't possible, and my guess is that it will
never be feasible for most applications. But limited application tricks like
this have been around for decades, and sometimes they're useful" \- Bruce
Schneier
[https://www.schneier.com/blog/archives/2019/07/google_releas...](https://www.schneier.com/blog/archives/2019/07/google_releases_1.html)

~~~
StefanKarpinski
I don't understand what Bruce Schneier means by this quote. Several fully
homomorphic encryption schemes have been proposed and implemented. "True"
doesn't have a technical meaning in this context, so it's unclear what's he
means by that phrase. Perhaps that he believes it will never be fast enough
for general computations? Or perhaps he's referring to the fact that schemes
don’t allow conditionals?

------
spencerp
There’s a PyTorch-style library for that:
[https://github.com/facebookresearch/CrypTen](https://github.com/facebookresearch/CrypTen)

------
EGreg
You know what’s interesting?

Aren’t hashes etc. done by some nonlinear functions?

Doesn’t ML kind of try to fit your model using linear functions all the way
down? Or not only linear?

Point being — can ML techniques be used to reverse hashes or find collissions?
Immovable object vs irresistible force?

Anyone got actual INFO on how this plays out?

~~~
garganzol
> Point being — can ML techniques be used to reverse hashes or find
> collissions? Immovable object vs irresistible force?

That's actually a good philosophical observation. And there is an answer for
that.

If you try to use ML on an encrypted data set then it will blow up and won't
converge. The encrypted data contain a lot of randomness while the ML would
assume the presence of a not insignificant amount of continuity, linearity and
correlation.

Trying to train a network on encrypted data would blow it up as it would not
be able to find a convergence point in any observable time. The effort would
be equivalent to trying to bruteforce the encryption.

But if you have a few million years, a few PBs of storage and some extravagant
network training algorithm based on mutations, who knows.

------
kylek
Not sure I totally grok this, but this has been around for a while to use
tensorflow on encrypted data [0]

[0] [https://github.com/tf-encrypted/tf-encrypted](https://github.com/tf-
encrypted/tf-encrypted)

~~~
osaariki
TF Encrypted has focused more on MPC [1]. At least that's what their arXiv
paper [2] talks about. It does seem they are also working to integrate
Microsoft SEAL for HE.

[1]: [https://en.wikipedia.org/wiki/Secure_multi-
party_computation](https://en.wikipedia.org/wiki/Secure_multi-
party_computation) [2]:
[https://arxiv.org/abs/1810.08130](https://arxiv.org/abs/1810.08130)

------
nightsd01
I wonder if you could apply the same homomorphic encryption to ML model
training as well, so that you can fully train an algorithm without ever being
able to actually “see” the training data.

------
winrid
I'm assuming this uses similar encryption algorithms that you can use to do
queries on encrypted data?

~~~
KenoFischer
I'm assuming you're referring to private information retrieval
([https://en.wikipedia.org/wiki/Private_information_retrieval](https://en.wikipedia.org/wiki/Private_information_retrieval)),
which is from the same field of research, but may or may not use the same
techniques.

~~~
winrid
Never looked into it. Just heard about it recently. Very cool.

------
webew
Wow! Does this mean translating a program from one language to another using
ML will be possible?

------
rehasu
Think about what encryption should do. Think about what Machine Learning
should do. The only thing you can do with ML on encrypted data is show where
encryption needs to be improved. Or maybe there is a way to create ML models
that produces output which only someone with the correct (private) key can
understand.

~~~
KenoFischer
> Or maybe there is a way to create ML models that produces output which only
> someone with the correct (private) key can understand.

Yes, that's what the blog post describes how to do.

------
mlthoughts2018
What are the runtimes?

~~~
KenoFischer
About a minute or so for the batch of 64, but I haven't tuned the
implementation yet (I just did enough work to get it down to a comfortable
range of experimentation). The paper I linked
([https://eprint.iacr.org/2018/1041.pdf](https://eprint.iacr.org/2018/1041.pdf))
which uses the same model, but with a more optimized implementation cites 26ms
amortized per image (in a batch of 64), so I would suspect I can get down to
that with a day or two of optimization work if I wanted to (or just plug in
their backend - but where's the fun in that).

------
abd1
very nice machine learning releted post amazing share information thanks by
@KenoFischer

------
t_mann
How is this new in 2019, as the article states? Fully homomorphic encryption
has been around for at least a decade. There’s even a hedge fund (numerai)
that crowdsources its quantitative modeling based on an implementation of FHE
since 2015.

------
personjerry
This technique is deeply flawed.

You can't do this effectively without outside knowledge they should not have.
They are in fact using outside knowledge, specifically that the encrypted data
is in the form of images. Without that knowledge, you wouldn't know which ML
techniques to use! Additionally, remember that feature engineering is a big
part of what makes ML effective at all, and that certainly cannot be done on
encrypted data (the feature engineering you need to do depends on what the
data looks like).

~~~
sthatipamala
If I don't even know what data type my input is, what meaningful output can
the system give? I wouldn't consider that cheating.

Also, you can do feature engineering on a training set that you collect, as
long as that is similar to the distribution of end user inputs. That's a
pretty standard ML workflow.

~~~
personjerry
The entire point of encrypted data is you don't know what's in there. What
they're claiming is they've worked around this problem, but they haven't,
because they're subtly incorporating outside information: specifically that
they know what the data looks like already. From a true third-party looking at
encrypted data, you would have no idea.

What does your statement about feature engineering have to do with encrypted
data? Maybe I can clarify with an example: You can't transform a waveform into
more usable features with a fourier transform if the waveform is encrypted.

~~~
karpierz
You seem to be misunderstanding the value proposition. Suppose that I have
some of my personal medical data, and I'd like to use it to try and detect
early symptoms of HIV. But because there's a stigma against HIV and I don't
trust medical providers to handle my data correctly, I refuse to hand over my
data to a third party service unencrypted. Without homomorphic encryption, I'd
be out of luck; I couldn't use a third-party service to analyze my data and
return a prediction.

With homomorphic encryption: \- I encrypt my data locally \- Send the
encrypted bits over to the third-party \- The third-party uses their ML
algorithm to compute a prediction \- They send back an encrypted prediction \-
I decrypt the prediction locally and get my results.

At no point in this process can anyone, aside from me, see either the input
data or the output.

------
brenden2
If you can infer information from encrypted data then it's not properly
encrypted. Generally you would use a salt that would render this type of
analyses useless.

~~~
LeanderK
Maybe i misunderstood something, but they are not really inferring
information. The model is still encrypted, the outsider doesn't know what's
going on. Wouldn't salt destroy the homomorphic property?

~~~
brenden2
Homomorphic encryption is malleable[1] in that it can, with enough
information, be decrypted without knowing the private keys in some cases. For
example, if you can correlate with other data it may be possible to
effectively undo the encryption.

This is more like anonymization (effectively a one-way hash) than encryption.
If you encrypt 2 different values with the same algorithm and key, you will
get the same ciphertext which reveals information about the original value
(i.e., that they are the same).

[1]:
[https://en.wikipedia.org/wiki/Malleability_(cryptography)](https://en.wikipedia.org/wiki/Malleability_\(cryptography\))

~~~
ummonk
There is no reason to reuse the same key though...

~~~
brenden2
If you used a different key for each datum then you wouldn't be able to do
this type of analyses.

This analyses depends on the property of the same values producing the same
ciphertext, which also mean you're leaking information.

~~~
ummonk
No it doesn’t. The model is trained on unencrypted data. It is then run on
encrypted data sent by the client to generate an encrypted classification. The
client then uses its decryption key to decrypt the encrypted classification.
This process is possible because the model is built to commute with
encryption.

