Hacker News new | past | comments | ask | show | jobs | submit login
Machine Learning on Encrypted Data Without Decrypting It (juliacomputing.com)
421 points by KenoFischer 85 days ago | hide | past | web | favorite | 116 comments

Since there seems to be a lot of confusion throughout this thread, perhaps I can attempt to clarify what's going on at a very high level. The setting is that the user sends their data to a service and wants the service to do useful work on their data without knowing what their data is. In this case, the work that the service does is apply a pre-trained, non-secret ML vision model to the secret, encrypted data. The result is an encrypted answer that is sent back to the user, who can decrypt it and get a useful result. The service is none the wiser about what the user's data was, nor what the answer was.

Does this seem like impossible magic? Yes. Everyone was pretty surprised when the first theoretical fully homomorphic encryption schemes were found. For a while people believed that it might never be efficient enough to be practical. But now it's actually feasible to do limited but real useful computations on behalf of a user without the service learning anything about what it is computing for the user. There is ongoing work to expand what kinds of computations can be done efficiently.

Sounds strange. Does this mean that the model has to be trained on encrypted data?

Yes, it’s rather surprising that it can be done at all. And no, you train the model normally and then express the application of the model in terms of the set of primitive operations that the encryption scheme supports. Broadening the set of operations supported by the encryption scheme is an active area of research since having a better “instruction set” allows more computations to be done efficiently.

It is also possible for the model to be provided by the user and encrypted along with the data rather than being fixed and public. That way the service can apply a model that it doesn't know to data that it also doesn't know. This allows a fairly general service in theory, but it remains a challenge to express more than fairly simple models in a way that can be computed efficiently.

I don't understand why it has to be trained on encrypted data. Isn't it faster to train on the cleartext data (and of course use the same instruction set, the unencrypted counterpart of it)?

I think he is saying that you do the training on cleartext data.

Ah ok, his first "yes" confused me so much that I saw no value to read further.

I would think the pre-trained point is critical.

Is there a way of doing the training (without being provided a trained model) all on the encrypted side?

Yes, but the data owner will need to provide the feedback in clear text.

Does this has something to do with all the operations being linear?

Not really ! Activation functions are usually non-linear. What happens is we turn our numbers into their counterparts in another group (abstract group) and work there (at a high-level).

This depends on a very special kind of "encryption".

Before reading: "I bet they're using homomorphic encryption to expose patterns in the encrypted data"

After reading: Yup. It makes sense, so long as your resulting model is run against similarly encrypted data, the same patterns will be there for the ML to identify.

Which is, of course, one of the issues with homomorphic encryption.

Just to clarify, homomorphic encryption does not expose patterns. At every point in the computation the ciphertexts are computationally indistinguishable from random. The result of evaluating the ML model will be an encrypted prediction that you then need to send back to whoever encrypted the data (or more precisely whoever has the key - doesn't need to be the same person) so they can decrypt and use the prediction.

Do you have a reference somewhere that backs up your assertions, where I can read more on this topic? I'm super curious about it.

The CKKS paper that describes the crypto scheme I'm using is described here: [1]. The paper is decently readable, but frankly I feel that it doesn't really convey much intuition and it's a bit hard to follow if you don't have an algebraic number theory background. Probably the correct thing to do is to read the original BGV paper [2], which is still quite technical of course, but at least doesn't implicitly assume all the development that has happened since then. Once you've gotten the basics, I have an overview that focuses more on the practical aspects of how it works in the documentation [3]. I've also written an overview of how CKKS works, in which I've tried to highlight what the two main ideas of CKKS are compared to earlier schemes [4]. Let me know if you were looking for something else.

[1] https://eprint.iacr.org/2016/421.pdf

[2] https://eprint.iacr.org/2011/277.pdf

[3] https://juliacomputing.github.io/ToyFHE.jl/dev/man/backgroun...

[4] https://juliacomputing.github.io/ToyFHE.jl/dev/man/ckks/

this is great, thank you.

Read Craig Gentry's PhD thesis, which was the first working fully homomorphic encryption scheme. It's no longer state of the art, but it contains a lot of accessible background on the core problem (which was then open) and why it's important.

The person you're responding to is correct. It's an explicit design goal that a fully homomorphic encryption system would not expose any distinguishable oracle about the underlying data. Otherwise there would be no point to it whatsoever, because you'd just be performing the same computations on the data dramatically less efficiently and without any benefit.

This follows the general imperative of cryptography, which is that the outputs of cryptographically secure primitives (hash functions, pseudorandom generators, pseudorandom permutations, etc) should be computationally indistinguishable from random up to 2^n queries, for some large n (such as 128).

What about side effects? Does eg timing of the computations leak data?

I would expect that all data processing by default leaks information via computation time. Some algorithms are intentionally designed to resist this; people that have gone to this effort will mention them. (A similar thing: assume something is not thread-safe unless the documentation mentions it's thread-safe.)

Actually homomorphic encryption always ends up running as slow as the worse case, so you can’t do a sidechannel attack by looking at the computation time.

He is just stating the definition of homomorphic encryption. There are lots of papers on the topic. The one mentioned in the article is a good starting point. "Homomorphic Encryption for Arithmetic of Approximate Numbers" https://eprint.iacr.org/2016/421.pdf

yeah I appreciate it! I usually check wikipedia first, but nonetheless a good place to start

You can get a sense of it just by looking at the classic: https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation...

Look for penguin image in ECB mode. Encryption without randomness reveals the patterns! That image is highly educational and makes you think.

That seems nontrivial that the same key would be used to decrypt the result. Does this only work for some subset of ML models?

I may be missing the question, so let me know if I do, but the point here is just that the crypto system is asymmetric. There's three kinds of keys, public, private and evaluation (as usual you can derive the others from the private key). Whoever has the public key can encrypt, whoever has the private key can decrypt and evaluation may or may not need the evaluation keys depending on what you're computing. As for which ML models it works for, they theoretical answer is all of them, because you can do arbitrary computation. The practical answer is that currently models of this complexity are probably the best you can do, because everything else would take years to run. It's being worked on though, both in theory land and in the implementation.

Ok, I think you answered my question, basically I was missing the public key. So you send an encrypted picture of a cat to the encryption trained model and it returns an answer which is encrypted using the public key. (as a super simple example)

The ML model has to use certain operations (which are homomorphic to the encryption) as its building blocks, but with fully homomorphic encryption you can put an arbitrary number of the building blocks together so you can build complex ML models out of those.

It would be good to see a proof why the homomorphism property is sufficient for doing ML (or a specific type of ML) on encrypted data.

Intuitively it may or may not make sense to people, but a proof would also clarify whether there are any caveats, limitations or other particuliarities.

It is really straightforward. The currently top voted comment basically explains it. If you take data x and encrypt it with encryption function C to get C(x), and then apply your machine learning model f to that to get f(C(x)), homomorphism means that f(C(x)) = C(f(x)) (because f and C commute), so the client can then apply the decryption to the result to get C-1(f(C(x)) = C-1(C(f(x)) = f(x).

You might want to clarify that “C-1” is actually C^-1 because otherwise I was wondering where subtraction comes into play.

Whoops, too late to edit; thanks for posting the clarification.

more like really straight forward over my head. I appreciate you taking the time to explain it - still can't grasp the concept. Back to framework connecting.

But are you saying that a ML model is just a function? The function C is clear, but what is f?

What is the domain and what is the codomain? If f and C commute, then x in X is just your "data" and both the domain and codomain.


> the top voted comment

Ah, I see that the top comment says that you use a pre-trained model. Intuitively that I think makes sense. Bot how about training the model itself?

So, you're training a statistical model - teaching it to recognize certain patterns - on data that is somehow wholly without patterns? Even with a certain amount of noise in the individual data points, if you're given enough data to train a statistical model to identify traits in the ciphertext you also probably have enough data to break the encryption.

Also, something can be statistically random and still have patterns (see PRNGs, which are statically random (you can't identify the next value from previous values), but there's still a pattern if you know the algorithm and seed).

I'll admit, the promise of homomorphic encryption is pretty amazing, but this particular combination of data and ML seems like a fairly obvious way to leak data. I believe there's a reason that homomorphic encryption has not been broadly accepted as an allowed standard.

EDIT: So, I think I'm missing a practical example. I have a model which I want to train on data homomorphically encrypted with key X, and that model is a very simple "is this a cat". I'm given a whole set of data encrypted with key X that's tagged with "cat" and "not a cat".

Once the model is trained, I can run this on any data encrypted with key X and find out if the data contains a cat (with some degree of accuracy). I have no way of telling information outside the tags provided on the training data, but it still gives me, a person without the encryption key, the ability to identify any feature that's tagged in the training set on any un-tagged production set.

Having seen a large quantity of ML training sets, the tag sets are rarely so limited. There's also often "elephant", "ball", and "dog" tags, even if I'm only being asked to train on cats.

> Once the model is trained, I can run this on any data encrypted with key X and find out if the data contains a cat (with some degree of accuracy). I have no way of telling information outside the tags provided on the training data, but it still gives me, a person without the encryption key, the ability to identify any feature that's tagged in the training set on any un-tagged production set.

I think you're missing the fact that the predictions come back encrypted, so you don't learn anything unless you know the key. Also, in this particular example the model was trained on unencrypted data, but as discussed below, you can do either.

Predictions are a tag, and a separate confidence value. A fairly finite set. Cat. Not a Cat.

Large quantities of small finite sets are anathema to encryption.

I'm still only guessing at the objection, but if it helps, the encryptions are randomized, so two different encryptions of the same value do not have the same ciphertext.

Not all encryption algorithms are susceptible to known-cyphertext attacks.

If you map a binary variable into a, let's say, 128-bit number (non-linearly) where a mapping is unique for every encryption key used, there's nothing making it an anathema for encryption.

I am not sure whether this is achievable practically though.

> I believe there's a reason that homomorphic encryption has not been broadly accepted as an allowed standard.

Well, yes... Practical homomorphic encryption is cutting-edge research, and standards bodies like NIST aren't going to deal with an area like this until it's much closer to "solved" (by which I mean much more efficient, with more practical applications, widely used and scrutinized schemes, etc.)

> homomorphic encryption has not been broadly accepted as an allowed standard.

I get that a lack of broad acceptance is a hard thing to source, but do you have one?

https://csrc.nist.gov/publications/detail/sp/800-111/final is a good starting point (and used by HIPPA, as an aside).

It leads to block cipher standards, AES standards, and so on.

I think it's not unreasonable to say that if it should ever appear as a recommended cipher by NIST, then it can be considered to be broadly accepted.

Last year a group of companies and institutions created a working standard. https://homomorphicencryption.org/standard/

This is not how this works. Homomorphic encryption (HE) is best thought of as a software CPU that gives you ways to mutate data (but not read it without the secret key of course). Any computation you run has to be using the instruction set offered by HE: addition, multiplication or rotation. When you do an operation on two ciphertexts, you get a new ciphertext back, but no information on its contents.

Now the way you do machine learning here is by translating your model to use the instructions offered by HE. You've effectively recompiled the model to a new architecture.

If you'd like to read more about machine learning with homomorphic encryption, we published a paper on our CHET compiler [1]. I also talk about this space on a high level in this MSR podcast episode [2].

[1]: https://www.cs.utexas.edu/~roshan/CHET.pdf [2]: https://www.microsoft.com/en-us/research/blog/he-compilers-f...

Quick question: If you have control of the computation instructions (the software for your HE CPU), can arrange for a leak of information from the encrypted data?

In a properly functioning HE scheme, the party doing the evaluation cannot obtain information about the encrypted data, even if they do the operation incorrectly. Without further measures, they can of course compute an incorrect value, which depending on the larger protocol can sometimes be an issue (e.g. consider the scenario where after decryption the client shares the prediction with the server for some reason - in that case, if the server sends the original image ciphertexts, instead of predictions, they can trick the client into decrypting the original image). There is also always the risk that whoever is choosing the instantiation of the cryptoscheme knows something you don't and picks weak parameters, but that's just all the usual attack modes on cryptographic schemes.

As other posters already mentioned, there's no control flow in the HE instruction set. This is actually one of the reasons neural networks are a great fit for HE, as they often have fixed data access patterns and no control flow.

The software can only do arithmetic in a fixed pattern, no "if" statements or while loops or anything. Any conditionals have to be faked with arithmetic.

So, since the arithmetic is secure (otherwise it wouldn't be HE), and the entire runtime pattern is fixed up front and made of nothing but arithmetic, there's no way to leak anything.

This gives a whole new look to lambda calculus and Lisp concepts that promote to represent everything as a (preferably pure) function.

If the whole logic can be expressed as a pure function without conditionals then it would fully fit into HE.

But what if we want conditionals?

Comparison operators like (a < b), (a > b), etc go out of the question immediately as they would allow to guess the values by a simple binary search.

Equality operation (a == b) seems plausible from the security standpoint as it would not reveal the encrypted value. But there is a challenge in performing that operation because both of its arguments may be encrypted with different randomization. To overcome this, probably some neat trick could be performed but... this is the question of the future.

EDIT: Here is an idea. Some FHE engines have pre-encoded values for some magic numbers like 0, 1 and -1.

What if the equality operation p(a, b) = (a == b) is performed like this:

p(a, b) = (a + (b * -1)) == 0

Does anyone have an intuition regarding the feasibility of the proposed trick in something like Microsoft SEAL? Do both left and right parts of the comparison would have the canonicalized randomness making the equality operation possible?

P.S. On a side note, the sheer existence of an equivalence operation in FHE scheme would decimate its security by allowing to plant bruteforce attacks with a lower guesswork. Not a catastrophe by any means, and some systems would prefer that as a small price to pay for having conditionals in a program.

Have you looked at reversible computing? The ideas are only vague in my mind at the moment, but the fragments are there. Disclaimer: out of my wheelhouse. Any conditional statement will throw some information away, "knowing" a==b means provably you have less complexity. Not a good look for encryption.

But instead let's say you have some register which gets rotated/xor'd in some way based on the relationship of a to b.

I don't think you can do branching logic though.

Good suggestions, thank you.

> I don't think you can do branching logic though.

You can. If we would be able to introduce a special equality operator, let's call it SEQ that works like so:

SEQ(a, b) = 1 when a == b; SEQ(a, b) = 0 when a != b

Then we would be able to do conditionals by discarding the result of unmatched branch by multiplying it by 0.

Thanks to the fact that all HE calculations are pure, no observable side effects would be produced.

For example, here is a simple program with a conditional statement:

P(x) = x == 10 ? 42 : sin(x)

This is how it may look in HE domain:

P(x) = SEQ(x, 10) * 42 + (1 - SEQ(x, 10)) * sin(x)

No branching is involved here but the result is equivalent to a program with branching. Eureka!

Once again, thank you for the fruitful suggestions.

EDIT: by the way, it paves the way for implementing > and < operators as well due to the fact that operator result remains in encrypted domain. This is a serious wow moment.

The encryption done right mixes a random noise to the encrypted data exactly to avoid pattern recognition. Homomorphic encryption does it too.

Otherwise it would be trivial to analyze and eventually crack the scheme. This is especially true for homomorphic encryption where an attacker can employ math operations to solve equations in order to reveal as much data as he/she humanly can.

This blog post reminds me of the "Machine Learning Systems are Stuck in a Rut" paper [1], where they mentioned:

> It is hard to experiment with front end features like named dimensions, because it is painful to match them to back ends that expect calls to monolithic kernels with fixed layout. On the other hand, there is little incentive to build high quality back ends that support other features, because all the front ends currently work in terms of monolithic operators. An end-to-end tool chain for machine learning requires solutions from many specialist disciplines.

This is a fantastic example of getting around this problem using Flux.jl's interaction with the full Julia language. Here the author does some exceedingly cool stuff (doing machine learning on encrypted data!), and in order to get there he needed to write what many would think of as lower-level kernels that should be "provided by the library" (encrypted matrix multiplication, encrypted convolutions). To make the interface useful, he needed to use a mature interface that people are already using and make it something that can automatically switch over to encrypted implementations. And, because of the nature of fully homeomorphic encryption, the implementation has to be fast, otherwise it's dead in the water since FHE is expensive!

To me, this example showcases one way how Flux.jl's is helping machine learning get out of that "rut". The author adds dispatches to standard kernels which allow for his encrypted data types, which then allows standard Julia Flux.jl machine learning models to act on encrpyted data, and it uses type-inference + JIT compilation to make it fast enough to work. Not only that, but it's also not tied to some "sub-language" defined by a machine learning framework. That means the FHE framework not only works nicely with machine learning, it can be used by any other package in the Julia language (differential equation solvers, nonlinear optimization, macroeconomics models?). This allows composibility of tools and community: all tools from all fields can now use this same FHE implementation, so authors collaborate and mature this to a very good one. These knock-on effects give people doing research in Julia a lot of competitive advantages over other researchers, and it will be interesting to see how this effects not just the ML research community, but also everyone else!


(Repost from the previous thread on this!)

> fully homeomorphic encryption

I think you mean fully homomorphic encryption, that is, f(x.y)=f(x).f(y).

A homeomorphism is a isomorphism of topological spaces. If you do mean the latter then it would be something new (and I am curious to hear about it).

Oops :)

Thanks for this provocative comment -- if you don't mind me asking, what do you think the tradeoffs are between using Julia and Python for ML in production? I get the idea of right tool for the right job and all, and I know that Python has a fantastic ecosystem. But, how close is Julia? How developed is the ecosystem right now, and how fast is it developing?

Julia has a smaller ecosystem than Python for sure, however julia has a couple of key ecosystem advantages that make it punch well above its weight class.

It really just boils down to the fact that julia's core value proposition is primarily enticing to domain experts trying to implement new algorithms without wanting to worry about switching languages for performance hotspots or having to reimplement existing functionality from other packages. Julia packages tend to be small, focused and astonishingly composable.

Hence, the proportion of the community that are actively developing and maintaining state of the art pacakges in julia is way higher than that in other languages like Python.

Furthermore, even if you're not some super researcher developer, Julia substantially lowers the bar for mere mortals such as myself to do package development. The composability I mentioned above means that I can stand on the shoulders of giants as I implement my own stuff, plus the fact that the majority of libraries are written in pure julia means that I can easily "peek under the hood" to figure out how they work and learn from them.

It's hard to make broad statements about how much ecosystem support one can expect from julia without first specifying the field. Some fields have fantastic julia support and others haven't received enough attention. It really just depends on if someone from your field has started making good packages yet.

Julia always seemed great on paper and definitely is a strong candidate for replacing Matlab. But whenever I tried using it, the user experience seemed much more broken than python or c++. It just seems way easier to structure and work on a python + c++ project than it is to structure and work on a Julia project. A moderately sized sane c++ code base compiles and runs faster than whatever gymnastics Julia performs to create a single 2d plot (literally freezes my laptop for seconds). It also installs gigabytes worth of libraries into global directories by default. The compiler itself is also of questionable quality compared to clang, Swift or Ocaml.

The one thing Julia does not do is install gigabytes of libraries in global directories by default. The default install does not require root, and installs the packages and libraries in ~/.julia.

How did you end up with libraries in global directories?

Also, while we are working on reducing the time to first plot latency (which has to do with compiling large amounts of code - the time to second plot is 0.2sec), it would be good to learn more about the compiler quality issues you mention. We always welcome constructive comments on the Julia discourse.

Compiler latency is being actively worked on and a PR was just pushed to Plots.jl which halved the time to first plot for many users.

Note that even before this though, it's just the compilation phase for the first plot that's expensive. Plotting itself is quite fast in julia, and if you start up julia with the command line argument `--compile=min` you'll see very very fast first plots. Obviously, this will come with a runtime performance cost though.

The particular attraction to me that Julia advertises is that it is much easier to read other people's Julia code. I recently tried Julia and found this to be quite true. It still lacks some critical features I need in the ML library but I can definitely see its attraction to users who want to understand the libraries they use.

Julia is compiled using LLVM.

clang, swift and julia are all based on llvm compiler toolchain. Ocaml has it's own compiler.

> literally freezes my laptop for seconds

Yep. This has been commented on again and again, but many refuse to show willingness to fix or even acknowledge the problem:






What do you mean by "refuse to show willingness to fix or even acknowledge the problem"? Do you realize that two of those issues were started by core contributors?

As somebody with some ML background but no expertise in crypto, is the following ELI~20 summary correct?

We take an ML model trained on unencrypted data, use a 'homomorphic evaluation' technique (let's just leave that as magic here) to convert the model operation-by-operation to a model that runs on encrypted data, do a little more crypto magic, and we've solved the business problem described at the beginning of the article.

(In particular, if you train a model on encrypted data you get a really bad model, right?)

Yep, that's correct. With the minor caveat that we choose an ML model that's "easy" to evaluate using homomorphic encryption.

As someone who is not a cryptography expert, is there any hope of using similar logic to train on encrypted data? Naively it seems like you could perform the same operations on the back propagation steps (or any other update algorithm you're using for non NN models) to arrive at the encrypted version of the parameter updates, which you could then decrypt to get the updated model. Am I missing something here?

You're quite correct that evaluating the back propagation could be done exactly the same as the forward pass. However, if you want to train for more than a handful of steps you'll have to use an operation called bootstrapping to periodically "refresh" the ciphertexts that encode the model. Bootstrapping is essentially evaluating the decryption circuit of HE using HE itself (with an homomorphically encrypted version of the secret key). The problem is that bootstrapping is much more expensive than the other operations in HE.

People have done very effective training on encrypted data using simpler models, like linear or logistic regression. See for example this work [1] from my colleagues at Microsoft Research.

[1]: https://bmcmedgenomics.biomedcentral.com/articles/10.1186/s1...

Training is a lot tougher. Just doing one gradient update step isn't all that bad (although you may have to play with the loss function a bit, e.g. logit cross entropy is probably tough to evaluate). However, then you need to go and actually do all the steps and gradient updates, so you probably need some form of bootstrapping to be able to evaluate computations of that depth. Also, the use case is slightly less compelling. For training, you can probably get all the parties who have data to coordinate and evaluate an MPC more cheaply than you could with HE alone. I think it'll require a very compelling use case for somebody to go and think through what the best way to do it is and it'll probably depend on the specifics of the application (who has what data, and what are we willing to leak as we go along - e.g. it's a lot easier if you don't care about keeping the weights secret).

There are definitely compelling use-cases and there are people working on it (though not me). Developing tools/systems to handle sensitive data in a secure way is extremely expensive and time consuming. If you can create data collection and model training pipelines that can operate effectively with just encrypted data then you greatly reduce the number of vulnerabilities (e.g. fewer employees need to actually see the sensitive data and fewer points of attack on the system itself).

There are certainly a number of factors to consider besides data security when evaluating the practicality of such an approach but I just wanted to confirm that it was technically possible before getting in to any of that. Thanks for your response and the post, I knew almost nothing about HE before today.

I'm piecing it together too. This is my understanding: the "model" has a set of rules (add, divide, bit shift) and that they do funky encrypted actions (add by making every third bit flip or something). If you run all of these actions in place of what you'd normally do (ie, just add like normal), then return the product, it can be decrypted as a result of the functions ran on it.

I might be wrong though.

With Homomorphic Encryption, the data owner encrypt data X, and give C(X) it to the owner of the function f applies it to C(X), which gives f(C(X)), which equals to C(f(X)) and return it to the data owner, which can decipher it to f(X). The owner of the function know neither the initial data nor the result, and the owner of the data doesn't know f.

There is also Functional Encryption, a different technique solving a similar (but different) goal. With F.E., the owner of the data must know the function f, but the party performing the calculation get directly the result f(X).

This sounds like what Microsoft research did with SEAL to produce CryptoNets


Yep, same research setting, though a different network. I don't at the moment remember all the details of CryptoNets, but IIRC they were doing batch size 8192 evaluations (i.e. just using each slot as an independent value and evaluating the code as if on scalars), which allows you to get away without the fancy ciphertext encoding magic that's described in the blog post (at the cost of high latency of course).

You're right, CryptoNets used a data layout optimized for throughput with a batch size 4096. Since then we've done a lot of work on low latency inference with our CHET compiler [1] and my colleagues with LoLa [2]. It all comes down to the data layouts you use.

[1]: https://www.cs.utexas.edu/~roshan/CHET.pdf [2]: https://arxiv.org/pdf/1812.10659.pdf

"True homomorphic encryption isn't possible, and my guess is that it will never be feasible for most applications. But limited application tricks like this have been around for decades, and sometimes they're useful" - Bruce Schneier https://www.schneier.com/blog/archives/2019/07/google_releas...

I don't understand what Bruce Schneier means by this quote. Several fully homomorphic encryption schemes have been proposed and implemented. "True" doesn't have a technical meaning in this context, so it's unclear what's he means by that phrase. Perhaps that he believes it will never be fast enough for general computations? Or perhaps he's referring to the fact that schemes don’t allow conditionals?

There’s a PyTorch-style library for that: https://github.com/facebookresearch/CrypTen

You know what’s interesting?

Aren’t hashes etc. done by some nonlinear functions?

Doesn’t ML kind of try to fit your model using linear functions all the way down? Or not only linear?

Point being — can ML techniques be used to reverse hashes or find collissions? Immovable object vs irresistible force?

Anyone got actual INFO on how this plays out?

> Point being — can ML techniques be used to reverse hashes or find collissions? Immovable object vs irresistible force?

That's actually a good philosophical observation. And there is an answer for that.

If you try to use ML on an encrypted data set then it will blow up and won't converge. The encrypted data contain a lot of randomness while the ML would assume the presence of a not insignificant amount of continuity, linearity and correlation.

Trying to train a network on encrypted data would blow it up as it would not be able to find a convergence point in any observable time. The effort would be equivalent to trying to bruteforce the encryption.

But if you have a few million years, a few PBs of storage and some extravagant network training algorithm based on mutations, who knows.

Not sure I totally grok this, but this has been around for a while to use tensorflow on encrypted data [0]

[0] https://github.com/tf-encrypted/tf-encrypted

TF Encrypted has focused more on MPC [1]. At least that's what their arXiv paper [2] talks about. It does seem they are also working to integrate Microsoft SEAL for HE.

[1]: https://en.wikipedia.org/wiki/Secure_multi-party_computation [2]: https://arxiv.org/abs/1810.08130

I wonder if you could apply the same homomorphic encryption to ML model training as well, so that you can fully train an algorithm without ever being able to actually “see” the training data.

I'm assuming this uses similar encryption algorithms that you can use to do queries on encrypted data?

I'm assuming you're referring to private information retrieval (https://en.wikipedia.org/wiki/Private_information_retrieval), which is from the same field of research, but may or may not use the same techniques.

Never looked into it. Just heard about it recently. Very cool.

Wow! Does this mean translating a program from one language to another using ML will be possible?

Think about what encryption should do. Think about what Machine Learning should do. The only thing you can do with ML on encrypted data is show where encryption needs to be improved. Or maybe there is a way to create ML models that produces output which only someone with the correct (private) key can understand.

> Or maybe there is a way to create ML models that produces output which only someone with the correct (private) key can understand.

Yes, that's what the blog post describes how to do.

> Nowhere was the user data decrypted and in particular the cloud provider does not have access to either the orignal image nor is it able to decrypt the prediction it computed.

That's from the post. Is that what you meant by creating a model whose results only the owner of the data can understand?

Precisely. Any encryption that allows an attacker to significantly reason about its underlying plaintext is broken.

What are the runtimes?

About a minute or so for the batch of 64, but I haven't tuned the implementation yet (I just did enough work to get it down to a comfortable range of experimentation). The paper I linked (https://eprint.iacr.org/2018/1041.pdf) which uses the same model, but with a more optimized implementation cites 26ms amortized per image (in a batch of 64), so I would suspect I can get down to that with a day or two of optimization work if I wanted to (or just plug in their backend - but where's the fun in that).

I'm a researcher at Microsoft who's worked a lot on this problem (neural network inference on encrypted data). Take a look at this paper we published at PLDI [1] on our CHET compiler. The speed of the inference depends a lot on the depth of the network. A small 2 convolutional layer model takes a couple of seconds, while a 10 convolutional layer SqueezeNet takes 164 seconds. The reason is that you have to select encryption parameters according to the depth of the computation. Since this paper was published we've by the way improved these numbers ~5X.

That was optimizing for latency and throughput is a something you can separately optimize for. This landmark paper (also from Microsoft) [2] showed inferencing on 2 convolutional layer network at a throughput of 58k images per hour.

[1]: https://www.cs.utexas.edu/~roshan/CHET.pdf [2]: http://proceedings.mlr.press/v48/gilad-bachrach16.pdf

very nice machine learning releted post amazing share information thanks by @KenoFischer

How is this new in 2019, as the article states? Fully homomorphic encryption has been around for at least a decade. There’s even a hedge fund (numerai) that crowdsources its quantitative modeling based on an implementation of FHE since 2015.

This technique is deeply flawed.

You can't do this effectively without outside knowledge they should not have. They are in fact using outside knowledge, specifically that the encrypted data is in the form of images. Without that knowledge, you wouldn't know which ML techniques to use! Additionally, remember that feature engineering is a big part of what makes ML effective at all, and that certainly cannot be done on encrypted data (the feature engineering you need to do depends on what the data looks like).

I disagree that knowing the type of data being encrypted makes the technique flawed. Let's say you wanted to build a Speech-to-Text API that takes some encrypted audio, transcribes it, and returns the transcriptions encrypted.

Sure, you know that any files sent to you are going to be encrypted audio files, but you don't know what is being said in them which is the point of this technique.

The ML example blog provides is amusing.

It tends to make some people think that neural network can somehow "see" the patterns in encrypted data. Plain wrong.

What happens in reality is this: the neural network is encrypted and can only work on encrypted data producing only the encrypted results.

The data can only be decrypted by the data owner who would not share his private key with anybody. That's it.

In this way, the whole algorithm works in a so called encrypted domain. When used in this role, homomorphic encryption (HE) engine resembles a custom CPU. That CPU is not Turing-complete because it lacks conditional branching, but it is almost there providing two basic operations: addition and multiplication. This is enough for a lot of tasks.

Does a CPU know a type of data or something special about them? Not at all. It executes primitive instructions of a program, one by one producing the output result from a set of inputs.

The program P that HE "CPU" executes can be any program suitable for the given instruction set. Not only ML. It can be any other program!

That's why homomorphic encryption is a very important technique in the area of cryptographic obfuscation. It allows to hide the data processed by a program P. And if that program P is a virtual machine (sometimes called Universal Circuit or UC) by itself, it would allow to hide the code as well, reaching the state of indistinguishability obfuscation. There are some practical problems along the way, but we are moving there.

Yep, that sounds like magic, but it's based on a pure "boring" math with a few bright ideas here and there. What was thought as an impossible feat in the past, gradually becomes a reality.

Isn't that the reason why Nobel is a no go for mathematicians? Otherwise, they would get plenty of Nobels (almost all of them, for sure).

If I don't even know what data type my input is, what meaningful output can the system give? I wouldn't consider that cheating.

Also, you can do feature engineering on a training set that you collect, as long as that is similar to the distribution of end user inputs. That's a pretty standard ML workflow.

The entire point of encrypted data is you don't know what's in there. What they're claiming is they've worked around this problem, but they haven't, because they're subtly incorporating outside information: specifically that they know what the data looks like already. From a true third-party looking at encrypted data, you would have no idea.

What does your statement about feature engineering have to do with encrypted data? Maybe I can clarify with an example: You can't transform a waveform into more usable features with a fourier transform if the waveform is encrypted.

You seem to be misunderstanding the value proposition. Suppose that I have some of my personal medical data, and I'd like to use it to try and detect early symptoms of HIV. But because there's a stigma against HIV and I don't trust medical providers to handle my data correctly, I refuse to hand over my data to a third party service unencrypted. Without homomorphic encryption, I'd be out of luck; I couldn't use a third-party service to analyze my data and return a prediction.

With homomorphic encryption: - I encrypt my data locally - Send the encrypted bits over to the third-party - The third-party uses their ML algorithm to compute a prediction - They send back an encrypted prediction - I decrypt the prediction locally and get my results.

At no point in this process can anyone, aside from me, see either the input data or the output.

If you can infer information from encrypted data then it's not properly encrypted. Generally you would use a salt that would render this type of analyses useless.

The person who runs the model won’t be able to infer anything. The result of the model is itself encrypted, and would need to be decrypted using the original encryption key.

>would need to be decrypted using the original encryption key

A small correction. Decryption is performed with a decryption (private) key which is kept in secret and only known to the data owner.

In contast, encryption key is public and known to both data owner and "the person who runs the model".

Maybe i misunderstood something, but they are not really inferring information. The model is still encrypted, the outsider doesn't know what's going on. Wouldn't salt destroy the homomorphic property?

>Wouldn't salt destroy the homomorphic property?

It depends on how salt is applied. If it is concatenated, the homomorphic property persists. To really destroy it, you would need to mix it up. But then you would lose the malleability, and all that would remain is an equivalence operation from the ring, but not + or *. This still can be useful for some tasks, but not for encrypted calculations or ML.

Homomorphic encryption is malleable[1] in that it can, with enough information, be decrypted without knowing the private keys in some cases. For example, if you can correlate with other data it may be possible to effectively undo the encryption.

This is more like anonymization (effectively a one-way hash) than encryption. If you encrypt 2 different values with the same algorithm and key, you will get the same ciphertext which reveals information about the original value (i.e., that they are the same).

[1]: https://en.wikipedia.org/wiki/Malleability_(cryptography)

This is wholly untrue. The guarantees HE gives for security of encrypted data is exactly the same as any traditional symmetric or asymmetric crypto system. HE is indeed malleable by design, but doing computation on homomorphically encrypted data gives no information on the contents of the ciphertexts.

>This is more like anonymization (effectively a one-way hash) than encryption. If you encrypt 2 different values with the same algorithm and key, you will get the same ciphertext which reveals information about the original value (i.e., that they are the same).

This is a misunderstanding. Homomorphic encryption schemes are randomized encryptions schemes, which means even if you encrypt the same value twice with the same encryption key you get two different ciphertexts (both look like "random data"). They just have the property that they will decrypt to the same value. This is possible because the space of ciphertexts is much larger than the space of plaintexts.

Homomorphic encryption is malleable. But once noise is added, there is no way to exploit malleability.

The big challenge in creation of a fully homomorphic encryption scheme was to make it able to cope with added randomness. This problem remained unsolved starting from 1978 when Rivest, Shamir and Adleman noted probable possibility of such scheme up until 2008 when Craig Gentry came up with solution in his PhD thesis. Gentry's work is a fascinating read by they way.

In other words, homomorphic encryption has the same security guarantees as a conventional crypto.

I believe that most HE algorithms (CKKS included) use some random noise, which means you cant infer equality. It's still "malleable", but not to the extent that you assert.

There is no reason to reuse the same key though...

If you used a different key for each datum then you wouldn't be able to do this type of analyses.

This analyses depends on the property of the same values producing the same ciphertext, which also mean you're leaking information.

No it doesn’t. The model is trained on unencrypted data. It is then run on encrypted data sent by the client to generate an encrypted classification. The client then uses its decryption key to decrypt the encrypted classification. This process is possible because the model is built to commute with encryption.

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact