I need an eli5 for this, attempt at reading makes it sound like i would be able to infer an approximation of encrypted data in the examples that they provide.
Basically, if this works as the headline suggests, and my poor understanding of the paper implies we can take their example of competing financial institutions:
Two banks share data, that is intended to be secret, and they use this technique to compute some property over the data. Each bank can then do the same computation over there data alone, and compare the resulting model over their data and the result over their competitor’s. From that they can infer the properties of their competitors portfolio, which seems to be leakage of data that is not intended to be possible.
Hence I am clearly missing something, which isn’t unexpected as HE is unintuitive to me.
The point of the paper was to operate over data without decrypting, the exact scenario they gave was competing finance companies, that are clearly not going to share keys.
Also if you had all of the decryption keys you’d just decrypt the data and use the raw data. They explicitly state that it is only fast in the context of HE problems - being multiple orders of magnitude slower than techniques you can use on raw data (they actually said “fast” - complete with the quotes, which I appreciated)
I agree the example is weird, but that's literally what the paper says:
"In the training phase, it takes as input an encrypted training data and outputs an encrypted model without using the decryption key. In the prediction phase, it uses the encrypted model to predict results on new encrypted data." Figure 1 further implies that the results must be decrypted.
This is the typical operational setting of homomorphic ML.
I think the aim is to be able to store encrypted data in the cloud or some other potentially insecure location, and perform machine learning on it without decrypting the data or the model.
Basically, if this works as the headline suggests, and my poor understanding of the paper implies we can take their example of competing financial institutions:
Two banks share data, that is intended to be secret, and they use this technique to compute some property over the data. Each bank can then do the same computation over there data alone, and compare the resulting model over their data and the result over their competitor’s. From that they can infer the properties of their competitors portfolio, which seems to be leakage of data that is not intended to be possible.
Hence I am clearly missing something, which isn’t unexpected as HE is unintuitive to me.