I wonder if you could write your programs in such a way that the bulk of the computation was done publically, and only sensitive ops were shunted out to the secure processing network.
Maybe in a language similar to Erlang, but instead of writing code that's amenable to sharing between multiple CPUs, you'd be sharing between multiple degrees of privacy.
Do you have some example of what such an application may be?
1) Secretly compute a list of encrypted tags and associated, unencrypted scores.
2) Sort by scores.
3) Convert the highest N tags back to encrypted data.
As a plus, steps 1 and 3 should be embarrassingly parallel.
You are, of course, leaking a count, which can be important (but your opponent can already make inferences based on the amount of data you have stored...).
I don't think this is known to be intractable, just really slow (for now).
"With the cloud-keyset, the library can evaluate a net-list of binary gates homomorphically at a rate of about 50 gates per second per core, without decrypting its input. It suffices to provide the sequence of gates, as well as ciphertexts of the input bits. And the library computes ciphertexts of the output bits."
but what does "evaluating a net list of binary gates" come to in practice? What operations could I expect to be able to perform?
So you'd program it by designing a digital circuit using AND, OR, and NOT gates, somewhat similar to how you would make a circuit with physical components.
You have millions, maybe billions of these gates in your CPU, each capable of doing millions of calculations for each tick of your homomorphic gate, so the "fast" in the title should be taken with a grain of salt. Your homomorphic circuit could have a noticeable delay adding two 64-bit numbers.
Anything you can do in 10 or 20 cycles of an 8080, you can now do to secret data in a few hours. And you can probably do much, much faster than that if you design a special purpose circuit. It's still sloooow by the computational standards we're used to. But it might be fast enough that there are useful applications.
(edit: although actually FPGAs could be useful simply because they can be very good at running FFTs (as are GPUs), not because of the gate-programmable nature of them)
Basically anything. If you can generate a netlist of gates of your HE CPU, you can write a program to compute it (perhaps slowly, though).
Also, 50ms per bit operation, though apparently an improvement, sounds extremely inefficient compared to normal programming.
Early on it was thought that considering circuits instead of regular programs could result in "polynomial with advice" algorithms for NP-complete problems, but results such as the Karp-Lipton theorem have shown this to be unlikely.
The improvements are described in the new paper: https://eprint.iacr.org/2017/430. Hopefully, the packing techniques will be also present in the next version of TFHE.
The CPU running on your device boils down to a netlist describing a bunch of gates and their interconnections. This is somewhat of an oversimplification, of course.
Someone with more background could probably expand on that.
In other word, if the application you are aiming at is suitable for running on a GPU, go for Helib, else if it would be faster on a CPU, use TFHE.
However, not being an expert on FHE, is there a way to leverage this on current RDBMS systems for example?
It says the library can evaluate binary gates. If we would like to run a SQL query for example, how do we translate it to a series of gates? Is it possible?
Or is this so low level that we basically would need to build our own "processor" with binary gates and then build the rest of the stack on top of it so we can, in the end, run a query?
Can anyone shed some light on how exactly can we take advantage of this library?
Although slower, it feels like TFHE would provide better security against an active adversary. So, at least, simple server-side encrypted queries would well be possible
Does it produce only encrypted output, or can it optionally produce unencrypted results also? Can it optionally use public data as an input?
Also I am guessing if it could be accelerated on GPUs. I worked with a guy who accelerated a standard FFT on CUDA 100..1000 times for scientific computations (and later NVidia copied his code, lol). I wonder if something similar can be done here
tldr. computations in the cloud with encrypted, private data
a) smart contracts controlling something within the encrypted data based on publicly available data;
b) encrypted key-value store where you traverse the tree structure based on encrypted query and encrypted tree buckets, but get a publicly available result about which bucket is next (similarly to how it was done in Arx paper using garbled circuits).
> We demonstrate CryptoNets on the MNIST optical character recognition tasks. CryptoNets achieve 99% accuracy and can make more than 51000 predictions per hour on a single PC. Therefore, they allow high throughput, accurate, and private predictions.
The real problem tends to be the (CPU to other thing and back again) latency, not the how fast can the other thing do the computation.
(Not affiliated in any way, but went to a really interesting talk on this a few weeks ago as part of the London Machine Learning meetup group - https://www.meetup.com/London-Machine-Learning-Meetup/events...)
Their (closed-source) method of obfuscating their data apparently does have the property that it preserves the structure of the data, but calling it "structure-preserving encryption" is misleading imo since it risks confusing it with standard notions of encryption and structure-preserving encryption which have much stronger security guarantees.
(Their marketing seems to encourage this conflation by, for example, citing academic advances in standard notions of homomorphic encryption and SPE and implying that these advances have enabled Numerai's technology)
> Just a few months ago this package was released by Louis Aslett at Oxford http://www.louisaslett.com/HomomorphicEncryption/. Louis helped me use his package to do Fan and Vercauteren homomorphic encryption on my dataset. Because the ciphertexts are polynomials it's not too easy for an average data scientist to use the data. That's why I came up with more chill ways of encrypting Numerai's data that the article mentions like order-preserving encryption. There's a security vs easy of use trade off, for sure. But homomorphic encryption is a real thing.
Louis Aslett authored https://arxiv.org/abs/1508.06574
I still don't think it's fair to market their method as comparable to Aslett's scheme or "standard" notions of homormorphic/order preserving encryption, no matter how "chill" they are :)
Edit: No specific sources for what Numerai is using, but in general: https://arxiv.org/abs/1610.06918 "Learning to Protect Communications with Adversarial Neural Cryptography".
Edit2: Yes, in general. I would say "yes, this is a valid form of encryption". But I do agree that their marketing was perhaps a bit too optimistic. I have no problem calling it "obfuscation" either (I just think their method of "obfuscation" is way more advanced than removing headers and normalizing within 0-1).
From the Wikipedia page for "Neural cryptography", it seems like there's some success in using NN's for cryptanalysis, but not for constructions...
Edit: Do you mean the Google GAN experiment? (https://arxiv.org/pdf/1610.06918v1.pdf) Ahh ok, well at least for this there looks like an attempt at defining a security model (security against some other NN). I don't really believe the security model is realistic (how do we know NN's are really that effective as adversaries?), but at least there is a model, so calling that "encryption" sits somewhat better with me. I'm pretty sure this is not what's being used by Numerai since it seems like it would not result in ciphertexts with the structure necessary to perform ML operations on.
Edit2: Maybe you're right and it is more advanced. In any case, as a crypto nerd I wish they would disclose what they are actually doing / the rationale instead of tantalizingly suggesting that they have made (what would be) a breakthrough in a practical use case of advanced encryption schemes, but not saying how.
IMO the key is that new encryption methods demonstrate levels of security in their intended use cases that users deem sufficiently strong. I agree that all closed-source methods leave room for misleading statements (and I won't speak to Numerai as I don't have any additional insight on their approach). But any algorithm can and should be benchmarked--generally and vs. common standards--to avoid confusion. Serious commercial & public sector users will be quick to ensure so, especially in lieu of massive social proof.
My thought here is that if we draw the line on what warrants "encryption" by saying it's got to be a traditional key-based system, or sufficiently similar to existing approaches, we risk stifling innovation in the space by denying new entrants an industry-standard term that buyers are trained to seek. Non-traditional doesn't necessarily mean non-secure. Would love to hear your thoughts on this.
I don't think that doing this stifles innovation (In fact, I think requiring crypto innovations to have security justifications is probably better overall for innovation in the field)
Much practical innovation comes from closed-source applications whose peer review comes in the form of commercial lab tests. Overly ossified technology standards & labels often force CIOs / CTOs / CISOs to build artificial barriers into their corporate procurement processes for optical reasons. In addition to engendering "check-the-box" complacency, these barriers absolutely stifle startup-driven innovation.
It wouldn't be a simple hash or something, would it ?
There currently aren't a lot practical use cases where we can afford a performance loss of ~100,000,000x (your homomorphic crypto algorithm is going to run on the order of ~Hz on a ~Ghz CPU).
So yes, for varying definitions of "fast".
Of course, an op on a single bit is still far short of a CPU instruction, AIUI.
You gotta start somewhere though!
it just screams side channel attacks.
so skeptical i dont have the energy to find them, better to wait until something with monetary value is cruising around the internet using it.
An analogy to think about might be blind signatures. In blind signatures you sign a blinded token and then the other party can unblind it to get a valid signature from you over a message whose content you don't know. This is classical public-key cryptography. In that case there is a secret key, and a timing attack against someone who can observe the signature creation might reveal that key. The connection between the blinded and unblinded message is also meant to be secret, and a timing attack against someone performing the blinding or unblinding operations might also reveal that relationship.
However, there is no timing attack that the signer can perform against itself to reveal the relationship between the blinded and unblinded messages, and there is no timing attack that the other party can perform against itself to reveal the secret key.
I think this analogy holds up for most purposes (indeed, blind signatures can be viewed as an extremely specific, narrow kind of homomorphic encryption), but I'd be happy to hear corrections if someone can see a way that it doesn't.
the "side channel" i am referring to is in however this impliments this mixing you refer to.
if you can observe how the internal state is changing given a cloud keyset you should be able to infer the secret key. in the same way you can infer the iv used in a mesernine twister from like 27 cycles (or whatever).
downvotes and promises definately wont reduce my skeptacism.
Like in public key cryptography, you can give a RSA public key to an adversary, he can use it as much as he want, he will not be able to get any information on the private key (in any practical time). Most importantly, the cloud key is not secret, nor obfuscated: the cloud already knows each bits of it (side channel attacks are not relevant).
The "cloud key" is the public part of the key given to the cloud so that he can perform his operations (circuit, bootstrapping). There are strong security proofs that show that it is impossible to recover the secret key from the cloud key.
In the case of TFHE, if there was a polynomial algorithm that would recover even a single bit of the secret key given the cloud key, this algorithm could be used to break the LWE problem, and also worst case instances of lattice problems (which are much harder than factorization or discrete log for instance).
the known plaintext attack breaks pretty much every crypto system, mix that with statistical analysis of this "processing" and I'm sure whatever is in the cloud will surrender its secrets pretty quick.
And all that risk for what benefit? none of this processing will ever be faster than doing the processing in place.
And yes, it is a malleable cryptosystem, but that doesn't imply you can learn the key from an input/output pair, nor does it imply you can read the input given to you. It just means you can change the output in a way that can still be decrypted by the same key.
Usually people use padding and IVs and such to prevent that, but this system is made to allow computation with it instead.
Certainly there may be some flaw that we can't see yet, but to say you're "sure" some secrets will be quickly surrendered seems foolhardy.
> And all that risk for what benefit?
It's very interesting from a theoretical perspective, and it's good to know that if you need to carry out computation on devices you don't trust, it is at least possible to prevent your input and output from being stolen.
This is already a 30x increase in efficiency from the last generation. It may never be faster, but there's always a chance someone will find other reasons to use it.
they do not need to be one in the same, if, statistically you can infer input given output.
And i never said sure in any way shape or form. what i said is i am skeptical of the motivations driving a project like this, when we know for an absolute fact bad men are trying to steal our secrets. It seems like a lot of work and very heavy math for something that will never tip the balance that is holding the cloud back with people not trusting it to keep their secrets.
for example, if you can do something as simple as compute a hash of the unencrypted data, you can instantly decrypt all data with known hashes with a high degree of certainty, without ever knowing the keys - side channel attack.
That's the difference between FHE and IO or multilinear maps. The second ones are almost all broken, mostly because of the attacks you mention. FHE, on the other hand, has semantic security.
This isn't true at all.
Other than some toy examples, and cryptosystems that were used before cryptography was studied academically, I can't think of any that are vulnerable to known plaintext attacks.
The benefit comes into play when you mix data from different sources that don't trust each other (to the point where they would never agree to one of them doing the processing in place). Homomorphic encryption allows combining the data without ever revealing it to the one doing the computation.
By performing a homomorphically encrypted computation, they can set it up to only reveal the final decision, but not the individual inputs that determined it, so nobody loses face.