> The database is a key value store prepopulated with the english names of countries and their capital cities from the continent of Europe. Selecting the country will perform a search of the matching capital. On a 2019 Macbook Pro laptop, the example searches take under 80 seconds.
Does this have a heavy performance cost for even toy applications?
Provable privacy like here or in zero-knowledge proofs is generally extremely expensive because every possible execution path needs to be taken each invocation. This is multiplied by the overhead from each simple operation becoming a complex cryptographic one.
Statistical correlation is stupidly powerful. Identifying someone uniquely in the world requires about 33 bits of entropy. Which means you have have some statistical analysis that reveals 0.01 bits of information per sample, you only need 3300 samples to uniquely identify the person performing the activity.
For the clueless (including myself--I just looked this up), if doing a brute-force search to identify or reidentify someone using "anonymized" data (which is what this poster is referring to), it only takes 2^33 attempts, or 8,589,934,592 attempts (on a bit scale--binary--0 or 1), to exhaust all other possibilities (such as other people--when reidentifying people via health data that was "anonymized"), when tracing to find a person via data in general.
Obviously some data is more sensitive than others, but that has little relevance: it requires nearly no data (especially in terms of what is being hoarded about each and every one of us) and virtually no effort or computing power, to be found.
This also means that even the most mundane (and always sensitive, too--although some data is more sensitive!) data that has been hoarded about you will be used against you, in ways you would have never imagined. It will be used for processing and inferences (which you are unaware of) will be made on you.
You are already probably part of several unhealthy social experiments that you are unaware of, via correlations from mundane data, using things that are pervasive such as AI.
I am not entirely certain whether I was clear about the point I made in my message, however I have asked a very specific technical question and you are making some generalised sweeping statement about privacy in the most widely used case.
We could not decrypt any ancient languages, for example; if we didn’t have an equivalent of a Rosetta Stone for them. Thus you should not be able to do the same for the encrypted database I assume.
Given enough queries and perseverance in unraveling things, a surprising amount of information can leak, for example whether you and me live in the same country, frequently travel together (indicating a relationship), etc.
I am trying to see whether I am missing something and you’re adding things I removed back into the equation.
If we could just keep our data secret, we wouldn't need encryption at all.
You'd need that to stop the risk that, say, an attacker learns (from a side channel) you were looking for person A's details at time T1, and thereby unmasks which other queries/times were about A.
The challenge was solved by Bernard Fabrot. It was fairly simple code on I believe an intel 4770K. The math libraries and CPU had been optimized so much in the ~20 years since the challenge was set that a consumer CPU could calculate the answer. The only catch is that it took months on end of calculation.
In this academic field the standard for privacy is usually extremely high. Even if you only access half of the rows, that still leaks one bit of information and would not be considered zero-knowledge.
It will probably be more complicated than that, due to repeated queries revealing patterns, but there are other mitigations for that, some of which are sure being developed.
If the same query would access the same 100 random rows, then after seeing your (hidden) query and getting the 100-random-row signature it would be trivial to run a bunch of candidates and see which one of them gets the same 100-random-row signature and thus figure out what you queried.
Honestly though, i mostly find FHE interesting theoretically. The fact that its possible at all is black magic. I'm sure if it ever gets remotely efficient people will come up with more creative applications, but in the meanwhile its resesrch worth it for research sake.
That would indeed go a long way towards resolving the issues caused by the way SaaS is done. It would let us decouple compute provider from the service, in a way in which it would be the end users who are free to chose where the computation happens, and they'd own all the data by default, while the business secrets and IP of the service provider would still be protected.
Shame to see it's so computationally expensive, I was really hoping this kind of computing would take off.
With these performance numbers in hand, I'm finding those posts retroactively hilarious.
(But even "just" end-to-end encryption is more difficult than many realize.)
Most often they do not understand the tradeoffs that come with mandatory e2ee (difficult information transfer, easy loss of data, no server-side search, etc), and are not ready to deal with them.
From a quick glance, this sees to be a major enabler (2016): https://tfhe.github.io/tfhe/
The project linked in the submission uses HElib: https://github.com/homenc/HElib/
The values in BGV and CKKS are integers and both schemes allow homomorphic integer multiplication and addition (CKKS gives approximate results). TFHE is actually not a batched scheme and moreover operates mostly on the level of single bits and boolean gate operations. It is quite difficult to directly compare TFHE with batched integer schemes.
Which scheme is best will ultimately depend a lot on the requirements of the application. If you're operating on integers and don't need bitwise operations, then an integer scheme is appropriate. Applications that need bitwise logic and can work with small bitwidths will be more suited for TFHE. CKKS is especially good for numeric and machine learning applications, because the approximation it implies can be managed and it allows using faster encryption parameters than BGV/BFV would.
(There are real applications, particularly in privacy preserving financial cryptography. But the database application is just a BS non-niche justification for grant purposes.)
What client would want to add some numbers without knowing what the result is?
I can't imagine any interaction with a database where I don't need to do an operation with the resulting data (even if it's just passing it into a different API/DB unchanged).
My interpretation of your example in human terms is "add your day of birth to your bank balance, but don't tell me the answer" and then walking away. What am I missing?
-- edit --
Maybe I'm looking at it in the wrong direction? Is it more like "here are some numbers that, add them up on your calculator but don't look at the result before showing it to me"?
That is essentially what zcash is. So as I said, numerous applications in financial crypto (where we can reasonably spend full seconds on desktop, or minutes on secure hardware grinding away at proof to send funds), but not so practical to spend an even greater amount of time on each database query.
for element of listB:
for element of listB:
I mean, yeah, sometimes you can. And in that case, "premature optimization is the root of all evil," sure. If you're just pulling a couple hundred product records out of a database and then sorting them in code, then yeah, an O(N log N) sort is "basically a free operation."
But sometimes you want to do more complicated things than that, and then it's not true. So overall I think it's pretty irresponsible to teach that sorting is basically free (without caveats). That's definitely not always true!
Hash tables and trees are O(1) or O(lg N) which are vastly faster than O(N).
Imagine a database with billions of records. O(N) search on this database will absolutely murder performance.
Does homomorphic encryption allow running operations blindly on an encrypted dataset and return encrypted results without ever decrypting the dataset?
Why do you think HE value map cannot be ever improved?
These applications can easily handle an 80 sec delay. It's also likely that queries scale well by batching into larger combined queries (since you need to involve every row already anyway).
1. Alice provides Bob with her public and evaluation keys.
2. Bob encrypts each row of the blacklist with her public key.
3. Alice encrypts her query (e.g. for "Eve") and sends to Bob.
4. Bob evaluates some circuit on Alice's query and Bob's rows, and sends her one (encrypted) result per row
5. Alice decrypts the rows and sees if any are ones.
Ways this can be made more efficient:
* Outputs from each row's evaluation can chain into the next evaluation, so Bob only need return O(1) data.
* Bloom filters could cut down on average-case performance, but that would make Alice leak information (whether she follows up her FHE bloom filter query with a more precise query.)
All the examples I can find seem to be around levels of privacy preservation that I just think if i was that concerned i would not risk sending it to anyone else anyway.
I think most of society would be happy with regulatory changes. Preventing companies from exploiting personal data is fairly simple in a democratic legal framework.
There are yes governments and states where democracy does not exist, but in that case they simply won't let me use such a search service anyway, so it's dubious the technology would be beneficial there.
Some time in the past decade I realised that software has only limited ability to change the world - it simply does not have the leverage to overcome all the politics of the world. If we want utopia, we must learn Judo to use the weight of the world against itself.
I remain optimistic that we can make this world better with these amazing new technologies, but I fear they cannot succeed on their own.
But I hope you can prove me wrong !
You gut instinct is right. The power comes from its close relation to the theory of computation. And this is the thing that drives our lives today.
The drawback is that HE imposes performance degradation. So its practical usage is somewhat limited, but still it promises immense returns for software publishers / consumers as the technology progresses.