Hacker News new | past | comments | ask | show | jobs | submit login
IBM completes successful field trials on Fully Homomorphic Encryption (arstechnica.com)
247 points by Nitrolo 7 days ago | hide | past | favorite | 164 comments





Fully homomorphic encryption (FHE) looks to me like the WET DREAM of every business that wants to control the software and content on your hardware without ever having to decrypt sensitive code or data residing on that hardware.

That's because FHE-encrypted software is locally unhackable as long as its implementation of FHE remains unbroken.

Think:

* FHE-encrypted mobile operating systems, applications, and data.

* FHE-encrypted desktop and server operating systems, applications, and data.

* FHE-encrypted "smart home" devices -- from light bulbs to dishwashers to fridges.

* FHE-encrypted transportation infrastructure -- from automobiles to trains to airplanes.

* FHE-encrypted industrial machinery of all kinds.

If FHE ever becomes practical, we're looking at a very unhackable future.

Remarkably, I've never seen this issue mentioned anywhere else before.


FHE gets brought up all the time. There are great discussions down thread about why this isn’t practical for most use cases.

The killer thing is the performance penalty for doing FHE on any dataset of meaningful size. This is because if you want to do an operation on one member of the set you need to do an operation on all members of the set otherwise you leak which member you care about. For sets of a very small size this performance penalty might be ok. For sets of a large size, like the set of all users of a service, the performance penalty will grow at least linearly to the size of the set.

This is the problem at the core of ZK-style proving systems. You can push the proof construction work into the client and have a sub-linear verification time relative to the proof, but the proof construction in the first place is going to be slow and expensive (and slower as the size of the set increases!).

The edge of research here is being done by cryptography labs all around the world. If someone does manage to solve this in a purely-software based manner with a performance penalty that is acceptable for a monotonically increasing set, that’s the holy grail. I remain unconvinced that this will be solved in a reasonable fashion for this use case in my lifetime (but I would be thrilled to be proven wrong!!). If you know of research which makes inroads on this problem, please point me at it.


Agree. That's why I wrote, "If FHE ever becomes practical..." -- because at present it isn't.

Currently it's a wet dream for a lot of businesses.


FHE style systems are possible if you use a hardware-based encryption approach like a Secure Enclave, but then you have a trust narrative that includes a Secure Enclave. For most use cases this is unacceptable.

Note: I work on a system design that I believe is acceptable that uses Intel’s SGX which we detail in this tech talk entitled “why sgx”: youtube.com/watch?v=Hwf_Q31woLo&utm .


If you believe in a Secure Enclave then there is no need for Fully Homomorphic Encryption anymore as all the computation can be carried securely in the SE. The question is the SE is not trustable for many people.

It’s more complicated than “do you believe in SE”. There’s a difference between using a SE for storing durable secrets (a bad idea in most implementations I’ve seen) and using a SE for provably destroying a piece of information. The former means that a break is full jackpot, the latter means that a break compromises the system for a window in time and forward secrecy can be restored with a patch.

I'm not sure if FHE is valid for any of these use cases, as far as I can understand, FHE machines/evaluators can operate on encrypted data, but can not derive any useful information from that data without the key.

In all of the use cases you have mentioned, FHE machine on the user device, can not perform interactions(IO, etc..) with the outside world (can't connect to a network, can't display things on a screen, etc..) without the key. So for these devices to be useful, the keys will have to be embedded in the IO paths(in which case the device would be hackable) or FHE machine should be implemented as subsystem that does things like DRM(I'm not sure if this would be possible either, but it could be).

I might be completely wrong here, but can someone sketch out how FHE can be used to make an unhackable consumer device?


The original - unencrypted - data from a user, or sensor, pass through encryption layer before going to FHE system.

I'm not sure I understand the question, but to me FHE is purely computational thing - only CPU and RAM are involved, and also operations include input and output data. To prevent the system from learning anything about the data, all operations are of kind

(RAM, encrypted output) = FHE(RAM, encrypted input)

so all data in RAM get there in the encrypted form in the first place. Thus the FHE system never see unencrypted data.

Coming back to an unhackable consumer device - if you measure a physical property, like temperature, or have a digital model of anything - you have to provide this data to FHE system only in the properly encrypted form. That encryption is on the consumer, not on FHE, as well as decryption of results of FHE operations. It's only the processing of data which FHE does.


The part where FHE requires 42X cpu and 10-20X RAM of non-FHE operations will initially keep the use cases relatively small, and we don't have Moore's progression to bail them out in a few years. Neither are FHE operations in these kinds of use cases heavily parallelizable.

I'm also seeing some questions over side channel attacks by throwing enough data at the polynomial transforms that make up the NAND gate you're hiding the FHE'd data in, some people are claiming there has to be a way to eventually derive the encrypted value of the data you're attacking, but I don't understand any of the maths around FHE so I can't tell how big a concern those types of attacks are. From my layman's interpretation of the comments IBM'ers are responding to, I don't think a single operation under FHE is intrinsically vulnerable to side channel attacks, but I bet how FHE is used in real-world applications with iterative back-and-forth operations does potentially open up specific poorly-designed usage patterns to the classic inferential side channel attacks.

If FHE becomes practical at the scale you imagine, it would dramatically raise the bar on hacking for sure, but the implementers still only need to make one mistake to give a wedge to the hackers to work with.

The tokenization market (tokenize/detokenize sensitive data) just turned up the "interesting" knob to 11, though. Would take a lot of work between vendors to make real and practical, but some real promise there.


Huh. Never thought about it from that point of view, thanks for bringing it up.

What I hoped for is that FHE would usher in the era of more end-user control over the data, not less off it. That's because it would allow to decouple and isolate storage from compute, in such a way that I could invite company's code to process my data, encrypted by me, at the location of my choosing, and the code would have no way to exfiltrate the data. At the same time, I would have no way to get at the service company's software. I somehow never realized it would be easier for software providers to just skip the "my data at my location" part, and just force us to use their FHE software while also owning the data entered into it.


I agree this is a novel interpretation. Could you elaborate on what the computation performed involves ?

I'm of the understanding that you send me ciphertext, I transform it with some operations you define, and I send back ciphertext-v2.

I can discern how this could exfiltrate data safely. They send in a bunch of zeros and I fill in a one for each minute I used this lightbulb, by dynamically creating the FHE circuit to reflect the data saved.

It's not clear to me how this could change my lightbulb. By 'change' I mean 'affect the behavior' or update the firmware or something. I say this with the expectation that the blobs I received and produced are ciphertext on my side, without keys to decrypt it (unlike, say, dvd drm).


I think the use cases of FHE are much more limited that you're thinking, specifically because it allows you to execute code on someone else's machine without them being able to see _anything_ about the data they're operating on, and send the results (which again, don't mean anything to that machine) back to you. So, for example, you could mine cryptocurrency on someone's computer, but the only benefit you get over running FHE-encrypted code on the client machine over running it on your own server is compute cost. It wouldn't be useful for e.g. DRM.

> That's because FHE-encrypted software is locally unhackable

And locally unreadable.


How does that work? If the business can control it, then it's hackable.

I read it as "locally unhackable". Any hack would have to hit the mother ship.

As I understand it, in order to interface with a FH-encrypted software the data needs to be encrypted too. So every input needs to be sent to the mothership, encrypted there, downloaded, then computed locally, then the output is sent to the mothership, decrypted, then downloaded to present the results to the user -- which is no different than just doing the computation "in the cloud", which already is a thing. So I don't see how it brings anything new to the table.

Unless the key is kept locally, which would be hackable.


FHE operates under asymmetrical encryption and the one running the software needs the public key to perform the FHE operations on the data. The private key is known to the manufacturer (although there are some cool applications if no one knows the private key, provably) and is part of the logical circuit of the FHE so that it can decrypt and freshen it's state.

Maybe more applicable for something like DRM?

there is a pretty big difference between having to hack a local key for every user and a real data breach

Yes, exactly. FYI, I added the word "locally" to my parent comment above to make the meaning clearer.

Basically Ransomeware

The comment by "Ibmresearcher" says that:

> Once one can do multiplication and addition all other operations can be bootstrapped from there so you can indeed do anything a Turing complete computer can do.

This is apparently iterated on many articles about FHE, but it seems false to me: to do Turing-complete stuff you also need comparisons, and the ability to change your flow depending on comparisons. But clearly you cannot do that with FHE, otherwise you could extract the encrypted content, one bit at a time.

My understanding is that an FHE computer can only execute a fixed net of additions and multiplications (or whatever operations they've got). So you can emulate an "if" clause by computing both branches and then selecting one of the two multiplying by 0 and 1 appropriately; you can do bounded "for" loops always executing the maximal number of times, but you can't do unbounded loops, therefore bye bye Turing completeness.

Of course there are a lot of interesting algorithms that can be executed without being Turing complete, but the general statement is false. Am I missing anything?


Imagine a FHE circuit that executes a single webassembly instruction. This is a finite circuit that takes as input the program, machine state and external input (all encrypted). As output it returns the (encrypted) new machine state and a single unencrypted bit indicating if the program terminated or not.

Now you can run the FHE step for as many times as the algorithm requires, giving you a FHE-VM that can execute any WebAssembly program from your favorite Turing complete programming language.

The limitations are that you need to define a bounded machine state, which technically breaks the Turing completeness. In practice this is no different from your laptop having a finite amount of memory which disqualifies it as a Turing machine. Real Turing machines don't exist.

The other limitation is that the FHE executor can now see the total run time of the algorithm, which was secret before. This is similar to a timing side channel and similar mitigations apply.

As others have pointed out, if you have a concrete algorithm you can do something much simpler than implement a VM. The Collatz sequence is good toy example of an algorithm with unknown loop bounds. You would make a FHE circuit that iterates a single step using the same overall design.


Totally agree. Just let me point out that this is not, theoretically speaking, an FHE computation. It is an "FHE plus a termination oracle" computation. Maybe no big difference in practice, but totally a different thing in theory, which was the point of my comment.

This is basically all that modern computers are; a giant circuit and a clock to keep sending a "do the next thing" signal.

Is it a stretch to say that "1: perform this list of multiplications and additions. 2: GOTO 1" cannot be called a FHE system?


But modern computers let you send them instructions entirely in a Turing-complete language, with no requirement that you first know how to unroll them into a static circuit (constant-bounding every loop). That seems like a bigger difference than "oh sometimes you get out-of-memory errors".

This depends on your exact definition of an FHE computation. You could imagine a definition under which you can consider the whole scheme a single Turing complete FHE. But this definition would have to accept that the FHE computation is unbounded and leaks computation length (as would any Turing complete scheme).

I don't like the name "termination oracle", it sound too much like "halting oracle" which is not what is going on here. All we do is add an outer while-loop, a simple algorithmic transformation. I wouldn't even consider that an oracle.


After putting some thought into this during a deeper comment chain somewhere, I thought about it this way:

FHE is equivalent to circuits, i.e., every circuit has a representation as a FHE. I feel like the idea is: Circuits can perform arbitrary computations on a limited size input, but not on an arbitrary size inputs. Loops are not a problem, as they are an implementation detail. Unbounded output is not possible on a touring machine that halts for all inputs of size s limited by some constant. There exists a circuit that would compute the same thing, as you can take the (binary encoded) results of the touring machine and just create a truth table from that, which describes a circuit. Circuits are interesting because they are usually limited in size.

So you compile your program into a circuit of a fixed size and can then compute solutions of this size. If you are interested in larger problems, you compile your program into a larger circuit. E.g., a function returing an 32bit integer and taking 20 32bit integers as parameters can be represented this way (without infinite loops, but those aren't covered by turing machines either), no matter the computation within that function.

edit: correction, obviously a circuit can not represent all partial functions, only functions (i.e., all inputs have an output).


No, you cannot convert an arbitrary Turing machine to a circuit. Not even if you assume a bounded input size. Not even if you assume just one single allowed input, because you might not be able to know if that Turing machine ever terminates on that input.

And loops are not an implementation details. Bounded loop are, one might argue, an implementation detail, but unbounded loops are precisely _the_ problem: in general it is impossible, given an arbitrary Turing machine (or an arbitrary C, Python, whatever Turing-complete language program), to give an upper bound on the number of iterations its loops will require.


ketzu isn't claiming that one can always describe a Turing machine using addition and multiplication only, only that one can describe a Turing machine's map on bounded inputs using addition and multiplication on the input only.

And that's precisely what I challenged. You can't even describe a Turing machine on one single input (using whatever operations you like) if you're not able to determine if that machine is going to terminate. And in general you are not (at least, I am not; if you are, I happen to have a few Turing machines for which I'd be happy to pay to know if they're going to terminate or not; good money, I promise).

You hold a misconception.

The task is not that; instead we assume that we already have a table that correctly describes the TM's maps from bounded input to output when they terminate. His claim is that this table can be expressed as a function in addition and multiplication with the input as operands. To be more precise, with the input and constant numbers (arbitrarily choosable while designing the description of the function) as operands.

Incidentally,

> You can't even describe a Turing machine on one single input (using whatever operations you like)

is false, we can simply encode it as the encoded tuple of the TM and the input; this is a standard construct when we talk about simulating TMs with a UTM.


> The task is not that; instead we assume that we already have a table that correctly describes the TM's maps from bounded input to output when they terminate. His claim is that this table can be expressed as a function in addition and multiplication with the input as operands.

Ok, if the task is this then everything is very easy, I agree. I was commenting on a much harder task, i.e., converting a generic TM to a function adding and multiplying the inputs. That is something you can't do.

> is false, we can simply encode it as the encoded tuple of the TM and the input; this is a standard construct when we talk about simulating TMs with a UTM.

I don't see how that disproves my assertion. I was saying something different, which has to do with your (correct) first assertion: a TM cannot establish if another TM is going to terminate on a certain input.


> I was saying something different

It's important to be precise with language; said encoding is a description of a given TM on a particular input, since a map exists from it to the space of outputs union DOESNOTTERMINATE. Now, whether or not this map is a computable function is a different question.


Alan Turing proved that a solution to the halting problem cannot exist. Per Wikipedia:[0]

> A key part of the proof was a mathematical definition of a computer and program, which became known as a Turing machine; the halting problem is undecidable over Turing machines.

It cannot be possible to “map” a Turing machine with a finite number of operations. Without true decision trees and loops, FHE isn’t Turing complete. At minimum, there needs to be a concept of a conditional jump—if X, jump to instruction Y.

You can unravel some programs into a finite set of instructions, but that doesn’t make FHE Turing complete.

Take the following code, for example:

    function f(x):
      while x == 1:
        do nothing
      return x
For a machine to be Turing complete, it must be able to run that function. FHE can’t do that, by definition; it would reveal information about the input.

[0]: https://en.wikipedia.org/wiki/Halting_problem


Some of the original work around FHE implies the researchers were applying ML models. If you had a complicated enough ML model (e.g. of a person) you could simulate a computer in that model. The assertion that FHE with just multiplication and addition is Turing-complete is true, and this is one proof.

If you can think of something besides encoding a Turning machine's state as NN weights I am very interested. Perhaps encoding a quantum circuit somehow? A technically correct but intractable solution would be simulating the quantum state of every atom in a silicon circuit.


You can simulate that program you gave on a finite circuit because you can simulate it on your computer which is a finite circuit.

Your computer has circuitry that is capable of being reused. It isn’t a simulator; it could, in theory, run that loop ad infinitum.

FHE can’t reuse circuitry. There’s no concept of, “based on the output, we now need to plug it back in and repeat.” Instead, it has to literally repeat that circuitry for every possible loop.


You could re-run FHE until some condition fails. This is what your CPU does BTW.

Again - the only reason you shouldn’t do that in homomorphic encryption is because this way you will leak run duration information.


You can’t do that—that reveals timing information.

Would you say that this circuit was fully homomorphic if, for all inputs, the processor appeared to be doing the same work?

That is to say? If X = 1 or X = 2 the processor did not appear to “do nothing” but, instead, performed an operation on an encrypted string that was constant time?


I’m not really sure what you’re asking. By definition, FHE cannot be Turing complete. A Turing machine must be capable of performing operations that cannot be handled in constant time, and for which the timing reveals information about the input. That is a fundamental feature of Turing machines; it’s the whole basis for how they came about.

There’s no way to do a FHE style system with branching unless all branches are indistinguishable from one another. All branches must execute in the same amount of time, otherwise statistical analysis will leak information.

Forget Turing completeness for a second, that’s not actually what you want for a FHE style system. You want computational indistinguishability which can’t be achieved without side-channel resistance.


Why can't FHE be capable of performing unbounded operations?

Seems like I didn't put enough thought into it and my computability class has been too long ago!

In hindsight it's kind of obvious: All functions computed by circuits must be a function an can not be a partial function, because a circuit can't output nothing.


in FHE you need to execute all branches, but the result could be just like you executed the right one.

Consider the following conventional code:

  if (flag) {
    a = b + c
  } else {
    a = b * c
  }
In FHE it could look:

  a =  flag * (b+c) + (1-flag) * b * c
That way, you can express almost arbitrary programs. I don't make a claim that a Turing machine could be, though.

Okay but that example is just one conditional branching. Turing-completeness requires that you be able to e.g. keep looping back until some condition is met.

If you want to build arbitrary functions like you've described, it reduces to the question of whether you can "unroll" the function, given its inputs[1], into a (stateless) circuit. But that would still IIUIC just get you primitive recursive functions (i.e. those where you can know the upper bound on the number of loop iterations for any loop before it starts), not the general recursive functions you need for Turing-completeness.

Anyone know how you pull off general recursive functions from just addition and multiplication?

[1] and in FHE you don't know the inputs either, which is another kink.


> that would still IIUIC just get you primitive recursive functions (i.e. those where you can know the upper bound on the number of loop iterations for any loop before it starts)

The primitive recursive functions cannot be expressed using circuits. Circuits by definition can only express polynomials, which have a much smaller growth rate than what primitive recursive functions are capable of.

[edit]

Actually, probably a better reason is that in primitive recursion, the running time can depend on the input. For a circuit, the running time is independent of the input.


The circuit that repeatedly multiplies the previous result with itself has higher than exponential growth rate.

No, because you can only multiply a fixed number of times.

Good luck finding something that grows faster than https://en.wikipedia.org/wiki/Ackermann_function

Easy, just square it. Also, the Ackermann function is not a primitive recursive function.

FHE is just a math equation. Can you accept the requirement that you process the state more than once? If so you could do something like encode the atomic state of a silicon circuit somehow and step it forward in time.

That is probably what the researcher is referring to. You can encapsulate computation, but not in a single math equation.


The big question isn't branching, it's unbounded loops.

It seems to me the only option is unrolling those loops. But that has a massive performance penalty. Because you need to always run through however deep you unrolled.


You can't unroll an unbounded loop, you don't know how many times to unroll. To have Turing machine you need to need to know from your program state if you reached the end of your loop or not (i.e., you need to evaluate the "while" condition), and by definition FHE shouldn't allow you to extract information from your program state.

You can unroll a sufficient number of times. There would be an upper limit on iterations and it would presumably be horrible inefficient, but I can't see why it wouldn't work.

If you know how many iterations are sufficient, the unbounded loop could have been written as a bounded loop in the first place. For truly unbounded loops, you’re still stuck.

I mean, that's the same as saying no computer is Turing complete because it doesn't have infinite memory.

I'm not saying it is a practical solution to unroll a loop that might run a billion times.


I am not saying there are not practical ways to do useful computation with this stuff. I am commenting on the theoretical remark that any algorithm (as in any Turing machine) can be executed with FHE, which is false, as it can be proved that it is impossible to compute, given an abstract Turing machine, an upper bound on the length of its execution.

I believe that's not the issue: the remote machine which performs the calculations on the encrypted data is TM complete and can perform unbounded loops. Comparison is the puzzler for me too: e.g., how do you take min(a, b) if you don't know how a and b compare? We know that F(a) + F(b) = F(a + b), so the remote machine can use "normal" addition operations, but still a > b and F(a) < F(b) can both be true, so it can't use "normal" comparisons, unless there's some implication of the properties of FHE I didn't see.

In theory we could compute the full table of values of a function like less(x: F, y: F) -> {0, 1}, interpolate a bivariate polynomial containing all those values, and build a circuit to evaluate that polynomial. Its degree would be around 2|F| though, so it would be completely impractical.

In SNARK programming, we would usually decompose field elements into bits, then apply a binary comparison circuit. Binary decomposition can be done non-determinstically, by having the prover give the purported bits as "advice" inputs, then having the circuit accumulate those bits and check the sum.

I haven't done any FHE circuit programming, but I imagine that approach wouldn't work, since the circuit must be deterministic. I imagine any values that might need to be compared would just be encoded as binary from the start. (Or is there a better approach?)


Interesting idea, but then it's not TM complete, since the problem space is by definition finite, isn't it? So it seems to me the importance of comparison for TM-completeness is still unsolved.

Then again, it just occurred to me that perhaps fully homomorphic implies that F(0) = 0 and F(1) = 1, so a > b could become F(a - b) ρ 0, and you only need to determine beforehand if ρ is > or <.


This is generally solved by unrolling the program into a single loop. Essentially having a function state(N) => state(N+1), shouldContinue.

This exposes the number of iterations needed to finish the computation. That could further be limited by underlying code rounding the iterations to mask the real count.

TLDR: either the run time is fixed or you leak some info via the run time.


Couldn’t the results indicate the need to re-run a circuit? While f(x) x=g(x) would simply return f(x)||g(x).

To summarise the other comments: No, it's not actually Turing complete. The programs you can express using circuits have a fixed running time.

Part of what's going on here is conflicting definitions of what "Turing complete" means in a formal context and how it's often used among practitioners.

A specific circuit has a fixed running time, but it can do arbitrary computations in that runtime, including emulating a fixed number of steps of an arbitrary Turing machine.

Our physical computers also have limitations (particularly space) on what Turing machines they can simulate, however they are often called "Turing complete" in general parlance. Obviously a laptop though has limitations that are on completely different orders of magnitude though, and are much more efficient.

Many "surprisingly Turing complete"[1] systems have similar limitations, but the more interesting question is "can you use this system to simulate a bounded number of steps of a Universal Turing machine?" and the answer for those systems, and FHE is yes.

[1]: https://www.gwern.net/Turing-complete


Like a physics simulation you iterate the cryptographic math over time. The physical laws themselves are incapable of expressing computation, it is the effect they have on a system evolving over time that produces what we call computation.

Oh. But FHE means Turing-complete, right? So what else do they have besides addition and multiplication that allows them to do arbitrary computations?

FHE just means that you can do addition and multiplication on ciphertexts. The ability to do that implies that you can build logical circuits. Those logical circuits are highly expressive, but not Turing complete.

The definition of FHE specifies arbitrary computation:

https://en.wikipedia.org/wiki/Homomorphic_encryption#Fully_H...

If it's really just addition and multiplication, it's not FHE.


Arbitrary computation can still be bounded. [1] for example cites the definition of FHE (Def. 9) as a scheme that can compute all circuits, which are fairly powerful [2].

Arbitrary computation is also, afaik, not well defined.

edit: After putting some thought into this, I feel like the idea is: Circuits can perform arbitrary computations on a limited size input, but not on an arbitrary size input. Loops are not a problem, as they are an implementation detail. Unbounded output is not possible on a touring machine that halts for all inputs of size s limited by some constant. There exists a circuit that would compute the same thing, as you can take the (binary encoded) results of the touring machine and just create a truth table from that, which describes a circuit. Circuits are interesting because they are usually limited in size.

[1] https://eprint.iacr.org/2015/1192.pdf

[2] https://en.wikipedia.org/wiki/Circuit_complexity


>Arbitrary computation is also, afaik, not well defined.

Yes it is, it's Turing-completeness.


> Yes it is, it's Turing-completeness.

While this may be a sensible interpretation, and I am willing to believe it's used in that interpretation in some fields, I have a hard time actually finding a definition of that form.

I also have a problem with using it in this way: In circuits the size of the circuit limits your input size. But given a maximum size of the input, and no size limitations on the circuit, you can construct a circuit that produces the same result as a turing machine that halts. So you can compute whatever you want (i.e. arbitrary) but you are limited by the size of your current circuit.

(Completely unhelpful but always fun to look at for words: https://books.google.com/ngrams/graph?content=arbitrary+comp... )


As gone over in the discussion above [1], you can only compute those TMs that you can unroll into a fixed size circuit, which means (at most) only the subset of TMs that halt; doing this in the general case requires a halting oracle, which is not possible. That's not arbitrary computation by any reasonable definition.

Furthermore, it adds a huge constraint to programming: you can only implement loops with iterations bounded by a constant known at compile time. (And if you derive any of those constants from the inputs at the time you're compiling it for the FHE machine to execute, you've leaked information about said inputs.)

[1] https://news.ycombinator.com/item?id=24019212


My computer is not Turing complete, yet it can do arbitrary computations.

FHE is bounded-memory Turing complete, the same as every computer on the planet.


See my cousin comment [1]: your computer does not require that you be able to determine whether your programs halt and then unroll them into a fixed-size stateless circuit, so that's a big difference.

[1] https://news.ycombinator.com/item?id=24022339


There are two solutions for this. 1) The halting problem is decidable (but untractable) for bounded machines. 2) Instead of deciding when to halt inside the machine, do it outside with continuations. Encrypted computing will always enforce time-limits thus this is not an issue.

With addition and multiplication you can do a lot. I think there is a term for circuits that can be “arithmetized”. A loop which iteration count is based on the value of the encrypted input seems to be out of scope.

I think the Wikipedia page is oversimplifying somewhat at the cost of accuracy.

Weird, feels kind of misleading just on general principle to call it "fully" when it doesn't do the full set of computations.

I found this link [1] which makes this distinction:

> Partially Homomorphic Encryption (PHE): In PHE scheme, only one type of mathematical operation is allowed on the encrypted message, i.e., either addition or multiplication operation, with unlimited number of times,

> Somewhat Homomorphic Encryption (SHE): In SHE, both addition and multiplication operation is allowed but with only a limited number of times.

> Fully Homomorphic Encryption (FHE): FHE allows a large number of different types of evaluation operations on the encrypted message with unlimited number of times.

And even then, multiplication is just repeated addition, so it still feels off to say it goes beyond partially homomorphic in the above schema. (Edit: actually, for arbitrary inputs, you can't reduce it like that, so ignore this para.)

In any case, original claim from the article that started this thread [2] is wrong.

[1] https://www.sciencedirect.com/topics/computer-science/fully-...

[2] "Since all mathematical and logical operations can be built from additive and multiplicative operations, "


Regarding [2]: Xor and And form a basis of all logical operations, while Xor on its own (or And on its own) do not. In GF(2) multiplication is xor and addition is and. So addition and multiplication are enought to build all possible circuits.

Another way to look at it is, in homomorphic encryption you have the problem that in an operation both operands are encrypted. Therefore repeated addition based on one operand is not an option.


In GF(2), addition is xor and multiplication is and.

Ah I am way too confused these days, thanks for the correction :)

Actually, Gentrys thesis abstract says this [1]:

> Such a scheme allows one to compute arbitrary functions over encrypted data without the decryption key [- i.e. ...]

My understanding is, while FHE can not implement an algorithm to solve a problem of arbitrary size input, it can apply arbitrary computations to a limited size input.

[1] https://crypto.stanford.edu/craig/


But neither your computer can if you bound your PC to always take the same amount of time to execute any problem you give it.

The issue with such a system that would allow arbitrary computation is that it would certainly allow to extract the value, which mean it would be simpler not to encrypt it in the first place :)

Comparisons can be done via subtraction.

Result has no bit set = both terms equal Result's sign bit is set = second term bigger Result's sign bit isn't set = first term bigger

Multiplying by the result of comparisons (aka 0 and 1, as you state) is actually a common performance optimization. Does not work for everything, but when it works it works well.


Yes, but you can't branch on the result, because you don't know if the result is zero or not. So your program needs to have fixed control flow, i.e., you cannot implement arbitrary programs.

There isn't branching in the traditional sense, but we can work around it. Instead of doing

    result = if condition { a } else { b }
in circuit programming, we normally do

    result = condition * a + (1 - condition) * b

Sure, that was already mentioned on another comment, but that's not enough for proper branching. You also need while loops. See also comment https://news.ycombinator.com/item?id=24018284.

If you constrain your computer to always take the same amount of time to execute any while loop (which a homomorphic computer needs to do not to leak any data), your computer would also have unbounded circuitry complexity and thus it’s not any better.

Also unbounded loops can’t practically finish even on a Turing machine :)


That's my point: not any Turing machine can be implemented with FHE, because FHE must have bounded execution time.

You can either say “a Turing machine can not be implemented” or “it can” but never “not any Turing machine” because there isn’t any types of a Turing machine - just a single one.

You don’t need to branch, you execute both branches at the same time.

You can implement comparisons using addition and multiplication

(ANDs and XORs are just multiplication and addition over bits, and we build comparisons out of these all the time.)

Re: the ability to do control flow, you can implement your Turing machine automaton as a circuit easily, and can then evaluate that using FHE


Quick counter-proof: Multiplication and addition are enough to implement neural networks, which could eventually implement logical circuits.

Code might finaly look like with movfuscator (MOV on a intel processor is also turing complete).

Personal pet peeve that whenever someone brings up the movfuscator, all the responses seem to of the “well akshhhhuallly...” variety, I’ve never understood it. The responses here are generally incorrect; x86 mov does not have conditionals (that would be ccmov), cannot load the program counter, and can’t do general arithmetic beyond very standard indirect addressing ( the same forms allowed by many RISC architectures). movfuscator actually includes a version that uses only two forms of mov (a load and store form) that would work on nearly any common RISC architecture. So the assertions that “this only works because x86 mov is ___” are generally incorrect, and needlessly dismissive of an interesting idea - the addressing used by movfuscator is just addition/subtraction. Certainly at least, at a broader level, the idea that it may be desirable to reduce general computation to some extremely simple form is interesting.

Movfuscator only works because x86's MOV is like the Swiss Army knife of move instructions. It doesn't just copy bits from one location to another, but has parameters for conditionality and arithmetic, which is what allows it to be so general. That still doesn't address the parent's confusion (that I share) of how you get Turing-completeness just from addition and multiplication.

True, but there are simpler (abstract) machines that also achieve Turing completeness: https://en.wikipedia.org/wiki/One-instruction_set_computer

> That still doesn't address the parent's confusion (that I share) of how you get Turing-completeness just from addition and multiplication.

I wouldn't say arithmetic circuits are Turing complete since they can only perform bounded computations, but we could say that they're functionally complete. It is known that boolean AND and OR gates are functionally complete, and we can reduce them to multiplication and addition in any finite field:

    x and y = x * y
    x or y = ¬(¬x and ¬y) = 1 - (1 - x)(1 - y)

What does "functionally complete" means for you?

Informally, I mean reducible from time-bounded Turing machines. Given a Turing machine TM and a time bound T, a (binary or arithmetic) circuit can be constructed to simulate TM for T steps. Or in other words, a circuit can be constructed to solve any particular instance of FNTIME(T).

You get bounded-Turing completeness. This is sufficient for all practical applications as we live in a finite universe.

The reason why Intel mov is so powerful is because the mnemonic actually maps to tons of different things. In particular conditionals are accomplished using indirect memory accesses.

No, it can't. FHE cannot emulate the whole MOV, especially the features that make it Turing-complete.

Any program can be compiled to a single `while` loop with a `switch`. I can think of two ways to "drive" this `while` loop:

- If the program is meant to run forever, or as long as possible, then the system performing the computations can drive the loop. This might be useful for anytime algorithms (e.g. an AI/optimisation problem): e.g. "keep iterating this computation (unless my billing account runs out of credit), I might ask for the current state at some point in the future".

- Programs which eventually halt could be checked by the key holder: the result gets decrypted, and if it isn't in the 'halt' state then it's sent back for another 'step'. In practice this could be made more efficient by e.g. making the computation idempotent, so it can be iterated many times without having to ask for confirmation every time; by having the encrypted computation perform many iterations, so each 'step' does more work; etc. If the job is kept on the encrypted system, with the key holder sending it step/stop instructions, that leaks details of the computation (i.e. an upper-bound on how many steps it takes). If we're willing to round-trip the data instead, then the key-holder can re-encrypt it each time (and jitter its timings), so the server couldn't tell the difference between stepping the same job and computing multiple jobs. This could be further obfuscated by throwing our jobs into a pool of others, e.g. on a shared proxy.

This model of having some external entity "driving" a computation's loop appears in a few other places too:

- Smart contracts, like Ethereum, give programs "fuel" that decreases with each step, and calling another program requires us to give it some of our fuel.

- Optimisation algorithms can also use "fuel" to avoid infinite loops. Some compilers use this to avoid transformations getting stuck in a cycle (rewriting the code between one form and another).

- Total (functional) programming languages, like Agda and Coq, don't allow general recursion; yet they do allow co-recursion. This lets us wrap each recursive call (i.e. step) in a constructor, then we can "drive" the program by having an external program unwrap these constructors (AKA a main loop).

- Coq's Mtac language lets us use general recursion in our proofs, but the results get wrapped up (just like the co-recursive example above). The compiler will run these proofs to completion: if they don't halt then neither does the compiler.

- CSS can be "driven" by user clicks, which together become Turing complete: https://notlaura.com/is-css-turing-complete


I agree. These are totally legitimate ways to still use FHE even if you need to compute a Turing-complete thing. It doesn't thwart my theoretical point, because you're not doing pure FHE anymore: you're using FHE for the "Turing-incomplete" part of your algorithm, plus some external oracle to check for termination.

Practically this might still be an acceptable way to tackle the problem. I mean, "practically" as much as FHE itself is.


But just 2 months ago there was an article on HN that stated:

The database is a key value store prepopulated with the english names of countries and their capital cities from the continent of Europe. Selecting the country will perform a search of the matching capital. On a 2019 Macbook Pro laptop, the example searches take under 80 seconds.

https://news.ycombinator.com/item?id=23435305

And today this article claims only a 40x compute cost for "machine learning"?

What is the cause of the disparity?


I think the 40x overhead is a case of comparing throughput overhead (from what I know, FHE based secure inference protocols have poor latency, but can process many predictions in parallel, improving throughput)

My layman guess is that the FME penalty goes up exponentially to the complexity of an operation.

Doing a exact string match on 200-ish rows in 80 seconds on a modern computer is so inefficient that I have a hard time seeing any less complex but useful operations whatsoever. Perhaps I'm just not clever enough, but for now homomorphic encryption seems like it isn't useful for common, real world usecases to me.

This isn’t true, it’s just that different kinds of operations are more or less efficient in FHE

Someone please correct me if I'm wrong, wouldn't this still be vulnerable to side channel or pattern analysis attacks? Also, how can all operations on a DB like projection, and equality be represented by addition and multiplication? Are we in the peano arithmetic space here? Can somebody break it down from me?

You answered your own question accidentally: side-channel attacks are not possible in this model of computation because everything is represented as addition and multiplication. The homomorphic ciphertext doesn't tell you when it's done, you sort of just do all the operations you're told and hand back the answer gaining no knowledge about the computation you just did.

That's not to say that a badly-designed usage of homomorphic computation can't introduce a timing channel, though. If there's a point in time where the result is handed back to the client, they inspect it, and then conditionally send more computations, you might be able to infer something with either the client's timing or if they happened to send back more computations or not. However, that assumes that we'd be able to trace multiple API requests associated with the same data set (as opposed to unrelated computations), and that we know enough about the computations we're being told to do to infer a timing channel from them. That's why I use the qualifying term "badly-designed", which in practice will probably be most uses of this technology.


So basically I tell you to do a bunch of stuff and give it back, and you have no idea if the computation is a partial application/full application etc. Neat. Next question: if my previous statement is correct it seems that it may push a lot of the work back onto the client, otherwise I don't get how a server couldn't figure out relationships from access patterns on blocks. This would limit the utility of the server right?

Yea, fully homomorphic querying of a database necessarily requires reading all of the data, or you could glean information from the access pattern.

You can get a lot of the benefit of a linear scan using a technique like oblivious RAM but I’m not convinced it’s possible to build such a system without using a Secure Enclave. There are some papers that purport to implement this but once you’re dancing in the realm of Secure Enclave’s you’re out on a different limb.

I don’t think it’s possible for a circuit to operate over a large set quickly and use a fully homomorphic pure software design.


With some minor interaction between client and server, you can use ORAM in conjunction with FHE to enable sub linear database scans. It’s not practical for deployments, but is probably feasible for implementations

I’m not convinced you can do ORAM without a Secure Enclave. If you’re already using an enclave, you don’t need FHE.

There can always be side channels. Side channel means some information channel that your model doesn't account for. By definition, it's not possible to be sure an implementation of a model doesn't have any side channels. The best known class of side channels is timing attacks, where the amount of time that an operation takes leaks some information about its content. This is a well-known area of research, and most modern implementations of cryptographical algorithms do take it into account; I expect this one does, too. If you can model the time an operation takes, you can prove that a given implementation has no timing channel. But any number of other side channels could potentially exist in this (or any other) implementation.

Got it. I took another pass through the article and was able to understand it a bit better with everyone's comments. Thanks all.

That's a lot of questions, but AFAIK it is not vulnerable to side channel analysis. That's why the author spent time on the database search use case; you must visit every row because you don't want an attacker to be able to segment your data at all. It would be possible to make trade-offs in that specific case, but cryptographic engineers are loathe to give the common programmer such a foot-gun.

Yeah, this stuff is cool so I'm trying to wrap my head around it. It sounds from what you're saying that you're trading a lot of efficiency for protection, but I can't see any way around it.

> Above, we can see charts indicating the additional compute power and memory resources required to operate on FHE-encrypted machine-learning models—roughly 40 to 50 times the compute and 10 to 20 times the RAM that would be required to do the same work on unencrypted models

Are those multiplicative factors specific to machine learning tasks, or is it the same for general purpose computation?

In other words, is there something about the structure of machine learning tasks which makes them more suited for FHE than other computing tasks?


> In other words, is there something about the structure of machine learning tasks which makes them more suited for FHE than other computing tasks?

Yes, FHE is not (currently) good at general-purpose computing. Complex conditional logic for example is a non-starter because every branch has to be evaluated. But tensor multiplication is well supported with existing FHE schemes, and many kinds of machine learning tasks consist of non-conditional tensor multiplications.


> Complex conditional logic for example is a non-starter because every branch has to be evaluated

Is overcoming this limitation of FHE anywhere on the horizon?


No, it's pretty fundamental. If the executor were able to know which side of the branch to execute it would mean the executor knows the value of the conditional.

So [1] claims "A variant of fully homomorphic encryption scheme for Turing machines, where one can evaluate a Turing machine M on an encrypted input x in time that is dependent on the running time of M on input x as opposed to the worst-case runtime of M".

I admit understanding this paper is a bit beyond me. But, does their construction evaluate Turing machine conditionals by evaluating both branches? If it did, how could the running time be "dependent on the running time of M on input x as opposed to the worst-case runtime of M"? It seems to me that having to evaluate every branch irrespective of the input would result in the running-time always being M's worst case, not varying runtime depending on the input.

[1] https://eprint.iacr.org/2013/229.pdf


I think this is a different approach to FHE that concedes to timing based side channels to get this speedup.

I don't think there's much that can be done, except trying to identify any circuitry shared among branches and moving it out. Like if we have "if (cond) { hash(x) } else { hash(y) }", we can move the hash circuit outside of the conditional.

There are some circuit compilers that try to do optimizations like that automatically, like xJsnark, though it's not meant for FHE.


Re your last question, I don't think so. It's just an attractive application, hidding neural network weights is valuable in some settings, for instance.

Its less about structure and more about use cases. In my opinion there are only two major uses of data collection by organizations - Machine learning training data and targeted advertisements. A federated fully homomorphic system will allow these companies to train their models without invading personal privacy.

It will also allow organizations to train models collaboratively in a privacy preserving way, which if you think about it has lot of consequences.


set intersection is a big one.

If you want to share data on common customers, without revealing who your unique customers are, you need set intersection.


I found the comment at the bottom of the page from the "Ibmresearch" user helpful in understanding FHE.

To quote his TLDR:

> TLDR Huge polynomials are what are operated on instead of plaintext values by hiding the real data in the polynomials with intentional noise to make it infeasible to extract the real data without the decryption key that knows how to remove it. All operators are bootstrapped from addition and multiplication and elegant modulus trickery. Noise is managed at each operation because the noise compounds with every logic operation you do. The result of an operation chain or circuit be it one operation or infinitely many, therefore, has some noise but can be decrypted by the person with the encryption key. The decryption effectively drops all the noise and hones in on the bits that matter. The person doing the computation is doing a lot of adds and multiplies on big polynomials so they cant understand what you are really trying to do.


Is there a recent breakthrough in FHE? The last I checked its no where near practical.

I think this is exciting stuff. We're actively building prototypes around FHE concepts. Basically asking the question: "Can we make meaningful care and hospitality predictions, without ever seeing sensitive data?"

We've banged out some stuff with HElib, and some other interesting implementations, but are just beginning.

And we're hiring for both backend and data science positions. :) jobs@theembassies.com


But how much slower? At what cost?

> With SEV enabled, an operator who has root privilege on a host system can't inspect or meaningfully alter the contents of RAM in use by a virtual machine running on that system.

Is that true? I would like to know more on what kind of garanties SEV gives / how it works high level, any resources you can recommend? I assume that at least when the VM is being launched, the sysadmin can mess up with the VM?


It is true if you assume SEV has no sidechannel vulnerabilities and that noone can uncap your CPU and read out the cryptographic material with an electron microscope.

Which both are probably untrue assumptions :-)


Is anyone working on this for contact tracing? if encrypt( my_long_lat, PK ) - encrypt( your_long_lat, PK ) <= 5m; raise_alarm

The article claims that oblivious query, set intersection, and machine learning on private data are not possible without FHE. However, aren't they all possible either with secure MPC or hardware based enclaves e.g. AMD SEV?

I would argue that secure enclaves do not exist in practice. You have to assume a physical device where the private key cannot be extracted and the operation cannot be observed. The threat model is much weaker than for FHE, and imo not really useful for operating in large sets of private data.

MPC introduces a trust assumption. MPC is only private if some threshold of the multiple parties are honest and destroy their secrets. Though often this threshold is just one, efficient MPC often only has 3 participants total.

FHE gives you a better security model than either of these, as it neither relies on physics for safety nor needs to assume any level of honesty from the person running the computation.


In many MPC schemes you only need to trust yourself; as long as you keep your stuff private, nobody can learn the MPC secrets

FHE is also not obfuscation -- you can't just decrypt the output. Makes it a lot less useful than people usually imagine.

How does secure multiparty give access to obliviousness?

I was thinking MPC might work for set intersection. For oblivious query, can't you do it by sending the query directly into the remote trusted execution environment, encrypted with the TEE's public key?

The extra computational power requirement is still pretty large... The math behind it doesn't seem to be anything new, there's a reason why these solutions aren't very commonplace yet

Reminder, FHE not being Turing complete isn't important. Every computer ever built has a finite memory and isn't technically Turing complete.


Just in time for the EARN IT Act. :P

Even if it worked, I wouldn't want the government scanning through data for whatever they decided was subversive or suspicious through undefined filters and unaccountable agencies.

If this works, see a huge impact on federated learning or training of NNs in general

this still isn't making sense to me, is there a simple math example of how this is even possible if encrypt(1) = a, then how can encrypt(1) + 2 ever give me the correct answer?

You need encrypt(1) + encrypt(2). You would not be sent instructions to add 2. No information about what you are doing is sent, except the operations + and x.

As the comments in the article state, it's actually all done with operations on polynomials with encrypted coefficients. As + and x can be done on polynomials, it all works out.

The maths behind it is of course much more complicated.

As for a simple maths analogy, consider adding 1 + 2.

I'll encrypt your values by multiplying by 3 mod 7. So encrypt(1) = 3x1 mod 7 = 3 mod 7 and encrypt(2) = 3*2 mod 7 = 6 mod 7. Only the encrypted values are sent, along with the operations I want to perform on the values.

Now this scheme is homomorphic, as I can just add the encrypted values, encrypt(1) + encrypt(2) = 3 + 6 mod 7 = 2 mod 7. That is the only computation that would be done before sending back the result.

To decrypt it, I would have to divide by 2 mod 7 by 3, which gives me 3 (you can easily check that encrypt(3) = 2 mod 7). And indeed 1 + 2 = 3. So the decrypted answer is correct.

Of course this scheme is too easy to crack. The FHE scheme is not.


3 mod 7 + 6 mod 7 = 9

i think you meant divide by 3 mod 7:

9 / 3 mod 7 = 3


Yes, I meant divide by 3 mod 7.

> roughly 40 to 50 times the compute and 10 to 20 times the RAM that would be required to do the same work on unencrypted models

Too bad Moore's Law is dead, otherwise it would have been possible to run everything we run now in 10-12 years using Fully Homomorphic Encryption for the same cost!


To me the practical problem with FHE is that, if it costs 20 times doing the equivalent non-encrypted computations, it is cheaper to do it on your premises rather than rent 20x the same computational power from a cloud host, unless the cloud host is able to provide computational power at 1/20 of the cost it has for you (which seems too much to me).

Every year the number of people saying Moore's Law is dead doubles!

40 to 50 times is still orders of magnitude greater than FHE used to be.

If we keep up the trend (big if) this is actually quite promising


IMO a more practical approach would be to use ordinary encryption inside a CPU, so it wouldn't be possible to extract any data without extremely advanced methods.

Rough description: CPU have secure memory which contains a private key. Also CPU have certificate with corresponding public key. That certificate is signed by Intel.

Hosting provider publishes an API which allows remote user to communicate with that secure CPU, basically just transmitting encrypted stream to and from CPU.

When you're starting your communication, you establish an encrypted link between your machine and target CPU, located in a data center. You can check authenticity of that CPU by checking certificate signature. Then you're sending any code you would like to execute (encrypted by session key). CPU executes that code. It encrypts RAM in process, so it's not possible to read its contents. And sends back some information (again, encrypted) which you're interested in.

So you have computing resources which are protected from anyone except very few who could extract private key from secure chip. I think that this kind of protection should be enough for many uses. And it would not compromise on performance. The only real world issue would be side channel attacks.


You're essentially describing AMD's SEV[1] as mentioned in the article. It piggybacks on the memory encryption implemented in the Zen architecture by giving a separate key for each VM. Ostensibly, the host can't interfere or snoop in the VMs, assuming you trust AMD. I'm surprised it hasn't been more widely adopted.

> protected from anyone except very few who could extract private key from secure chip

The way I understand chip manufacturing it would be hard to diffuse a separate key into each chip. This means it'd only take one of those very few people to extract and leak the key (or cut out the middleman and leak it from inside AMD) to break it for everyone.

> The only real world issue would be side channel attacks.

This is probably a much bigger problem in practice. Spectre & related attacks have been effective against Intel SGX[2], another "trusted environment" inside a larger system.

[1]: https://developer.amd.com/sev/ [2]: https://arstechnica.com/gadgets/2018/08/intels-sgx-blown-wid...


> That certificate is signed by Intel. > So you have computing resources which are protected from anyone except very few who could extract private key from secure chip.

…and Intel. This has obvious problems in who the root of trust should be, as well as putting all the eggs in the same basket.

Also, what you propose is I think equivalent to the SEV technology mentioned in the article.


For this scheme you need to trust the CPU (meaning its maker, its implementation, etc). The point of FHE is not trusting anybody.



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: