
Cingulata: Run C++ code over encrypted data with fully homomorphic encryption - p4bl0
https://github.com/CEA-LIST/Cingulata
======
xsamgreen
FHE is undergoing standardization right now. The leading algorithm candidates
are based on lattice crypto which is currently quantum-safe:
[https://homomorphicencryption.org/](https://homomorphicencryption.org/)

Here are some mature and actively-developed HE libraries:

[https://github.com/homenc/HElib](https://github.com/homenc/HElib) (By IBM,
employed the inventor of FHE)

[https://github.com/Microsoft/SEAL](https://github.com/Microsoft/SEAL) (Best
tutorials)

[https://palisade-crypto.org/software-library](https://palisade-
crypto.org/software-library) (High-quality codebase)

[https://github.com/tfhe/tfhe](https://github.com/tfhe/tfhe) (Fast boolean
logic, used by Cingulata)

Good introductory material is hard to come by, but I like the first half of
this talk: [https://simons.berkeley.edu/talks/intro-fhe-and-
tfhe](https://simons.berkeley.edu/talks/intro-fhe-and-tfhe)

------
entelechy
Great work! This allows to perform arbitrary computation on untrusted devices.

However last time I checked computation in this scheme is ridiculously slow:
on modern machines, cutting edge implementation of FHE manage to get around
100 integer operations per second.

Never the less there have been some brave startups trying to commercialise
this technology:

[https://venturebeat.com/2020/02/18/enveil-
raises-10-million-...](https://venturebeat.com/2020/02/18/enveil-
raises-10-million-for-enterprise-scale-homomorphic-encryption/)

Other interesting things build on top of FHE:

sql database where data and queries are fully encrypted:
[https://github.com/zerodb/zerodb](https://github.com/zerodb/zerodb)

fully encripted brainfuck vm:
[https://github.com/f-prime/arcanevm](https://github.com/f-prime/arcanevm)

~~~
rhindi
There are two main approaches to FHE: homomorphic boolean circuits and
homomorphic numerical processing.

In the former (eg Cingulata), you convert a program into a boolean circuit,
and evaluate each gate homomorphically. While this is general purpose, it also
means you decompose functions that could be done in one instruction into
multiple binary operations (so very slow). That’s usually what people refer to
when they say FHE is slow.

The other approach consists of operating directly on encrypted integers or
reals, and finding ways to do more complex computations (like a square
function) in one step. While this is obviously much faster, it is also limited
to whatever operations is supported by the scheme. This is what people refer
to when they say FHE can only do certain things.

For years, the tradeoff has basically been slow and general purpose, or fast
and limited. But there are new scheme being worked on that will be published
soon that enable to go way beyond what’s currently done, such as doing
efficient deep learning over encrypted data and other complex numerical
processing.

Lots is coming out of labs and will be on the market within 2 years!

~~~
entelechy
ohh interesting! Are there any opensource implementations of homomorphic
numerical processing?

Were there any efforts of combining both approaches?

~~~
hedora
Note that most useful homomorphic numerical encryption schemes are easily
breakable. Once you have equality you can usually de-anonymize user data. Many
companies have been burnt by this.

With a less than operator and the ability to encrypt chosen plaintext values,
you can decrypt arbitrary messages in a linear (in message size) number of
steps.

Arithmetic operations can often be used to build gadgets that bootstrap
comparison operations. For instance, with addition and equality you can
implement a comparison operation for low-medium cardinality fields.

The field is littered with negative results that are being sold as secure,
practical systems. Be careful when using them on important data.

~~~
bondarchuk
> _and the ability to encrypt chosen plaintext values_

Isn't this a big assumption? The way I envision it is

1\. client encrypts data with their key

2\. server computes on data without decrypting and without needing the key

3\. client decrypts computation output with their key.

Or is it always required at step 2 that the server also has the key needed for
encryption (but not decryption obviously)?

~~~
littlestymaar
> Isn't this a big assumption?

The standard resilience criteria for modern multi-purpose encryption suppose
that your scheme should be resistant to adaptive chosen-cipher attack. Chosen
plaintext is a way weaker attack (the hierarchy being: known plaintext <
chosen plaintext < chosen cipher < adaptative chosen cipher).

It may be OK for some situations, but it requires to be much more cautious
than with regular crypto (which is already error-prone…).

------
speedgoose
How encrypted and private is the encrypted data in such systems?

When I looked at encrypted databases, the real ones, not the encrypted at rest
databases, I read comments saying that the crypto was relatively too weak to
have any use outside research. That it is a neat research topic, it will be
great eventually, but it's not ready for production.

So I went with the classic and simple solution : encrypt with aes256gcm, and
decrypt and reencrypt if I manipulate the data.

Does a system like cingulata offers encryption as strong or better than
aes256gcm?

~~~
qayxc
Systems like Cingulata are true end-to-end encrypted systems. No data is
decrypted at any point in the process.

All data manipulation and -processing takes place on encrypted data, as
opposed to the encrypted database you mentioned, which still decrypts its
contents in memory prior to processing.

The reason homomorphic encryption is far from being ready for production, is
that all operations (e.g. all your algorithms and programs) need to be
transformed to a virtual circuit that operates on cipher text encrypted by a
specific algorithm.

This is akin to translating your software into an inefficient byte code that's
then dynamically executed by an obnoxiously slow interpreter.

The great part is that you can simply encrypt your data on a local (and
trusted) machine, send the cipher text into the cloud for indexed storage or
processing and do your queries or operations on encrypted data. At no point
will your data ever be decrypted on the remote machine.

So there's great potential there w.r.t. privacy and cloud computing (and
especially AI where training data is often the "magic sauce" that gives your
company an edge over the competition) and SaaS.

~~~
speedgoose
Thanks for the explanation. The byte code analogy made me understand the
technology a lot better.

------
hansdieter1337
Reminds me of my work with CryptDB. A framework to run SQL queries over
encrypted data:
[https://github.com/CryptDB/cryptdb](https://github.com/CryptDB/cryptdb)

One important fact about homomorphic and other encryption schemes you can
calculate on: It leaks information! E.g., if enc(x) + enc(y) = enc(z), you
gain the knowledge that x + y = z. With enough data, it’s easy to obtain the
unencrypted data without the secret(s).

~~~
gautamcgoel
I think you meant homomorphic, not homophobic...

~~~
hansdieter1337
ah, hehe, yes indeed. Thank you, auto-correct

------
dekhn
During the COVID freakout, I opined that all the CPU/GPU spent on Folding@Home
was unlikely to create any sort of interesting breakthrough and it used an
absurd amount of CPU/GPU to not make breakthroughs. In the past, I've been
kind of opposed to using FHE on health data (seems wasteful; instead, should
be run on trusted compute with a decrypt and encrypt key). But for the amount
of wasted compute in CPU/GPU for molecular dynamics... coudl we instead
process encyrpted health data on consumer devices (I'm not aware of any health
data analytics problems that need to scale over millions of consumer devices).

~~~
lacker
For “big data” type tasks, the cost of shuffling around the data to thousands
of different consumer devices probably makes it ineffective. You’re better off
just paying for one centralized data warehouse, rather than all the custom
coding and decentralized bandwidth to make some consumer-based strategy work.
This seems more like it could be useful for building things like decentralized
encryption protocols.

------
pimlottc
> Cingulata (pronounced "tchingulata")

As an English speaker, this makes me more confused about the pronunciation,
not less.

~~~
zimpenfish
Same. You can't just steal words[1] and change the pronunciation!

[1]
[https://en.wikipedia.org/wiki/Cingulata](https://en.wikipedia.org/wiki/Cingulata)

~~~
asjw
Cingulata comes from the Latin words cingŭla or cingŭlum and it should be
pronounced with the soft C (like in Tchaikovsky)

> History and Etymology for Cingulata New Latin, from Latin cingulum, cingula
> girdle + New Latin -ata

[https://www.merriam-
webster.com/dictionary/Cingulata#:~:text...](https://www.merriam-
webster.com/dictionary/Cingulata#:~:text=History%20and%20Etymology%20for%20Cingulata,cingula%20girdle%20%2B%20New%20Latin%20-ata)

It's a common mistake English speakers do all the times

The rule is quite simple: a C followed by the vocals e and i is always
pronounced tch in Latin.

Latins used K for "hard C" like in "corn" and S for the s sound like in "cent"

~~~
ReactiveJelly
If Tchaikovsky is Russian and the translation of his name into written English
is arbitrary and it's pronounced "Chaikovsky", what is the "T" for?

~~~
asjw
That's also very simple: it's the transliteration of the phonetic symbol _'
tʃ_ which represents the soft C sound as in cheese.

That's what happens when a language steals a foreign word (Latin in this case)
and changes its pronunciation

In Latin the way groups of letters are pronounced it's (almost) unambigous,
it's a phonetic alphabet itself

If you change the pronunciation, that part is lost and you have to rely on
recollection instead of recognition

It also means that if you don't already know a word you can't be sure on how
it is pronounced

------
asdfaoeu
47 upvotes, no comments. I'm guessing no one understands this.

~~~
jacobush
Yep. Was hoping for someone to debunk or explain. My (probably deeply flawed)
hunch was that one would need some kind of specialised virtual machine or
something to make homomorphic stuff. I'm babbling now.

Yes, someone please explain Cingulata.

~~~
qayxc
Cingulata is a system for secure, privacy preserving applications based on
homomorphic encryption.

Homomorphic encryption is a form of cryptography that allows for operations on
encrypted content to yield the same results (though encrypted) as if applied
to plain text input.

Basically, the framework takes algorithms implemented in C++ into a virtual
boolean circuit that operates on encrypted data. This means you can run your
database, AI training, page ranking, etc. on encrypted data for a truly end-
to-end encrypted processing or computation on untrusted devices or
environments (cloud computing!).

This comes at a price, though, as the virtual circuit is basically a software
interpreter for your original algorithm and thus is abysmally slow compared to
the original code...

At least that's my understanding of the system.

~~~
radres
So I can give an executable to anyone in the world and they cannot get to my
source code?

~~~
layoutIfNeeded
No.

You have a function F which maps inputs x to outputs y. You transform this
function into new one F', that will map the encrypted inputs encrypt(x) to the
encrypted outputs encrypt(y), _without knowing how to decrypt_.

    
    
       F(x) = y 
       F'(encrypt(x)) = encrypt(y)

~~~
russfink
This is correct. Additionally, the entity performing the computation does not
get to learn the answer. This is an important distinction between a technique
such as this, and software obfuscation.

