
What end-to-end encryption should look like - lopespm
https://jitsi.org/blog/e2ee/
======
blendergeek
This is a dupe of
[https://news.ycombinator.com/item?id=22855407](https://news.ycombinator.com/item?id=22855407)

The URL is slightly different but it is the same.

------
jdc
In short:

* currently the Jitsi Meet videobridge sees unencrypted conversations

* they're changing that using a new WebRTC API called "Insertable Streams"

* it's currently in alpha with an open RFC

* they plan to use the double ratchet algorithm for key exchange in the future

~~~
tialaramex
The mention of double ratchet is confusing and maybe worrying in the sense
that it feels as though they don't understand what they're doing here.

The Double Ratchet is designed around synchronous linear messages:

Alice: "Hi Bob" Bob: "Hey" Alice: "Check out our new house guest!" <cat photo>
Bob: "Aww :cat-emoji: :heart:"

One of the two ratchets changes the key used for messages from Alice to Bob
each time Alice sends a message, ensuring that knowing one of these ephemeral
keys only gets you one message and resetting all the assumptions about how
much data can safely be encrypted with a single key. And of course likewise
from Bob to Alice.

The other ratchet uses the a Diffie-Hellman style algorithm to agree new keys
entirely each time they exchange messages back and forth, in order to repair
key compromise.

But, while this makes sense for an application that is periodically sending
messages it isn't a reasonable fit for video streaming. It doesn't make sense
to change the keys for each frame transmitted for example.

I _guess_ it can make sense to have the system automatically change the keys
when participants leave, somehow, as this means a participant can't secretly
eavesdrop calls they seem to have left. But I don't see how that's the double
ratchet.

Their example 'foo' password obviously is a placeholder, and I can see that if
they used a scheme like Jitsi's default random VC names
("BearsMasticateSteakImmediately") you could have something that can't
reasonably be brute-forced but exactly what happens with key exchange
definitely needs more thought.

The good news is that done correctly that's largely orthogonal to the
encryption problem. It's important, but it doesn't need to block the other
work.

~~~
OrgNet
> It doesn't make sense to change the keys for each frame transmitted for
> example.

If it doesn't hurt the reliability of the communication, I don't see the
problem... and it could have "hidden" benefits.

~~~
vermilingua
It could also have “hidden” complications, which I find much more likely.

------
kodablah
I have a question on the definition of "end-to-end encryption" with regards to
groups/rooms. Is it sufficient to have a single key shared by all "ends" for
the room, or must each user have its own key? If a single group key is
acceptable, must it be rotated as a member goes or comes?

I'm just looking for common definition of the term wrt webrtc groups for fear
of misusing it when describing a product. (for the purposes of this
clarification, pretend I have key distribution/derivation figured out and I
whether I use asymm or symm encryption doesn't apply to the question itself)

~~~
crazygringo
I think it's very much a gray area.

True E2EE seems like each endpoint would have to have its own key. But the
bandwidth realities of group videoconferencing make it entirely infeasible for
each participant to be sending a separately encrypted stream to each other
participant. So the only realistic solution is for all members to share the
same key, which makes everything dependent on the security of the key
distribution. But then... is it really E2EE? I'm not sure there's a generally
accepted answer here.

~~~
oconnor663
If you already have pairwise E2E encryption working, you can use it to
distribute a shared key. It's the first part that's hard.

------
Snelius
Too early, jitsi needs lot of work to get good basic features first. Their
"videobridge" just a pieces of software without docs, arch description, has
not possibility to run as pure SFU without any other jitsi parts like Colibry
XMPP. And etc. "InsertableStreams" scheduled for Chrome only and future 84
version, still experimental and if it's a after codec processing then useless.

~~~
VWWHFSfQ
Janus [1] is a much more capable webrtc gateway that's pluggable to do
anything you want. Jitsi is like a Wowza type server. Stay away.

[1] [https://github.com/meetecho/janus-
gateway](https://github.com/meetecho/janus-gateway)

~~~
jfim
Is it a turnkey solution? Janus seems more like a WebRTC gateway with no
directly usable end-user app (eg. video chat).

Jitsi Meet seems to be a turnkey thing where you can just self host an
instance and have users create video meetings without too much fuss.

~~~
gfodor
This might be useful:
[https://www.youtube.com/watch?v=u8ymYTdA0ko](https://www.youtube.com/watch?v=u8ymYTdA0ko)

Janus is more of a Swiss Army knife, vs a all-in-one app for a specific use
case for webrtc.

------
ccktlmazeltov
> From this key we derive a 128bit key using PBKDF2. We use the room name as a
> salt in this key generation. This is a bit weak but we need to start with
> information that is the same for all participants so we can not yet use a
> proper random salt.

1) You do not need to use a password-based KDF if your key is a random
bytearray of >= 16 bytes. I expect that most people would use their feature by
copy/pasting the key in email or whatever channels they're using to
communicate the room link.

2) I'm not sure about IV generation, maybe somebody with more knowledge on
SSRC/etc. can look at that

3) decryption error do not kill the connection, that's untypical but I think
that should be fine

~~~
phoe-krk
_> 3) decryption error do not kill the connection, that's untypical but I
think that should be fine_

I assume that this is so you can deal with the fact that some individual
frames have been lost and/or corrupted as long as the next ones decrypt just
fine. That is important for video streams; I can imagine that the quality of
life of people using E2EE streams would suffer if the stream was highly
vulnerable to corrupting even a single bit within any of the frames.

~~~
ccktlmazeltov
In practice a bunch of layers under the application one have checksums: UDP
already has a checksum, IP already has a checksum too, the data link layer
too... so maybe you wouldn't even reach a naturally corrupted AES-GCM
ciphertext? Not sure what are the probabilities for the UDP checksum though.

------
rixtox
I want to point out that E2E encryption on Web doesn't make your conversation
more secure than regular HTTPS encryption without E2E encryption. The core to
the argument here is the threat model. The whole point of using E2E on any
conversation is to make sure that even the service provider cannot read your
conversation. It's very clear that the threat model is againsting the service
provider. However, the very same service provider of your conversation channel
also provides the underlying encryption application on Web. That means, if the
service provider wants to act evil, it's always possible to sneak you an
application that steals your conversation by simply not applying E2E
encryption, or just eavesdropping before encrypting.

The root of the problem is that Web application doesn't have a root of trust.
As long as this problem is not addressed, E2E encryption on Web will always be
meaningless.

~~~
jandrese
This kind of thinking is why PGP never took off.

Yes, your ISP could hack the binaries, break the HTTPS trust model somehow,
and alter what you download, but practically speaking that isn't going to
happen. Trying to build your system to defeat an adversary who has that kind
of power ends up making it too cumbersome for the 99.999% use case.

What I don't understand is why the server needs to decrypt the traffic at all.
Why not just have it rebroadcast the encrypted streams? The endpoints could
exchange crypto keys with public key crypto and to the video server it would
just be a bunch of bytes. Clients would turn on and off video streams from
different clients based on network conditions and how much screen real-estate
they have. Audio could even be encoded on a different channel so it could
always be forwarded even if the video is not.

~~~
ComodoHacker
>What I don't understand is why the server needs to decrypt the traffic at all

In short, to make it scale for many (tens to hundreds) participants. Clever
techniques like Simulcast[0] and SVC[1] allow that, but routing server must
support it to meet different requirements of individual participants.

0\. [https://webrtchacks.com/sfu-simulcast/](https://webrtchacks.com/sfu-
simulcast/)

1\. [https://webrtchacks.com/chrome-vp9-svc/](https://webrtchacks.com/chrome-
vp9-svc/)

------
m3kw9
As if Zoom or any pro company doesn’t actually know. Bragging about knowing
what it means actually make them sound amateurish.

~~~
VWWHFSfQ
Zoom has fundamental problems that makes them not able to support E2E easily.
They only recently started sending video over WebRTC data channels. Before
they were sending over websockets. They're not even using the browser's native
media streaming APIs. So E2E is a huge problem for them to implement.

------
sandov
Tangential (and noob) question: Could there be an encryption+compression
scheme where:

1\. Sender sends an encrypted stream at K bps.

2\. Server takes the encrypted stream and compresses it, without decrypting
anything. It then sends encrypted+compressed stream at J bps (J<K) to end
recipient.

3\. The end recipient decrypts the compressed stream using a key provided by
the original sender, not the server.

This with reasonably secure encryption and reasonably size-efficient codecs,
obviously. So the step 2 would add compression additional to the compression
of the original codec.

Is this mathematically possible?

~~~
askvictor
Generally speaking, video compression algorithms are 'lossy' (cause a
reduction in quality) and need to be able to 'see' the video in order to
compress it. The compression typically removes details that are not
perceivable by humans (this is also the case for image and audio compression),
and by sending just the changes from frame to frame, rather than each entire
frame. For both of these, the compression algorithm needs knowledge of the
video stream, which would be impossible if it's compressed.

You could try to use a lossless compression algorithm (such as used by zip),
but those are effectively useless on video in general, and even more so on
encrypted data, which appears to be random.

~~~
sandov
Thanks for your answer.

> the compression algorithm needs knowledge of the video stream, which would
> be impossible if it's compressed.

That makes perfect sense and I guess my theory of secure encryption + post-re-
compression fails because of this. But what if we didn't need perfectly secure
encryption, but just per-block encryption. So the server knows that you've
sent 60 frames, but doesn't know what is in those frames.

What if the codec was made in such a way that the server knew that certain
blocks in the stream could be discarded to reduce size while the stream
without those blocks still makes sense to the recipient.

For example: the stream is composed of 64 byte blocks, but the codec says that
every 2 blocks there's a discardable block that adds image quality but is not
essential. So, with this knowledge, the server discards every 2 blocks when
sending that data to people with low bandwidth and sends the original stream
with all its blocks to those with high bandwidth.

It's an extremely naive scheme, but maybe this principle could be applied to
more complicated codecs, so the server only needs to know metadata about the
stream (where each block is and whether it's essential), but not the content
of the block itself (framebuffer and audio sample values).

I'm sorry if this idea is too dumb (and my English skills are not the best).

~~~
askvictor
Interesting; some time ago the Ogg audio transport was working on bitrate
peeling
[https://en.wikipedia.org/wiki/Bitrate_peeling](https://en.wikipedia.org/wiki/Bitrate_peeling)
with the basic idea that a stream can be encoded at one bitrate but can be
served at that or any lower bitrate. A simpler example is FM stereo radio -
the main frequency provides a mono audio signal, which works just fine, but if
the receiver can also pick up the stereo sub-frequency (containing just the
diff of left and right channels), it gets stereo.

Anyways, the wikipedia page linked above links to this about the same concept,
for video:
[https://en.wikipedia.org/wiki/Scalable_Video_Coding](https://en.wikipedia.org/wiki/Scalable_Video_Coding)
so it might be feasible. Not sure how feasible/secure encrypting this would
be.

------
john37386
Shouldn't they inspire the end to end encryption from openSSH. As far as I
know it's used by many worldwide and it seems fairly secure. Maybe I am
missing something though.

~~~
ffk
It looks like what they have now is openssh like. Client 1 sends data to the
server encrypted. Server decrypts. Server reencrypts and sends to client 2.

This describes how to communicate in such a way where the client 1 and 2 can
communicate with the server in the data path and The server being unable to
see the contents of the message.

The hard part is key management between clients in a secure way.

~~~
edoceo
You said "server decrypts" but then also "server being unable to see
contents". How? If I decrypt, I see contents.

~~~
ffk
I was unclear in my comment. I was describing a before and after set of
solutions. The former allows for server side decryption. The latter approach
would prevent server side decryption.

~~~
edoceo
Now I get it, thx

