
Wiretapping End-To-End Encrypted VoIP Calls: Real-World Attacks on ZRTP - TjWallas
https://www.sufficientlysecure.org/2017/03/15/zrtp.html
======
mtgx
For those interested, Signal doesn't seem to use ZRTP anymore:

> _The new Signal voice and video beta functionality eliminates the need for
> ZRTP. The "signaling" messages used to set up the voice/video beta calls
> (offer/answer SDPs, ICE candidates, etc) are transmitted over the normal
> Signal Protocol messaging channel, which binds the security of the call to
> that existing secure channel. It is no longer necessary to verify an
> additional SAS, which simplifies the calling experience._

[https://whispersystems.org/blog/signal-video-calls-
beta/](https://whispersystems.org/blog/signal-video-calls-beta/)

And it's not in beta anymore:

[https://whispersystems.org/blog/signal-video-
calls/](https://whispersystems.org/blog/signal-video-calls/)

~~~
sufficient
Author of the paper here.

Yup, in regards to Signal our findings are already obsolete :D I think that
the new Signal developments are great. It is better to allow only one key
verification mechanism for unified usability and also use key continuity.
Before, SAS needed to be verified for each call again.

~~~
jugbee
But isn't now with signal that you have to wiretap it once and your are good
to go since there are no sas every time?

~~~
Johnny_Brahms
Sure, but "wiretapping it once" would mean breaking a lot of well studied and
until now unbroken crypto.

------
rdtsc
The more interesting would be to see how feasible is to crack the in band SAS
authentication string, when callers verbally verify it.

Deep learning and ability to train on a specific callers' voice [1] then mimic
it might be an interesting attack vector. In practice Silent Circle's
implementation does something interesting and instead of SAS numbers use
dictionary words. So you end up with something like "Pink Elephant Salad".
Could probably MitM that. However callers are then supposed to make some extra
puns or discuss it a bit and say something like "Ha-ha! Wonder how tasty the
an elephant salad would be". And if after MitM-ing, the string to the other
side was "Plastic Blue Llamas" then a MitM attack becomes more obvious.

[1] [http://research.baidu.com/deep-voice-production-quality-
text...](http://research.baidu.com/deep-voice-production-quality-text-speech-
system-constructed-entirely-deep-neural-networks/)

~~~
sufficient
Author of the paper here.

There is existing work on testing the feasibility of impersonating other
person's voice. We discuss them in our related work section at the end of the
paper.

I think on the long run, SAS will no longer be a sufficient authentication
technique due to advances in speech synthesis. To prolong ZRTP's life we
propose usage of sentences instead of words/chars. This is discussed in detail
in our best practices section.

~~~
Fox8
This is a fascinating and well elaborated article!

I noticed that UX/UI is important and a guarantee that SAS should increase in
length, what are some of the recommendations that you advise to have a good
ZRTP implementation ?

Or should we start discussing the fadeoff of ZRTP and a change to something
like Matrix protocol or even Signal's one ?

~~~
lozf
Both Matrix and Signal use WebRTC for VoIP so the content is encrypted by
default. Call set up and signaling is also encrypted by default with Signal,
and possible with Matrix - it's automatic if the room from which a call is
established is already encrypted.

I know Signal attempts to prevent any data leakage by forcing the Opus codec
to use a constant bitrate instead of its default VBR -- I'm not sure if Matrix
implements anything similar yet.

------
lallysingh
AFAICT, This looks more like attacks on the implementations of ZRTP than on
attempts to find weaknesses in the underlying protocol.

~~~
jessaustin
One sort of assumes that from " _Real-World_ Attacks", no?

~~~
lallysingh
Or real world attacks on solid implementations.

------
ameister14
This is fascinating. Thanks for writing this paper, guys.

