Hacker News new | past | comments | ask | show | jobs | submit login
Zoom rolled their own encryption scheme, transmit keys through servers in China (citizenlab.ca)
1248 points by gasull 54 days ago | hide | past | web | favorite | 302 comments

Matthew Green's article on this is has a thread here: https://news.ycombinator.com/item?id=22771193

The Intercept article on it has a thread here: https://news.ycombinator.com/item?id=22767807

It's probably too much of a stretch to merge all these, because many comments are about specifics of those posts, and the ones that aren't are kind of generic and so maybe not worth merging anyway (https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...).

Hi dang,

Thanks in advance for all your moderation efforts that make HN the site we all love to use on a regular basis.

I'm curious if you've ever considered writing a blog post about some of the things you've learned from your years of moderation? You spend so much time on HN, you must have seen lots of patterns and have lots of insights on...well, everything that gets posted on HN to everybody that posts on HN. I'd genuinely be interested in reading it if you ever did.

Cheers, a semi-anonymous HN user

At first, the site attracted about sixteen hundred daily visitors, and Graham moderated and maintained it himself. Today, around five million people read Hacker News each month, and it’s grown more difficult to moderate. The technical discussions remain varied and can be insightful. But social, cultural, and political conversations, which, despite the guidelines, have proliferated, tend to devolve. A recent comment thread about a Times article, “YouTube to Remove Thousands of Videos Pushing Extreme Views,” yielded a response likening journalism and propaganda; a muddled juxtaposition of pornography and Holocaust denial; a vague side conversation about the average I.Q. of Hacker News commenters; and confused analogies between white supremacists and Black Lives Matter activists. In April, when a story about Katie Bouman, an M.I.T. researcher who helped develop a technology that captured the first photo of a black hole, rose to the front page, users combed through her code on GitHub in an effort to undermine the weight of her contributions.

The site’s now characteristic tone of performative erudition—hyperrational, dispassionate, contrarian, authoritative—often masks a deeper recklessness. Ill-advised citations proliferate; thought experiments abound; humane arguments are dismissed as emotional or irrational. Logic, applied narrowly, is used to justify broad moral positions. The most admired arguments are made with data, but the origins, veracity, and malleability of those data tend to be ancillary concerns. The message-board intellectualism that might once have impressed V.C. observers like Graham has developed into an intellectual style all its own. Hacker News readers who visit the site to learn how engineers and entrepreneurs talk, and what they talk about, can find themselves immersed in conversations that resemble the output of duelling Markov bots trained on libertarian economics blogs, “The Tim Ferriss Show,” and the work of Yuval Noah Harari.

Not sure whether to agree or disagree.

That sounds like the Hacker News that I know. It's worth trawling through the comments because you get gems among the dross, but it should be a trawl.

>The site’s now characteristic tone of performative erudition—hyperrational, dispassionate, contrarian, authoritative—often masks a deeper recklessness. Ill-advised citations proliferate; thought experiments abound; humane arguments are dismissed as emotional or irrational. Logic, applied narrowly, is used to justify broad moral positions. The most admired arguments are made with data, but the origins, veracity, and malleability of those data tend to be ancillary concerns.

Ah yes, I remember this article. Oddly enough, this is a major reason why I still enjoy browsing HN. There are few places left that cover a variety of topics while also encouraging this open style of discussion.

Another reason IMHO that this site has maintained its relative quality is the policy of effectively banning memes (whether textual or visual) and not embedding images from links or allowing them to be attached.

Someone needs to do a deep cognitive study of how memes, especially in image form, affect cognition during discourse. Whatever is going on it seems obvious that their presence immediately reverts people to a juvenile junior high level of discussion at best.

They also seem to lead to the development of this kind of obscure signaling culture hive mind. I don't even know how to describe it. In Star Trek TNG they called it the "borg songs," the sort of emotional-cultural carrier wave that pervaded the Borg collective.

Meme-dominated sites like 4chan immediately turn me off. I find that whole style creepy and culty and gross and it feels like some kind of soft mind control technique. I felt this way long before 4chan became "Stormfront for nerds," but the fact that this style if discourse would create an environment compatible with that way of thinking doesn't shock me at all.

When you mentioned borg songs, I internally cried because of the new Picard show. It broke the emotional core of TNG for a cheap thrill, and your mention of Borg songs reminded me of that.

Would still be interesting if someone could train language / interaction model on different years of hacker news to see how a certain topic would be discussed differently over the years.

Neat idea. HN explicitly allows reposts of older links after some time, so you would be able to see how discussion of literally the same article has changed in some cases.

I think the view is accurate from high enough, but experience depends a lot on some element that's probably time and possibly the alignment of the planets. You can tell when Mercury is in Gatorade.

Sometimes people are really cool. Sometimes people are fools. The fools make good points sometimes. The cool people can be the worst. The roles are fluid. I've had my fool moments.

dang, who I assume is a cyborg driven by several different mods, generally makes good calls. There aren't many places online where I feel like I can point out a problem to the mods and not worry about being called an idiot. Or banned.

Amount of 'smart sounding' and trigger words that are shoveled into this block of cra..text to semi-insult people is the exact reason I'm reading HN and practically nothing else online.

I find that the opposite is true. Most comments are emotional, and I have to search for controversial comments to find anything objective/worth reading. Like Reddit, this site is biased to the political left (USA). I wish we could just avoid all politics in tech.

I don't think it's possible to avoid politics in any profession. Teachers care about education reform, which is a political subject. Doctors care about the healthcare system, which is a political subject. Civil engineers care about building codes and safety regulations, which are political subjects. Accountants care about tax laws, which are an incredibly political subject. Social workers care about welfare programs, which is also a very political subject. You might not talk about it in the realm of politics while at work, but it has an impact on your work as well as the lives of the public.

You could even argue that being "apolitical" actually means that your political opinion is that the status quo is acceptable, which other people may disagree with. For instance, some people believe that selling water is unethical because it's a basic human need which you should always have free access to -- saying that the status quo is acceptable means that you don't think that for-sale water is unethical. There's nothing wrong with that opinion, but it is an opinion (and a political one at that).

I don't express my political beliefs here because I think I would be massively down voted. I think that's true for lots of other users.

The way it's presented makes a huge difference. People are obliterated for any political opinion if presented in a way that treats politics like a game instead of something that affects everyone.

Make a polite, clear-headed enough argument for something like eugenics and people will upvote you. It's troubling, but I can make a pro-socialism post under the same conditions. It's an uneasy peace.

> You could even argue that being "apolitical" actually means that your political opinion is that the status quo is acceptable

I agree with this. I see Cultural Hegemony (domination-justification stories) being very strong in unexpected places. Family members or close friends who indirectly benefit from the artificial scarcity mechanisms (in SV especially Intellectual Property) that make up the various social class realities of rentier capitalism [1]. What seperates me and a mother of three in Bangladesh, who is getting paid starvation wages making the designer clothes I wear [2], is that I was lucky enough to have been born in the global North, in an upper-middle class family. I was able to grow up on the inside of the fort, and benefit from a rich inheritance [3].

[1] https://www.resilience.org/stories/2017-08-03/book-day-corru...

[2] https://www.youtube.com/watch?v=OaGp5_Sfbss

[3] https://www.youtube.com/watch?v=NGnDDhco4gw

I'm starting to increasingly appreciate comments like this, because they appear regularly on both sides of the ideological divide, helping me keep my binary tree fresh and balanced. A few that go the opposite way:




Sorry if that sounded snarky—I don't mean to pick on you personally, and I don't disbelieve you. But the perception is explicable by the notice-dislike bias: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que....

I think people often mistake bias in the moderation for bias in the moderators, ignoring that much moderation depends on flagging, which in turn is influenced by the bias of the people who were attracted to the conversation.

People are political animals. The only way to avoid politics is to avoid people. Not exactly a workable proposition if you want to change the world with a billion dollar unicorn.

I think that's true, but it's also true that the distinction between curious conversation and ideological battle is a useful and actionable one, and that's basically what the site guidelines are staked on: https://news.ycombinator.com/newsguidelines.html.

So, if we can not avoid politics it is ok to encourage some views while discouraging the opposite views?

How can we know for sure we are right or even on the good side, whatever "good" means?

First we'd need to establish a general frame of reference, a perspective, which could be a human one.

From that perspective, the lowest common denominator is that bad is what harms humanity, e.g. by disregarding human rights. Human rights then becomes the new frame of reference, meaning that something can't be good if it's not within that frame.

Power struggle then becomes what are rights, and who owes them to whom? Who gets what rights and how are they enforced is pretty much the core of political struggle.

Power struggle based on immediate and future system relevancy takes place within that frame (innovation), in a democracy that frame (human rights) shouldn't be broken because it damages the belief in the system itself. An informed society acts in its best interest, if it doesn't the education system or entire system is corrupted and broken.

Perhaps it would be better to assume (until clear otherwise) that everyone has good intentions, but a different persp ctive? Then we can try to work together, as partners, to understand each other's perspective and position, rather than thinking of others as opponents to be defeated.

> this site is biased to the political left (USA)

when you're the USA, the world looks biased to the political left ...

Fascinating, thanks for sharing. Makes me reflect on the (relatively few) comments I've made here, and if I'm making a positive impact on the community.

Oh come on. It's always the good users who worry about this.

(I looked at your comments and the ones I saw are not only fine, they're exactly what HN is for.)

Hey, thanks! And kudos for helping to cultivate such a great community despite its massive growth.

I wonder how that approach can continue to make sense unless they hire more Lisp moderators.

I don't understand your point but I relate to the idea of hiring more Lisp moderators.

It sounds like you guys were hired to be both Lisp programmers for the site infrastructure and moderators. Presumably as the site grows, you will need to scale more of you.

I'm late to this thread because for some reason I only just noticed it. Thanks for the nice comment! It's fun to hear from someone who is just curious.

I do think of writing about HN, but not a blog post, because I can't think of a good reason. I don't enjoy writing (https://news.ycombinator.com/item?id=14431262), at least not that kind of thing. Somehow I maneuvered myself into a job that is basically being a writer...of that kind of thing. Writing because you enjoy it, or were born to do it, is the best reason; failing that, it could be a means to an end. But what end?

I do intend to write about HN in a different way. There are core moderation topics I've been posting about and returning to for years now (https://news.ycombinator.com/threads?id=dang). I'm inching my way towards distilling those comments into mini-essays, or discrete paragraphs, about the principles of HN: what kind of community we want HN to be and how to get there. I think there's an opportunity for a step change if we become more conscious as a group about what we're trying to be and why.

The "why" is critical because the rules here can seem a bit lame if you think of them only as constraints. Many creative minds are the sort that don't see why they should respect arbitrary constraints. For such people, the way to win them over is to derive the rules from first principles: to show how it's in their interest to follow them because that's the way we get to have, and keep, an interesting community over time. That's what I try to do in moderation comments, at least on good days: not to scold people, but to persuade them.

My intention is to distill all that commentary into a series of posts that can form sort of the case law, or midrash, or zoography, of the core topics around HN community. I don't know if anyone has noticed, but I already do that by linking back to previous moderation comments using specific keywords. Some examples:






There are at least a dozen of these. My idea is to consolidate that material into little essays that interpret, explain, and basically act as hermeneutics to the site guidelines. Then maybe, in the future, I won't have to type out a whole new comment on $topic; I can just link to the canonical explanation. In other words, I intend to write more in order to write less! Other than that, for learnings and patterns about moderating HN, your best bet is to lure me with the right kind of beer. Or just ask a question here.

> Other than that, for learnings and patterns about moderating HN, your best bet is to lure me with the right kind of beer. Or just ask a question here.

If that's an actual offer, I'd be happy to buy you a drink when the quarantine is over.

The serverside key handling stuff is bad, but generally known (Zoom has features whose natural implementation require them to keep keys serverside).

People are dunking on Zoom for rolling their own crypto and coming up with AES-128-ECB. This is also bad, but people should be aware that it's a lot more complicated than "you can see penguins through it".

You can see penguins through an ECB-encrypted bitmap because discrete blocks of the bitmap image repeat, and thus have the same ciphertext, and these correspondences carry obvious meaning in a bitmap. The same is not automatically true of video or audio data with normal codecs. Aaron Toponce points out that sensor noise will likely scramble ECB ciphertexts, for instance.

Colm MacCárthaigh makes an even more important point, which is that it's already very trick to reliably encrypt voice tracks, because common encoding and transmission techniques make them susceptible to traffic analysis. So, for instance, you can quickly find papers about exploiting silence suppression to make predictions about speech in an encrypted audio channel. The point here being, cryptanalytic attacks on ECB are unlikely to be anyone's first recourse.

Obviously, the 128 bit AES key thing doesn't really have any practical impact.

Designers should religiously avoid ECB mode, but the real danger of ECB is in interactive settings, where we as attackers get to induce plaintext patterns, and use chosen boundaries to isolate targeted ciphertext. Bulk video and audio transmission isn't that kind of interactive setting. You still don't want to read people saying that ECB is OK; it's bad.

Essentially: it seems like Zoom's cryptography is bad, but not in a way that really matters compared to brochure-level badness of non-end-to-end-encryption.

Encouraging companies to lie by just writing it off when they're caught is an under-appreciated problem.

If I'm honest and I'm working on a video conferencing platform, a potential customer asks "Do you have End-to-end encryption?". I say honestly "No". The customer goes to Zoom because they lied and said they do. Any outcome where Zoom keeps even _some_ of the resulting customers and money rewards deceit and ensures you'll get worse products in future because a worse product was given a leg up by lying. That's where we are today.

Silence detection (and a bunch of related problems) should be solved by using Constant Bitrate (CBR) encoding. The analysis linked says SILK was used, which was conceived as primarily a Variable (VBR) encoding but it doesn't go into enough depth to be sure whether VBR is in play here, though it's hard to give Zoom benefit of the doubt.

On the video side you can play sensor noise off against video codec optimisations. If you're sure the bottom 1-2 bits on a video conference camera feed is noise, the people writing the optimised live video encoder probably agree and aren't sending those bits anyway (you can synthesise noise perfectly well without sending it), so it's a wash.

And also though I agree it's not where you'd start do not rule out attackers controlling the plaintext - the "virtual pub" I attended a few hours ago with friends had a third party video playing, and sometimes people share their slide decks by video in corporate or government environments. It's definitely more practical to sneak deliberately watermarked video into someone's Zoom conference than to, as we worried about for disk encryption, get them to stash megabytes or gigabytes of carefully watermarked files on a drive.

I've read this comment a couple times and am not clear whether or where you see the potential for useful leaks through repeated aligned plaintext blocks to occur, either in VBR AAC or in H264 video. I am spitballing and would welcome specific feedback on this if you have it; if you're speculating, can you force your speculation to the form of a test case we can go implement? We don't have to use Zoom; a reasonable model of the codecs Zoom uses would suffice for discussion's sake.

Informed speculation is all I have.

I will say in passing that it's weird to keep picking VBR AAC in your examples, the document says they used SILK, which would make sense because SILK was designed (by Skype many years ago) for this application. AAC is from a different era (it turns out we were trying to be too clever back then) and targets a different application.

This bothers me because even if ECB was good enough here, which we're both agreed it isn't, encrypted SILK VBR is anyway specifically known to be vulnerable to the same approach they demo'd in "Spot Me if You Can" and similar papers:


Keep in mind that even without any cryptographic work, we know the exact length in bytes (just like in that paper) of every SILK frame on the wire.

As to the cryptography:

I think the first thing I'd do if I was trying to turn this into an attack would be to send gibberish to Zoom's intermediary servers and see if they care (thus function as a potential Oracle).

Everything that isn't authenticated (like CTR mode, which is what was supposedly "recommended" here) admits the potential of an oracle attack, so what's interesting to me here is what's distinctively bad about ECB in this setting. And: I'm not saying ECB is OK in this setting. I'm saying "it's not PenguinVision simple" and "what makes it not OK is probably interesting".

You know more about the codec situation here than I do. So, you tell me! What should I be looking at?

> CTR mode, which is what was supposedly "recommended" here

Where are you getting this recommendation? Who recommended it - to Zoom? And when?

> You know more about the codec situation here than I do. So, you tell me! What should I be looking at?

If you're prepared to be at least slightly interested in problems even if ECB only contributes rather than blowing them wide open, then maybe you should look at H.264 I-frames first.

An I-frame stands on its own. For example the WebP image format is essentially just one I-frame from VP8. If the scene is literally unchanged from one I-frame to the next, all the exact same image data must be re-encoded, and logically (if the codec is at all efficient) that means the same bytes.

[ If you just don't use I-frames then all loss becomes unrecoverable and people will stop saying your product "Just works". In products like a DVD or Netflix I-frames are needed to make it "seekable". This is obviously not a necessity in a video conference, but the ability to handle network glitches and to add/ remove participants seamlessly is a requirement ]

So with ECB for I-frames clearly we can tell if nothing changed. Maybe an artificially lit meeting room with nobody in it, a desk left unmanned. Or a screen share left showing an unchanging desktop. Is that a "useful" thing for eavesdroppers to learn? Do you think Zoom users would be surprised that eavesdroppers can tell their colleagues took a coffee break?

But because codecs like H.264 are oriented around square blocks I suspect we can do better. They're not deliberately trying to have these blocks encode to exactly 16 bytes and so mostly they won't, but of course one chance in 16 isn't nothing and we get to try every I-frame. Probably not to the point of getting a blocky outline of a moving person on a stream (which is sad because I'm pretty sure that, like the Penguin, would make the point viscerally) but enough to have some idea which parts of a scene (or a slide deck) are changing.

like the Penguin, would make the point viscerally

Isn't an important part of the argument that the point the penguin 'makes viscerally' is actually an extremely narrow, relatively uninteresting one? It tells you something like 'certain types of synthetic images represented entirely uncompressed in the spatial domain are obviously super leaky when ECB-mode encrypted'. Which, you know, neat but nobody does that.

It seems extra-uninteresting compared to things like 'plaintext can be recovered from encrypted realtime VBR-encoded audio'.

In that context 'I-frames might be both identical and magically align' seems even more contrived than the already more-famous-than-informative penguin.

Firstly I don't want to play vulnerability Olympics. Zoom does lots of things wrong, it should fix all of them, and its users are right to complain at each of them, I don't understand any value from trying to pick some of them as "less bad" based on tenuous claims about what you think might or might not be possible in other domains like video encoding.

> In that context 'I-frames might be both identical and magically align' seems even more contrived than the already more-famous-than-informative penguin.

Did you take a look at an actual H.264 video stream of something boring?

I downloaded a random example of an H.264 video of the generic movie lead-in you've seen a million times where giant numerals appear in descending sequence. The sequence repeats itself several times, allowing us to compare I-frames each time.

Let me quote from the first time a giant numeral two is displayed - briefly in hexadecimal:

> fc fe 7f 3f 9f cf e7 f3 f9 fc fe 7f 3f 9f cf e7

That's not encrypted, but on the other hand I also don't understand what it means because I am not an H.264 decoder. So an ECB encoded block would be just as meaningful to me.

Of course, we'd only get the same exact ECB encoded block if the input was the same... a few seconds later another I-frame of the same giant numeral two and:

> fc fe 7f 3f 9f cf e7 f3 f9 fc fe 7f 3f 9f cf e7

Ah. Well, maybe it's just a fluke? A few seconds later the same number is featured in the I-frame again and...

> fc fe 7f 3f 9f cf e7 f3 f9 fc fe 7f 3f 9f cf e7

Maybe I got real stupid in my old age but I think what's going on here is exactly what I already told you to expect. Given the same input the H.264 I-frame encoded is identical or at least so very similar as to be full of the same 16-byte sequences. Actually it's mostly 9-byte sequences, it's just that it needs so very, very many of them to encode an I-frame that statistically producing a recurring 16-byte sequence was almost inevitable.

Let me be clear here: I don't understand H.264 well enough to align this naturally frame aligned, I just took the raw data and pretended it was already 128-bit aligned because that logically cannot be worse than what you'd get if you knew what you were doing. I would be astonished if somebody who actually knows what they're doing can't get much better results from this.

You don't have to understand H.264. All you have to do is take the histogram of consecutive 16 byte blocks from the stream. It's like a 5 minute coding project. Then we don't have to wonder if there are aligned 16 byte block collisions; we can actually look at them and get a sense of where they are and how frequent they are relative to the size of the data. Can you do that? Or, failing that, can you provide a `curl` command anyone else could use to download the data you're looking at, so we can do it for you?

I can H.264 a series of images with solid-colored blocks and not see repeats. But who knows if my codec test harness is representative? I very much doubt it is. I'm certain that there are cases where there are not just collisions, but useful collisions. Let's understand what they are.

This isn't "vulnerability Olympics". It's simply about understanding what we are discussing.

You can see the phenomenon I described for yourself in:


This file contains audio, which in the Zoom pseudo-RTP is separate so you should strip that out with your preferred tool without changing the H.264 video though it won't change the results very much (it only re-assures us that the recurring data is in the video stream as I assumed) so if you don't have such tools don't worry too much.

It has the case we're interested in, because it's the simplest to understand: I-frames with identical content.

If you've put together a "codec test harness" that's what you want to simulate. Don't feed in penguins or "solid-coloured blocks" just feed in a still image, and watch as (unsurprisingly) it outputs the same I-frame each time it needs to do so, regular as clockwork. You're not simulating a Zoom stream of abstract art, but a shared desktop that's showing Pauline's Q3 revenue forecast for two whole minutes while her team discuss Tiger King or the desktop of a guy who left five minutes ago but didn't leave the meeting, or (more speculatively) an empty meeting room on camera.

This is the beginning. If we stop dismissing it as just "more complicated" than just a Penguin pixmap and bring in people who know more about video codecs we can explore how much more can be done. If every slide show has the company logo and the presenter's name on the first slide, does that correlate? Can we make a watermark that would survive being played back through one codec and then encoded by Zoom?

But there's a strong sense in which I don't care to do all that actual work. Some of these things might be possible and some not, but we're not building a cool demo for a convention we're just here to say that Zoom is terribly insecure and if people feel they must use it, treat it like a meeting held in the local coffee shop, assuming your local coffee shop is bugged by both the US and Chinese government.

This is useful. Thanks! Out of ~22,000 AES blocks, this has ~1000 colliding blocks:

    04608c1c211004608c1c00000007419b: 4 samples
    1004608c1c211004608c1c0000000741: 4 samples
    0963211004608c1c211004608c1c0000: 5 samples
    33c00963211004608c1c211004608c1c: 5 samples
    63211004608c1c211004608c1c000000: 5 samples
    f7df7df7df7df7df7df7df7df7df7df7: 5 samples
    0001000003ff0000000e000004000000: 6 samples
    211004608c1c211004608c1c00000007: 7 samples
    c00963211004608c1c211004608c1c00: 8 samples
    0001000003ff00000014000004000000: 9 samples
    0001000003ff0000000d000004000000: 25 samples
    75d75d75d75d75d75d75d75d75d75d75: 31 samples
    5d75d75d75d75d75d75d75d75d75d75d: 33 samples
    00000b0000000b0000000b0000000b00: 34 samples
    d75d75d75d75d75d75d75d75d75d75d7: 34 samples
    7f3f9fcfe7f3f9fcfe7f3f9fcfe7f3f9: 39 samples
    f3f9fcfe7f3f9fcfe7f3f9fcfe7f3f9f: 40 samples
    9fcfe7f3f9fcfe7f3f9fcfe7f3f9fcfe: 42 samples
    cfe7f3f9fcfe7f3f9fcfe7f3f9fcfe7f: 42 samples
    fcfe7f3f9fcfe7f3f9fcfe7f3f9fcfe7: 42 samples
    e7f3f9fcfe7f3f9fcfe7f3f9fcfe7f3f: 43 samples
    f9fcfe7f3f9fcfe7f3f9fcfe7f3f9fcf: 43 samples
    fe7f3f9fcfe7f3f9fcfe7f3f9fcfe7f3: 46 samples
    3f9fcfe7f3f9fcfe7f3f9fcfe7f3f9fc: 47 samples
    aebaebaebaebaebaebaebaebaebaebae: 60 samples
    baebaebaebaebaebaebaebaebaebaeba: 63 samples
    00000000000000000000000000000000: 69 samples
    ebaebaebaebaebaebaebaebaebaebaeb: 73 samples
    00060000000600000006000000060000: 147 samples
The collisions occur in runs, so, for instance, there's a run of ~40 "fcfe7f3f9fcfe7f3f9fcfe7f3f9fcfe7"'s towards the end of the file.

I-frame macroblocks in H.264 are DCT'd, like a JPEG; I have no intuition for what the repeats would signify, except that the encoding is deterministic and, I guess, within an I-frame, stateless? So identical samples will show up in the output?

(It might be easy to instrument an H.264 decoder to get an indication of what frame I'm in when I see a collision, which is I guess what I'll do next).

Did you ever find some sort of reasonable 'live streaming' type profile for encoding? I noticed ffmpeg's default seems to be putting more work into this than you'd expect from a realtime encoder which, in theory at least, might lead to fewer collisions than in a 'realistic' situation. I think tialaramex made a similar point somewhere upthread.

I don't think it would make a huge difference but the numbers might be higher/more representative. Another way might be to just capture zoom traffic of someone's Quarterly Synergies presentation or whatnot.

Edit: also thought of a silly upper-bound test for Team Penguin - you get plaintext of every colliding block in the stream. How much penguin can you get out of that?

I'm just noticing, by the way, that one of the repeats in this test pattern video is the same as the repeat you observed in the Star Wars opening crawl, so presumably that has nothing to do with content.

Thank you for clearly articulating why using AES-ECB here is not as bad as the penguin image everyone seems to know. It's still a bad choice - you really need to prove to yourself that your data stream isn't susceptible and it's honestly easier just to use an appropriate stream cipher mode.

I absolutely agree that the real deserved flak is for saying "end-to-end encryption" when that's clearly not what's happening. Honestly, I don't know how they could make their product work as well as it does with actual end-to-end (consumer to consumer) encryption. Self-hosted solutions can do this but their servers are doing a lot more than ferrying raw data streams around. That's why their meetings work so well with 200+ participants.

They also deserve criticism for saying AES-256 when they're using 128-bit keys, but like you said it's not a security issue. I suppose they could be referring to TLS that they use to set up connections and meetings rather than the stream itself but that's a stretch.

When you control the content of the stream, you control whether there are repeated blocks. If there are none, do any objections to ECB remain?

Yes: ECB is also somewhat malleable, and uniquely vulnerable to "cut-and-paste" attacks since nothing ties a run of ciphertext blocks to its position in the stream. But these arguably aren't especially relevant to Zoom's threat model.

This is a really well written post that makes some valuable arguments.

For me, the main reason using ECB makes me distrust a service is that it could be a canary for "we don't have anyone who understands security working here", and it increases my prior that they're also doing other dumb things like reusing keys, or not securely distributing them in the first place.

Lack of end-to-end is bad, but in a situation with more than two participants it's not an easy problem and when some of them are dialing in from landlines then one of your endpoints has to have a key anyway, unless I'm missing something? [edit - was addressed in first paragraph]

> the real danger of ECB is in interactive settings, where we as attackers get to induce plaintext patterns, and use chosen boundaries to isolate targeted ciphertext

Is it conceivable that actual Zoom use cases might allow attackers to do some of these things? I don't have an example in mind, but I'm thinking that the Zoom protocol supports many more actions than just audio and video transmission, so perhaps some of the other protocol features could be exploited for plaintext injection.

For example, Zoom supports text chats, and some kind of metadata information about participants and call setup. Perhaps those are all directly encrypted under the same key with the same block cipher mode?

Somebody should add an ECB encrypted JPEG to wikipedia.

It will look just like the CBC comparison example, but I assume that's the point you wanted to make.

As a total novice to crypto stuff, is there any way that someone with access to unencrypted audio/video from the sensor could work out some sort of "baseline" and then factor it into their cryptanalysis of encrypted content produced from the characterized sensor?

I don't know audio well enough to answer this, but you can play with it directly; it took about 7 minutes for me to throw a trivial Go program together to AAC-encode a WAV of half a minute of speech; the only repeated 16-byte blocks I got were a couple runs of zeroes.

The trick with ECB is that you're relying on 128-bit blocks of plaintext repeating with perfect alignment, and, to do anything useful with it, those repeats leaking useful information. In a bitmap image of a penguin, those repeats are easy: they're blocks of pixels of uniform color, and all ECB is doing is permuting their colors. That's less intuitively true of audio (where I'm not really seeing repeats at all, but, again, you should just play with it).

For audio, you don't want to send a wav file over the line, as it's incredibly wasteful. Once it's reasonably encoded, you necessarily end up with a lower entropy steam... So should still have some version of "solid blocks of color", especially for silence. (Exact details depend on the codec, though.)

Lossy compression reduces entropy, I'd expect fewer repeats and more of what looks like random data in any compression worth its salt. Thomas also mentioned AAC, not PCM. I agree on the silence, but it would come up as smaller packets, not repeated packets, IMO. And then we're back to timing analysis of speech.

> Lossy compression reduces entropy, I'd expect fewer repeats and more of what looks like random data in any compression worth its salt.

Um, so, just to make sure I understand you, you're implying that lossy compression is not worth its salt? Because you will surely have more repeats in a lower entropy space.

You get essentially no repeats (collisions, more precisely) in the AAC streams I played with. Some block boundary headers; that's it.

I'm not sending a WAV file (or modeling that); I'm AAC-encoding.

Opus is probably the right codec for some things. I've heard good things about it.

Screen sharing a presentation is problematic; lots of discrete blocks of bitmap image being repeated.

If Zoom supports stock background filters or modes, that would seem to be another massive source of repeating plaintext, depending on the implementation.

I mean, it won't be, for the same reason 'tedunangst points out above. But even if it was: the repeats you'd be referring to would be leaking... stock background filters and modes.

> Aaron Toponce points out that sensor noise will likely scramble ECB ciphertexts, for instance.

1) "likely" != "guaranteed to"

2) If there's sensor noise when you're sharing your screen, then you're doing something horribly horribly wrong

I was a bit suspicious about that ECB leaking when they didn’t use an example from a Zoom call to demonstrate a leak, your explanation about it probably not having the same issue here makes a lot of sense.

Sensor noise does not translate nearly as well as one might think in compressed videos. And there are likely repeating data blocks big enough to analyze.

I mean: go generate the test case that shows this. You don't even have to encrypt to prove it out; you can safely assume ECB will reveal any aligned 16-byte repeats, so all you need to do is take a histogram of all the consecutive 16-byte blocks in your encoded output.

> induce plaintext patterns, and use chosen boundaries to isolate targeted ciphertext

Like setting a custom, static background? One of zooms fun features.

But it does. It means that they're too inept to know the difference. What else are they claiming that is wrong?

I'm interested in the engineering details, not the message board punditry. If you've got details to contribute --- especially if I'm wrong about something --- let's see 'em.

This is honestly the best “Zoom is bad” summery I’ve seen so far. While I certainly believe some of the Zoom hate is blown out of proportion, this article does a good job explaining to someone who isn't a security expert what the issues are. I've been getting questions about the company from family and friends, and will be forwarding this to them. Well done.

This is a great article, but as an educational provider it fails to answer one question: Why should I care?

The only concerning thing for me is, why would they lie about using AES-256 when none of my users (and I assume most of their users) would care in any way about AES-256 vs. AES-128 in ECB mode. Why would they lie?

Even after this, having my users conducting university lessons over something that might be decrypted in China is honestly not that big of an issue. I would of course prefer it if these meetings would be private from the PRCs scrutiny but at least in my situation (and I think most educational contexts) this is not really that important.

The students log in via email and from home, they both count as personal identifiers.

Now, China knows who is attending which lesson. And how much activity each individual shows. And also, what happens on the side like environment sounds, environment at the camera (e.g., how generous the student's apartment is). Also, the client can analyze the mouse cursor movement, see what other apps are running and how (on native clients), and on mobile clients there is for example the gravitational sensor.

Moreover, a voice (and the face, of course) is like a fingerprint of a person. Hence you now have a reverse lookup table from voice/face to person.

Just as an FYI 2 weeks later... We decided on not enforcing Zoom accounts for our students for various reasons. So the PRC might have IP address access to a SIP/Zoom server but this is not something we, as a small university, can solve. Even without Zoom the PRC could trace access to our bigbluebutton server or a jitsi videobridge and I don't presume that using Webex or Vidyo or what have you would solve this issue (and honestly all other solutions would have ended up being more expensive).

We still do not have any evidence that the PRC has access to unencrypted Zoom server logs and frankly I assume we would have the same (or worse) issues I had with my tests from Iran that either SIP/WebRTC doesn't work or appears to be intercepted. So, at least for me and my users, Zoom is the most accessible and "least worst" solution.

Thank you for the follow-up. No, we don't have evidence that they have access to the server logs, or even more, the streams. I guess that an intelligence would compromise one of Zoom's employees, then gaining access without any further evidence. This gives them to possibility to sneak on any Zoom call that is routed to the respective servers.

And indeed, an intelligence could possibly hack your bigbluebutton server. This involves, however, a targeted attack instead. I think this is a different scenario, though.

Because a company doing an RFP with a checklist of features is going to rank them against their competitors, and it would look bad in the spreadsheet.

I assume this is your answer to "why would they lie?". It does not answer the question to why should an educational provider care though.

And assuming I'll consider switching to webex the response to encryption in webex is this: https://www.webex.com/content/dam/Webex/eopi/Americas/USA/en...

Which 404's and basically represents my experience with Cisco: "We don't give a shit about you, you already payed us.". Frankly Webex could host the next coming of Jesus and I would not give them any more money.

In the general case, it’s bad practice to do business with liars. That’s one reason why an educational institution would care.

Correct, and if a Zoom representative would have lied to me that would factor into my decision. Frankly though, for student lectures and faculty meetings I don't care about their encryption (as long as they do TLS for client->server to protect my users in a public wifi situation) and a certain encryption level was never a basis for my decision. As long as they provide transport security I don't really care.

Webex is so bad that nobody would consider using it based on technical merits, security track record or being backed by a competent organisation, so it's kind of immune from the kind of critique that is being leveled against Zoom.

I have a feeling computer accessible Webex is just there because of the dedicated videoconf HW that Cisco makes. The software is to provide a feature checkmark and make its victims miserable enough to buy the HW.

lol, nice job finding a link that 404s and basing your entire argument on it. Here is another one for your next post: https://www.webex.com/sdvfebdwq3433t8hjaxcxqadxe

Frankly, Cisco can post whatever they like I will not give them any more money. You are certainly right that I was disingenuous and I do not care what they do with Webex. My link was cherry picked from their press release for encryption in Webex that I though it was funny that that link would 404.

Good job discrediting my post though. It's not like Cisco basically told everybody in a KB that if you want to use the same features Zoom provides (or a Linux client, or desktop sharing or breakout rooms or audio transcription or waiting rooms or...) you would have to give up every encryption feature Webex provides. But I assume you are a seasoned Webex admin and can provide us some insight into why you are using webex in contrast any other solution. Or you are trolling, whatever.

Lets say these lessons are a politics seminar discussing whatever PRC finds objectionable, then family of the student back in the old country get their social credit score deducted.

Or even better use those recording in the future as compromat as needed.

I don't know how much free time they have over there, but snooping in on courses that a relative outside the country is taking and storing all of them... I mean, if you want to peg someone's social credit score, just stakeout their house and wait for them to spit outside or something. Hell, just make something up and dare them to come argue. Why go to all that effort?

Doesnt go exactly like that. More like: CCTV captures someone going to an area where known rebels or political activists live. (Look up videos on chinas face recognition, its insane.) Police decide to look through the person's zoom meeting transcripts, making a search on certain keywords. They find evidence of rebellious activities, and order further surveillance on the individual or arrest them.

In a surveillance state of the scope you've described, triangulating the zoom transcripts of an international relative's course work back to someone you spotted on CCTV is still hardly worth the extra trouble. At that level of erosion of civil liberties, they can already send the jackboots to break down the door when they make the CCTV match. Don't find anything? You plant something or coerce them into ratting on someone else. Why would you go mining terabytes of data that's mostly boring meetings and calls from grandma?

It's all take data collection and they mine it later. 10 years from now they go looking for video from you. And yes, if you don't think people have weird incentives and time on their hands, have a look at the shitshow of US Presidential politics.

Because big orgs check for a minimum list of features and that list will nowadays always include some element of encryption/data protection. Many companies use zoom, or e.g. I've seen the OECD host seminars there. Have they done due diligence and an independent audit of the software? No, Robert and Lucy from procurement had a week to read through 8 different bids describing software features and support modalities, assured they fit the checklist and then calculated which one is the lowest bid (or "best value for money" which is checklist points/price) as they are obliged to choose that.

Then zoom can go around and claim OECD and IBM and the UN (all made up) use them, which lends credence, even if it's just that one training team in Nairobi that trialed the software once.

"... and then they came for me". Obviously that poem was written about something rather more serious than your privacy but the point stands.

As I alluded to in another post I am from Germany and certain people I work with actually went through the "... they came for me" phase.

Your point does not stand on its own.

I (en_GB) lived in that weird place called West Germany for about 10 years on and off back in the 70s and 80s. We have many friends (Hi Wurms, int al) who also have family, friends and acquaintances that lived through those days directly, shall we say, and of course my own family members who did from another side and perspective. You may want to take another look at my username and make of that what you will.

My point really does stand. You might gradually allow erosion of your rights until you find that none are left. It is so easy to say "I have nothing to hide" until you find that actually you do have something to hide for reasons that are not immediately obvious.

I am not saying that using Zoom will have nasty consequences but I am saying that the attitude that abrogates responsibility for your own privacy might have unintended consequences. If it becomes common place to simply say "meh" we might not like the world we get instead of the world we might wish for.

My Old Saxon friends have a rather more robust attitude to privacy concerns than you mate!

I'm not going to play "guess what my username means" with you, sorry. I'm also not going to play "who knows more people that lived through the 3rd reich" with you.

Me administering a Zoom account for my fellow employees and my students does not erode anybodies right. For me it is a choice between a GDPR compliant vendor and a vendor that does not care about the GDPR. Personally I have had good experiences with the GDPR (Facebook finally having to delete my account even though I would not verify it with a personal ID and cell phone number after I went through the irish data protection authority) and Zoom claims to be GDPR compliant.

So, frankly I'm not sure what you are talking about. It seems like you are going for a slippery slope argument I don't agree with.

Zoom also claims to have end-to-end encryption and yet they don't, their marketing and even their clarification post being a lie.

Companies like Zoom can claim that they are GDPR compliant however truth of the matter is that compliance offices are overwhelmed. And until Zoom will have a huge data leak or something nobody is going to investigate their compliance.

So a company like Zoom might claim GDPR compliance and that's something, but only if you can trust them.

And a company that lies in their marketing and press releases can't be trusted, sorry.

Google's Meet btw is also GDPR compliant, Google tries to be GDPR compliant nowadays with everything they do because they are a huge target. They also don't use bullshit in their marketing and are pretty good at security, so I personally trust them more, even if I actively avoid Google's products out of privacy concerns.

Nitpick: he's not talking about the Drittes Reich, he clearly must be talking about the experience people had in eastern German DDR / "German people's republic".

Kids! How on earth would a Brit end up in the DDR? OK we did:

My dad was a British soldier (so was mum but that's another story). We were posted to exotic places like "Reindahln" (MG) and Paderborn and Soltau etc. We went on a holiday to West Berlin in around 1980ish. We were allowed through Check Point Charlie to see the DDR for a short while. Funnily enough exactly the same arrangement as getting into Northern Cyprus. ie the Turkish bit.

Anyway, we saw the Brandenburg Gate from both sides, when it was mined all around but rather nicely flood lit. I have to say the east side looked a bit shag back then.

Our German friends always used to look forwards to reunification but the cost when it came used to cause a few remarks cough. For me a unified Germany is a good and beneficial thing, regardless of cost. I saw first hand what life was like in E Berlin in the early 80s.

"Guess the username" (you didn't even try and it was pretty bloody obvious):

Gerdes (my family name) means the same as the word German. A ger is a spear - https://en.wikipedia.org/wiki/Migration_Period_spear. A ger-man is a spear bearing man and gerdes is an old form of that. You lot had a habit of trundling around with spears - hence the name in English.

Using Zoom is of course not an awful thing to do. Just be careful me old fruit. Please.

> and Zoom claims to be GDPR compliant.

That's the thing, though. Zoom also claimed to be end-to-end encrypting with AES-256. If they were willing to lie about that, what's not to say they're willing to lie about GDPR compliance?

Because one of those lies carries massive legal and financial penalties, and the other one doesn't?

> one of those lies carries massive legal and financial penalties

"It's only illegal if you get caught."

Doesn’t Germany have specific data privacy laws based on the massive surveillance state that operated in the East up through 1990 or so? And you’re not concerned with using services that go through a country that, by all accounts, is trying to outdo the old Stasi with modern technology?

Zoom claims to be GDPR compliant (https://zoom.us/de-de/gdpr.html). Frankly, ensuring a company claims compliance is as far as I can go. I'm still hoping that if a company intentionally lies about this they will get sued out of existence. If I'm wrong about this the GDPR is worthless anyway and there isn't really anything I can do.

The problem is, you are right. In practice, many companies say "well, it is compliant, but we don't care about the rest as long as we can function".

If you’re not especially worried about having a communist police state intercept your private conversations, that’s your personal business. All I can ask is that you don’t go out of your way trying to legitimize that for everyone else as you have here.

Your point does not stand on its own.

No, but it will when the next generation of Nazi, Stalinist, and Maoist regimes arise and gain access to the data in question because we weren't fanatical enough about E2E privacy today.

And just as Niemoeller's verse warns, by then it will be too late.

There is an actual Maoist regime operating today that has access to Zoom's data. This is not a hypothetical.

> And just as Niemoeller's verse warns, by then it will be too late.

But, this is just wrong. In Germany we did speak out when they came for the communists, the socialists, the unions and the jews. Niemöllers argument was relevant in 1937 when he was arrested and I appreciate that you picked up on it but frankly it's different here in Germany. We do still speak out against discrimination against all of them. Maybe E2E encryption in a chat application is just not as pressing as the things Niemöller talked about at the time.

Maybe not. Hope you're right.

You should care because Zoom is a company run by and within China. What's troubling for the west is having so much IP and information flowing through a bad state actor without knowing about it.

> Why would they lie?

I kind of doubt it was intentional. Developers are not marketing, typically, and it seems reasonable to assume that a technical person said "aes" when a marketing person asked "do we have encryption?". And then the marketing person searched for "aes" and assumed that meant "aes-256".

I think of it as a warning to future companies who take these kind of liberties...

The warning being that it doesn't matter because it doesn't affect their share price right? So far I haven't seen any tangible damage to them when it comes to $$.

Let's see where they are in 6 or 12 month's time and what they go through to get there.

Other nascent companies will tip their "We want to grow like Zoom" hat and it'll serve as a warning sign as to what the growth mentality is.

Anyway, my point was... I don't think they've taken too much grief yet, for what they've done. They deserve all the lumps they're getting, and that may (hopefully) be instructive.

[And yeah, there are always companies that do this kind of thing... some are Facebook, and some trip into a hole.]

This sounds like Zoom hate, which I am seeing a lot nowadays. The irony is that it is coming from tech while the av Joe's doesn't really care.

Share price aside...it's the most reliable video conferencing app while others are crumbling to infrastructure pressure (UberConf, GotoMeeting, etc). Their downloads are through the roof right now:


It's been a somewhat noticeable hit, but they're still doing great overall: https://www.google.com/search?q=zm+stock

I mean, isn't this stuff they can fix?

Like good that the market is stressing them on their security, at $30bn they should be able to engage with that feedback and then loop.

On the other hand, tried to use Skype lately? Product has barely evolved since they were bought out by MSFT.

Guess google does videoconferencing too, but they know enough about us all already....

They can fix the code, but would people trust them when they say that the code is fixed? Trust needs to be earned.

People trust them even though their security is broken. Lots of people don't know, lot don't care, quite a few probably don't get it anyway.

It’ll take a new CEO before people will trust them again, kind of like what happened at Uber.

Who are you hoping picks up this little unit of opinion? Are you hoping to see it quoted, or just vaguely referred to as "growing discontent" and know that you were in there?

Such a strangely extreme viewpoint to suddenly jump to.

Using AES in ECB mode is clearly a bad choice, but honestly it's not that horrible for high entropy data like compressed audio/video. I'm sure someone could prove me wrong one day, but it seems hard to extract any useful patterns out of compressed audio/video. It does check the box of "uses encryption" for regulatory reasons (while missing the intent). It's pretty egregious considering how easy this is to get right.

The 128-bit key is not inherently wrong if they were rotating these out during the stream. That being said, there's no reason not to do it right and use a mode like GCM with a longer key - most hardware supports acceleration for AES-256 these days. It can actually be slower to use a 128-bit key on 64-bit systems.

While I respect the decision not to disclose the waiting room vulnerability, it's pretty obvious what's going on given the context. They probably shouldn't have mentioned where the vulnerability is.

I'm honestly surprised anyone with technical knowledge thought that Zoom was actually doing end-to-end encryption given how the software works. All of the video transcoding/downconversion is clearly happening on the server. Your client is not sending multiple compressed streams for varying connection bandwidths. That's the main reason a lot of people like Zoom - it actually works well with dozens or hundreds of participants.

Is it just me to whom it seems obvious why they've gone with ECB?

Zoom's design has a single key for everybody and for everything [ in the context of a particular video conference call ] . It's simpler and, to a layman, it sounds secure. [ We arguably contribute to this if we say e.g. "the key" implying it's just one thing when we mean something like a master secret in TLS used to derive lots of actual keys ].

Once you've committed to a single key ECB behaves exactly how you'd want.

You've got some audio, or video, ready to send? Just encrypt it with the key. Receive some encrypted data? Just decrypt it.

What happens if you have some network trouble briefly? Nothing, everybody decrypts whatever does arrive and maybe a few frames are missing.

All of the other modes don't work at all if you try to use them this way. They all expect you to have thought about the problem and track a bunch more state and then maintain that state despite an unreliable network and other issues.

Unless there is somebody in the room who says we can't do ECB because it's fundamentally not a secure choice, ECB is what you're going to get from this design decision.

And I've been in rooms like that as the only voice, or at least as the only person who spoke up. I've been in rooms where I was part of a chorus too, but as organisations get bigger and "security is everyone's problem" becomes a phrase people learn but don't act on, it gets lonelier.

Also, I actually can't even work out what a "correct" key rotation strategy could be for ECB with a variable number of parties all encrypting stuff at once. As a result it seems unlikely that Zoom did figure out such a strategy and then correctly implemented it. Instead it seems safe to assume there is no key rotation, everything sent by every participant for the life of the stream is encrypted with the same key, even though that's a terrible idea.

That whole argument would make sense if there was not a standardized solution to the problem:


I would like to hear the explanation for why they did not use SRTP, though I suspect the answer is, "We had no idea it existed."

> Using AES in ECB mode is clearly a bad choice, but honestly it's not that horrible for high entropy data like compressed audio/video. I'm sure someone could prove me wrong one day, but it seems hard to extract any useful patterns out of compressed audio/video.

...you're joking right? The Wikipedia example for why ECB is not recommended is literally an image:


Edit: This applies to compression too. Please refer to Shannon's source coding theorem.

It's definitely a terrible choice for uncompressed images or video. I'm arguing it probably isn't that bad for highly compressed video. That being said, if you're encrypting any data stream you should use an appropriate stream cipher.

You're forgetting about the technical intricacies of compressed video. Compressed video is a mix of high and low entropy content, with a predictable time pattern to this. For example, one can easily use traffic analysis to find B-Frames, and run analysis on that. Bam, you get very low entropy due to the stationary nature of video conference.

Yeah, the B-frames were my thought too, but ordinary sensor noise would hopefully make the individual frames different enough. If you’re doing green-screen background switching, or transmitting a static image then it’s definitely going to be a problem.

But given all the other security issues that seem to hover around Zoom like a cloud of angry bees, this is probably all moot - if you really want to crack a video stream then there are probably easier ways.

Zoom does screen sharing, right? Surely it's not transmitted uncompressed, but it is stationary for a long time and perhaps only small parts changing when they do (eg, switching slides). Is there an ECB-based weakness here?

Those unchanged pixels won't be transmitted. Only the changed part of the screen needs to be sent to the client. For example, RFB[RFC 6143] would send a 16-byte header with the size and position of a rectangular area of the screen followed by the pixels in that rectangle. Or multiple rectangles can be sent in one update message. But if you consider the case of text being typed, there will be a single rectangle per keystroke(s).

Now I wonder if a sequence of these rectangles, all the same size and in roughly the same area of the screen, would lend to some sort of statistical analysis. At the very least the timestamps of the updates would tell you how fast the operator is typing.

[RFC 6143] https://tools.ietf.org/html/rfc6143

If the encryption scheme is poor, why would the data being compressed or not matter?

Compression already makes a compressed file roughly indistinguishable from random noise (module access to the decompressor). So the patterns have been removed.

That doesn’t make this good, but it means that one specific example isn’t immediately applicable.

There's more in the stream than just compressed data. There'll be metadata info that you can make reasonable guesses about. ECB mode lets you take that information and apply it to other blocks in the ciphertext.

This thread is an excellent illustration of why you don't want your encryption implemented by merely good coders. You need people who know what they are doing.

I’m literally one of those people who “knows what they’re doing”. This is the problem with discussing ECB on an online forum. There’s no space to have a nuanced discussion without people cargo culting “ECB bad” over every comment.

Yes, ECB is almost always the wrong choice. Yes, there are other ways it’s going to fail in this use case. Yes, compression before encryption itself often enables other attacks. No, I should not have to prefix a comment about ECB with this type of disclaimer when I’m making (what should be) an uncontroversial statement that the tux attack doesn’t directly apply to compressed image data.

Ironically, when designing a protocol for my company, one of the reasons we didn’t use ECB when it would have been entirely justified (each chunk of data was precisely one block in size and keys were only ever used once) was because of potential backlash from people who only know “ECB bad” and nothing more.

I’m not arguing that. I’m saying that for compressed data, underlying patterns in data aren’t trivially exposed by ECB. Ergo, the “tux” attack on bitmap image files doesn’t really apply here. I meant nothing more and nothing less than that.

And I'm saying that they aren't just sending compressed data, nor would hardly any practical communication application, which makes the "well maybe they have enough entropy that it doesn't matter?" argument moot.

Literally nobody is making that argument or in any way suggesting that ECB is a great choice. Just that this one specific attack doesn’t apply.

Which specific attack?

The one discussed directly in the grandparents of this thread.

In your own words.

It's harder to extract patterns from high entropy data. I don't think anyone's saying that this is even an OK thing to rely on, at all, just that the nature of the data means that this specific weakness is likely more difficult to take advantage of.

If zoom were transmitting text this would be relatively more serious.

What about the chat system? I doubt they're intentionally compressing the text there in order to increase the entropy. I guess they could be using gzip or whatever, but we'd need to look at how the protocol works in more detail. Or do they use a different system for the chat protocol altogether?

AES-ECB isn't necessarily insecure, it's just very easy to misuse (and I agree what the article described is a misuse). I think the argument is that if there are patterns in the input data, the same patterns will show up in the AES-ECB encrypted data, just with different values. Compressed data should be high entropy and hard to predict, so there really shouldn't be structure or patterns to the input data. There's no guarantee that any given compression algorithm provides sufficient randomness, though.

"The use of ECB mode is not recommended because patterns present in the plaintext are preserved during encryption."

absolutely - I was asking why a poor encryption scheme would be considered acceptable just because the data is compressed.

Compression does not introdoce entropy to a stream. Please refer to Shannon's source coding theorm.

If anything, it reduces entropy.

But it does increase the entropy per byte, thus making patterns harder to spot.

....so your argument us to hope an adversary only sees part of your video and not all of it?

No, their argument is that if you have a higher entropy per byte, there will be more variation in the aligned 16-byte chunks that are relevant for attacking AES-128-ECB. This reduces the probability of the attacker being able to find equal blocks.

And my argument is if I know the video encoding and compression sequence, I wouldn't depend on AES-ECB. I know the patterns that show up.

If I am encrypting something, I only want to depend on the strength of the encryption. I don't want to hope that something else ensures that an adversary cannot figure out my ciohertext. That is a very bad idea.

Sure, and nobody was arguing they should have used ECB or that they shouldn’t change it. Only that the ability to exploit this given compressed data is lower than the uncompressed penguin image example.

No comment on the original claim, but that example is encryption applied to an uncompressed image. (Adjacent identical pixels are not typically represented individually when compressed, and thus encryption could not cause the banding patterns seen in those regions of the image if it were compressed prior to encryption.)

Seriously. The main argument of this article is assuming Zoom encrypts uncompressed video data. That is not what is happening here.

The point is that any pattern in the plaintxt data shows up in encrypted data if you use AES-ECB.

Compression does not introdoce entropy to a stream. So assuming that saying the stream is compressed and calling it good is a very bad idea. Please refer to Shannon's source coding theorem. If anything, compression reduces the entropy in the information.

I think you may want to look closer at Shannon’s source coding theorem; The Shannon entropy of the output of a compression algorithm will be higher than the entropy of the source as identifiable patterns are eliminated. Otherwise the theorem would trivially contradict itself.

Shannon's source coding theorn says that the entropy in a compressed information is at most the entropy of the uncompressed information. If you add entropy to a compressed algorithm, you are by definition adding noise to the SNR of a signal.

We’re not talking about a noisy channel here, so I’m not sure where you’re getting the SNR from. I think we’re talking about entropy of different distributions here so let’s cut to a concrete example relevant to your original claim (that compression doesn’t help reduce the impact of repeated blocks in ECB by reducing the rate of repeated blocks).

Suppose we have some string of bytes. When we split it into aligned 16-byte blocks (let’s assume it divides evenly for simplicity), we find that the distribution of these blocks are not evenly distributed. For example, 1% of blocks turn out to be the same, which given the number of symbols in this code is massively out of proportion.

We apply a Huffman code using the 16-byte blocks present in the message as the alphabet and their observed statistics for this particular message (if that aspect bothers you, you can assume we pretend the dictionary to the message). Huffman codes are optimal for per-symbol encoding.

Suppose we re-evaluate the distribution of 16-byte blocks in the compressed data; will this distribution have higher entropy (meaning there will be fewer duplicate blocks to exploit ECB with) or not?

> The point is that any pattern in the plaintxt data shows up in encrypted data if you use AES-ECB.

No, that's false. ECB reveals repeating plaintext blocks. "F0123456789ABCDEF0123456789ABCDEF" contains a repeating block-length sequence, but would encrypt to three distinct blocks under ECB, because the sequence is not aligned to a block boundary.

> ECB reveals repeating plaintext blocks.

That by definition is saying any pattern in plaintext shows up in cipher text

Is compressed audio/video actually high-entropy (in the time domain) though?

Compressed anything is high entropy

Depends on the compression. Lossy compression discards a lot of noise vs the signal we care about and would in a sense reduce entropy.

Untrue. Many performance trade-offs have to be made and the entropy has to vary drastically with time. See for example B-Frames vs I-frames in compressed video. Couple that with the very low entropy video conference data and bam.

Two words: sensor noise.

Even uncompressed video will be hard to see that "penguin image effect" in, because the pixels that make up each block will be constantly changing in a random way, and unlike that synthetically generated image, it's highly unlikely for a block to be the exact same as any other one in any given frame.

You greatly overestimate both the image quality of crappy videoconferencing streamed video, the amount of pixel-wise sensor noise after noise reduction (pretty low actually), while underestimating the ingenuity of crypt-analysts and the power of having a lot of data. Like seriously, the only way the shitty 1mm or less sensors on webcams are able to deliver HD video is through an abject amount of noise reudction, sharpening and filtering. All of which greatly reduce entropy.

Hint: You don't need to know the plaintext exactly. you just need to be able to build a reasonably precise probability distribution.

I'm actually not sure - that's a good point. The whole point of compressing audio for video conferencing is to preserve human speech, so things that produce radically different waveforms but "sound the same" to us might show up as patterns. I guess it's better to avoid the question entirely and use an appropriate stream cipher!

There are lots of embedded processors with hardware support for AES-128 only. I have to fight to keep AES-256 out of the ciphersuite list because of the performance regression. The rest of the world will probably force the issue eventually but the saving grace is that 3DES is still considered secure.

> the saving grace is that 3DES is still considered secure.

Nobody who wants to do AES-256 rather than AES-128 thinks 3DES is "still secure". 3DES is perhaps 112 bits of useful keyspace but it has 64-bit blocks which was already bad news when DES was invented.

TLS 1.3 doesn't have a 3DES option at all. You can do AES 128 or AES 256 (or ChaCha20).

The USG still considers it secure and that will retard forward progress.

There is an abundance of secure, widely reviewed symmetric ciphers for every imaginable application profile.

communications on the platform aren't encrypted at all. That's what the article says.

"Zoom, a Silicon Valley-based company, appears to own three companies in China through which at least 700 employees are paid to develop Zoom’s software. This arrangement is ostensibly an effort at labor arbitrage: Zoom can avoid paying US wages while selling to US customers, thus increasing their profit margin. However, this arrangement may make Zoom responsive to pressure from Chinese authorities."

I'm not here to defend zoom but any and all companies that can do this have and are doing the very same thing to minimize costs. It's not great but it's the expected way of managing a software company in 2020. It would be hard to do business otherwise.

Whether it's good or bad that's a question that will need to be reexamined given the current situation.

Given Zoom is blocked in China, what reason is there for the main key server to be there?

Even ignoring the appearances, for latency and the fault tolerance reasons, China is the last place you'd want to put it a critical server for an app used in the West.

Zoom is no longer blocked in China.

I really don't think this counts as rolling your own crypto. They just used a weak implementation of existing methods.

No more rolling your own crypto than if I were to use DES.

I agree - all security is assembled from lower level primitives and can be insecure despite using good building blocks. AES (even AES-128) is a fine choice, ECB mode is even okay in some contexts, but using that a stream cipher is not appropriate.

Definitely. Some other places (I think Telegram iirc) do actually create their own encryption algorithms without using existing ones. That’s what I thought rolling your own crypto scheme was. If the problem with Zoom is that they chose a poor mode for the algorithm, why doesn’t the headline say that? Creating your own encryption algorithm is way worse.

If rolling your own encryption scheme is just choosing what algorithm you use — every system does that. So it’s not headline worthy. :/ I get that zoom needs to make better choices, but the rhetoric around it has been pretty poor and unhelpful.

Search for images encrypted with aes-CBC. This is pretty much the definition of rolling your own crypto.

It's "rolling your own crypto", not "rolling your own encryption algorithm". That includes using "secure" low-level primitives incorrectly to build an insecure higher-level protocol. In this case, using ECB mode has been known for years (decades?) to be a bad move regardless of the underlying encryption algorithm.

As the classic article says, if you type the letters A-E-S, you're doing it wrong.

Yeah, in fact, that is USUALLY the way people screw up and roll their own crypto.

Well, except for the part of the article where it is made clear that they don't implement any sort of encryption at all.

Zoom is an interface for secretly sending a message to the Chinese government, and then hoping that they secretly relay that message to the person on the other end of your call.

> Zoom’s most recent SEC filing shows that the company (through its Chinese affiliates) employs at least 700 employees in China that work in “research and development.”

Wow. What could all of these people possibly be doing? It can't be development and QA; what's going on over there?

Zoom's core product development is in China. Their Bay Area office is apparently support roles.

700 engineers for a software company the size of Zoom is actually pretty small.

You do realize development include software engineering, right? 700 people doing programming isn't even remotely surprising.

Not to defend them against the recent security fiasco, but innuendos such as this that links "employees working in China" directly with "shady business" makes me at least uncomfortable.

For one product...? I’m sure they have a great deal of internal software and versions for numerous OS and platforms - but we’re not talking about a company with hundreds of consumer facing software products.

To be more clear, I meant "it can't only be development and QA". Meaning, there has to be something else included in that number that is not traditional R&D.

Also, to be clear: the statement is meant from the perspective of, how do they have that many employees (what are they all doing); not "why" do they have that many (which, while seemingly very large, I am not in a position to judge), nor meant to be a slander against outsourcing R&D work to China (though, to be crystalline clear, I inherently trust Zoom less because of this, just like I trust companies who base or outsource R&D to Australia less because of Australia's laws on these matters).

I didn't read it as suggesting it was "shady business". I read it as an insinuation that the company didn't know what it was doing from the top down, so they didn't hire smartly or manage well; they just threw numbers of people at the problem (and got predictably bad results).

With good management, a team of 50 should be able to provide what Zoom provides.

Yeah, moments after replying I realized there is a more charitable read :-)

Of course, with this read comes the age old question of "I can build Google/FB with 20 good men, what are they doing?"

> Of course, with this read comes the age old question of "I can build Google/FB with 20 good men, what are they doing?"

Well, this would explain Google's pattern of constantly churning out new products on the theory that hey, maybe someone somewhere wants it.

If you need 20 people to run your actual operations but you've hired 20,000 people, what are the other 20,000 people supposed to do?

The answer might be that truly good people are not willing to work for FAANG, so they have to make do with 100 so-so instead.

Why can't 700 people do development and QA?

I think it was more a jab at Zoom being buggy and (in light of recent revelations) inadequately secure - that is, implying that those 700 employees are obviously not doing development or QA given the state of the product.

That's actually the most interesting fact about Zoom I feel like I've learned so far!

Edit: I looked it up... thought it seemed crazy high but it obviously includes all the support staff too. So they might have only 300 engineers total, for example. I guess that's more reasonable. [1]

"As of January 31, 2020, we had 2,532 full-time employees. Of these employees, 1,396 are in the United States and 1,136 are in our international locations."

"We also operate research and development centers in China, employing more than 700 employees as of January 31, 2020."

[1] https://investors.zoom.us/static-files/09a01665-5f33-4007-8e...

Is "Support" traditionally categorized under research & development? Possibly enterprise support or solutions engineers?

Skype had 500 employees in 2010 a year before they were acquired. 700 doesn't seem clearly out of proportion.

500 employees versus 700 R&D employees.

If that 700 count is including support staff, then it becomes slightly more believable, but even then, I'd expect Zoom to be at that 700-1500 number in terms of total employees, not just R&D.

"Wow. What could all of these people possibly be doing?"

There are 20,000 google engineers working in "research and development", what could all of these people possibly be doing?

Google is many orders of magnitudes larger than Zoom. Zoom has one product, Google has thousands of products, many more complex than Zoom.

Yeah, 28x the engineers for research at Google makes sense -- I'd say the work Google is doing is at least 28x more expansive, complex, etc. 700 engineers is quite a lot for a company of this size!

Google has maybe 5 products that matter.

Violating humanity's privacy?

Machine learning on everyone's video feeds to make better deep fakes of common people?

You get the face and the voice of people talking directly to the camera. You get the people who they constantly talk to so you know their relations. You know the content of what they are saying.

Deep fakes can only work for so long. When it becomes ubiquitous people will naturally learn that anything can be deep faked. We would need some process of verification, then.

>However, this arrangement may make Zoom responsive to pressure from Chinese authorities.

Zoom doesn't even have to be responsive to pressure, just a developer has to be. And they live there so yah they'll be understandably responsive.

Seriously, all people who are surprised about this type of news should really understand that most "successful" (as in, popular) tech/.com/SV "startups" these days are like this.

If you "waste" time on the real important stuff such as a good design, user security, or the like, you will lose over to a competitor who didn't and therefore was able to spend more time on marketing.

If you waste your time on user privacy you will be totally crushed by that competitor whose definition of "privacy" was "how much can I pester this user until he gives me access to his address book so I can spam his friends?".

I don't think this is good. I think this is very sad. But it is what it is.

I still remember the Whatsapp founders coming into the jabber mailing lists with "please let me configure my server"-type questions, for god's sake.

Agree, but I want to add something: the usability of Zoom is good, even for casual computer users. Strategically, they focussed on what they considered to be the things that would give them sales. Not saying it is good, but it certainly worked for them (at least until now).

I agree. My SO ist currently studying from home due to the pandemic. A lot of professors and tutors use what they find first. There is no standard.

She already had to use MS Teams, Slack, Jitsu and Zoom. Her take was that Zoom was by far the most usable tool. And most of her 20 years younger co-students agree.

Sadly most people don't even know about the problematic status of Zoom. And if one tells them, most do not understand what's so problematic about that.

The students are all forced to agree to these abusive third party TOS simply to receive the education to which they are entitled/for which they have already paid.

That’s a bait and switch on the part of the university. “You’ve already paid, but now you have to give up your civil rights against this third party you’ve never heard of to get the service.”

There should be liability for the schools for doing this. A class action, perhaps?

There's a pandemic, schools and professors try what they can to keep the courses running and not ruin their students' year, and your reaction is "sue them"?

Pandemic really shouldn't be an excuse to just sell out students to any party that used more marketing money.

There's a reason why (at least in this part of the world) the educational products receive additional scrutiny and need to comply to additional privacy and security directives.

They’re pulling a bait and switch.

There are video technologies they can license so that they can keep things running without forcing students into having to enter into abusive third party agreements. They can hire people to set up simple first-party HLS streaming systems—it can be done in a day. There are tons of non-abusive alternatives. Zoom makes malware. Can you imagine being given the option by a university to whom you have already paid tens of thousands of dollars: “install this malware or you don’t get to come to class”?

You could say the same about Amazon and their backlash against workers. “There’s a pandemic, and this company is doing what it can to keep the supplies moving, and your reaction is “sue them”?”

Yes, that is the appropriate reaction if they break the law.

A pandemic does not justify abusive, coercive behavior. If anything, it makes it more abusive.

Partly because it isn't really problematic for students and teachers that are trying to use VTC.

When the alternatives don't work, it doesn't really matter how secure the alternatives are.

I agree this isn't uncommon, but that does nothing to make it acceptable.

Bad security design is unacceptable and this is sort of indicative of a large dissonance when it comes to SV startups. "Disruption" isn't ignoring requirements - if you are able to under price your competitors because you fail to adhere to good practices (or, more importantly, regulations[1]) you're not building a lean and useful product - you've just built a half-assed competitor that can undercut prices because it is incomplete... You're selling something as a competing solution when it only does half the stuff.

There is a lot of good to be said about identifying the 90/10 value components of a problem space and discarding expensive features that would just add complexity for little value - but if those features are requirements you're just failing to actually meet the points consumers (or markets ala regulations) expect and making your profit off of deceit.

1. Looking at you Uber.

But, the parent's point is that it is acceptable. Zoom has become a $300mm company all the while flatly lying (or being very generous in statements). What is the consequence for them? They made rapid market growth because they didn't waste constrained resources on security/privacy concerns.

No consequences means their behavior is acceptable. Which means this will repeat until there is a reason that makes this behaviour not acceptable.

This is a bad point. The difference between using AES-128 and AES-256 from a code standpoint is trivial. The only explanation for this is either gross incompetence or malevolence. Personally I don't use Zoom, nor would I recommend its use for even personal conversation.

Have you studied the performance implications of different key sizes on real-time video applications? A little bit of reading suggests AES-256 carries a significant performance penalty vs. AES-128. I would be keen to hear from someone who has tried it in real-world situations and can demonstrate AES-256 working well on e.g. low-end android devices/cheap laptops without issue.

I use Zoom regularly. Their video works well. I would like to have AES-256 but also I suspect this is not a casual choice, and I'm not sure it's as clear-cut as you're assuming.

the difference is a copy and paste from an older StackOverflow post.

Using code without understanding its implications would fall in the gross incompetence category.

And my point is that this happens all the time. Natural selection seems to favor the type of company that would just copy&paste from SO (then spend their resources on some fancy viral marketing) instead of the one that would stop to think about it.

People shouldn't be surprised, but they should be upset.

Awfully big talk on a Friday!

But when I see "roll their own encryption," it tells me that they went out of their way to create something subpar. There are some things that one should never* roll one's own of, and encryption has to be the top of the list.

* OK, if you're an encryption expert, you obviously would roll your own to advance the state of the art, but Zoom are quite obviously not encryption experts.

"Roll your own" is such a oversimplification.

Usually it is: I can spend all this time making things right, or bob here knows how to do this other simple way that is good enough in a week.

It's so much more important to make things right than to have something "good enough" (which is never actually good enough) that reducing it to that simplistic binary is not all that drastic.

I think you might be missing the bigger picture. You have a company that the "Ministry of State Security of the People's Republic of China" can easily hack to spy on American children and businesses. What could go wrong?

Not just American. Someone posted a screenshot on HN yesterday or the day before of the prime minister of the United Kingdom having a video chat with about 16 other top government officials over Zoom.

Anyone want to wager how many of their computers are now p0wned.cn?

More generally, there is a culture of celebrating rule-breaking or even law breaking within the startup community. As long as you get away with it, and make money off it, then it's totally cool...

That’s also partially a reaction to so many rules and laws being utterly counterproductive and/or corrupt. Respect for rules and regulations depends on those rules and regulations having respectable motivations and effects in the first place.

It does lead to a general sense of degrading lawlessness throughout society. When powerful people start to have contempt for laws it can get really abusive.

Sure. But at this point everyone picks and chooses which laws they care about. I've observed a pretty heavy overlap between people who care a great deal about taxi regulations (most of which exist merely to artificially prop up rent-seeking owners of taxi medallions) and people who consider American immigration law so fundamentally unjust that it should simply not be enforced anymore. Faithfully Obeying All The Rules And Regulations isn't an especially popular ideology, and if you consistently advocate it, you'll end up very unpopular with everyone.

Immigration is a tough issue. The debate is now mostly focused on what to do with undocumented immigrants who are already here, perhaps with families, etc. That's one issue which has been debated to death.

But there are also perhaps hundreds of millions of potential immigrants worldwide who would love to come to the US permanently and have the means to do so, if it were legal. But they cannot get a green card or h1b because those are nearly impossible to get from people in some counties. So they never come.

It does seem hard to decide a fair system here but any decision is just that... A system of laws... They have no meaning unless followed. What's the point of rule of law anyways? It's the answer to arbitrary rule: by monarchs and dictators.

> The debate is now mostly focused on what to do with undocumented immigrants who are already here, perhaps with families, etc. That's one issue which has been debated to death.

Which is exactly my point. According to the law, these people are illegal aliens and the government has every power to stop them from crossing the border, detain them if they're already here, and deport them to their countries of origin no matter how long they've been here. And they have no legal right to reside in the country. And a lot of conservatives argue that "sanctuary city" proclamations represent a fundamental disrespect for the rule of law.

A lot of people balk at this notion and start arguing that US immigration law is fundamentally unjust and that it's reasonable to make exceptions. And that's a perfectly reasonable perspective, but it's hard to keep that consistent with the idea that taxicab medallions, for example, are some unquestionable pillar of society.

Forming a mass consensus on what reasonable exemptions to the laws should be is exactly how the lawmaking process in a democracy should work... What I hear you saying is that we don't live in a functional democracy. I don't think, however, that individuals or cities selectively following laws is going to get us any closer to a functioning democracy.

It is easier for Zoom to apologize and fix than for zoom to have no customers but a solid stack.

I more suspect it is a problem stemming from hiring people who are all new to professional programming. Avoiding these issues means hiring some older, experienced professionals.

Or hiring people solely based on Leetcoding interviews. Seriously, every interview I've had in the last decade focused on algorithms/data structures but it was very rare to get any questions on security. This applies to startups and larger enterprise companies alike.

Security implies experience. They don't care anything about that. Just hire cogs to grind away and fire when they burn out.

Those cost more.

Much cheaper than trying to put out the fires caused by not using them.

Experience costs more because it pays off.

There is little to no penalty if your software is crap from a security or privacy point of view and you are popular enough.

The example I used is Whatsapp. On the early days, but definitely after their popularity was already high in Europe, you could still impersonate any user on the platform trivially. Their only real security was obfuscation of the client source code, obfuscation of the protocol. Doesn't help much when you still have a Java (J2ME) client that is trivial to decompile. They still became hugely popular.

People quickly forget about this type of issues, or they don't assign blame where it belongs. They will just shrug over it and start believing that it is normal for computers to get hacked from time to time. After all, it appears all the time on TV.

Yes, but people are short sighted. They also probably think “I’d rather grow quickly and be able to afford this down the line than do it right and risk missing out.”

> Yes, but people are short sighted.

Which comes from being inexperienced as well. And an experienced hand can save you money immediately. Such as selecting a more appropriate programming language to implement the project in. Suggesting an appropriate library to use rather than rolling one's own. Avoiding multithreaded coding disasters. Having a backup system that works. Having a revision control system that works. Knowing that unit tests have an immediate payoff. Avoiding stupid metrics like paying people based on lines of code written. Avoiding rolling your own encryption. Don't transmit passwords in plaintext. Don't do illegal things. Don't leave yourself wide open to lawsuits. Use a real CPA to do your taxes. Keep proper business records so the IRS doesn't hang you. And on and on.

My CPA makes lots of money when some young entrepreneur comes in all terrified that the IRS is auditing him and he kept no records and didn't even file returns.

An experienced hand will tell you to never ignore and never mess with the tax man.

Check out what happened to Will Smith when he paid no attention to the IRS. All his proceeds from "Fresh Prince of Bel Air" went to the IRS as taxes, interest, and penalties.

Is this maybe a side effect of leadership in these startups being more junior themselves and not realizing the value they're missing out on by not pulling in some of that experience?

There's a reason airlines don't put a freshly minted, cheap pilot in command of a 747.

I know market cap isn't everything, but just thought I would look it up.

Zoom (ZM) has a market cap of $40.77B (with a forward PE of 327.31...). Delta (DAL) has a market cap of $18.19B, with a forward PE of 3.91.

Like I said, market cap isn't everything, but I find that astonishing.

Well, the airlines are being destroyed at the moment.

Why can't people bother to construct a minimally secure encryption system given that there are so many good documents and code examples out there?

I don't mean anything with ratcheting, forward secrecy, replay protection, nonce reuse resistance, or any other bells and whistles, just basic competent symmetric encryption without gaping holes or ridiculous bizarre design choices?

It's not hard!

(1) Generate 12 bytes of random nonce using a good secure random source, prepend to message.

(2) Use nonce to initialize AES-GCM.

(3) Run it through AES-GCM, append tag.

(4) Done.

That's not hard and it's secure enough for common use cases.

No, that doesn't solve the problem at all!

AES-GCM was not designed to be used with random nonces - otherwise the nonce space would need to be larger than 96 bits. It was only designed to be safe with a unique counter nonce. That does not always become a problem, but it will in Zoom's case.

Zoom is encrypting a video stream, which means you cannot use AES-GCM wholesale, but have to use it to encrypt chunks of data.

The problem is that 12 bytes (96 bits) of nonce is just not enough, and after encrypting a certain amount of data with the same key, the chance of repeating the nonce becomes rather high. And if you have a long video conference call with many participants using the same key, you'll sooner or later generate enough data that the nonce will be repeated. Once the nonce is repeated, GCM loses its security guarantees.

It's hard for me to estimate how bad it would be, since I'm not familiar enough with the plaintext data characteristics, but in this case it could even be worse than using ECB.

AES-GCM is a cryptographic primitive that is meant for sequential whole-message encryption. The moment you're using it for streaming you are rolling your own crypto. Even if you've used a cipher that supports larger nonces like XChaCha20-Poly1305, you can still not be sure that you're absolutely free from mistakes if you also want authentication, for instance.

Zoom was negligent for just going ahead with AES-ECB, but finding a solution for this problem is not that simple. When you need to stream encrypted data, encrypting it directly with a safe AEAD construction is not always going to make you safe. That's why it's generally safer to use TLS, even considering how historically problematic TLS is. Of course Zoom could not use TLS for streaming video, since this would preclude them from using UDP and allowing for packet loss. I've never encrypted TLS traffic and I'm not familiar enough with other protocols like dTLS and [Noise](https://noiseprotocol.org/) to know if they will be useful in this case, so I won't be making any claims to how easy this is.

The only thing I can think if is that maybe the protocol Zoom is using precludes prepending the nonce due to the packet format. But surely there's some way to do this with packet counters and user IDs. It's possible this was an intermediate step but I'm really stretching for excuses here.

Yes you can construct an IV from other state, though it's not ideal. Another good fit might be modes designed for disk encryption as those are designed for cases where you can't pad (e.g. disk blocks).

There's also AES-GCM-SIV and its relatives which construct the nonce from a MAC of the plaintext and technically do not require a separate IV, though if you don't use one any duplicate message will be obvious.

Those are somewhat more complex but honestly even if you don't get those perfect it's almost definitely better than ECB.

Or use TLS end to end like a normal company...

Yeah I wasn't saying that because it's obvious. I was just saying if you really want to DIY crypto it's not that hard to do better than dumb ECB.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact