The Intercept article on it has a thread here: https://news.ycombinator.com/item?id=22767807
It's probably too much of a stretch to merge all these, because many comments are about specifics of those posts, and the ones that aren't are kind of generic and so maybe not worth merging anyway (https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...).
Thanks in advance for all your moderation efforts that make HN the site we all love to use on a regular basis.
I'm curious if you've ever considered writing a blog post about some of the things you've learned from your years of moderation? You spend so much time on HN, you must have seen lots of patterns and have lots of insights on...well, everything that gets posted on HN to everybody that posts on HN. I'd genuinely be interested in reading it if you ever did.
a semi-anonymous HN user
The site’s now characteristic tone of performative erudition—hyperrational, dispassionate, contrarian, authoritative—often masks a deeper recklessness. Ill-advised citations proliferate; thought experiments abound; humane arguments are dismissed as emotional or irrational. Logic, applied narrowly, is used to justify broad moral positions. The most admired arguments are made with data, but the origins, veracity, and malleability of those data tend to be ancillary concerns. The message-board intellectualism that might once have impressed V.C. observers like Graham has developed into an intellectual style all its own. Hacker News readers who visit the site to learn how engineers and entrepreneurs talk, and what they talk about, can find themselves immersed in conversations that resemble the output of duelling Markov bots trained on libertarian economics blogs, “The Tim Ferriss Show,” and the work of Yuval Noah Harari.
Not sure whether to agree or disagree.
Ah yes, I remember this article. Oddly enough, this is a major reason why I still enjoy browsing HN. There are few places left that cover a variety of topics while also encouraging this open style of discussion.
Someone needs to do a deep cognitive study of how memes, especially in image form, affect cognition during discourse. Whatever is going on it seems obvious that their presence immediately reverts people to a juvenile junior high level of discussion at best.
They also seem to lead to the development of this kind of obscure signaling culture hive mind. I don't even know how to describe it. In Star Trek TNG they called it the "borg songs," the sort of emotional-cultural carrier wave that pervaded the Borg collective.
Meme-dominated sites like 4chan immediately turn me off. I find that whole style creepy and culty and gross and it feels like some kind of soft mind control technique. I felt this way long before 4chan became "Stormfront for nerds," but the fact that this style if discourse would create an environment compatible with that way of thinking doesn't shock me at all.
Sometimes people are really cool. Sometimes people are fools. The fools make good points sometimes. The cool people can be the worst. The roles are fluid. I've had my fool moments.
dang, who I assume is a cyborg driven by several different mods, generally makes good calls. There aren't many places online where I feel like I can point out a problem to the mods and not worry about being called an idiot. Or banned.
You could even argue that being "apolitical" actually means that your political opinion is that the status quo is acceptable, which other people may disagree with. For instance, some people believe that selling water is unethical because it's a basic human need which you should always have free access to -- saying that the status quo is acceptable means that you don't think that for-sale water is unethical. There's nothing wrong with that opinion, but it is an opinion (and a political one at that).
Make a polite, clear-headed enough argument for something like eugenics and people will upvote you. It's troubling, but I can make a pro-socialism post under the same conditions. It's an uneasy peace.
I agree with this. I see Cultural Hegemony (domination-justification stories) being very strong in unexpected places. Family members or close friends who indirectly benefit from the artificial scarcity mechanisms (in SV especially Intellectual Property) that make up the various social class realities of rentier capitalism . What seperates me and a mother of three in Bangladesh, who is getting paid starvation wages making the designer clothes I wear , is that I was lucky enough to have been born in the global North, in an upper-middle class family. I was able to grow up on the inside of the fort, and benefit from a rich inheritance .
Sorry if that sounded snarky—I don't mean to pick on you personally, and I don't disbelieve you. But the perception is explicable by the notice-dislike bias: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que....
How can we know for sure we are right or even on the good side, whatever "good" means?
From that perspective, the lowest common denominator is that bad is what harms humanity, e.g. by disregarding human rights. Human rights then becomes the new frame of reference, meaning that something can't be good if it's not within that frame.
when you're the USA, the world looks biased to the political left ...
(I looked at your comments and the ones I saw are not only fine, they're exactly what HN is for.)
I do think of writing about HN, but not a blog post, because I can't think of a good reason. I don't enjoy writing (https://news.ycombinator.com/item?id=14431262), at least not that kind of thing. Somehow I maneuvered myself into a job that is basically being a writer...of that kind of thing. Writing because you enjoy it, or were born to do it, is the best reason; failing that, it could be a means to an end. But what end?
I do intend to write about HN in a different way. There are core moderation topics I've been posting about and returning to for years now (https://news.ycombinator.com/threads?id=dang). I'm inching my way towards distilling those comments into mini-essays, or discrete paragraphs, about the principles of HN: what kind of community we want HN to be and how to get there. I think there's an opportunity for a step change if we become more conscious as a group about what we're trying to be and why.
The "why" is critical because the rules here can seem a bit lame if you think of them only as constraints. Many creative minds are the sort that don't see why they should respect arbitrary constraints. For such people, the way to win them over is to derive the rules from first principles: to show how it's in their interest to follow them because that's the way we get to have, and keep, an interesting community over time. That's what I try to do in moderation comments, at least on good days: not to scold people, but to persuade them.
My intention is to distill all that commentary into a series of posts that can form sort of the case law, or midrash, or zoography, of the core topics around HN community. I don't know if anyone has noticed, but I already do that by linking back to previous moderation comments using specific keywords. Some examples:
There are at least a dozen of these. My idea is to consolidate that material into little essays that interpret, explain, and basically act as hermeneutics to the site guidelines. Then maybe, in the future, I won't have to type out a whole new comment on $topic; I can just link to the canonical explanation. In other words, I intend to write more in order to write less! Other than that, for learnings and patterns about moderating HN, your best bet is to lure me with the right kind of beer. Or just ask a question here.
If that's an actual offer, I'd be happy to buy you a drink when the quarantine is over.
People are dunking on Zoom for rolling their own crypto and coming up with AES-128-ECB. This is also bad, but people should be aware that it's a lot more complicated than "you can see penguins through it".
You can see penguins through an ECB-encrypted bitmap because discrete blocks of the bitmap image repeat, and thus have the same ciphertext, and these correspondences carry obvious meaning in a bitmap. The same is not automatically true of video or audio data with normal codecs. Aaron Toponce points out that sensor noise will likely scramble ECB ciphertexts, for instance.
Colm MacCárthaigh makes an even more important point, which is that it's already very trick to reliably encrypt voice tracks, because common encoding and transmission techniques make them susceptible to traffic analysis. So, for instance, you can quickly find papers about exploiting silence suppression to make predictions about speech in an encrypted audio channel. The point here being, cryptanalytic attacks on ECB are unlikely to be anyone's first recourse.
Obviously, the 128 bit AES key thing doesn't really have any practical impact.
Designers should religiously avoid ECB mode, but the real danger of ECB is in interactive settings, where we as attackers get to induce plaintext patterns, and use chosen boundaries to isolate targeted ciphertext. Bulk video and audio transmission isn't that kind of interactive setting. You still don't want to read people saying that ECB is OK; it's bad.
Essentially: it seems like Zoom's cryptography is bad, but not in a way that really matters compared to brochure-level badness of non-end-to-end-encryption.
If I'm honest and I'm working on a video conferencing platform, a potential customer asks "Do you have End-to-end encryption?". I say honestly "No". The customer goes to Zoom because they lied and said they do. Any outcome where Zoom keeps even _some_ of the resulting customers and money rewards deceit and ensures you'll get worse products in future because a worse product was given a leg up by lying. That's where we are today.
Silence detection (and a bunch of related problems) should be solved by using Constant Bitrate (CBR) encoding. The analysis linked says SILK was used, which was conceived as primarily a Variable (VBR) encoding but it doesn't go into enough depth to be sure whether VBR is in play here, though it's hard to give Zoom benefit of the doubt.
On the video side you can play sensor noise off against video codec optimisations. If you're sure the bottom 1-2 bits on a video conference camera feed is noise, the people writing the optimised live video encoder probably agree and aren't sending those bits anyway (you can synthesise noise perfectly well without sending it), so it's a wash.
And also though I agree it's not where you'd start do not rule out attackers controlling the plaintext - the "virtual pub" I attended a few hours ago with friends had a third party video playing, and sometimes people share their slide decks by video in corporate or government environments. It's definitely more practical to sneak deliberately watermarked video into someone's Zoom conference than to, as we worried about for disk encryption, get them to stash megabytes or gigabytes of carefully watermarked files on a drive.
I will say in passing that it's weird to keep picking VBR AAC in your examples, the document says they used SILK, which would make sense because SILK was designed (by Skype many years ago) for this application. AAC is from a different era (it turns out we were trying to be too clever back then) and targets a different application.
This bothers me because even if ECB was good enough here, which we're both agreed it isn't, encrypted SILK VBR is anyway specifically known to be vulnerable to the same approach they demo'd in "Spot Me if You Can" and similar papers:
Keep in mind that even without any cryptographic work, we know the exact length in bytes (just like in that paper) of every SILK frame on the wire.
As to the cryptography:
I think the first thing I'd do if I was trying to turn this into an attack would be to send gibberish to Zoom's intermediary servers and see if they care (thus function as a potential Oracle).
You know more about the codec situation here than I do. So, you tell me! What should I be looking at?
Where are you getting this recommendation? Who recommended it - to Zoom? And when?
> You know more about the codec situation here than I do. So, you tell me! What should I be looking at?
If you're prepared to be at least slightly interested in problems even if ECB only contributes rather than blowing them wide open, then maybe you should look at H.264 I-frames first.
An I-frame stands on its own. For example the WebP image format is essentially just one I-frame from VP8. If the scene is literally unchanged from one I-frame to the next, all the exact same image data must be re-encoded, and logically (if the codec is at all efficient) that means the same bytes.
[ If you just don't use I-frames then all loss becomes unrecoverable and people will stop saying your product "Just works". In products like a DVD or Netflix I-frames are needed to make it "seekable". This is obviously not a necessity in a video conference, but the ability to handle network glitches and to add/ remove participants seamlessly is a requirement ]
So with ECB for I-frames clearly we can tell if nothing changed. Maybe an artificially lit meeting room with nobody in it, a desk left unmanned. Or a screen share left showing an unchanging desktop. Is that a "useful" thing for eavesdroppers to learn? Do you think Zoom users would be surprised that eavesdroppers can tell their colleagues took a coffee break?
But because codecs like H.264 are oriented around square blocks I suspect we can do better. They're not deliberately trying to have these blocks encode to exactly 16 bytes and so mostly they won't, but of course one chance in 16 isn't nothing and we get to try every I-frame. Probably not to the point of getting a blocky outline of a moving person on a stream (which is sad because I'm pretty sure that, like the Penguin, would make the point viscerally) but enough to have some idea which parts of a scene (or a slide deck) are changing.
Isn't an important part of the argument that the point the penguin 'makes viscerally' is actually an extremely narrow, relatively uninteresting one? It tells you something like 'certain types of synthetic images represented entirely uncompressed in the spatial domain are obviously super leaky when ECB-mode encrypted'. Which, you know, neat but nobody does that.
It seems extra-uninteresting compared to things like 'plaintext can be recovered from encrypted realtime VBR-encoded audio'.
In that context 'I-frames might be both identical and magically align' seems even more contrived than the already more-famous-than-informative penguin.
> In that context 'I-frames might be both identical and magically align' seems even more contrived than the already more-famous-than-informative penguin.
Did you take a look at an actual H.264 video stream of something boring?
I downloaded a random example of an H.264 video of the generic movie lead-in you've seen a million times where giant numerals appear in descending sequence. The sequence repeats itself several times, allowing us to compare I-frames each time.
Let me quote from the first time a giant numeral two is displayed - briefly in hexadecimal:
> fc fe 7f 3f 9f cf e7 f3 f9 fc fe 7f 3f 9f cf e7
That's not encrypted, but on the other hand I also don't understand what it means because I am not an H.264 decoder. So an ECB encoded block would be just as meaningful to me.
Of course, we'd only get the same exact ECB encoded block if the input was the same... a few seconds later another I-frame of the same giant numeral two and:
Ah. Well, maybe it's just a fluke? A few seconds later the same number is featured in the I-frame again and...
Maybe I got real stupid in my old age but I think what's going on here is exactly what I already told you to expect. Given the same input the H.264 I-frame encoded is identical or at least so very similar as to be full of the same 16-byte sequences. Actually it's mostly 9-byte sequences, it's just that it needs so very, very many of them to encode an I-frame that statistically producing a recurring 16-byte sequence was almost inevitable.
Let me be clear here: I don't understand H.264 well enough to align this naturally frame aligned, I just took the raw data and pretended it was already 128-bit aligned because that logically cannot be worse than what you'd get if you knew what you were doing. I would be astonished if somebody who actually knows what they're doing can't get much better results from this.
I can H.264 a series of images with solid-colored blocks and not see repeats. But who knows if my codec test harness is representative? I very much doubt it is. I'm certain that there are cases where there are not just collisions, but useful collisions. Let's understand what they are.
This isn't "vulnerability Olympics". It's simply about understanding what we are discussing.
This file contains audio, which in the Zoom pseudo-RTP is separate so you should strip that out with your preferred tool without changing the H.264 video though it won't change the results very much (it only re-assures us that the recurring data is in the video stream as I assumed) so if you don't have such tools don't worry too much.
It has the case we're interested in, because it's the simplest to understand: I-frames with identical content.
If you've put together a "codec test harness" that's what you want to simulate. Don't feed in penguins or "solid-coloured blocks" just feed in a still image, and watch as (unsurprisingly) it outputs the same I-frame each time it needs to do so, regular as clockwork. You're not simulating a Zoom stream of abstract art, but a shared desktop that's showing Pauline's Q3 revenue forecast for two whole minutes while her team discuss Tiger King or the desktop of a guy who left five minutes ago but didn't leave the meeting, or (more speculatively) an empty meeting room on camera.
This is the beginning. If we stop dismissing it as just "more complicated" than just a Penguin pixmap and bring in people who know more about video codecs we can explore how much more can be done. If every slide show has the company logo and the presenter's name on the first slide, does that correlate? Can we make a watermark that would survive being played back through one codec and then encoded by Zoom?
But there's a strong sense in which I don't care to do all that actual work. Some of these things might be possible and some not, but we're not building a cool demo for a convention we're just here to say that Zoom is terribly insecure and if people feel they must use it, treat it like a meeting held in the local coffee shop, assuming your local coffee shop is bugged by both the US and Chinese government.
04608c1c211004608c1c00000007419b: 4 samples
1004608c1c211004608c1c0000000741: 4 samples
0963211004608c1c211004608c1c0000: 5 samples
33c00963211004608c1c211004608c1c: 5 samples
63211004608c1c211004608c1c000000: 5 samples
f7df7df7df7df7df7df7df7df7df7df7: 5 samples
0001000003ff0000000e000004000000: 6 samples
211004608c1c211004608c1c00000007: 7 samples
c00963211004608c1c211004608c1c00: 8 samples
0001000003ff00000014000004000000: 9 samples
0001000003ff0000000d000004000000: 25 samples
75d75d75d75d75d75d75d75d75d75d75: 31 samples
5d75d75d75d75d75d75d75d75d75d75d: 33 samples
00000b0000000b0000000b0000000b00: 34 samples
d75d75d75d75d75d75d75d75d75d75d7: 34 samples
7f3f9fcfe7f3f9fcfe7f3f9fcfe7f3f9: 39 samples
f3f9fcfe7f3f9fcfe7f3f9fcfe7f3f9f: 40 samples
9fcfe7f3f9fcfe7f3f9fcfe7f3f9fcfe: 42 samples
cfe7f3f9fcfe7f3f9fcfe7f3f9fcfe7f: 42 samples
fcfe7f3f9fcfe7f3f9fcfe7f3f9fcfe7: 42 samples
e7f3f9fcfe7f3f9fcfe7f3f9fcfe7f3f: 43 samples
f9fcfe7f3f9fcfe7f3f9fcfe7f3f9fcf: 43 samples
fe7f3f9fcfe7f3f9fcfe7f3f9fcfe7f3: 46 samples
3f9fcfe7f3f9fcfe7f3f9fcfe7f3f9fc: 47 samples
aebaebaebaebaebaebaebaebaebaebae: 60 samples
baebaebaebaebaebaebaebaebaebaeba: 63 samples
00000000000000000000000000000000: 69 samples
ebaebaebaebaebaebaebaebaebaebaeb: 73 samples
00060000000600000006000000060000: 147 samples
I-frame macroblocks in H.264 are DCT'd, like a JPEG; I have no intuition for what the repeats would signify, except that the encoding is deterministic and, I guess, within an I-frame, stateless? So identical samples will show up in the output?
(It might be easy to instrument an H.264 decoder to get an indication of what frame I'm in when I see a collision, which is I guess what I'll do next).
I don't think it would make a huge difference but the numbers might be higher/more representative. Another way might be to just capture zoom traffic of someone's Quarterly Synergies presentation or whatnot.
Edit: also thought of a silly upper-bound test for Team Penguin - you get plaintext of every colliding block in the stream. How much penguin can you get out of that?
I absolutely agree that the real deserved flak is for saying "end-to-end encryption" when that's clearly not what's happening. Honestly, I don't know how they could make their product work as well as it does with actual end-to-end (consumer to consumer) encryption. Self-hosted solutions can do this but their servers are doing a lot more than ferrying raw data streams around. That's why their meetings work so well with 200+ participants.
They also deserve criticism for saying AES-256 when they're using 128-bit keys, but like you said it's not a security issue. I suppose they could be referring to TLS that they use to set up connections and meetings rather than the stream itself but that's a stretch.
For me, the main reason using ECB makes me distrust a service is that it could be a canary for "we don't have anyone who understands security working here", and it increases my prior that they're also doing other dumb things like reusing keys, or not securely distributing them in the first place.
Lack of end-to-end is bad, but in a situation with more than two participants it's not an easy problem and when some of them are dialing in from landlines then one of your endpoints has to have a key anyway, unless I'm missing something? [edit - was addressed in first paragraph]
Is it conceivable that actual Zoom use cases might allow attackers to do some of these things? I don't have an example in mind, but I'm thinking that the Zoom protocol supports many more actions than just audio and video transmission, so perhaps some of the other protocol features could be exploited for plaintext injection.
For example, Zoom supports text chats, and some kind of metadata information about participants and call setup. Perhaps those are all directly encrypted under the same key with the same block cipher mode?
The trick with ECB is that you're relying on 128-bit blocks of plaintext repeating with perfect alignment, and, to do anything useful with it, those repeats leaking useful information. In a bitmap image of a penguin, those repeats are easy: they're blocks of pixels of uniform color, and all ECB is doing is permuting their colors. That's less intuitively true of audio (where I'm not really seeing repeats at all, but, again, you should just play with it).
Um, so, just to make sure I understand you, you're implying that lossy compression is not worth its salt? Because you will surely have more repeats in a lower entropy space.
1) "likely" != "guaranteed to"
2) If there's sensor noise when you're sharing your screen, then you're doing something horribly horribly wrong
Like setting a custom, static background? One of zooms fun features.
The only concerning thing for me is, why would they lie about using AES-256 when none of my users (and I assume most of their users) would care in any way about AES-256 vs. AES-128 in ECB mode. Why would they lie?
Even after this, having my users conducting university lessons over something that might be decrypted in China is honestly not that big of an issue. I would of course prefer it if these meetings would be private from the PRCs scrutiny but at least in my situation (and I think most educational contexts) this is not really that important.
Now, China knows who is attending which lesson. And how much activity each individual shows. And also, what happens on the side like environment sounds, environment at the camera (e.g., how generous the student's apartment is). Also, the client can analyze the mouse cursor movement, see what other apps are running and how (on native clients), and on mobile clients there is for example the gravitational sensor.
Moreover, a voice (and the face, of course) is like a fingerprint of a person. Hence you now have a reverse lookup table from voice/face to person.
We still do not have any evidence that the PRC has access to unencrypted Zoom server logs and frankly I assume we would have the same (or worse) issues I had with my tests from Iran that either SIP/WebRTC doesn't work or appears to be intercepted. So, at least for me and my users, Zoom is the most accessible and "least worst" solution.
And indeed, an intelligence could possibly hack your bigbluebutton server. This involves, however, a targeted attack instead. I think this is a different scenario, though.
And assuming I'll consider switching to webex the response to encryption in webex is this: https://www.webex.com/content/dam/Webex/eopi/Americas/USA/en...
Which 404's and basically represents my experience with Cisco: "We don't give a shit about you, you already payed us.". Frankly Webex could host the next coming of Jesus and I would not give them any more money.
I have a feeling computer accessible Webex is just there because of the dedicated videoconf HW that Cisco makes. The software is to provide a feature checkmark and make its victims miserable enough to buy the HW.
Good job discrediting my post though. It's not like Cisco basically told everybody in a KB that if you want to use the same features Zoom provides (or a Linux client, or desktop sharing or breakout rooms or audio transcription or waiting rooms or...) you would have to give up every encryption feature Webex provides. But I assume you are a seasoned Webex admin and can provide us some insight into why you are using webex in contrast any other solution. Or you are trolling, whatever.
Or even better use those recording in the future as compromat as needed.
Then zoom can go around and claim OECD and IBM and the UN (all made up) use them, which lends credence, even if it's just that one training team in Nairobi that trialed the software once.
Your point does not stand on its own.
My point really does stand. You might gradually allow erosion of your rights until you find that none are left. It is so easy to say "I have nothing to hide" until you find that actually you do have something to hide for reasons that are not immediately obvious.
I am not saying that using Zoom will have nasty consequences but I am saying that the attitude that abrogates responsibility for your own privacy might have unintended consequences. If it becomes common place to simply say "meh" we might not like the world we get instead of the world we might wish for.
My Old Saxon friends have a rather more robust attitude to privacy concerns than you mate!
Me administering a Zoom account for my fellow employees and my students does not erode anybodies right. For me it is a choice between a GDPR compliant vendor and a vendor that does not care about the GDPR. Personally I have had good experiences with the GDPR (Facebook finally having to delete my account even though I would not verify it with a personal ID and cell phone number after I went through the irish data protection authority) and Zoom claims to be GDPR compliant.
So, frankly I'm not sure what you are talking about. It seems like you are going for a slippery slope argument I don't agree with.
Companies like Zoom can claim that they are GDPR compliant however truth of the matter is that compliance offices are overwhelmed. And until Zoom will have a huge data leak or something nobody is going to investigate their compliance.
So a company like Zoom might claim GDPR compliance and that's something, but only if you can trust them.
And a company that lies in their marketing and press releases can't be trusted, sorry.
Google's Meet btw is also GDPR compliant, Google tries to be GDPR compliant nowadays with everything they do because they are a huge target. They also don't use bullshit in their marketing and are pretty good at security, so I personally trust them more, even if I actively avoid Google's products out of privacy concerns.
My dad was a British soldier (so was mum but that's another story). We were posted to exotic places like "Reindahln" (MG) and Paderborn and Soltau etc. We went on a holiday to West Berlin in around 1980ish. We were allowed through Check Point Charlie to see the DDR for a short while. Funnily enough exactly the same arrangement as getting into Northern Cyprus. ie the Turkish bit.
Anyway, we saw the Brandenburg Gate from both sides, when it was mined all around but rather nicely flood lit. I have to say the east side looked a bit shag back then.
Our German friends always used to look forwards to reunification but the cost when it came used to cause a few remarks cough. For me a unified Germany is a good and beneficial thing, regardless of cost. I saw first hand what life was like in E Berlin in the early 80s.
Gerdes (my family name) means the same as the word German. A ger is a spear - https://en.wikipedia.org/wiki/Migration_Period_spear. A ger-man is a spear bearing man and gerdes is an old form of that. You lot had a habit of trundling around with spears - hence the name in English.
Using Zoom is of course not an awful thing to do. Just be careful me old fruit. Please.
That's the thing, though. Zoom also claimed to be end-to-end encrypting with AES-256. If they were willing to lie about that, what's not to say they're willing to lie about GDPR compliance?
"It's only illegal if you get caught."
No, but it will when the next generation of Nazi, Stalinist, and Maoist regimes arise and gain access to the data in question because we weren't fanatical enough about E2E privacy today.
And just as Niemoeller's verse warns, by then it will be too late.
But, this is just wrong. In Germany we did speak out when they came for the communists, the socialists, the unions and the jews. Niemöllers argument was relevant in 1937 when he was arrested and I appreciate that you picked up on it but frankly it's different here in Germany. We do still speak out against discrimination against all of them. Maybe E2E encryption in a chat application is just not as pressing as the things Niemöller talked about at the time.
I kind of doubt it was intentional. Developers are not marketing, typically, and it seems reasonable to assume that a technical person said "aes" when a marketing person asked "do we have encryption?". And then the marketing person searched for "aes" and assumed that meant "aes-256".
Other nascent companies will tip their "We want to grow like Zoom" hat and it'll serve as a warning sign as to what the growth mentality is.
Anyway, my point was... I don't think they've taken too much grief yet, for what they've done. They deserve all the lumps they're getting, and that may (hopefully) be instructive.
[And yeah, there are always companies that do this kind of thing... some are Facebook, and some trip into a hole.]
Like good that the market is stressing them on their security, at $30bn they should be able to engage with that feedback and then loop.
On the other hand, tried to use Skype lately? Product has barely evolved since they were bought out by MSFT.
Guess google does videoconferencing too, but they know enough about us all already....
Such a strangely extreme viewpoint to suddenly jump to.
The 128-bit key is not inherently wrong if they were rotating these out during the stream. That being said, there's no reason not to do it right and use a mode like GCM with a longer key - most hardware supports acceleration for AES-256 these days. It can actually be slower to use a 128-bit key on 64-bit systems.
While I respect the decision not to disclose the waiting room vulnerability, it's pretty obvious what's going on given the context. They probably shouldn't have mentioned where the vulnerability is.
I'm honestly surprised anyone with technical knowledge thought that Zoom was actually doing end-to-end encryption given how the software works. All of the video transcoding/downconversion is clearly happening on the server. Your client is not sending multiple compressed streams for varying connection bandwidths. That's the main reason a lot of people like Zoom - it actually works well with dozens or hundreds of participants.
Zoom's design has a single key for everybody and for everything [ in the context of a particular video conference call ] . It's simpler and, to a layman, it sounds secure. [ We arguably contribute to this if we say e.g. "the key" implying it's just one thing when we mean something like a master secret in TLS used to derive lots of actual keys ].
Once you've committed to a single key ECB behaves exactly how you'd want.
You've got some audio, or video, ready to send? Just encrypt it with the key. Receive some encrypted data? Just decrypt it.
What happens if you have some network trouble briefly? Nothing, everybody decrypts whatever does arrive and maybe a few frames are missing.
All of the other modes don't work at all if you try to use them this way. They all expect you to have thought about the problem and track a bunch more state and then maintain that state despite an unreliable network and other issues.
Unless there is somebody in the room who says we can't do ECB because it's fundamentally not a secure choice, ECB is what you're going to get from this design decision.
And I've been in rooms like that as the only voice, or at least as the only person who spoke up. I've been in rooms where I was part of a chorus too, but as organisations get bigger and "security is everyone's problem" becomes a phrase people learn but don't act on, it gets lonelier.
Also, I actually can't even work out what a "correct" key rotation strategy could be for ECB with a variable number of parties all encrypting stuff at once. As a result it seems unlikely that Zoom did figure out such a strategy and then correctly implemented it. Instead it seems safe to assume there is no key rotation, everything sent by every participant for the life of the stream is encrypted with the same key, even though that's a terrible idea.
I would like to hear the explanation for why they did not use SRTP, though I suspect the answer is, "We had no idea it existed."
...you're joking right? The Wikipedia example for why ECB is not recommended is literally an image:
Edit: This applies to compression too. Please refer to Shannon's source coding theorem.
But given all the other security issues that seem to hover around Zoom like a cloud of angry bees, this is probably all moot - if you really want to crack a video stream then there are probably easier ways.
Now I wonder if a sequence of these rectangles, all the same size and in roughly the same area of the screen, would lend to some sort of statistical analysis. At the very least the timestamps of the updates would tell you how fast the operator is typing.
[RFC 6143] https://tools.ietf.org/html/rfc6143
That doesn’t make this good, but it means that one specific example isn’t immediately applicable.
Yes, ECB is almost always the wrong choice. Yes, there are other ways it’s going to fail in this use case. Yes, compression before encryption itself often enables other attacks. No, I should not have to prefix a comment about ECB with this type of disclaimer when I’m making (what should be) an uncontroversial statement that the tux attack doesn’t directly apply to compressed image data.
Ironically, when designing a protocol for my company, one of the reasons we didn’t use ECB when it would have been entirely justified (each chunk of data was precisely one block in size and keys were only ever used once) was because of potential backlash from people who only know “ECB bad” and nothing more.
If zoom were transmitting text this would be relatively more serious.
If anything, it reduces entropy.
If I am encrypting something, I only want to depend on the strength of the encryption. I don't want to hope that something else ensures that an adversary cannot figure out my ciohertext. That is a very bad idea.
Compression does not introdoce entropy to a stream. So assuming that saying the stream is compressed and calling it good is a very bad idea. Please refer to Shannon's source coding theorem. If anything, compression reduces the entropy in the information.
Suppose we have some string of bytes. When we split it into aligned 16-byte blocks (let’s assume it divides evenly for simplicity), we find that the distribution of these blocks are not evenly distributed. For example, 1% of blocks turn out to be the same, which given the number of symbols in this code is massively out of proportion.
We apply a Huffman code using the 16-byte blocks present in the message as the alphabet and their observed statistics for this particular message (if that aspect bothers you, you can assume we pretend the dictionary to the message). Huffman codes are optimal for per-symbol encoding.
Suppose we re-evaluate the distribution of 16-byte blocks in the compressed data; will this distribution have higher entropy (meaning there will be fewer duplicate blocks to exploit ECB with) or not?
No, that's false. ECB reveals repeating plaintext blocks. "F0123456789ABCDEF0123456789ABCDEF" contains a repeating block-length sequence, but would encrypt to three distinct blocks under ECB, because the sequence is not aligned to a block boundary.
That by definition is saying any pattern in plaintext shows up in cipher text
Even uncompressed video will be hard to see that "penguin image effect" in, because the pixels that make up each block will be constantly changing in a random way, and unlike that synthetically generated image, it's highly unlikely for a block to be the exact same as any other one in any given frame.
Hint: You don't need to know the plaintext exactly. you just need to be able to build a reasonably precise probability distribution.
Nobody who wants to do AES-256 rather than AES-128 thinks 3DES is "still secure". 3DES is perhaps 112 bits of useful keyspace but it has 64-bit blocks which was already bad news when DES was invented.
TLS 1.3 doesn't have a 3DES option at all. You can do AES 128 or AES 256 (or ChaCha20).
I'm not here to defend zoom but any and all companies that can do this have and are doing the very same thing to minimize costs. It's not great but it's the expected way of managing a software company in 2020. It would be hard to do business otherwise.
Whether it's good or bad that's a question that will need to be reexamined given the current situation.
Even ignoring the appearances, for latency and the fault tolerance reasons, China is the last place you'd want to put it a critical server for an app used in the West.
No more rolling your own crypto than if I were to use DES.
If rolling your own encryption scheme is just choosing what algorithm you use — every system does that. So it’s not headline worthy. :/ I get that zoom needs to make better choices, but the rhetoric around it has been pretty poor and unhelpful.
As the classic article says, if you type the letters A-E-S, you're doing it wrong.
Zoom is an interface for secretly sending a message to the Chinese government, and then hoping that they secretly relay that message to the person on the other end of your call.
Wow. What could all of these people possibly be doing? It can't be development and QA; what's going on over there?
700 engineers for a software company the size of Zoom is actually pretty small.
Not to defend them against the recent security fiasco, but innuendos such as this that links "employees working in China" directly with "shady business" makes me at least uncomfortable.
Also, to be clear: the statement is meant from the perspective of, how do they have that many employees (what are they all doing); not "why" do they have that many (which, while seemingly very large, I am not in a position to judge), nor meant to be a slander against outsourcing R&D work to China (though, to be crystalline clear, I inherently trust Zoom less because of this, just like I trust companies who base or outsource R&D to Australia less because of Australia's laws on these matters).
With good management, a team of 50 should be able to provide what Zoom provides.
Of course, with this read comes the age old question of "I can build Google/FB with 20 good men, what are they doing?"
Well, this would explain Google's pattern of constantly churning out new products on the theory that hey, maybe someone somewhere wants it.
If you need 20 people to run your actual operations but you've hired 20,000 people, what are the other 20,000 people supposed to do?
Edit: I looked it up... thought it seemed crazy high but it obviously includes all the support staff too. So they might have only 300 engineers total, for example. I guess that's more reasonable. 
"As of January 31, 2020, we had 2,532 full-time employees. Of these employees, 1,396 are in the United States and 1,136 are in our international
"We also operate research and development centers in
China, employing more than 700 employees as of January 31, 2020."
If that 700 count is including support staff, then it becomes slightly more believable, but even then, I'd expect Zoom to be at that 700-1500 number in terms of total employees, not just R&D.
There are 20,000 google engineers working in "research and development", what could all of these people possibly be doing?
You get the face and the voice of people talking directly to the camera. You get the people who they constantly talk to so you know their relations. You know the content of what they are saying.
Zoom doesn't even have to be responsive to pressure, just a developer has to be. And they live there so yah they'll be understandably responsive.
If you "waste" time on the real important stuff such as a good design, user security, or the like, you will lose over to a competitor who didn't and therefore was able to spend more time on marketing.
If you waste your time on user privacy you will be totally crushed by that competitor whose definition of "privacy" was "how much can I pester this user until he gives me access to his address book so I can spam his friends?".
I don't think this is good. I think this is very sad. But it is what it is.
I still remember the Whatsapp founders coming into the jabber mailing lists with "please let me configure my server"-type questions, for god's sake.
She already had to use MS Teams, Slack, Jitsu and Zoom. Her take was that Zoom was by far the most usable tool. And most of her 20 years younger co-students agree.
Sadly most people don't even know about the problematic status of Zoom. And if one tells them, most do not understand what's so problematic about that.
That’s a bait and switch on the part of the university. “You’ve already paid, but now you have to give up your civil rights against this third party you’ve never heard of to get the service.”
There should be liability for the schools for doing this. A class action, perhaps?
There's a reason why (at least in this part of the world) the educational products receive additional scrutiny and need to comply to additional privacy and security directives.
There are video technologies they can license so that they can keep things running without forcing students into having to enter into abusive third party agreements. They can hire people to set up simple first-party HLS streaming systems—it can be done in a day. There are tons of non-abusive alternatives. Zoom makes malware. Can you imagine being given the option by a university to whom you have already paid tens of thousands of dollars: “install this malware or you don’t get to come to class”?
You could say the same about Amazon and their backlash against workers. “There’s a pandemic, and this company is doing what it can to keep the supplies moving, and your reaction is “sue them”?”
Yes, that is the appropriate reaction if they break the law.
A pandemic does not justify abusive, coercive behavior. If anything, it makes it more abusive.
When the alternatives don't work, it doesn't really matter how secure the alternatives are.
Bad security design is unacceptable and this is sort of indicative of a large dissonance when it comes to SV startups. "Disruption" isn't ignoring requirements - if you are able to under price your competitors because you fail to adhere to good practices (or, more importantly, regulations) you're not building a lean and useful product - you've just built a half-assed competitor that can undercut prices because it is incomplete... You're selling something as a competing solution when it only does half the stuff.
There is a lot of good to be said about identifying the 90/10 value components of a problem space and discarding expensive features that would just add complexity for little value - but if those features are requirements you're just failing to actually meet the points consumers (or markets ala regulations) expect and making your profit off of deceit.
1. Looking at you Uber.
No consequences means their behavior is acceptable. Which means this will repeat until there is a reason that makes this behaviour not acceptable.
I use Zoom regularly. Their video works well. I would like to have AES-256 but also I suspect this is not a casual choice, and I'm not sure it's as clear-cut as you're assuming.
* OK, if you're an encryption expert, you obviously would roll your own to advance the state of the art, but Zoom are quite obviously not encryption experts.
Usually it is: I can spend all this time making things right, or bob here knows how to do this other simple way that is good enough in a week.
Anyone want to wager how many of their computers are now p0wned.cn?
But there are also perhaps hundreds of millions of potential immigrants worldwide who would love to come to the US permanently and have the means to do so, if it were legal. But they cannot get a green card or h1b because those are nearly impossible to get from people in some counties. So they never come.
It does seem hard to decide a fair system here but any decision is just that... A system of laws... They have no meaning unless followed. What's the point of rule of law anyways? It's the answer to arbitrary rule: by monarchs and dictators.
Which is exactly my point. According to the law, these people are illegal aliens and the government has every power to stop them from crossing the border, detain them if they're already here, and deport them to their countries of origin no matter how long they've been here. And they have no legal right to reside in the country. And a lot of conservatives argue that "sanctuary city" proclamations represent a fundamental disrespect for the rule of law.
A lot of people balk at this notion and start arguing that US immigration law is fundamentally unjust and that it's reasonable to make exceptions. And that's a perfectly reasonable perspective, but it's hard to keep that consistent with the idea that taxicab medallions, for example, are some unquestionable pillar of society.
Experience costs more because it pays off.
The example I used is Whatsapp. On the early days, but definitely after their popularity was already high in Europe, you could still impersonate any user on the platform trivially. Their only real security was obfuscation of the client source code, obfuscation of the protocol. Doesn't help much when you still have a Java (J2ME) client that is trivial to decompile. They still became hugely popular.
People quickly forget about this type of issues, or they don't assign blame where it belongs. They will just shrug over it and start believing that it is normal for computers to get hacked from time to time. After all, it appears all the time on TV.
Which comes from being inexperienced as well. And an experienced hand can save you money immediately. Such as selecting a more appropriate programming language to implement the project in. Suggesting an appropriate library to use rather than rolling one's own. Avoiding multithreaded coding disasters. Having a backup system that works. Having a revision control system that works. Knowing that unit tests have an immediate payoff. Avoiding stupid metrics like paying people based on lines of code written. Avoiding rolling your own encryption. Don't transmit passwords in plaintext. Don't do illegal things. Don't leave yourself wide open to lawsuits. Use a real CPA to do your taxes. Keep proper business records so the IRS doesn't hang you. And on and on.
My CPA makes lots of money when some young entrepreneur comes in all terrified that the IRS is auditing him and he kept no records and didn't even file returns.
An experienced hand will tell you to never ignore and never mess with the tax man.
Check out what happened to Will Smith when he paid no attention to the IRS. All his proceeds from "Fresh Prince of Bel Air" went to the IRS as taxes, interest, and penalties.
Zoom (ZM) has a market cap of $40.77B (with a forward PE of 327.31...). Delta (DAL) has a market cap of $18.19B, with a forward PE of 3.91.
Like I said, market cap isn't everything, but I find that astonishing.
I don't mean anything with ratcheting, forward secrecy, replay protection, nonce reuse resistance, or any other bells and whistles, just basic competent symmetric encryption without gaping holes or ridiculous bizarre design choices?
It's not hard!
(1) Generate 12 bytes of random nonce using a good secure random source, prepend to message.
(2) Use nonce to initialize AES-GCM.
(3) Run it through AES-GCM, append tag.
That's not hard and it's secure enough for common use cases.
AES-GCM was not designed to be used with random nonces - otherwise the nonce space would need to be larger than 96 bits. It was only designed to be safe with a unique counter nonce. That does not always become a problem, but it will in Zoom's case.
Zoom is encrypting a video stream, which means you cannot use AES-GCM wholesale, but have to use it to encrypt chunks of data.
The problem is that 12 bytes (96 bits) of nonce is just not enough, and after encrypting a certain amount of data with the same key, the chance of repeating the nonce becomes rather high. And if you have a long video conference call with many participants using the same key, you'll sooner or later generate enough data that the nonce will be repeated. Once the nonce is repeated, GCM loses its security guarantees.
It's hard for me to estimate how bad it would be, since I'm not familiar enough with the plaintext data characteristics, but in this case it could even be worse than using ECB.
AES-GCM is a cryptographic primitive that is meant for sequential whole-message encryption. The moment you're using it for streaming you are rolling your own crypto. Even if you've used a cipher that supports larger nonces like XChaCha20-Poly1305, you can still not be sure that you're absolutely free from mistakes if you also want authentication, for instance.
Zoom was negligent for just going ahead with AES-ECB, but finding a solution for this problem is not that simple. When you need to stream encrypted data, encrypting it directly with a safe AEAD construction is not always going to make you safe. That's why it's generally safer to use TLS, even considering how historically problematic TLS is. Of course Zoom could not use TLS for streaming video, since this would preclude them from using UDP and allowing for packet loss. I've never encrypted TLS traffic and I'm not familiar enough with other protocols like dTLS and [Noise](https://noiseprotocol.org/) to know if they will be useful in this case, so I won't be making any claims to how easy this is.
There's also AES-GCM-SIV and its relatives which construct the nonce from a MAC of the plaintext and technically do not require a separate IV, though if you don't use one any duplicate message will be obvious.
Those are somewhat more complex but honestly even if you don't get those perfect it's almost definitely better than ECB.