I came across this twitter thread during the last Facebook outage. Apparently something very wrong is happening there with their backups. I would definitely check similar cases. Confirmed by multiple people :/
I quote the author:
This is really weird. In #WhatsApp, I started to see messages that I know 100% that I deleted 2 days ago?! WTF is happening there? I think this is a really big violation of privacy! I see the messages from a month ago, with my disappearing messages setting turned on?! Gosh
I don't think services should be allowed to mislead their users in this way. If 'delete message' does not delete the message then it should not be labelled as such.
The option should be labelled as 'Hide Message*' together with a link to an explanation of the feature & its limitations.
OT: I saw a similar thing (deleted messages returning) in Messages on MacOS a couple weeks ago, shortly after I upgraded to Monterey. A whole bunch of deleted messages from 2017 through 2019 returned. Nothing from earlier or later returned.
I mentioned this on Reddit and someone replied that they saw it too with hundreds of deleted messages going back to 2015 returning.
There are actually three different things: replays, reloading a message, and delayed messages. Replays are impossible in the signal protocol, so that’s not what happened. Delayed messages is part of signal: you can receive message2 before message1. Reloaded message is probably what happened, it doesnt work at the signal level since “deleting a message” is not something signal specifies.
e2e encryption means the party in the middle does not have the key to the data. It is somewhat of a misnomer since it is a feature of key-agreement more than a feature of encryption.
Any other features are dependent on the protocol that uses the secret key. You will generally see an encryption method that is protected against cipher-text manipulation, but e2e does not guarantee that. Similarly, a protocol that uses e2e encryption can add replay protections, but it is not at all a feature inherent in e2e.
I could well imagine that whatsapp has some replay protection build in. I could similarly imagine they have a way to override that in case they need to. Heck, perhaps the replay protection is implemented with WhatsApp as the ultimate arbiter of what counts as a replay. As long as WhatsApp does not know the key used to encrypt my messages, the encryption is e2e in my book.
I completely forgot about this issue :) I'm glad someone was interested in looking at how those backups work. Perhaps my complaints also contributed to this investigation :)))
These look like messages being re-sent from the service to the client.
This is not surprising - when you ask someone else to route messages for you, even encrypted messages, you are giving them the (encrpyted) payload and asking them to route it for you.
If you have a large network with billions of users, it's reasonable that some of the users' phones may be offline some of the time.
Should the service just drop messages on the floor when that happens, or buffer them in some queue (recall, they're E2EE) that gets emptied every so often?
Now assume all your infra has a hiccup (outage) and goes offline, and then comes online again.
Probably the retry logic didn't synch correctly and attempted to retransmit encrypted messages that had already been delivered.
I'm not sure if that explains why deleted messages from months ago are being resurrected. That would imply that there is a persistence framework that has multi-month readback capability.
The oldest message from the twitter screenshot looks ~8 days old.
In the second tweet the user says "3 chats before the outage and now 15+ or more chats which I deleted before the week or two."
Two weeks (and in screenshots, only 8 days shown) does not seem surprising. Especially given the increasing rate of internet shutdowns across the globe [1].
E2EE is too important to play fast and loose with.
[1] "In 2020, Access Now and the #KeepItOn coalition documented at least 155 internet shutdowns in 29 countries." (https://www.accessnow.org/keepiton/)
The standard fix to this is to give each message a GUID, and then when you delete a message, instead of deleting it, your store a tombstone with the GUID, and that prevents a network re-send from causing the message to reappear.
Yes, it's a concept used e.g. in hash tables that use open addressing [0], where you can't delete value X because the final address of value Y may depend on whether X was present when it was being inserted. So instead of deleting X and leaving behind an empty address we change X to a tombstone that stays there forever and says "something was here and if you're looking for Y, Z or anything else, keep on looking".
That this audit was published helps build faith that WhatsApp is trying to do the right thing but it sounds like they still have some kinks to work out.
This is not a trivial problem and I don't want to downplay their effort and transparency. Seeing their struggle and audits is a good sign - they are doing the right things.
For the work I do, I also conducted a mini-experiment where I checked the WhatsApp security in regards to the media transmitted.
In the cited documentation below ([0]), it claims that media will not be saved for either method 1 or 2 ("To stop media from all your individual chats and groups from being saved" and "To stop media from a particular individual chat or group from being saved").
I found out that the media is in fact written to storage, straight to the "Private" directory ("Internal Storage/WhatsApp/Media/WhatsApp Images/Private"). Said directory includes a ".nomedia" file which inhibits the Gallery app from your phone to see such media. So if you were to copy the "Media" directory into your PC, such "unsaved media" would come with it.
Also, when sending View Once media ([1]), it can be easily screenshot on your smartphone, or if you're using WhatsApp Web, use the DevTools while image is open, so you can grab it in their transmitted resolution.
Want to grab a status from a contact? Go to "Internal Storage/WhatsApp/Media/.Statuses" (enable view hidden directories in your Files Explorer application) and copy/move the file elsewhere, so it goes out of reach when WhatsApp tries to delete it.
Not to spill poisonous words around here, but I highly doubt they take security seriously.
View once media is always susceptible to this problem. I can always put a capture card between my screen and the device or use a vm - or just take a picture using a second device.
Likewise touching file storage makes sense, the media files have to sit somewhere. Any vaguely modern android phone has full disk encryption, so it's only apps with global filesystem access that present a threat.
Let’s assume we don’t want to make memory constrained devices keep large video files buffered in memory and constantly re-download them over the network?
For the View Once media, have it buffered in memory and be done with it. Delete from the server once it's inaccessible (or keep it a few days it need be... for reporting reasons).
For the non-private media, just write it to storage straight away.
Still hit the issue with large media that doesn't fit in memory and your gain is still minor (resist a compromised filesystem.. maybe.. if they didn't compromise the binary too)
If use use AES-CTR you can decrypt any arbitrary 16-byte block of data from the file as long as you know its offset from the beginning. You don't need to know any other data of the file. As long as you don't need to insert or shift parts of the file without re-encrypting, this works fine. And you have to be OK with the file being padded at the end to a multiple of 16 bytes long.
Every time you want to read some data off flash, you read it then immediately decrypt it. Any time you want to write data to flash, you decrypt it first. Checking chapter markers or frame counts will be the same as before as long as you decrypt those bytes immediately after reading them from flash.
If you have your phone rooted, go to the "/data/data/com.whatsapp" using a terminal, and copy the messages database to the Internal Storage (wherever that is, I don't have any rooted device right now with me).
Then extract that file to your computer and use any SQLite3 viewer. It's not encrypted.
You'll find messages since the very first install of WhatsApp, even if you deleted these.
I agree, I see it currently uses the "unprivileged storage" (the user-space storage called "Internal Storage").
"My Files" ("com.sec.android.app.myfiles" on Samsung Android) is an app and certainly has access to those files.
If that app has access to that directory, certainly many other apps do have access, such as Samsung Gallery ("com.sec.android.gallery3d" on Samsung Android). Of course, in the case of the Gallery app it's just used as an image viewer, but that is not an inhibition for another app to exfiltrate these directories into a server.
What if some malicious app synced WhatsApp directories into a 3rd party server? I don't see it too hard.
I’m actually quite impressed that WA has allowed this report to be released publicly. They did not have any obligation to do that, but it sure makes them look more trustworthy.
That’s exactly why maybe they would do this? They’re self aware of their reputation, try to go above and beyond to counter it, and then chose to publish a damning report against themselves in that effort.
Note, I hate Facebook, but there’s nothing worse than when someone/something tries to get better in good faith, and we shit on their efforts simply because their starting position is weak (aka fat shaming the new person at the gym).
I would generally agree with you, but my faith toward Facebook has greatly diminished over the years and it would take a lot to restore it. (aka the fat person is eating a pound of bacon every five steps on the treadmill)
Because the number of people who can actually validate the security of the open source options is vanishingly small and assessments like this provide sufficient evidence that WhatsApp's claims are not bunk.
> and assessments like this provide sufficient evidence that WhatsApp's claims are not bunk.
Actually, no that's not sufficient evidence. That's only sufficient evidence for those interactions between the auditors and whatsapp's servers, it doesn't say anything about your interaction with whatsapp's servers. And that's the heart of the problem: using code shipped to a browser for e2e encryption is a flawed model, it would be trivial to target you with a version that has broken e2e encryption but subtle enough that you would never notice.
Sure it's not sufficient to formally prove security of the system e2e, but i'd guess it's sufficient for the average user. As for shipping a client with broken encryption... Aren't people using Signal (like the person I originally responded to suggested) at risk of the same thing? Nobody's verifying checksums on every update except for people who aren't using WhatsApp for their comms anyway.
Most likely because their contacts don't want to switch. I've migrated extended family off WhatsApp to Signal but it was a nontrivial effort since their contacts don't want to use anything but WhatsApp.
TBF that's not really all they have to do. They also have to use that new application, which means (further) diving their communications stream. And it means an additional onboarding step when they set up a new device.
I understand these aren't huge obstacles, but I am generally reluctant to add additional communication channels for these reasons - they aren't trivialities to everyone.
Email is much easier because of forwarding (or polling, in case an old service doesn’t have forwarding). I’m changing my main email every ten years or so and have few degenerate addresses like apple or yandex accounts which are dead but still attached to few orgs.
Messengers usually don’t have forwarding or a sensible polling interface. It would be nice to have all-in-one app even with limited functionality, since I’m mostly using text and images. Or use one of real full-featured messengers with gateways to others.
I think Signal and Matrix are more trustworthy and auditable.
However I also think more people are on WhatsApp than those two (and possibly more) combined and people want to just use it probably for that reason.
I saw some of my contacts sign up to Signal firstly after the privacy fiasco then again in smaller quantity after the major outage. I deleted WhatsApp a while ago but I decided to re-install it again recently or basically risk losing contact with some old friends. In fairness I did try and convert people to Signal; I managed to convince a couple - so not all was lost.
I tried signal among my close-y circle and we had no issue with nobody else there. But it didn’t stick because of how it looks and works. When I read reports like yours it feels like you don’t use all of the features, but what repelled us was not that inertia thing.
Sadly no, I should have written that down (a good habit I’ve only got later). But I have vague memories that some of the issues were very basic, like replying or images or chat handling, and that we mostly expressed the same concerns. It’s not unusable, but it just didn’t feel right as a whole.
It should also be noted that we are not privacy-critical, just like these “numb people” I believe, who choose over little details rather than a big picture. Our background at the time was whatsapp, telegram and viber.
> Why on earth do people trust a closed source messenger owned by Facebook, which backs up to Google?
Those are some of the few companies that are large enough to oppose governments.
> Signal and Matrix are open source
The main advantage of which is to enable audits like this, which WhatsApp is doing.
Of course you can't actually build WhatsApp from the audited source or pin it to the audited version... but you can't do that with Signal either. Not to mention that you're stuck with closed-source Google Play Services (or closed-source iOS) anyway.
> Those are some of the few companies that are large enough to oppose government
They're more likely to cooperate with governments because they have so much to lose. All the big guys are caving in to China for example because they don't want to lose that sweet 1+ billion consumer market. Yes even Google. Check maps.google.cn and see the border around the South China Sea.
And also their interests are much more aligned with the governments, being entities similar in size and controlled by huge shareholder interest groups.
Oh and finally most of them don't even pretend to oppose government interests. Even Apple.
The best thing about Matrix is that both the software and the network is open and decentralised. This is why I prefer it over Signal (which even frowns on third party clients)
Are you saying it’s not getting enough credit or that it shouldn’t? The Signal code was closed source for longer than a year (from April 2020) when no commits were done in the public repo because <we don’t have any official statement>. Some months ago the public repo got a barrage of commits after that long gap. It wasn’t that the Signal platform and client had no updates during this time. There were many, but the code wasn’t released.
Signal may be open source at times, but that alone is not a reliable factor for the platform/company to be considered as trustworthy.
Signal is not entitled to the value associated with being open source. Signal is merely pretending to be open, when in reality they are more community and user hostile than plenty of closed source or proprietary projects. Calling Signal open, is an insult to anyone actually building or supporting open source projects or protocols.
This is a false claim. All of Signal's software is released under open source licenses. The fact that you disagree with how they run their project and service is irrelevant to the fact that both the client and server are free software.
It's not a false claim. I stated an opinion. It's not even that controversial, it's a very common complaint about signal. The primary value in open source is it gives control over the code to the user. Signal releases their code, but actively prevents it's user from actually getting any benefit from it. A messenger that exists but you can't use to communicate with anyone has no value over a messenger that doesn't exist. A signal client that you can modify but then can't use to communicate with anyone else has the same issue. The protocol should be considered 'open source'. But for an encryption system, that's the bare minimum so it's not that impressive.
> The Signal code was closed source for longer than a year (from April 2020)
This absolutely untrue. Signals source has NEVER been closed source. The Signal server source code(which isn't special and doesn't change that often) just had no public commits. The Signal client source code(what matters and makes Signal secure) was frequently updated.
Well seeing as you don't have access to the ME, no amount of application code being published will mean you can see the source running on the servers. This is a red herring.
The client source code is the only thing that's really relevant in an e2e encryption model, anyway. Regardless, 100% of the production versions of the Signal server software have been published under free software licenses, so I'm not sure what you're arguing.
This is not true, You contain just as much if not more value from building the graph of people who communicate with each other then knowing the contents of their communication.
But I think what they're trying to say is because signal prevents any user from being able to use the signal app with servers they the user control. You're stuck with trusting the people running the servers because they say they won't do anything wrong. The whole reason you say the client is what matters is because it's something the user doesn't need to trust somebody else won't do something wrong. if I can build my own client and validate myself The security doesn't depend on blindly trusting somebody else because they say it's safe to do so.
A well designed encrypted protocol doesn't depend on blind trust in some service. The main signal app requires blind trust in the servers they control.
Currently you can get the metadata graph and contents of conversations for most messenger usage today.
Now with signal and other E2EE messengers, you can just get the metadata graph, maybe. Not using the standard set of servers makes you stand out in a different way metadata wise, and more vulnerable, because you don't have as much labor available to secure your personal network, which is what your hinting at. It's partly why tor is a public network, because they want more noise in metadata analysis, and why you want to use VPN providers, so it's not just "you" that is aggregating your traffic.
The 3rd era will do both in a usable way, but usable ones don't really exist yet. All you have is research messengers.
One step at a time. Perfect is the enemy of good, or something better.
> Currently you can get the metadata graph and contents of conversations for most messenger usage today.
Phrased differently; Other messengers don't protect privacy so it's acceptable for this one claiming security to break privacy too.
> Now with signal and other E2EE messengers, you can just get the metadata graph, maybe. Not using the standard set of servers makes you stand out in a different way metadata wise, and more vulnerable, because you don't have as much labor available to secure your personal network, which is what your hinting at.
No, that's not what I'm hinting at. I'm complaining signal pretends it's primary focus is privacy any security, but fails at some of the most basic designs! If someone is targeting me specifically they can own my server, and I'm screwed. But using my server; if they own Signal, they don't get me for free. The inverse is correct as well, if they're targeting me, and they own me. They might not put forth the effort to own Signal. And if your opposition includes people who can server a sealed warrant, hacking into signal might not even be needed.
> It's partly why tor is a public network, because they want more noise in metadata analysis, and why you want to use VPN providers, so it's not just "you" that is aggregating your traffic.
What?
> The 3rd era will do both in a usable way, but usable ones don't really exist yet. All you have is research messengers.
> One step at a time. Perfect is the enemy of good, or something better.
No, that's not true about security. No security is better than half-assed security. Especially if you don't know it's half-assed.
You can audit a browser based solution until the cows come home but unless you are willing to audit the code that was sent to you on each and every interaction you will never have any real security. I don't think browsers are a platform for secure communications based on code sent down the wire at run time, it is much too easy to replace that code by other code.
Okay, my remarks on the product (the e2e encrypted backups):
> Weak 512 bits RSA key signing key
That's unprofessional at best, negligent at worse. I understand that it's quite a bit more calculations at more key lengths, but still, 512 is simply looking for trouble.
> It enforces a maximum number of user login attempts, in order to prevent brute-force of a user PIN/passphrase; after
ten unsuccessful attempts to log into an account, the account is locked and the backup data is irremediably lost.
That's essentially a denial of service waiting to happen. So if another actor can access my account (e.g. has access to the telephone operator), they can destroy my backup. I would understand a delay between retries (maybe a week ?), but just throwing out the backup is just bad.
Clearly the key thing here is the security but I having used a backup to switch to a new phone recently, I was amazed at what an appallingly bad user experience it was - it got stuck multiple times (no fault of my own and no way to tell it to stop trying to hopelessly continue) and then you have to wait longer and longer each time to repeat it.
There is no guarantee that once audit is completed the actual running system is e2ee. You just have to trust WA. Given who runs it and the track record, there is no way I could have trust in them.
I would love if the encryption (either real time or backups) in Whatsapp was actually 'legit' - I mean, Facebook arguably bought it to kill competition
I can tell you first hand, anytime a company pays a third party to do a security assessment, the result is purely what the company wants you to see. Independent does not mean that it wasn't influenced, just that it wasn't done by the company itself.
This is false. The security company has its own reputation to mind, and its people their own conscience. (There may be cases like you're saying, but "anytime" and "purely" is completely wrong.)
This is first-hand as well. But I'm not the one making a universal claim.
I think what they are trying to say is that nothing would prevent the company ordering the audit from providing the firm conducting the audit with another branch of the source code.
And this isn't just NCC Group; it's NCC Crypto, a particularly hardcore practice inside of NCC. Marie-Sarah Lacharité's name is on this, and she's not messing around.
You may not think so, but I know of equally reputable firms that have overlooked things as blatant as software that allows third parties to pipe scripts without any notification to the user. Look at the recent NPM discussion for example, and that isn't even the one I am referencing. So, yes.. they may be a "legit" outfit but that doesn't prevent them from ignoring obvious security issues.
There is no proof what they tested was actually what's in people's phones. It's most likely a separate "cleaned-up" build/codebase for looks. And chances are against users.
As a pentester at a security company doing assessments for customers, I can say that this is definitely false.
We value our independence highly. It is what ultimately brings in business. It would be very bad business if one our customers gets hacked, when it was an easy vulnerability for us to find.
This is the same for the NCC group here. If in a few weeks the WhatsApp e2e encryption on backups was cracked, they would look like fools. And that is not good for business.
It isn't merely being hacked.. if for some reason data gets exposed, it is easy to redefine the exposure point as a third party issue. For example, lets say an app allows you to install a plugin. However, the plugin API lets a third party run anything they want. I've seen firsthand how auditors will determine that it isn't the fault of the company they are auditing, irregardless that that the company provides a plugin API that allows for easy exploits because their software isn't technically the one exploiting the user.
There are firms that do what you say--they write a report that says how totally wonderful it was to work with you and how wonderful it is that you quickly fixed all the informationals they found, and what a genius you were to work with them. And they happily make these reports public.
Reputable firms will write savagely brutal reports when warranted.
1. Backups are opt-in - just as they have always been.
2. The E2EE backups do not rely on HSM's - they rely on a client-side only key derived by the WhatsApp client, on the user's phone.
3. The client-side key backup does not rely solely on HSM's - naturally, the client-side key must be backed up in case the user loses their phone. This key is itself encrypted and stored remotely (whether this is on third-party cloud or on WA servers is unclear from the report). However, decrypting it requires a user passphrase, known only to the user.
4. The design uses HSM's additively, not as the only support - via an OPAQUE exchange the user can combine their passphrase with a per-user secret stored in the HSM to derive, client-side, the key that unwraps the backup key. OPAQUE ensures WA cannot learn the user key material required to derive the key that unwraps the backup key.
This is all on page 6 of the published NCC report.
Additionally, you can also elect to store the raw key yourself (in the form of a 64-digit number). In which case the HSM thing doesn't apply. The caveat is that they can't help you recover it, but in my opinion that's a feature, not a bug. Consider the mud puddle test.
Of course we still have to take their word from it that the app doesn't secretly store this key somewhere. But I suppose this audit will validate that. I have to do a deep dive into it. The problem remains of course that this app can be modified at any time through the update mechanism.
Last time I looked at it, WhatsApp backup key was simply stored server side. Also, backup encryption key never changes, basically. I tell you this because I needed to extract an old backup that I did on Android years ago to recover some messages: well it was as simple as extracting the key from another phone where I was signed into (need root privileges, but of course you can just access the account from an emulator, insert the SMS code, recover the key and sign in again on the main phone), then the backup is easily decrypted. And no passphrase needed (and even if it is, how it would be difficult to brute force? Considering that users use the same password everywhere...)
This for local backups, but I assume that the encryption schema is the same for a backup on Google Drive (just the file that would be stored locally is uploaded into Google Drive in a non user accessible location).
By the way I don't care that much of backup secrecy, in fact I use mainly Telegram even if everything is on the server clear text. WhatsApp tries to give users a false sense of security in my opinion.
I believe you are mistaken, and the NCC group analysis makes it quite clear that the locally-stored backup key is not the same as the "export key": the export key is the encrypted version of the local backup key. WhatsApp servers only have access to the export key, not the local backup key. The fact that you could extract the local backup key with physical access to your device and root privileges does not mean that the key is stored remotely in plaintext.
The HSM is a server-side HSM. I believe it helps prevent brute-forcing weak passwords/PINs by non-WhatsApp attackers, in case non-WhatsApp attackers gain access to the encrypted backup keys.
I was surprised to read that OPAQUE. uses/generates deterministic asymmetric keypairs based on a secret seed. I'd posit the HSM stores this seed so that it can use various derivations to verify whether a given key asserted by a client was generated by that seed. (https://www.ietf.org/id/draft-irtf-cfrg-opaque-07.html)
I have only used key derivation in symmetric protocols, so tbh I don't know how you do deterministic asymmetric key generation, or even which primitive uses it.
It can effectively be the same. Consider ECDH as key agreement, passed into a KDF such as a hashing algorithm, potentially with additional input, then using that value as the private key — the security assumptions then become the Square Computational Diffie-Hellman and whatever assumption(s) are in the hashing algorithm, the former is proven to be equivalent difficulty to the general CDH assumption.
I quote the author:
This is really weird. In #WhatsApp, I started to see messages that I know 100% that I deleted 2 days ago?! WTF is happening there? I think this is a really big violation of privacy! I see the messages from a month ago, with my disappearing messages setting turned on?! Gosh
https://twitter.com/pytlicek/status/1445072626729242637?s=21