I posted this not because I was angry on having a GET request sent to my server on a char by char basis. My main concerns were privacy related, since I posted this some additional things came to light:
1) This leaks the IP address of the person writing the msg
2) When property="og:image" is used it also leaks the User Agent and Android version 
3) When presented with invalid headers as a reply it can cause a crash on IOS, which mean this is a potential RCE vector 
4) It leaks the exact time an URL is typed into a chat
5) It's on by default, this is the default behavior in E2E encrypted conversations 
I don't use WhatsApp, I found this out by accident as I just have a habit to tail my logs. I know though that Signal doesn't
do any of this pre-fetching. I am aware this is a 'feature' but there's no place for it when security is involved.
If the connection was not made from the client (aka 'leaks the IP address') then it would need to be proxied (aka central point for 5eyes to monitor to get all WA client urls) or the servers would need to know the contents of the messages (aka break e2e and we are still back to a central monitoring point.)
Of course it will send user agent info, so that it can provide a better preview card if the site supports taking advantage of this info. If it only provides this when you explicitly try to send the info then it is doing what the user told it to do.
The header bug is interesting and if it is actually a WA problem you should report it.
Of course it leaks the exact time a URL is typed into chat. I can't even imagine what you are trying to say here since this is a point that is without a point. We have already established what it is trying to do and in that context this point makes no sense.
It is on by default and is the default behavior because this is what users expect. The secure features of WA are a bonus, but are not the raison d'etre here and when it comes down to it WA is a messaging app and it prioritizes usability when the feature is not an egregious security problem. In this case it is not a major security problem so usability and expectations win.
However, WhatsApp is selling a solution that is meant to provide privacy. When I write a URL in an SMS, my phone does not try to preemptively retrieve it to display a preview. WhatsApp may be encrypting the message, but it's identifying me to a server while I'm composing it, which is enough to completely destroy any level of privacy or anonymity that I would expect.
Yep you are right. They even tout this in their Security Page.
> WhatsApp's end-to-end encryption ensures only you and the person you're communicating with can read what is sent, and nobody in between, not even WhatsApp. This is because your messages are secured with a lock, and only the recipient and you have the special key needed to unlock and read them.
Am I misunderstanding, or are these requests happening just because someone is typing a message that happens to contain some string that looks like a URL. At least with a browser, you are entering things that are supposed to ultimately result in HTTP requests.
Facebook has literally told courts they can't decrypt WhatApp traffic. I wonder if that court case in Brazil is still pending.
By comparison, no one ever said "There's no way Google could read my search terms".
A more accurate comparison would be regarding Allo.
No, WatsApp didn't get informed of what URL you entered on the message. The site owner gets a notice, as does the user's ISP, and a lot of people in between. But WatsApp can not tell what URL was typed.
So they are in the clear legally, but morally, its still dubious to do that given its effectively disclosing what is often a substantial portion of the conversation.
And, transmitting an URL usually has no use beyond accessing it. They are doing what the user expects, it's just lacking some communication and power-user tools to override the default behavior.
Let us just say there are certain things that can cause legal complications merely accessing it and not reporting it is technically still a crime.
Yeah this is a much bigger deal, for sure, as it's a messaging app - just pointing out similar case in a different application.
I've written code to fetch sites and give a preview, for a bookmarking bookmarklet. This involves analyzing the html for title and to select best image to represent the page. That of course necessitates retrieving the page, either through the client or server.
See this tweet for the info exposed: https://twitter.com/0xjomo/status/874585822158352384
If WA proxied the request, it would be WA snooping on the conversation, and that would be a way larger problem because they would accumulate that metadata for EVERYONE.
If you don't want cards or external requests in the conversation, you obfuscate the url with any of the myriad methods people use to get past anti-url filters in forums etc.
They did the right thing here. It might be good to add a "do not show URL previews" option, but it's not currently broken.
I was actually looking at it from the perspective of it was being done on the server, rather than the client, and people were concerned about information collection about where they're linking to (or something).
This is a good heads up for people with special security needs using proxies to access HTTP. Most people are willing to visit URLs they send in messages from their IP addresses in a browser, or already have before they send the message, but I can see how it's worth knowing about, especially with the additional information disclosure.
Why is no one saying anything about end to end crypto?
Whatsapp shouldn't be able to see my messages, isn't that what they say themselves?
doing a GET request over the internet is already violating e2e
It's merely just another e2e communication between the client and another party, albeit through another protocol.
Telegram seems to proxy the requests.
So Whatsapp wins with crypto here.
On the other hand i do not get why they send it after every character. Makes it even faster but creates a bunch of unnecessary requests. Not very user friendly. They could do it after they recognize a finished URL (as soon as there is a space). And as pointed out in the tweets it COULD harms the users privacy.
What good is E2E if you are going to send the plaintext home anyway? Doing these requests on the device is stupid, but proxying them through FB servers would border on malicious.