Hacker News new | comments | ask | show | jobs | submit login
“Someone was typing in a URL and WhatsApp was fetching it off my server” (twitter.com)
174 points by sr2 on June 14, 2017 | hide | past | web | favorite | 65 comments

Hi HN, op here.

I posted this not because I was angry on having a GET request sent to my server on a char by char basis. My main concerns were privacy related, since I posted this some additional things came to light:

1) This leaks the IP address of the person writing the msg

2) When property="og:image" is used it also leaks the User Agent and Android version [1]

3) When presented with invalid headers as a reply it can cause a crash on IOS, which mean this is a potential RCE vector [2]

4) It leaks the exact time an URL is typed into a chat

5) It's on by default, this is the default behavior in E2E encrypted conversations [3]

I don't use WhatsApp, I found this out by accident as I just have a habit to tail my logs. I know though that Signal doesn't do any of this pre-fetching. I am aware this is a 'feature' but there's no place for it when security is involved.

[1] https://twitter.com/0xjomo/status/874585822158352384 [2] https://twitter.com/dr4ys3n/status/874725257722179584 [3] https://mastodon.social/@rysiek/9146943

These are all expected behaviors and are the correct decisions if the user expectation is that URLs generate preview cards.

If the connection was not made from the client (aka 'leaks the IP address') then it would need to be proxied (aka central point for 5eyes to monitor to get all WA client urls) or the servers would need to know the contents of the messages (aka break e2e and we are still back to a central monitoring point.)

Of course it will send user agent info, so that it can provide a better preview card if the site supports taking advantage of this info. If it only provides this when you explicitly try to send the info then it is doing what the user told it to do.

The header bug is interesting and if it is actually a WA problem you should report it.

Of course it leaks the exact time a URL is typed into chat. I can't even imagine what you are trying to say here since this is a point that is without a point. We have already established what it is trying to do and in that context this point makes no sense.

It is on by default and is the default behavior because this is what users expect. The secure features of WA are a bonus, but are not the raison d'etre here and when it comes down to it WA is a messaging app and it prioritizes usability when the feature is not an egregious security problem. In this case it is not a major security problem so usability and expectations win.

Did you all know that chrome does this too? May sound obvious but I always had assumed that nothing is sent until you press enter for some reason (yeah I know, search prediction would be impossible without that). But one day I was type in a path on a test URL and noticing my server getting hit on - every single letter.

Yes, Chrome does it as well, but I expect that as it's a web browser. I know the tradeoffs when browsing the web and expect that my requests will be visible across the Internet. That's just how web browsing works.

However, WhatsApp is selling a solution that is meant to provide privacy. When I write a URL in an SMS, my phone does not try to preemptively retrieve it to display a preview. WhatsApp may be encrypting the message, but it's identifying me to a server while I'm composing it, which is enough to completely destroy any level of privacy or anonymity that I would expect.

> WhatsApp is selling a solution that is meant to provide privacy.

Yep you are right. They even tout this in their Security Page.


> WhatsApp's end-to-end encryption ensures only you and the person you're communicating with can read what is sent, and nobody in between, not even WhatsApp. This is because your messages are secured with a lock, and only the recipient and you have the special key needed to unlock and read them.

I suppose that I'm not surprised. You can't audit WhatsApp's source code, and even if you could, you can't guarantee what you're putting on your device matches the source code. Yet another closed source, inherently untrustable system.

Even with open source you don't know what you're putting on your device matches the source code (unless you're part of the 0.00001% that would compile their own mobile apps).

Even if WhatsApp is not about privacy there seems to be a difference here.

Am I misunderstanding, or are these requests happening just because someone is typing a message that happens to contain some string that looks like a URL. At least with a browser, you are entering things that are supposed to ultimately result in HTTP requests.

Some browsers may send what you type in the address bar to a search engine - I know that Chrome does. So again, I wouldn't be surprised if this is done - that's what web browsers do. They don't make a claim of providing privacy and security.

Maybe not SMS but newer versions of iMessage will prefetch urls to generate a preview. I don't recall if it happens as you type (like WhatsApp) or after you hit send though.

Yes, I did know that. The difference is WhatsApp is selling encrypted-onboard privacy. Which is now, prima facie, a lie.

Facebook has literally told courts they can't decrypt WhatApp traffic. I wonder if that court case in Brazil is still pending.

By comparison, no one ever said "There's no way Google could read my search terms".

A more accurate comparison would be regarding Allo.

You've got me angrily reading the article while asking myself if WatsApp would be stupid enough to lose that mostly won case. (It's in our supreme court, I hope they will announce it illegal to widely block a communication channel and to coerce companies into releasing broken crypto.)

No, WatsApp didn't get informed of what URL you entered on the message. The site owner gets a notice, as does the user's ISP, and a lot of people in between. But WatsApp can not tell what URL was typed.

WhatsApp is probably the last entity that I would be worried about reading my message. WhatsApp is trying to sell you a promise where nobody, other than the recipient, can know the content of your messages. Having the owner of a site whose URL is included in a WhatsApp message in real time as it is compose breaks that promise - now the site owner and many more people (DNS server operators, both ISPs at the very least) have information about the content of that message.

Yeah but tbh, its still not e2e encrypted. It just means WhatsApp is ignorant.

So they are in the clear legally, but morally, its still dubious to do that given its effectively disclosing what is often a substantial portion of the conversation.

The message contents are e2e encrypted.

And, transmitting an URL usually has no use beyond accessing it. They are doing what the user expects, it's just lacking some communication and power-user tools to override the default behavior.

> And, transmitting an URL usually has no use beyond accessing it. They are doing what the user expects, it's just lacking some communication and power-user tools to override the default behavior.

Let us just say there are certain things that can cause legal complications merely accessing it and not reporting it is technically still a crime.

I'm not sure that's what a user would expect; if anything, I'd think users would expect the opposite, that internet requests potentially with identifying information are not being sent to third parties based on what they've typed into the box without hitting send.

The system is clearly passing message data, not metadata, back to the app servers in the clear. Full stop.

If my sleuthing is on point, the case in Brazil is still in progress, with the latest notation on 09/06/2017 (May 9th).


Oh, cool yeah just pointing it out!

Yeah this is a much bigger deal, for sure, as it's a messaging app - just pointing out similar case in a different application.

Yeah, that link prediction stuff is one of the things I turn off immediately after installing a browser.

Contrary to apparently most other commenters, I had no idea! Thanks!

In order to produce the link preview, probably. As far as why it's character by character, I don't know, but that doesn't seem very sinister to me. Checking URLs letter by letter is sloppy, especially if you're not even trying to do auto completion, but it doesn't reveal any more information than a complete url could. Anyway, I would think they are expecting people to paste URLs in, not type them.

I've written code to fetch sites and give a preview, for a bookmarking bookmarklet. This involves analyzing the html for title and to select best image to represent the page. That of course necessitates retrieving the page, either through the client or server.

It's not sinister so much as that Whatsapp is sending the requests directly from the user's phone (not through a proxy, etc) which is exposing the end user's IP address and full user agent string to the website hosting the page. This information could be used to identify the end user which goes against Whatsapp's whole "Privacy and Security is in our DNA" thing. (https://www.whatsapp.com/security/)

See this tweet for the info exposed: https://twitter.com/0xjomo/status/874585822158352384

Knowing that WhatsApp produces link preview cards, I frankly don't see anything here that I didn't already expect and assume was happening - these cards come from somewhere that is neither me nor my conversation partner. This can't be a surprise to anyone who was concerned about privacy.

If WA proxied the request, it would be WA snooping on the conversation, and that would be a way larger problem because they would accumulate that metadata for EVERYONE.

If you don't want cards or external requests in the conversation, you obfuscate the url with any of the myriad methods people use to get past anti-url filters in forums etc.

The alternative would be telling WhatsApp what everybody was typing, so they could proxy the requests.

They did the right thing here. It might be good to add a "do not show URL previews" option, but it's not currently broken.

I see, thanks. So, the lookup is done from the sender's client, not the recipient, correct?

I was actually looking at it from the perspective of it was being done on the server, rather than the client, and people were concerned about information collection about where they're linking to (or something).

This is a good heads up for people with special security needs using proxies to access HTTP. Most people are willing to visit URLs they send in messages from their IP addresses in a browser, or already have before they send the message, but I can see how it's worth knowing about, especially with the additional information disclosure.

Privacy. Facebook.

Pick one.

E: Disregard. Whatsapp is doing exactly what they should be doing. Telegram seems to proxy the requests.

Why is no one saying anything about end to end crypto?

Whatsapp shouldn't be able to see my messages, isn't that what they say themselves?

The 4th comment is:

doing a GET request over the internet is already violating e2e

not if it's over TLS/SSL...

When both e's in e2e are the parties of the conversation, and the TLS connection is too a third party, then yes it is a violation of their claim.

I'm going to have to disagree with that. The TLS connection is not part of the conversation... it has no ability to intercept messages, encrypted or otherwise.

It's merely just another e2e communication between the client and another party, albeit through another protocol.

But it reveals part of what was typed in the e2e chat to the third party... tls or not is irrelevant to that point.

True, I overlooked that. Thanks for taking the time to point that out :)

The app obviously can see the messages, this happens on the clients.

Yeah, just tested. I assumed Whatsapp would fetch serverside but no and thus expose client IPs.

Telegram seems to proxy the requests.

So Whatsapp wins with crypto here.

Telegram does not encrypt by default. Would be interesting to see, what's happening in encrypted chat there.

Wouldn't using a proxy also break the E2E? Whoever operates that proxy can see all the requests

i guess then they loose the preview feature..

This makes me think of another potential privacy risk: if you paste a URL in WhatsApp, or click Android's share button and select WhatsApp, it doesn't add a space after the url. Most users are probably aware that they have to add a space, but if they forget, WhatsApp will probably send the first word of the rest of the message to the server. (Similarly if you paste a URL at the start of an already-written message, but maybe that's even more contrived.)

Apparently several other messaging apps behave similarly, from the replies in that tweet there were mentions of Facebook Messenger[0] and Telegram[1].

[0]https://pbs.twimg.com/media/DCRsz7mXUAAEbKK.jpg [1]https://pbs.twimg.com/media/DCSyWs0XcAAQb2N.jpg

On the one hand it provides a greater user-experience if Whatsapp can figure out the URL and preview information about the posted URL (like any social network does today, even we do it at STOMT when you attach an URL to your feedback).

On the other hand i do not get why they send it after every character. Makes it even faster but creates a bunch of unnecessary requests. Not very user friendly. They could do it after they recognize a finished URL (as soon as there is a space). And as pointed out in the tweets it COULD harms the users privacy.

They're probably trying to prefetch the URLs so they're loaded by the time you want it, just not a nice way of doing it.

Yeah right.

Skype scans messages for URLs and downloads them. Microsoft claims is that they are checking for malware, still creepy.

Does anyone remember that time that they blocked any messages containing YouTube URLs in MSN Messenger?

One aspect is the lack of debounce, but also revealing the endusers ip and user agent. They could proxy external link requests via whatsapp servers without breaking end to end encryption. wonder what iMessage does ?!

> They could proxy external link requests via whatsapp servers without breaking end to end encryption.

What good is E2E if you are going to send the plaintext home anyway? Doing these requests on the device is stupid, but proxying them through FB servers would border on malicious.

Could you explain how that would work? Right now they seem to be doing it client side thus not ever seeing the URL themselves. If they were to have a proxy said proxy would need to know the URL to fetch. Is there some alternative solution?

I believe this Behavior is for information gathering about odata.

What does Signal do for link previews? Nothing at all?

Prefetching a webpage to generate it's preview should at least be optional, controlled through user settings.

Seems like they need debounce? Most JS utility libraries (lodash, etc) have a debounce function...

And even if you don't use those, denounce takes about 4 lines to write.

probably whatsapp web version? it adds some kind of description if you send an url: https://i.imgur.com/Rkl2cZJ.png

Did that change happen before of after the acquisition by Facebook?

Plain creepy. Also, does it produce a lot of traffic?

This is a stupid and crappy implementation of a helpful feature, but how is it creepy?


Preview sure, but why every character?

Because the preview is displayed as you type, not when you hit Send

But still... why every character? Wait a couple hundred ms after each keystroke to see if they are done.

Because mobile internet is slow and that'd be the difference between 300ms and 600ms.

Because instant > wait a couple hundred ms

simple really

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact