Hacker News new | past | comments | ask | show | jobs | submit login
Collecting huge amounts of data with WhatsApp (lorankloeze.nl)
290 points by jboynyc on May 12, 2017 | hide | past | web | favorite | 59 comments

If you think there is no problem, you are wrong. The blog post does not show all the information leaks that this implies. Example: I can modify the script to monitor all the numbers I've in my phone, so that based on the online/offline status in a few weeks I can be able to guess who is having conversations together, discovering cheatings, work affairs, ...

EDIT: Practical example. After collecting enough data about user X I create a table about the probability of this user being online in a given few-minutes time ranges. Then I check the online frequency of that user compared to the online statuses of another user Y. If the difference compared to the expected probability is significant, than I can suspect the two are chatting.

Another thing I can use is that attivation delay of the online status, since often X sends a message to Y and this results in, a few seconds after, Y to be online, and then the contrary.

Well, maybe. I don't want to discount this concept entirely because some info is being "leaked" if that is the right word, but...

Let's say one of your contacts chats a lot because it's a chatty person. They're online far more than another person. What if that other person only chats on the bus on the way to and from work at roughly the same time every day to tell their wife they're on the way home. This activity will overlap with the chatty person's activity all the time.

By your rationale, they are having a conversation, maybe cheating, and maybe having a work affair.

I think the more contacts a user that are active, the higher probability that your model predicts they are having a "conversation" with another user. You'll probably find that your thresholds are really hard to fine-tune: maybe we say A chats with B if abs(A.activeTime - B.activeTime) < threshold, but that threshold is going to be super hard to find* and even harder to validate.

Sure, there is some information here (the picture probably being the most concretely weird) but the fact that you can just go to the App and check a box for privacy means that this seems like not a huge issue.

Yes, WhatsApp made the software, but its your responsibility to apply your own privacy settings.

If you check the model I described in my comment, it should filter the "bus problem", since it will detect a chat only if, compared to the standard "bus time" probability of the user A chatting, it is chatting more if in the same range also B is chatting. If you add to this that people on Whatsapp usually do not talk to the exact minutes, it is definitely possible to create a robust system for guessing with good probability of two have often conversations. Also note that the phone numbers in input are not random, are the ones of a connected circle of persons. Add to this the fact that we can split the ranges even, potentially, by few minutes, and you can even detect interesting stuff for people having continuos chats with multiple persons like teenagers. Another thing that is possible probably is also "groups detection", since at new messages a set of users will activate at the same time.

I would think having a week or a month's worth of data is enough to get rid of such people who are "background noise", and get an accurate enough "who's talking to who" mapping. (Something like, "A and B's presence are within a minute of each other for 80% of the time, even during 'off-peak' times, for example early in the morning.").

We are worried about the NSA collecting metadata, but you dismiss this as an end user problem. Famously this is why Facebook has its settings to be opt-out instead of opt-in, because a high percentage of users never changes the default.

> Yes, WhatsApp made the software, but its your responsibility to apply your own privacy settings.

I do get Caveat Emptor but a lot of people do not understand the meaningful implications of privacy.

But you are voluntarily giving that info away if you use WA and don't change your privacy settings. The 'last online' feature is quite important IMO, and if you're concerned about this then disable it.

Edit: On the other hand it would perhaps be better to have it on 'Contacts only' as the default, but you could still monitor your colleagues as they probably have you in their contacts anyway.

Whatsapp has no setting to disable the "online" flag when i have opened the app, and a bunch of things mentioned can be built on top of just this.

I think this shows how some seemingly trivial data points on an individual level can allow one to build something way more than the sum of parts at a mass surveillance level.

I believe you're looking for Settings > Account > Privacy > Status > Only Share with...

And select no contacts on that screen. This is on the Android app, I'm not familiar with the web version.

I don't think that disables the "online" flag. As far as I'm aware, it's not currently possible to disable that one.

I agree but I don't believe the average user realizes that it is possible to do that with the above data. And indeed, to do that manually is not practical, but if the info becomes scriptable, it's another matter.

Scriptable and available for anybody. That my contacts in WhatsApp can correlate my activities with my status, this is part of the purpose of WhatsApp, that any stranger having my mobile phone from my business contact can do it, this is another matter.

Edit: Discovered that one can share only with "My Contacts" these information. I am surprised I haven't seen this before or maybe the if you do not share, you cannot see info box misguided me into not restricting to my contacts these informations.

But that's literally the only correlation you can make: "these people are chatting".

If you think two people talking with each other outside of "state secrets type shenanigans" is a big deal, you're wrong.

There are much easier, much more accurate ways to discover "cheating" than online times for WhatsApp...

Just checked if I was affected and happily I found I'd set every possible setting to private or disabled.

I figure five eyes already had this information. I think they would have tools to decrypt all communications from any app, by using rooted phones that scan their own memory for common crypto libraries and then extract the keys.

On the initial run it would not know where to look, and the phone would be set up to go through a proxy that blocks all non-decryptable communications, to avoid detection. A profile would be extracted to quickly and silently extract the keys from the phone's memory and subsequently send them to the decrypting proxy.

Then on the second run, the phone would be wiped/reset and the decrypting/blocking proxy would attempt to decrypt the communications that are now extracted from the phone in real-time. The wipe functions to avoid detection (it makes it look like the phone is simply crashing). Perhaps the wipe would include changing some device ID's and the source IP.

Rinse, repeat until only decryptable signals leave the phone.

(Something similar could be done with stubbing the encryption code in memory and then "moving" it to the proxy.)

Either based on virtualization tools or on memory inspection. Or perhaps ring -1 based.

The kind of tool I wish every techy had, so they could easily discover what their apps are really doing.

I've seen footage, I stay noided, I've seen footage, I stay-

Edit: If you know of similar or related tooling, please let me know! I want this software.

If there was a way of tricking the app to think it is always online, the real status could be obfuscated. WhatsApp probably uses some Android and iOS API to know when it's open, tapping into that could confuse the app and make the online status pretty useless.

LineageOS (previously CyanogenMod) had a Privacy Blocker or something like that, which you could block specific apps access to major APIs like Media Access, Phone ID Access, etc. It's been a while since I last used that, don't know if it still exists, but it sure helped my paranoia. It was fun seeing apps trying to access all sorts of stuff on my phone just to see them being denied access.

Nice Death Grips reference btw.

I think that rather than deny access, it feeded the app with randomly generated data.

You are affected: There's no way to disable the "online" status display when the app is opened in WhatsApp.

Oh, that's a very misleading by Facebook.

I have "Status privacy" and "Last seen" set to no one, I assumed that included "If I am online right now".

This has been discussed many times before on HN. Posts range back from 2+ years ago.

Seems to me that Whatsapp should be able to rate-limit these requests and work to secure the interface so only the legit website can actually pull the info?

The initial website display comes from a QR code you can on your phone, which the website then gets authorized by. Could they not then limit queries to that account?

I could be way off the mark, but it seems to me like the worst of this could be mitigated quite easily without much loss in functionality for users?

WhatsApp does monitor their network and API calls being performed. This way they ban spammers and fake clients going on the WhatsApp network. I suspect they use some form of Machine learning to filter out the non-users and when you're identified: you're getting banned. This means that you can no longer use your phone number and need a new SIM. So this story isn't feasable for "huge amounts of data". You might be able to download a few 100k of profile picutures but you will get banned.

> Some of the information that’s being sent back include the following:


> All of the information sent back is the following:

Why would you post information in a status update if you don't want it to be public? Why would you use a picture you want to keep a secret?

I think the concern is that the information is now associated with your phone number. Personally I don't care, but who cares can just enable correct privacy setting.

I don't get why people make such a big problem about this.

First of all, the privacy issues from WhatsApp have been discussed many times before. Yes, the default option "public" is bad, and just like making your profile picture on facebook "public" means anyone can scrape it. The fact that its a mobile app doesn't make it different from a website.

Secondly; people in the comments talk about the fact that you cannot control your precense in WhatsApp, and yes that is indeed a serious privacy problem which has been discussed before, but this article mentions nothing about that.

Third; WhatsApp monitors their network for non-user clients (to prevent spam and non-official clients). You may be able to request profile pictures of 500 people, but what about 1 million? Iterating over such a large set will likely cause a ban of your WhatsApp account, which means you need to spent another 10 bucks on a SIM card which will make it unfeasable to exploit.

Source: 4 years of experience with chat-bots on the WhatsApp network. I got a lot of SIM cards banned from WhatsApp by experimenting how far I could go. Not only sending messages, but also scraping.

> Iterating over such a large set will likely cause a ban of your WhatsApp account, which means you need to spent another 10 bucks on a SIM card which will make it unfeasable to exploit.

Such limitations can be bypassed. For example one can use botnet of hacked Android phones or buy thousands of SIM cards in bulk or maybe even some virtual phone numbers.

Do you have contact info somewhere? I'd like to chat. (Or you can reach me at <username> at gmail.)

what is your findings for limitations?

When you give your phone number to LinkedIn you're not surprised either if people will find your job title, photo and name based on your phone number right?

I'd think most people are pretty aware of how public the Whatsapp info is.

If I ask the semi- and non-technical people within my rech, I find that the majority is shocked by the described possibility.

Sadly not shocked enough to even change the privacy settings. Or - beware - deleting WA.

Why do people think "public" does not mean the same as "public" on a website? Essentially there is nothing different here..

Being able to see this information for everyone in your contact list without having to add them is one of the main features of whatsapp

What was the service called again?

Ah the HN hug of DOS... obligatory link to the cached version


It's working for me now.

What's more. If you see their profile pic, you can image search google to check if that picture matches some social networks. Now you have phone->name identification.

Wow. Combine this with a reverse image lookup to try to match numbers with celebrities and other noteworthy people and there are a lot of opportunities for a lot of bad things.

I wonder if Trump uses Whatsapp on a personal phone?

This is nothing new but well explained and nice script.

I'm totally shocked... maybe I will make a story or picture on my public Instagram about that issue ;)

It clearly says so in the WhatsApp settings what people can and cannot see. This is the modern day equivalent of cold calling a range of home numbers and see who answers. If you don't want that information to be public, disable it. Simple as that.

This functionality is pretty much what made WhatsApp so easily accessible for anyone in the first place.

If you get cold-called, you will catch on that somebody is checking your status (home/not home) or tries to elicit information from you – because you are directly involved. With this API, you'll never notice that somebody tracks you. Most non-tech people will never check the settings tab and even fewer will check the privacy settings. They are not aware they are sharing their info publicly.

You are directly involved by installing the Whatsapp application and agreeing to the ToS, which clearly states:

>We collect information about your online and status message changes on our Services, such as whether you are online (your “online status”), when you last used our Services (your “last seen status”), and when you last updated your status message.


>Your phone number, profile name and photo, online status and status message, last seen status, and receipts may be available to anyone who uses our Services, although you can configure your Services settings to manage certain information available to other users.

"B-but people don't read those!" - well then maybe that's something to worry about instead of complaining about an API which is the nature of the product.

You can't disable the "online" status display when you're active in WhatsApp. Their settings page doesn't indicate this. So it's still possible to profile your contacts even with everything set to "maximum privacy".

Yes, but this article does not discuss that. It's only about the scraping of information.

This could be extended slightly to get usernames, I believe.

When you're added to a group with a person in it who's not in your contacts, their messages have the name linked to their account in the corner.

The push name is not a username and if you don't want to 'expose' it, leave the group without sending messages. They won't see your name.

The username is visible in the group info screen. Also, if anybody types '@' it will show a list of group members and their usernames.

Not if you've never sent a message to the group or the specific user who's looking.

My point is that there's probably an API call to get it.

Or is it sent by the client itself?

They can see your name under "Group Info".

>Please, use it wisely!

Made me laugh.

This reminds of the Citibank incident a few years back. Not technically a hack but can land you in trouble.

I did almost the same, its insane how much data you can collect on a person.

Man, my public GitHub profile ties my email address to my photograph. And that stuff is also in each of my public commits.

With modern Android's anti-spam features, I can even dodge spam calls to my phone, so this is not a problem for me.

What is the point of your comment? That it isn't a problem for you so therefore it shouldn't be a problem for the remaining billion of WhatsApp users?

I went back and forth on this, but I think it's quite clear what the point of the comment is so I'm not going to explain.

> With modern Android's anti-spam features, I can even dodge spam calls to my phone, so this is not a problem for me.

With those modern anti-spam features, you leak all call details to those providers. Read up their ToS.

I have t mobile. Sometimes I get calls labeled "scam likely".


I'd like to know more about the anti spam features. Right now, my anti spam tactic is if an incoming call doesn't start Google voice saying "to accept press one", it is likely something I don't care for.

I'm all right with this. I also use Google Voice, so giving call details to the people I'm getting my calls through is not something I would be worried about. Even if I weren't, I trust Google.

This has been done before.

By who and when?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact