The Signal messenger is primarily focused on user privacy, and thus exposes almost no information about users through the contact discovery service. The only information available about registered users is their ability to receive voice and video calls. It is also possible to retrieve the encrypted profile picture of registered users through a separate API call,if they have set any. However, user name and avatar can only be decrypted if the user has consented to this explicitly for the user requesting the information and has exchanged at least one message with them.
So Signal comes out excellently from this, yet is mentioned in the title. However, the paper does find that Telegram reveals to the world, in real time, exactly how many Telegram users have a particular phone number in their address book...
Can we change the title from the (click baiting) university press release to one which more accurately reflects the content of the paper?
For Telegram, the researchers found that its contact discovery service exposes sensitive information even about owners of phone numbers who are not registered with the service.
For Signal, TFA makes it clear that correlation defeats Signal's privacy measures:
Interestingly, 40% of Signal users, which can be assumed to be more privacy concerned in general, are also using WhatsApp, and every other of those Signal users has a public profile picture on WhatsApp. Tracking such data over time enables attackers to build accurate behavior models. When the data is matched across social networks and public data sources, third parties can also build detailed profiles, for example to scam users.
More privacy-concerned messengers like Signal transfer only short cryptographic hash values of phone numbers or rely on trusted hardware.
However, the research team shows that with new and optimized attack strategies, the low entropy of phone numbers enables attackers to deduce corresponding phone numbers from cryptographic hashes within milliseconds.
It is hard to say how Signal can improve upon these attacks other than to not use phone numbers at all.
If Alice and Bob are in the same chat
Bob has Alice's number stored in their phone's contacts list
Bob refers to Alice in the chat (using @Alice)
Telegram will disclose to all the chat participants whatever name Bob has stored for Alice in their contacts (instead of the name Alice specified in their Telegram profile)
> there would be no phone number involved. Maybe not even a username involved! Nothing to add to an address book.
I know Moxie shows up on HN, maybe he could explain more? I'm very interested in this feature and I think HN would love to know more and if it helps solve the above issue (presumably it could).
I think it would be wonderful if you could use signal without a phone number.
I wonder if there is a technical reason they don’t implement this, as it sure seems like it would only have benefits for users privacy and security.
As I understand it, the challenge is to do it in a privacy-friendly way, since your contact list of phone numbers is on your phone, but this has to live on Signal's servers.
Signal uses phone numbers because it makes discovery easy. Threema, for example, can use phone numbers for discovery but does not require it. Discovery without phone numbers is easy. I see my contacts and scan their Threema QR codes. If I need to contact a friend of a friend, my friend gives me the FoaF's Threema ID.
Signal would need to store a second contact list if it was not using the phone contacts. And suddenly you need to backup this second contact list. If every app does that you can forget about the user backing up everything, they simply won't do it and the feature becomes useless. The solution would be for Signal to store it on their server, obviously encrypted. But then you have different privacy issues to take care of: how can you retrieve a user's contact without storing its identity. How do you hide the number of contact they have...
every chat application that i have stores its own contact list. in fact i don't even have any contacts in my general phone contact list, because i don't call or send sms to people. and i don't want any chat contacts in my phone contact list.
i have not tried signal yet, mainly because it is not available on f-droid. but if signal insists on storing its contacts in my general phone list then i won't be able to use it. and that's ignoring the problem with using phonenumbers.
there is no technical problem to store contacts locally. deltachat does that too. deltachat also provides a backup feature to export the local data including contacts and messages so you can restore them on another device.
there is no reason, signal couldn't do the same.
i don't know why this is so unusual. we are having this same argument every time signal's use of phone contacts is brought up. and every time the same claims are being made.
Every other app can see it if you click "Accept" on the per-application consent dialog ...
it's basically saying: i am going to take over your contact list, and if you don't want those contacts to be shared with other apps, then you can just block them.
what if another app wants to do the same?
e.g. If you had stored a plumber number 10 years ago, you'll receive a notification telling that the plumber is on Telegram now. Of course likewise, if you start using Telegram today everyone who has your contact and uses Telegram will receive the notification; be prepared for some awkward conversations with people whom you have forgotten.
•Telegram's latency seems to be low when compared to WhatsApp(Although part of which could be optimised code, data center proximity should account more and if so how a supposed renegade group of techies with no revenue afford better data center facilities than their Billion$ competitors?).
•Their feature update notifications seems to create a sense of consumer focussed entity when compared to the competitors.
•The bot API has made the platform extensible than others (Messenger restricted several features of their API after Privacy fiasco).
That's all, I don't buy the argument of Telegram USP as security and marketing it for one seems to be disingenuous at best and malicious at worst IMO.
If a service X knows the mapping between a user id and some useful info it can display (eg the name or photo) then whatever you do to get that user id, you can then display that useful info if it would be shown to any user of the service. Such as Facebook showing the profile pic and name (that’s why the real names policy is DUMB for privacy). So people resort to effectively usernames. This means you can id the user across sites and then later try to scrape info associated with that username across sites.
The solution is to remove all info, including usernames, unless the person has shared it with you (eg friended you and shared some info like a username with friends). Most of us on forums don’t give a crap who answering, just their reputation. For strangers, why have avatars or usernames at all? Why have anything?
Otherwise you will have to rate limit scrapers and stuff like that, playing a cat and mouse game against sybil accounts.
Telegram so far never had an independent audit of its crypto or maybe I'm wrong?
I'm not sure they made the compromises and decisions the way I would have preferred them, but their e2e secure messenger platform is way more ubiquitous than mine (which I never wrote), so in spite of that, I reckon they've done more to "make the world a better place" than I have...
(I do still get mad everytime Signal tells me "Some random or friend who's phone number you saved sometime in the last decade or so is now using Signal!" I'm 99% certain none of those people knew I was going to see that message when they installed/configured their "super private e2e encrypted messenger app!!!")
Whatsapp just has a plaintext metadata mapping of the global social graph and each user's social graph. Signal has every user upload their address book into a secure enclave so that they can at least somewhat plausibly resist a subpoena for a user's social graph. This does not stop a determined attacker from making a list of all phone numbers/usernames on the service and discovering who is using the service (IE the former being an individual's social graph, which is hidden, vs the graph of all users which is discoverable).
I don't think I've ever seen Signal say this, so this opinion is mine and not theirs, but I don't think Signal can actually protect who uses the service, only what they say on the service and who is in their social graph. A determined attacker, even if they didn't have this address book lookup tool, could correlate IP logs and learn a lot if they had an omniscient view of user traffic.
The core question is this: should e2ee systems have any user growth/discovery tools or not? On some level the real question is "does Signal need to grow at all?".
I think the answer is "yes" but that's not particularly grounded in any dogma other than people want to work on growing products.
In summary, I don't think Signal hides who uses the service, only what their users say on the service (and who is in whom's address book). In this way Signal conceals each user's individual social graph but not the total social graph of who is using the service.
Discovery I agree is a trade-off of security vs usability / service attractiveness.
I'm fairly certain you could create something very hard if you assume 2 users meet in person once to exchange long identifiers and keys via QR-code scans. Add in burner phones and/or public WIFI and tracking that stops being feasible and you'd rather follow the human.
However, that'd be very inconvenient and maybe impossible for many - and you'd be back at empty. And possible to target entirely.
I'm still VERY pissed off that Signal scanned my contacts without even asking me (I don't actually remember giving the app permission to do that). That was a big red flag for me.
I still get pissed when it does that to friends of mine (and less pissed, but still unhappy when it does it to co workers, ex colleagues, work clients, government employees, taxi drivers, pizza delivers, and all the other random numbers my phone has saved in it's contact list over the last decade or so...)
During WW2 there was tremendous innovation in the field of electronics and radio. Some way through the war, both sides began fitting relatively small radio transmitters to aircraft, which enables an equipped aircraft to actively transmit. So one obvious idea is to transmit "Hey I'm friendly" and then you know not to send up interceptors.
So there's a nice switch on your bomber aeroplane that activates this fancy new "I'm friendly" transmitter, you are trained to switch it on as you return to base, and the chap fitting it seems damn sure it's important to switch it off when leaving. Which is odd right? I mean, it prevents getting shot down, stands to reason you'd turn it on all the time. And so, despite the urging of those who understood how it works, leaving it switched on was indeed common practice, and commanders would defend their crews for doing this, arguing that the perceived safety of the "Don't shoot me down" transmitter allowed them to press home attacks in conditions where it might otherwise be prudent to withdraw.
Which is funny because of course the reason to switch off the transmitter is that it's a free homing beacon for enemy fighters and anti-aircraft weapons, so in choosing to do this they were actually significantly increasing their danger of death.
I don't suppose anybody knows if Android version of Signal back in the 2014-2016 period asked for contact permission (in a non Android mandated way)? Ie if post 2016 Android Signal app running on pre-marshmellow Android versions does?
I don't really trust Signal all that much, but my friends seem to. It's founded by "Moxy Marlinspike" which is a guy with a made-up name, who Twitter hired to fix their security issues - and it looks like they wasted their money on that, so I don't have the most confidence in "Moxy" to really keep my chats private.
I do think it's important for people using these kinds of services (and I'm one of them!) to understand their limitations, but I also kinda find this a bit self-evident, if you think about how contact discovery works. There's simply no way around it (unless you stop using phone numbers to exchange contacts). So in the sense that studies like these help educate non-technical users of the technical limitations of services, this is great!
However, to say they "threaten privacy"... That feels like a gross mischaracterization of what's going on here. Every social technology site, app, etc, has this problem, and it's something that could be, to an extent, mitigated for (detection of scanning attempts, rate limiting, etc). Meanwhile, these are the apps that are bringing E2EE to the masses. It feels like missing the forest for the trees.
Additionally for Signal users: It is possible to turn the notification feature off, but if you newly join Signal, every Signal user in your address book will be notified unless they have switched it off.
No. People who have your number in their address book will be notified.
«It should be possible for privacy-concerned users to provide another form of identifier (e.g. a username or email address, as is the standard for social networks) instead of their phone number. This increases the search space for an attacker and also improves resistance of hashes against reversal. Especially random or user-chosen identifiers with high entropy would offer better protection.»
Threema does this. By default users get an 8-character random identifier. Linking a phone number and/or e-mail address is optional. This way, users can choose their own balance between the usability of contact discovery and the privacy of random identifiers.
All the other techniques are mainly making it harder for attackers, but not impossible. If a user on a 5 year old phone should be able to sync an address book of 2000 contacts in reasonable time, then the calculation of hashes cannot be made all too computationally intensive (e.g. by using intentionally expensive derivation functions like scrypt or argon2). The asymmetry between the weak hardware of a consumer phone and the abundant computation power of a cluster is what makes fighting brute force attacks so difficult.
Granted, the proposed incremental contact discovery using leaky buckets is quite an interesting form of rate limiting. It also has a cost though, namely increased complexity, and thus an increased chance for bugs / malfunction (hurting the user experience) and vulnerabilities (hurting security).
Contact discovery is a difficult balancing act.
(One last comment: While private contact discovery is a difficult problem, securing profile information isn't. The fact that I can grab the public profile picture / information and online status of almost any WhatsApp or Telegram user is inexcusable. Giving the users control over access permissions is easy. Signal does this by encrypting the profile and sharing the key. Threema does this by sharing profile information only using end-to-end encrypted messages, without servers being involved for storage.)
This expands the search space, without actually solving the problem, I think. The problem exposed by the study shows that phone numbers have a small enough search space to be readily enumerable. Adding email addresses and/or usernames just means the same attacker would need to move to well understood JackTheRipper/Hashcat style dictionary attacks.
I think to thwart these types of attacks, every user identifier needs to be something very like a GUID (and a proper long one like 128 bits and a totally random one, not a hash of their phone number or email address).
You are right that e-mail space is also quite small (albeit not as small as phone numbers), especially considering that a huge part of users will have a @gmail.com address.
Other than that, Signal needs to become feature rich. There are many features people want that are just pushed aside. Unfortunately Signal is making the shift from only crypto/privacygeeks to mainstream. In crossing that crevasse Signal needs to consider a different set of opinions that previously it could safely ignore. I would leave Signal if it left the "privacy above all else" mentality, but the forums suggest a high group think about what is "a good feature" and what is "a dumb feature" (and how people are going to use it). If it is a highly requested feature, just add it to the list of things to add. You can't ignore it anymore.
(And can we just add a link to the third party sticker website? People seem to care about that and sticker discovery is needlessly difficult. I get asked this frequently and am constantly sending the sticker link. I'm sorry, but the default ones suck and I cannot understand any good reason it works this way)
Presumably the reason is that your own sticker sets should be private by default, which makes it more work to allow optional public sharing of them (which supposedly was not worth delaying the feature for). For example, I have a sticker set of weird pictures of a friend of mine that I like to use, but only with mutual acquaintances.
Btw, I would argue that stickers are the prime example of Signal trying to become feature rich. But of course, there's only so much you can do at the same time. (Though new features appeared to be released more often recently, presumably as a result of their funding infusion a while back.)
I agree with that and think it is a good example. I know they got a lot of flack for it, even seeing Rachel get downvoted quite a bit for saying this is what Signal needs.
As to the sticker discovery, I am more referring to a link in app that leads you somewhere like here. It is nice that if a friend sends a sticker to you that you can download the entire set. It is nice that you can make your own (which is presumably what you are referring to). But if we can download these stickers without some warning (presumably no danger) then this would fix the endless comments I hear of "Signal doesn't have good stickers like Facebook does." Just seems like extremely low hanging fruit to me.
Examples of features in both groups?
I've seen many people ask for a bidirectional delete like in WA. The response always comes back to the lines of "just don't make typos." When I've seen arguments akin to a company nuking a company phone the responses are "well that's dumb because someone can run a custom Signal and save all the data anyways (like that has anything to do with the threat model at hand or changes that this is a probabilistic method of security, but at least doesn't guarantee that the holder of a phone can read confidential messages).
I should note that a "compromise" was created where Signal is introducing the feature but there is a time limit. It is not clear to me why this time limit is there other than because some people think it will "trick" people into thinking such things like screenshots exist (I guarantee you every 13 year old boy with TikToc knows how to screenshot or record). Basically the argument is that this feature will "trick" people into thinking that the message no longer can possibly exist anywhere (I don't think many would actually believe this, but yes there are dumb people. Security is never dumb proof).
Another example is the sticker thing. Just make them discoverable.
Of course if you have access to the telephone network real time localization service you could do correlation analysis that way.
 Allegedly "LEO Access Only" but operated by people who think $50 is a lot of money.
But I still don't want the entire world knowing that he's in my contact list.
As you already pointed out there's a gigantic downside which is your privacy. I always keep hearing the occasional person that says that they are not on Facebook (for privacy reasons) but uses WhatsApp. I later congratulated them for signing up to Facebook and they also allowed WhatApp to upload their entire address book to find or search for anyone's number on Facebook who is not on WhatsApp!
Wire, Signal, and Telegram do the same thing but are just as bad for privacy and are disqualified.
(Admittedly, if that's your threat model, I hope you have enough magic amulet's in the submarine you now live in...)
Does Signal get any benefit out of that hashing at all? Why do they bother?
In 1975 other users would have cared because that's sixteen digits to painstakingly memorise or copy down somewhere, but that problem went away. Very few people today even notice because who needs telephone numbers?
And it's not true that any finite domain is tractable. The IPv6 address space is large enough, and thus sparse enough that it's basically pointless to try to connect to random addresses. If you pick random 32-bit IP addresses and connect to TCP port 22 a lot of them will answer. Some of them might have a bug you know to exploit. Maybe you can get one thousand answers per hour and one in every ten thousand is vulnerable to your attack, you are now successful twice every day. Whereas if you try this with IPv6 you'll die of old age before you connect successfully let alone find a vulnerable server.
looks very promising.
How much would you bet against there being someone in jail right now convicted on nothing more than having bought a sim/phone with a recycled number that'd previously been used by someone dealing/buying drugs?
I don't get the relationship between numbers in my phone and the probability I stood in line at a starbucks whith someone else who was infected.
Now that we are though, if I were a contact tracer, I’d totally ask to see your text messages - there’s a reasonable chance that while not everybody you texted is someone you came into contact with, there’s also probably a fairly high correlation the other way - if you had met up with someone you quite likely messaged or called them to arrange if. If it was my job to help you remember all the people you’d spent time with in the last 2-3 weeks, I’d definitely like to go through your messages and call logs to remind you about anyone you might’ve forgotten.