
Large-scale Abuse of Contact Discovery in Mobile Messengers [pdf] - sizzle
https://encrypto.de/papers/HWSDS21.pdf
======
galadran
The associated paper [1] summarises the information revealed by Signal
succinctly:

 _The Signal messenger is primarily focused on user privacy, and thus exposes
almost no information about users through the contact discovery service. The
only information available about registered users is their ability to receive
voice and video calls. It is also possible to retrieve the encrypted profile
picture of registered users through a separate API call,if they have set any.
However, user name and avatar can only be decrypted if the user has consented
to this explicitly for the user requesting the information and has exchanged
at least one message with them._

So Signal comes out excellently from this, yet is mentioned in the title.
However, the paper does find that Telegram reveals to the world, in real time,
exactly how many Telegram users have a particular phone number in their
address book...

Can we change the title from the (click baiting) university press release to
one which more accurately reflects the content of the paper?

[1]
[https://encrypto.de/papers/HWSDS21.pdf](https://encrypto.de/papers/HWSDS21.pdf)

~~~
ignoramous
From TFA, here's the damning telegram bit:

 _For Telegram, the researchers found that its contact discovery service
exposes sensitive information even about owners of phone numbers who are not
registered with the service._

For Signal, TFA makes it clear that correlation defeats Signal's privacy
measures:

 _Interestingly, 40% of Signal users, which can be assumed to be more privacy
concerned in general, are also using WhatsApp, and every other of those Signal
users has a public profile picture on WhatsApp. Tracking such data over time
enables attackers to build accurate behavior models. When the data is matched
across social networks and public data sources, third parties can also build
detailed profiles, for example to scam users._

...

 _More privacy-concerned messengers like Signal transfer only short
cryptographic hash values of phone numbers or rely on trusted hardware.

However, the research team shows that with new and optimized attack
strategies, the low entropy of phone numbers enables attackers to deduce
corresponding phone numbers from cryptographic hashes within milliseconds._

It is hard to say how Signal can improve upon these attacks other than to not
use phone numbers at all.

~~~
forgotmypw17
Here is a really fucked up Telegram mis-feature I discovered recently:

If Alice and Bob are in the same chat

and

Bob has Alice's number stored in their phone's contacts list

and

Bob refers to Alice in the chat (using @Alice)

then

Telegram will disclose to all the chat participants whatever name Bob has
stored for Alice in their contacts (instead of the name Alice specified in
their Telegram profile)

~~~
kayodelycaon
This is why I got a cheap burner sim.

~~~
wtn
This is why I block mobile messenger apps from accessing my contacts.

~~~
rideontime
That doesn't help you if you're Alice.

------
josh2600
This is basically a question of "should e2ee services allow users to auto-
discover/discover each other or not?"

Whatsapp just has a plaintext metadata mapping of the global social graph and
each user's social graph. Signal has every user upload their address book into
a secure enclave so that they can at least somewhat plausibly resist a
subpoena for a user's social graph. This does not stop a determined attacker
from making a list of all phone numbers/usernames on the service and
discovering who is using the service (IE the former being an individual's
social graph, which is hidden, vs the graph of all users which is
discoverable).

I don't think I've ever seen Signal say this, so this opinion is mine and not
theirs, but I don't think Signal can actually protect who uses the service,
only what they say on the service and who is in their social graph. A
determined attacker, even if they didn't have this address book lookup tool,
could correlate IP logs and learn a lot if they had an omniscient view of user
traffic.

The core question is this: should e2ee systems have any user growth/discovery
tools or not? On some level the real question is "does Signal need to grow at
all?".

I think the answer is "yes" but that's not particularly grounded in any dogma
other than people want to work on growing products.

In summary, I don't think Signal hides who uses the service, only what their
users say on the service (and who is in whom's address book). In this way
Signal conceals each user's individual social graph but not the total social
graph of who is using the service.

~~~
leptons
>The core question is this: should e2ee systems have any user growth/discovery
tools or not? On some level the real question is "does Signal need to grow at
all?".

I'm still VERY pissed off that Signal scanned my contacts without even asking
me (I don't actually remember giving the app permission to do that). That was
a big red flag for me.

~~~
josh2600
I don't think it's possible for a non-apple/google app to access your address
book on iOS or Android without you explicitly giving permission. Please
correct me if I am wrong.

~~~
billiam
If we're seriously arguing in HN over whether something as fundamental as this
is going on , imagine the confusion among potential users of Signal.

~~~
tialaramex
The fact users can become confused of something doesn't affect whether it's a
fact or not. An insistence upon truths that contradict the facts works just
fine _except_ for the practical implications which are stubbornly unchanged.

During WW2 there was tremendous innovation in the field of electronics and
radio. Some way through the war, both sides began fitting relatively small
radio transmitters to aircraft, which enables an equipped aircraft to actively
transmit. So one obvious idea is to transmit "Hey I'm friendly" and then you
know not to send up interceptors.

So there's a nice switch on your bomber aeroplane that activates this fancy
new "I'm friendly" transmitter, you are trained to switch it on as you return
to base, and the chap fitting it seems damn sure it's important to switch it
_off_ when leaving. Which is odd right? I mean, it prevents getting shot down,
stands to reason you'd turn it on all the time. And so, despite the urging of
those who understood how it works, leaving it switched on was indeed common
practice, and commanders would defend their crews for doing this, arguing that
the perceived safety of the "Don't shoot me down" transmitter allowed them to
press home attacks in conditions where it might otherwise be prudent to
withdraw.

Which is funny because of course the reason to switch off the transmitter is
that it's a free homing beacon for enemy fighters and anti-aircraft weapons,
so in choosing to do this they were actually significantly increasing their
danger of death.

------
nbadg
I think it's important to put this into context. They're stating that a
malicious user could crawl public info of other users, thereby building (over
time) a behavioral model of those users. The theory that you could protect
users from that by hashing phone numbers and using the hash for contact
discovery, turns out not to be accurate, because there are few enough phone
numbers in existence that you can just brute force the hash.

I do think it's important for people using these kinds of services (and I'm
one of them!) to understand their limitations, but I also kinda find this a
bit self-evident, if you think about how contact discovery works. There's
simply no way around it (unless you stop using phone numbers to exchange
contacts). So in the sense that studies like these help educate non-technical
users of the technical limitations of services, this is great!

However, to say they "threaten privacy"... That feels like a gross
mischaracterization of what's going on here. Every social technology site,
app, etc, has this problem, and it's something that could be, to an extent,
mitigated for (detection of scanning attempts, rate limiting, etc). Meanwhile,
these are the apps that are bringing E2EE to the masses. It feels like missing
the forest for the trees.

~~~
thimkerbell
What companies are offering to do behavioral modeling as a service?

------
dbrgn
From the section "mitigation techniques":

«It should be possible for privacy-concerned users to provide another form of
identifier (e.g. a username or email address, as is the standard for social
networks) instead of their phone number. This increases the search space for
an attacker and also improves resistance of hashes against reversal.
Especially random or user-chosen identifiers with high entropy would offer
better protection.»

Threema does this. By default users get an 8-character random identifier.
Linking a phone number and/or e-mail address is optional. This way, users can
choose their own balance between the usability of contact discovery and the
privacy of random identifiers.

All the other techniques are mainly making it harder for attackers, but not
impossible. If a user on a 5 year old phone should be able to sync an address
book of 2000 contacts in reasonable time, then the calculation of hashes
cannot be made all too computationally intensive (e.g. by using intentionally
expensive derivation functions like scrypt or argon2). The asymmetry between
the weak hardware of a consumer phone and the abundant computation power of a
cluster is what makes fighting brute force attacks so difficult.

Granted, the proposed incremental contact discovery using leaky buckets is
quite an interesting form of rate limiting. It also has a cost though, namely
increased complexity, and thus an increased chance for bugs / malfunction
(hurting the user experience) and vulnerabilities (hurting security).

Contact discovery is a difficult balancing act.

(One last comment: While private contact discovery is a difficult problem,
securing profile information isn't. The fact that I can grab the public
profile picture / information and online status of almost any WhatsApp or
Telegram user is inexcusable. Giving the users control over access permissions
is easy. Signal does this by encrypting the profile and sharing the key.
Threema does this by sharing profile information only using end-to-end
encrypted messages, without servers being involved for storage.)

~~~
bigiain
> It should be possible for privacy-concerned users to provide another form of
> identifier (e.g. a username or email address, as is the standard for social
> networks) instead of their phone number.

This expands the search space, without actually solving the problem, I think.
The problem exposed by the study shows that phone numbers have a small enough
search space to be readily enumerable. Adding email addresses and/or usernames
just means the same attacker would need to move to well understood
JackTheRipper/Hashcat style dictionary attacks.

I think to thwart these types of attacks, every user identifier needs to be
something very like a GUID (and a proper long one like 128 bits and a totally
random one, not a hash of their phone number or email address).

~~~
dbrgn
Yep, my point was that if the phone/email thing is entirely optional, then
users can choose not to link any non-random identifier at all.

You are right that e-mail space is also quite small (albeit not as small as
phone numbers), especially considering that a huge part of users will have a
@gmail.com address.

~~~
bigiain
If you attack email addresses as a dictionary attack instead of brute forcing
the entire possible email address space, they’re actually a smaller search
space. The paper claims 53 trillion possible (global) phone numbers, and 700
billion mobile numbers. Haveibeenpwned has just recently passed 10 billion
email addresses- Troy probably doesn’t have _every_ valid email in existence
(yet), but I’d be surprised if it was as low as 1 in 70 (or 1 in 5,300). At
least cutting phone numbers down to just a specific country works better in
eliminating in needed searches than for email (all those @gmail.com
@outlook.com and to a lesser extent @company.com addresses aren’t country
specific).

------
godelski
I'm one of those WA + Signal users. There's only a few barriers (besides
momentum) now that prevent me from turning friends to Signal (some of them are
dumb, few are big). But by far the number one feature is groups. We're at a
time where people are concerned about privacy, this should be utilized.

Other than that, Signal needs to become feature rich. There are many features
people want that are just pushed aside. Unfortunately Signal is making the
shift from only crypto/privacygeeks to mainstream. In crossing that crevasse
Signal needs to consider a different set of opinions that previously it could
safely ignore. I would leave Signal if it left the "privacy above all else"
mentality, but the forums suggest a high group think about what is "a good
feature" and what is "a dumb feature" (and how people are going to use it). If
it is a highly requested feature, just add it to the list of things to add.
You can't ignore it anymore.

(And can we just add a link to the third party sticker website? People seem to
care about that and sticker discovery is needlessly difficult. I get asked
this frequently and am constantly sending the sticker link. I'm sorry, but the
default ones suck and I cannot understand any good reason it works this way)

~~~
Vinnl
> (And can we just add a link to the third party sticker website? People seem
> to care about that and sticker discovery is needlessly difficult. I get
> asked this frequently and am constantly sending the sticker link. I'm sorry,
> but the default ones suck and I cannot understand any good reason it works
> this way)

Presumably the reason is that your own sticker sets should be private by
default, which makes it more work to allow optional public sharing of them
(which supposedly was not worth delaying the feature for). For example, I have
a sticker set of weird pictures of a friend of mine that I like to use, but
only with mutual acquaintances.

Btw, I would argue that stickers are the prime example of Signal trying to
become feature rich. But of course, there's only so much you can do at the
same time. (Though new features appeared to be released more often recently,
presumably as a result of their funding infusion a while back.)

~~~
godelski
> I would argue that stickers are the prime example of Signal trying to become
> feature rich.

I agree with that and think it is a good example. I know they got a lot of
flack for it, even seeing Rachel get downvoted quite a bit for saying this is
what Signal needs.

As to the sticker discovery, I am more referring to a link in app that leads
you somewhere like here[0]. It is nice that if a friend sends a sticker to you
that you can download the entire set. It is nice that you can make your own
(which is presumably what you are referring to). But if we can download these
stickers without some warning (presumably no danger) then this would fix the
endless comments I hear of "Signal doesn't have good stickers like Facebook
does." Just seems like extremely low hanging fruit to me.

[0] [https://signalstickers.com/](https://signalstickers.com/)

------
sails
Signal has been discussing related issue on discovery [1,2] for a while.
However this research shows this issue in a new light, and I'd be happy to see
a better resolution.

[1] [https://github.com/signalapp/Signal-
Android/issues/6750](https://github.com/signalapp/Signal-Android/issues/6750)

[2] [https://community.signalusers.org/t/why-are-users-who-
have-m...](https://community.signalusers.org/t/why-are-users-who-have-my-
phone-number-in-their-contacts-notified-that-ive-joined-signal/1320)

Additionally for Signal users: It is possible to turn the notification feature
off, but if you newly join Signal, every Signal user in your address book will
be notified unless they have switched it off.

~~~
haffenloher
> if you newly join Signal, every Signal user in your address book will be
> notified

No. People who have your number in _their_ address book will be notified.

------
ChuckMcM
Fascinating paper. It seems that if you were to have a number of VOIP phone
numbers (say a four or five different Google Voice numbers) and used different
numbers for different services, it would defeat the correlation attack but you
would also need to scramble other personal data (avatar Etc.) in order to
prevent that from being used for correlation.

Of course if you have access to the telephone network real time localization
service[1] you could do correlation analysis that way.

[1] Allegedly "LEO Access Only" but operated by people who think $50 is a lot
of money.

------
huslage
Phone numbers are not private. The identifiers we use to connect to one
another are not private. By design.

~~~
samatman
Yes, that's true, and Joe Drug Dealer only gave me his number because we share
an interest in exotic tropical fish... I swear.

But I still don't want the entire world knowing that he's in my contact list.

------
mcraiha
I assume some users would like to have more privacy. e.g. ask permission
before you can see any of my info, message me and add me to your list.

------
inputmice
As someone who has thought a good deal about contact discovery the mitigation
techniques section is actually pretty interesting. Quicksy.im, an XMPP client
but based on phone numbers and with built in contact discovery, I developed ~2
years ago, already does very strict rate limiting, but the paper mentions some
other techniques as well that I should probably look at.

~~~
bigiain
Like with websites and password managers, rate limiting works fine when going
via the expected auth service. Doesn't help at all when NSA/MSS/Mossad have
popped the contact hash database off Whispersystem's backend.

(Admittedly, if that's your threat model, I hope you have enough magic
amulet's in the submarine you now live in...)

------
netmonk
Lets go back to IRC...An Android client would to the trick

------
upofadown
Hashing something with a limited domain like phone numbers is useless...

Does Signal get any benefit out of that hashing at all? Why do they bother?

~~~
tialaramex
For one thing hashing means Signal doesn't care that my telephone number is
sixteen digits while yours probably isn't. All the hashes are the same size.

In 1975 _other users_ would have cared because that's sixteen digits to
painstakingly memorise or copy down somewhere, but that problem went away.
Very few people today even _notice_ because who needs telephone numbers?

And it's not true that _any_ finite domain is tractable. The IPv6 address
space is large enough, and thus sparse enough that it's basically pointless to
try to connect to random addresses. If you pick random 32-bit IP addresses and
connect to TCP port 22 a _lot_ of them will answer. Some of them might have a
bug you know to exploit. Maybe you can get one thousand answers per hour and
one in every ten thousand is vulnerable to your attack, you are now successful
twice every day. Whereas if you try this with IPv6 you'll die of old age
before you connect successfully let alone find a vulnerable server.

------
ThePowerOfFuet
I wish more people knew about Threema.

~~~
senectus1
giving that a look atm.

looks very promising.

------
satisfaction
Why does who I text have anything to do with who I was close to in person?

~~~
bigiain
Would you be happy if that's your only argument against a prosecutors case
against you?

How much would you bet against there being someone in jail right now convicted
on nothing more than having bought a sim/phone with a recycled number that'd
previously been used by someone dealing/buying drugs?

~~~
satisfaction
No I would not be happy. What I was thinking was: How does who I text help
with in person contact tracing. If I text my High School buddy in california
how does that help determine who I came in contact with if I were later
diagnosed with Covid.

I don't get the relationship between numbers in my phone and the probability I
stood in line at a starbucks whith someone else who was infected.

~~~
bigiain
Right. I didn’t think we’d been talking about contact tracing. The linked
article was talking about “contact discovery” in the context of contact list
scraping by messaging apps (like WhatsApp/Telegram/Signal), not about COVID
exposure contact tracing...

Now that we are though, if I were a contact tracer, I’d totally ask to see
your text messages - there’s a reasonable chance that while not everybody you
texted is someone you came into contact with, there’s also probably a fairly
high correlation the other way - if you had met up with someone you quite
likely messaged or called them to arrange if. If it was my job to help you
remember all the people you’d spent time with in the last 2-3 weeks, I’d
definitely like to go through your messages and call logs to remind you about
anyone you might’ve forgotten.

------
awinter-py
and YO

