
Fake WhatsApp update from “WhatsApp Inc.” with Unicode whitespace: 1M downloads - jakub_g
https://twitter.com/virqdroid/status/926437790140772362
======
repiret
Several years ago I read [1] a proposal for internationalized domain names
from DJ Bernstein from before puny-code took hold. The key observation was
that there's nothing stopping you from just using UTF-8 in the existing DNS
protocols, but it included a discussion of how to treat visually
indistinguishable unicode characters to prevent fraud, which is why I bring it
up now:

The proposal was that the TLD administrators should whitelist in non-ASCII
characters and generally require that domains are either entirely ASCII or
entirely in a subset of Unicode that made sense for their native languages -
.ru could allow all-ASCII or all-Cyrillic, .gr could require all-ASCII or all-
Greek, .de could allow ASCII plus eszett and the umlauts, and could further
require normalized encoding (ü must be FC and not CC 88 75) and consider ö.de
and oe.de to be collisions [2], and so on. Weird varieties of spaces, dashes,
non-printing characters, accents that are only needed to type Klingon, and so
on would never get whitelisted in.

I've always thought that was a great idea, and its a general principal App
stores could use too. (Although I realize that app stores don't have as strong
a concept of a native language as most TLDs do, which makes it a bit harder)

[1]: Its possible it was
[https://cr.yp.to/djbdns/idn.html](https://cr.yp.to/djbdns/idn.html), but I'm
not convinced. Maybe it was an earlier revision.

[2]: In German, you spell "ö" as "oe" if you don't have an ö key. German
speakers wouldn't necessarily need "ö" and "o" to be collisions.

~~~
frobozz
This is an awful idea that conflates country and language and adds another
place where governments can marginalise minorities. Should administrators of
north African tlds be empowered to forbid Tifinagh characters?

Which character sets should be permitted in the .US tld?

The general principal of all ascii or all something else is not bad though. It
would prevent certain homograph spoofs.

~~~
vostok
I'm not sure it would help that much. For example pop.ru and рор.ru look
identical to my eyes in my current font.

~~~
frobozz
Yes it's not bad but not great either. It would only help in certain
situations (hello vs Ꮒello). It also becomes tricky if you permit Turkish, as
you would have to allow mixing of ascii and the dotted vs dotless i.

There are also good reasons to mix language in a name. US state abbreviations
might be used to distinguish (e.g.) a diaspora community in Texas from their
equivalent in Alaska.

~~~
rspeer
Dotted and dotless i is a different issue, related to capitalization. Those
are still Latin letters. Of course to support most European languages you have
to be able to use ASCII and non-ASCII Latin letters together. That's not
unique to Turkish.

Think of the Spanish word "cañon", for example. It's not four English letters
and one Spanish letter, it's just five Latin letters (four of which are in
ASCII).

~~~
frobozz
There is a difference between ı and i in both cases. I don't know what you
mean by 'related to capitalisation'

The reason I raise Turkish specifically is that the similarity between the
characters presents a potential homograph for a phishing attack in non-Turkish
domains. (e.g. mıcrosoft.com). Characters with apparent diacritics are less
vulnerable (e.g. öracle.com).

------
shpx
Here are three fake uBlock Origins that have over 4 million users between them

[https://chrome.google.com/webstore/detail/ublock-
plus/kjagjn...](https://chrome.google.com/webstore/detail/ublock-
plus/kjagjnchnnlgiafjjlahaedeagnmhefi?hl=en-US)

[https://chrome.google.com/webstore/detail/ublock-adblock-
plu...](https://chrome.google.com/webstore/detail/ublock-adblock-
plus/fdecnmmdccnkogcidionikojplkjfgie?hl=en-US)

[https://chrome.google.com/webstore/detail/ublock-
adblocker-p...](https://chrome.google.com/webstore/detail/ublock-adblocker-
plus/pnhflmgomffaphmnbcogleagmloijbkd?hl=en-US)

The last two are exploiting the fact that uBlock Origin doesn't come up when
you search "adblock".

There's tons more, just look through the search results for "adblock"
[https://chrome.google.com/webstore/search/adblock?hl=en-
US&_...](https://chrome.google.com/webstore/search/adblock?hl=en-
US&_category=extensions) and results for "ublock"
[https://chrome.google.com/webstore/search/ublock?hl=en-
US&_c...](https://chrome.google.com/webstore/search/ublock?hl=en-
US&_category=extensions)

Note that firefox doesn't have this problem (tons of adblockers, maybe some
are fake, but none pretending to be uBlock Origin)
[https://addons.mozilla.org/en-
US/firefox/search/?platform=ma...](https://addons.mozilla.org/en-
US/firefox/search/?platform=mac&q=ublock) maybe has something to do with the
fact that they show usage numbers on the results page.

~~~
Sylos
> Note that firefox doesn't have this problem [...] maybe has something to do
> with the fact that they show usage numbers on the results page.

Mozilla does a manual code review of newly submitted or updated extensions.
So, an actual human being sits down and looks at the code. They'll notice when
a fake uBlock Origin is submitted.

With that, they also enforce a rule which Google does not have, that any
connection to the internet which is not necessary for the add-on to function
(ads, telemetry) have to be opt-in.

This isn't perfect protection, for example the extension Web Of Trust required
sending browsing data back home in order to function, which they then sold in
anonymized form, which was proven to be deanonymizable last year. But it does
take out the incentive to spread fake versions in a lot of cases, as you just
can't publish an ad-ridden or trojan uBlock Origin clone.

~~~
comboy
> With that, they also enforce a rule which Google does not have, that any
> connection to the internet which is not necessary for the add-on to function
> (ads, telemetry) have to be opt-in.

This sounds pretty cool and reasonable. But extensions still can modify the
currently displayed website, right? Doesn't that make it trivial to submit
data somewhere? E.g. <img> tag with GET params, as the most basic form of
this.

~~~
Sylos
It does, but it also makes it trivial for Mozilla to notice that this is
happening and then they can weed that extension out before it gets published.

------
hawski
Google. A search giant. A machine learning leader. They can save me from a
typo in the web search, but can't in the Play store.

That's another reason to switch to F-Droid.

~~~
eli
What keeps someone from doing the same thing to f-droid?

~~~
gnarbarian
Fewer people run it so it's not as desirable a target.

"Security via unpopularity"

~~~
phalangion
Aka, the Linux anti-virus.

~~~
27182818284
Sometimes it certainly seems that way, but there was also period of time when
Apache dominated the web and yet Microsoft's IIS was having a lot more
exploits despite Apache having more market share. Marketshare isn't the only
factor, but it probably is _a_ factor.

~~~
bartread
I wonder if that might have been due to Windows market share? Windows was
everywhere on the desktop, and those Windows desktops provide a good
intermediary vector for attacking instances of IIS on Windows Servers.

Also, thinking back to the bad old days and the script-kiddie-eseque of many
viruses of the early 2000s (iloveyou, et al), I suspect it may come down to
attacking what you know: Windows was more prevalent and better understood so
that's what people tried to break.

Not my field though, so all just speculation.

------
Waterluvian
Take the top apps and automatically raise a list of apps with similar names
each week. Pay someone to investigate and flag if necessary.

I feel like there's a general unwillingness in some realms to do anything at
all if the solution is "manual labor until a better tool is available."

~~~
TylerE
There might be legal concerns as well. I know under the DCMA safe harbor
provisions that basically as soon as you start doing manual moderation you are
now liable for anything that gets through.

~~~
pvillano
Could you expand on that?

~~~
hidenotslide
Not an expert, but the only part I see like this is section 512(a) here:
[https://en.wikipedia.org/wiki/Online_Copyright_Infringement_...](https://en.wikipedia.org/wiki/Online_Copyright_Infringement_Liability_Limitation_Act)

This seems to apply to network traffic, whereas I'd guess what to host on the
Play Store would be covered by 512(c) instead. If so, the "red flags" test
would seem to require at least some automated content checking.

But all the above refers to copyright, rather than the trademark violation and
fraud in the WhatsApp clones.

------
kornish
For anyone curious, the name for this type of deception (using Unicode to
pretend to be a known domain) is called a homoglyph attack:
[https://www.cisco.com/c/en/us/support/docs/security/email-
se...](https://www.cisco.com/c/en/us/support/docs/security/email-security-
appliance/200146-Homoglyph-Advanced-Phishing-Attacks.pdf)

------
paulryanrogers
Why don't they have a normalized slug to ensure name uniqueness? Or if so why
would it consider whitespace differences unique?

~~~
dfc
What is a "normalized slug"?

~~~
nerdponx
All names get reduced using Unicode normalization.

~~~
rspeer
Recently, thousands of users were duped into installing "АdВIосk РIuѕ" in
Chrome. Every one of those letters is Unicode normalized.

(Hint: it's not just the I's masquerading as l's.)

~~~
nerdponx
Fair enough. I'm a little surprised that those hold up under NFK
normalization. А sure as heck seems compatible with A.

------
CM30
Hang on, don't Google supposedly review apps before accepting them into the
store? I mean, they apparently have both an automated system checking for rule
violations and actual human staff checking every now and then:

[https://www.recode.net/2015/3/17/11560334/google-is-
adding-m...](https://www.recode.net/2015/3/17/11560334/google-is-adding-
manual-review-to-android-app-submission-process)

So how is this stuff just waltzing past their quality control setup? That one
unicode character can't really be messing up the whole system, right?

If this stuff is supposedly moderated, who's actually doing the moderation
here?

------
nebulous1
I assume the downloads were fake, thus giving Google an easy excuse to get rid
of it (it's gone now). Although probably all they needed was the obvious
impersonation.

Unicode has a boat load of security issues.
[http://unicode.org/reports/tr36/](http://unicode.org/reports/tr36/)

------
pasbesoin
Recently, I went to install the Amazon Kindle app onto my new phone. From the
Google Play store. It all looked good, except for the strangeness of an
individual's name listed as the name for the street address and contact
information for the app. That was something I did not recall from previous
visits to the app in the Google app store.

So, the Kindle app's not on my new phone. Because the validation portion of
curation is, ultimately, left up to the individual. And I didn't have time to
go chasing around the Web making sure I was hitting the correct/official app
store page. I _probably_ was. But I've been well-trained to "pause and check"
on such details.

P.S. I now recall, causing further hesitation, the "other apps" sections of
the search results and/or Kindle app page, included an Amazon Video app. And
that app had the same name listed in its details.

Now, the last I recall, Amazon Video was specifically NOT available in the
Google app store. Forcing people on non-Amazon devices who wanted to use it,
to have to add the Amazon app store and adjust permissions to allow installing
apps from it. At least, temporarily; once you had that or whatever app you
wanted from Amazon, you could then adjust your devices settings back to their
defaults. Unless/until you wanted to pull an update to such an app -- then,
rinse and repeat.

So... I see a weird bit of contact information. And I see it also for an app
that prior experience taught me was not available in the Google app store...

And, with repeated stories like the OP, I can't trust the Google app store to
be well-curated.

What else can I say? Meh...

~~~
hesarenu
Amazon prime video is available on play store.

~~~
pasbesoin
It appears to be, now. It did not used to be.

The Kindle listing I was looking at shared details with the Video listing.

Despite a fair amount of news browsing, apparently I missed the information
that the Video app had made its way into the Play store. Actually, I seem to
recall some news of same but also follow-up news that it had been pulled,
again, within a few days. (The eternal Google/Amazon competition/strife/"user,
you are the product" situation.) This would have been months ago.

So, I'm left uncertain whether I'm looking at the real thing, or an imposter.
I'm fairly certain I'm not. But "fairly certain" is not "secure".

At the time, I didn't have a lot of time to delve into this. And I only had my
phone in hand, making such an investigation more cumbersome.

I didn't install the Kindle app, then. The moment and immediate need passed,
and following up on this dropped down my list of priorities.

~~~
patja
When I see an app has over 100 million installs, half a million 5 star
reviews, and a support email address ending in @amazon.com it seems pretty
sure to be the real deal.

~~~
pasbesoin
After some more checking, when I had a bit of time, I installed it.

Those are also things I look for.

Still seems to be in line with my basic point: On Google Play, it's up to the
user to assess the item's legitimacy. At least, so far, Google continues to
provide these data points to the user; as long as the Play Store itself isn't
compromised.

Keep in mind, some of the items recently in question in the news are reported
to have had a million plus installs. Separately, fairly recent news stories
have described ways in which third parties have managed to glom onto prominent
domains -- particularly those providing extensive user services -- to gain the
addressing of that major domain for their own functionality.

------
enord
Trust is hard. It cannot be automated, it's inherently social and demands
vigilance. This is as true IRL as it is online and on "curated" federations
(of software, news, contacts etc). The "killer app" for trust is one that
extends our natural skepticism and social awareness, and this will _never_ be
easier online than in meatspace. This is obvious when we raise our perspective
from the purely technological (the "means") to the fundamentally social (the
"ends").

All the automated or manual safeguards that Google could enact would never
prevent people from pulling a fast one, the old switcheroo, a kansas shuffle
on each other because it's just something that we do. And we will use
whichever means (technology) available, in whatever way feasible. This
particular example looks egregious (or ingenious, depending) for cosmetic
reasons, but it's fundamentally an interaction between people however
fraudulent. Google is in the business of interactions between people.

~~~
smsm42
Solving trust 100% is hard. Having people review apps that have names which
are within a short Levenshtein distance (accounting for Unicode tricks etc.)
of a popular apps' names and banning those apps, the accounts that created
them and their suppliers of fake votes is not _that_ hard, especially for a
company like Google. And look at those apps' descriptions, they are complete
baloney, and any two-bit text classifier which a capable intern can mock up
together in a weekend from off-the-shelf components can recognize that. These
guys aren't even trying, and still aren't getting caught.

Yes, it may require some monetary investment, but we're talking about $700bn
company. They could afford it if they wanted to. If they are not doing it,
that means they do not want to.

~~~
enord
Of course, in hindsight you "only" have to calculate the Levehnstein distance
between any product name and _all other_ product names on the store. That
scales well. In order to close one single avenue for fraudulent advertisement.
Maybe it's a big one, and maybe the cost is recouped through improved customer
relations. Maybe.

And maybe they implement this, and calculate hundreds of millions (billions?)
of Levehnstein distances every day, but the next day someone publishes the
same app but with a germanized name ("Was ist App Update") and fools a
couple'o hundred thousand germans. Now the solution is obvious, run the names
through Google Translate for ALL languages and calculate the respective
levehnstein distances! I'ts foolproof! Shame on you google for not doing it
already! Simply irresponsible.

~~~
smsm42
> and _all other_ product names on the store

Not true. Nobody fakes random products. It's the top scoring ones that are
getting faked - for the obvious reason that this is what people are looking
for. If you're not in top N (100, 200, whatever), faking you is useless, you
just replacing nobody with nobody (exception may be bank apps, where even
faking relatively obscure ones can be lucrative, but let's not get into niches
for now). Just scanning against the top ones would kick the floor from under
the most current fakers.

And of course you don't need to continuously re-scan the data - you need to
scan only once, when the app is submitted or the name is changed. So, in
summary, when adding app or release to the store, you need to check its name
and description against a list - let's be generous - of 1000 strings and maybe
run a basic text classifier if you are feel in very AI mood today. Is that
impossible to scale? Nope, it's fairly easy.

> but the next day someone publishes the same app

So your argument is because simple checks are not perfect and do not cover
100% of possible fakery, let's not do anything and allow even the dumbest
fakers to run free and fill the store with trash. Does it make sense to you?
Because it doesn't make sense to me. Probably you decided since your argument
won't be perfect anyway, there's no point to even try for it to make minimal
sense?

------
jwilk
"Unicode whitespace" apparently means non-breaking space (U+00A0).

~~~
colanderman
To which the term "Unicode whitespace" applies as equally well as it does to
plain old U+0020 :)

------
nvr219
I seriously don't understand how people let their aging parents or young
children or friends use Android phones.

~~~
izacus
Should they also be forbidden from using Linux, Windows, macOS because it
allows for the same exploit? Should everyone on the world be limited to iOS
and ONLY iOS which is limited to ONLY apps (and soon media content) Apple
allows you to use?

~~~
prepend
You joke, but this is exactly why I recommend Macs or iPads to all of my
elderly friends and family. Trying to run Windows safely turns into a level of
knowledge that my 75 year old mom isn't able to do.

I don't yet know of a good alternative. I desperately want one because Apple
kit is expensive. But for now "$500 for an iPad" is the advice that gets me
the fewest calls for support.

------
pfarnsworth
I have personally and purposefully caused a lot of confusion on some sites by
using Cyrillic letters that look exactly like English letters to impersonate
other people. This was mainly for fun and for harmless trolling, but it's very
easy to see that this could be used on any site that uses Unicode for
usernames, etc. Phishing is extremely easy with this and something needs to be
done otherwise no one will trust the Internet ever again, especially if
someone can just "steal" Whatsapp so easily.

------
dep_b
At least now it's easier to explain to customers why they have to get a DUNS
to have apps under their company name in the App Store while the Play Store
just allows it.

------
fiatjaf
Apparently a-z0-9 usernames work better than these full business names.

It would be much harder to fake a github.com/whatsapp account than it is to
fake "WhatsApp Inc.". Besides the invisible codepoints, one would easily do
"WhatsApp Inc", "WhatsApp Messenger Inc.", "WhatsApp IM" and so on.

------
gaius
I wish for security and simplicity there was a way to disable everything but
7-bit ASCII. Like, who actually thought that having identical characters for
different things made any sense ??

------
jeisc
How is any end user to know what is the original whatsapp?

------
e9
I hope there are no fake banking apps like that...

------
JoshuaRLi
Unicode strikes again!

------
QAPereo
When people bitch about “walled gardens” I like to remind them just why people
build walls. This... is why. Sure, a world without walls and locks would be
ideal, but only if it’s also a world without thieves, saboteurs, and jerks.

~~~
spiorf
The irony is strong here. You need walled gardena because walled gardena
protect people from dangerous software. Posted as a comment in a news about
dangerous software found in a walled garden.

~~~
ufmace
The argument would be that this suggests that Google's garden should have
higher and better-guarded walls to prevent such things, while many seem to
argue that our gardens' walls are too high. Apple gets criticized for having
slower and more arbitrary manual reviews of all app updates, but they don't
seem to get malicious apps like this in their app store nearly as much.

~~~
ycmbntrthrwaway
They also don't have decent free mail clients with OpenPGP support.

------
microcolonel
This is why even some of ASCII is a mistake, maybe we shouldn't even bother
with case.

Every time something like this comes about I just get more cynical about the
complexity of multilingual systems, or systems with interesting typesetting
routines.

~~~
colejohnson66
What ASCII characters are a mistake? If we required all apps to only use
characters in the ASCII charset (the 127 codepoint one, not the Windows-1291
"ASCII" garbage), these tricks wouldn’t work.

~~~
irl_zebra
The poster said some ascii is a mistake. I’m assuming those chars that don’t
translate to much/anything.

~~~
colejohnson66
Like basically anything in the 0x1 to 0x1F range?

------
gear54rus
Bullshit. Just like punycode TLDs. And I say this as a guy from a cyrillic
country. We invented a footgun and then promptly shot themselves with it.

Instead of making everyone use safe ascii charset for IDs (domains, names like
the one in the article, etc.), we go for stupid fuckton of language charsets
that cause such problems. All in the name of accessiblity or whatever. And all
this does it let people continue living in their language-specific bubble
instead of just learning the main international language: english and living
happily ever after.

And now people suggest some crutches like restricting the data to some subset
of unicode. Never learn.

~~~
Jaxan
I don't think it's fair to force English upon people. It's not even the most
spoken language on the planet. (There is more people speaking Chinese. And
also more people speaking Spanish than English.)

~~~
gear54rus
Of course it's fair. Tech dominates the world, it's not about numbers, it's
about being able to understand everyone using modern communication methods.

Regarding china, they chose isolationist politics (just like russia lol), it's
everyone else's duty to pull the blanket over to the 'everyone' side from
'china' side.

------
apeacox
But gmail doesn’t make any differences between _user.name_ and _username_.

~~~
nolok
Why would this be relevant? What I do with one field in one of my product for
one set of reasons has very little to do with what I do with another field
entirely, in another product entirely, for a whole other set of reasons...

------
pacetherace
This is so typical of Google's policies. They will not fix something just
because users report it.

[https://bugs.chromium.org/p/chromium/issues/detail?id=147](https://bugs.chromium.org/p/chromium/issues/detail?id=147)

~~~
codazoda
I wouldn't fix that "bug" either. I don't want confirmation dialogs all over
the place. They are annoying when I try to close or delete. Yes, I clicked
close on purpose.

Google has done a good job with some of their "undo" notifications; these work
much better imho.

~~~
Godel_unicode
Especially considering there's a chrome option to have the startup tabs be the
tabs that were last open. No dialogs necessary, just take me back to where I
was.

~~~
Sylos
That's hardly going to help average users.

