
Google Is Battling a Russian Spammer Over the Use of the Letter 'G' - bloomca
http://motherboard.vice.com/read/google-is-battling-a-russian-spammer-over-the-use-of-the-letter-g
======
r1ch
I think an issue here is Google is showing "ɢoogle" as "ɢoogle.com" and not
"xn--oogle-wmc.com". The .com TLD has no support for IDNA2008 so they allow
registration of these similar-looking unicode TLDs. This is why if you paste
ɢoogle.com in your browser it will show the punycode instead. Basically it
looks like Google is decoding punycode for all TLDs, not just those that
support IDNA2008.

------
foepys
This shows the problems with unicode in domain names. Some Cyrillic characters
look exactly like Latin characters but have different codepoints, e.g. a
(0x61) and а (0xB0D0). This is pretty important for businesses whose domain
include an a, like banks.

Google is now noticing that those malicious domains don't even have to be an
exact visual match but a similar looking one is sufficient to trick users.

~~~
usernam
Ironically this is made worse by some fontsets such as Noto, that unify the
look over a variety of scripts (which is something you normally _want_).

This is going to be fun to watch. Unicode domain names are a can of worms.

~~~
agumonkey
Could be made a standard to signal non ascii chars by programs. A (red squared
g)oogle might grab attention. Maybe a full underline ..

~~~
xiaoma
That might be nice for you but it would be annoying for those of us who use
non-latin scripts.

~~~
agumonkey
Good point .. maybe a dual set based on the user locale..

------
morsch
Accessing the domain in question (ɢoogle.com) redirects to this fairly bizarre
chain of subdomains

    
    
      http://money.get.away.get.a.good.job.with.more.pay.and.you.are.okay.money.it.is.a.gas.grab.that.cash.with.both.hands.and.make.a.stash.new.car.caviar.four.star.daydream.think.i.ll.buy.me.a.football.team.money.get.back.i.am.alright.jack.ilovevitaly.com/

~~~
kombucha2
That's Pink Floyd.

~~~
soneca
Good catch! [https://genius.com/Pink-floyd-money-
lyrics](https://genius.com/Pink-floyd-money-lyrics)

------
petetnt
Popovs argument seems to contradict his statement as seen in this other
Motherboard article: [https://motherboard.vice.com/read/this-pro-trump-
russian-is-...](https://motherboard.vice.com/read/this-pro-trump-russian-is-
spamming-google-analytics)

Before:

> “I was fully prepared from April, but I wait. I could begin in a month
> before the elections and on a wave of the anti-Russian hysteria to receive a
> lot of traffic,” he said.

Later:

> “Lie! Not my domain!” Popov writes in bright red text regarding the site
> with dodgy pop-ups.

> “Lie! I'm not a spammer!” he continues.

Either someone is running an extensive anti-Popov campaign or Popov is
realising that the campaign has been a huge mistake.

------
merricksb
Cached version (as original URL is returning 404):

[http://webcache.googleusercontent.com/search?q=cache:KZq3KBV...](http://webcache.googleusercontent.com/search?q=cache:KZq3KBVYLhYJ:motherboard.vice.com/read/google-
is-battling-a-russian-spammer-over-the-use-of-the-
letter-g+&cd=1&hl=en&ct=clnk&gl=au)

------
ungzd
Seems that they sued vice.com too, now displays 404.

------
hlandau
I can't believe it took this long for someone to register that name.

~~~
krona
I would've thought Google themselves would have bought it many years ago, to
protect their users (and their brand.)

~~~
amptorn
There's a _lot_ of Unicode variations on "google". Potentially too many to
make that practical...

~~~
qntty
How many could there possible be? 1000? 5000? If google had to spend
$50,000/year to never have to worry about this, it would easily be worth it.

~~~
shawnz
If there are just 5 alternate ways to represent each letter, that's already
over 15000 domains.

~~~
planteen
If bidi control characters are allowed (not sure if they are), it's even
worse.

Then you could have something like *elgoog display as google

------
runnr_az
My coworker built a little tool to identify potential domain spam problems:
[http://upsidedown.domains/alternate.html?google](http://upsidedown.domains/alternate.html?google)

------
krumplifej
I stumbled upon this vulnerability during a white hat phishing test. The
success rate was very high when I used the alternate G domains even among hard
core IT folks. People have a tendency to overlook the difference. At that
point I faced an ethical dilemma: should I just forget about this or maybe
publish something? Neither options seemed right. Finally decided to get all
the unreserved domain names for the fortune 500. Had to set a limit
somewhere... To my surprise 102 of the vulnerable 103 fortune 500 was still
available. Now I own these domains... If these companies want them, I am happy
to transfer them over. If they do not care, I just let them expire. For my
company - we set the spam filters according, changed our web proxies, and also
own the alternate domains. I also submitted a bug report with a major software
vendor, because their solution further amplified the problem. They are working
on a fix...

------
NKCSS
unicode in domains is tricky; on the one hand; it's good that we can allow
people whom have non-ascii characters in their language to create domains
using them, but it introduces the problems pointed out here. It would be sane
to say that, when you abuse the system to trick people (as is clear with the
google and lifehacker examples), that the registration is voided (and barred
from future use).

Maybe it should be restricted to certain TLD's though; e.g. only allow the
unicode characters in TLD's that have a good reason for using them. That way,
it won't be an issue for .com/.net/etc.

~~~
garblegarble
I always thought that enforcing domain names use characters in the same glyph
subset would solve a lot of problems - either all Latin or all Cyrillic, etc.
(although I don't have any experience of what people put in their
internationalised domain names so maybe that would conflict with a key use
case somehow?)

~~~
edent
Same subset doesn't really help

* ɢᴏᴏɢʟᴇ uses all Latin Small Caps

* ᏀᎤᎤᏀᏞᎬ uses the Cherokee block

* ԌООԌӏЕ uses Cyrillic block

Not perfect, but would you notice them in the small type of your typical
address bar?

~~~
garblegarble
>Not perfect, but would you notice them in the small type of your typical
address bar?

Requiring that all characters come from the same block certainly doesn't solve
the problem but it would help make it a little more obvious - your examples
above are much easier for me to spot as suspicious vs the ɢoogle.com in the
article.

I hadn't thought about latin small caps before, that's an interesting one -
although perhaps that block could be blacklisted entirely.

Overall it's a bit of a mess, isn't it!

------
annnnd
Is it possible to disable Unicode chars for domains in FF?

~~~
OJFord
`network.enableIDN` apparently. Interesting reading (2005) here:

[https://bugzilla.mozilla.org/show_bug.cgi?id=279099](https://bugzilla.mozilla.org/show_bug.cgi?id=279099)

\-- EDIT:

It doesn't 'disable' as such, but renders the 'punycode' in full, so a fake
'[http://www.miсrоsоft.com'](http://www.miсrоsоft.com') is rendered as instead
'[http://www.xn--mirsft-yqfbx.com'](http://www.xn--mirsft-yqfbx.com').

For comparison, here's fake on top of real (not in monospace since it destroys
the illusion):

[http://www.miсrоsоft.com](http://www.miсrоsоft.com)

[http://www.microsoft.com](http://www.microsoft.com)

~~~
annnnd
Thanks! Setting "network.enableIDN" seems _not_ to work, but this setting
works: network.standard-url.encode-utf8 [0]

[0] [http://kb.mozillazine.org/Network.standard-url.encode-
utf8](http://kb.mozillazine.org/Network.standard-url.encode-utf8)

EDIT: doesn't work either. And the same for network.standard-url.escape-utf8.
:(

------
rnhmjoj
I get a "404 horse".

~~~
aptwebapps
Possibly to mitigate this?
[https://motherboard.vice.com/en_us/article/spammer-now-
spamm...](https://motherboard.vice.com/en_us/article/spammer-now-spamming-
google-analytics-with-motherboard-article-on-spam)

------
homero
Vice is under ddos now

