
Phishing with Unicode Domains - 01walid
https://www.xudongz.com/blog/2017/idn-phishing/
======
wimagguc
HN Discussion about the same topic from 2 days ago (126 comments to date):
[https://news.ycombinator.com/item?id=14119713](https://news.ycombinator.com/item?id=14119713)

~~~
paulddraper
High level recap:

Chrome - fixed in 59 (current stable is 57)

Firefox - no plans to change; you can adjust network.IDN_show_punycode in
about:config

IE - immune

Safari - immune

~~~
belorn
Could you explain what fixed/immune means? Is it only the confusable
characters, ie characters that are visually identical or near-identical to
latin characters, that is getting the punycode treatment?

~~~
H4CK3RM4N
I think IE and Safari just don't support Unicode in domains.

------
dmckeon
Can a browser could track how many language/character sets are typically used
by a browser profile, and warn the user when they are about to use a new,
previously unused set, rather than waving the duty off as the "responsibility
of domain owners"?

With now over 1000 top-level domains, and however many homographic matches
among character sets, expecting people to register dozens of matching domains
seems unrealistic.

~~~
Klathmon
Won't it be even easier to just check if the domain contains something outside
the currently used character set (perhaps always allowing ascii)?

I think that, plus a "you have never visited this site before" kind of warning
could go a long way towards combating these kinds of attacks.

I think the real devil is going to be in the UI. You don't want to make it
overly scary (otherwise you penalize domains which use some unicode characters
correctly), but it can't be so unnoticable that you won't be able to tell when
it matters.

~~~
dmckeon
For a multi-lingual (really multi-char-set, "multi-graphic"?) user who often
visits sites in several different char-sets, and might have a 60/30/5/5
percentage distribution, getting an "are you sure?" check before visiting a
site with mixed char sets or a new-to-that-profile unmixed set seems like an
useful confirmation that would not be invoked often, but would be likely to
avoid a trip to the phishy sites. The same approach would work for the 99/1 or
100/0 distribution. The UI should be more of:

    
    
      "You have never visited a site in this language/character set before"
       More_Info.  Cancel?   Proceed?

~~~
wbkang
Interestingly enough, my Chrome sends "accept-language:en-
US,en;q=0.8,ko;q=0.6". I dont even know how it infers that.

------
shif
I wonder how the domain displays on email clients like gmail and outlook, this
is the scariest part, most people will just look at the domain and think it's
a valid mail and follow the instructions of that mail, it could be
catastrophic for companies, the ubiquity $40 million fiasco comes to mind.

~~~
mike-cardwell
Considering how easy email is to spoof, why bother using a unicode domain
which is only similar to the target domain? Why not just use the real domain
instead?

~~~
antinatalism
Spoofing isnt so easy for gmail and yahoo inboxes. Some web-clients warn of a
return path too. For sophisticated spoofing and phishing unicode domains are
helpful. Plus, spoofing emails is just a small attack vector.

~~~
mike-cardwell
Spoofing is trivially easy for gmail and yahoo. Here's me spoofing an email
from fakeaddress@ycombinator.com to my gmail address:

    
    
      mike@blob:~$ telnet gmail-smtp-in.l.google.com 25
      Trying 66.102.1.26...
      Connected to gmail-smtp-in.l.google.com.
      Escape character is '^]'.
      220 mx.google.com ESMTP 19si14686133wmr.1 - gsmtp
      EHLO whatever
      250-mx.google.com at your service, [164.132.228.175]
      250-SIZE 157286400
      250-8BITMIME
      250-STARTTLS
      250-ENHANCEDSTATUSCODES
      250-PIPELINING
      250-CHUNKING
      250 SMTPUTF8
      MAIL FROM:<fakeaddress@ycombinator.com>
      250 2.1.0 OK 19si14686133wmr.1 - gsmtp
      RCPT TO:<*****@gmail.com>
      250 2.1.5 OK 19si14686133wmr.1 - gsmtp
      DATA
      354  Go ahead 19si14686133wmr.1 - gsmtp
      From: "Fake Address" <fakeaddress@ycombinator.com>
      To: *****@gmail.com
      Subject: This is a spoofed email
    
      Spoof spoof spoof
    
      --
      Spoofy McSpoof
      .
      250 2.0.0 OK 1492497764 19si14686133wmr.1 - gsmtp
    

Email was delivered fine. Straight into the Inbox (not the spam folder). Even
though ycombinator.com has strict SPF records which don't include my IP.

The only clue is, in the web interface Google displays a grey octagon with a
red question mark inside it next to the sender address. And when you hover
over that a tooltip says:

"Gmail couldn't verify that ycombinator.com actually sent this message (and
not a spammer)"

So yeah. I would dispute "Spoofing isnt so easy for gmail and yahoo inboxes"
\- They're as shit as everyone else.

------
nemo1618
What an odd coincidence: I just published a Go package yesterday to detect
such attacks in source code. Is there a homography bug going around?

[https://github.com/NebulousLabs/glyphcheck](https://github.com/NebulousLabs/glyphcheck)

(btw, Wikipedia notes that "The term homograph is sometimes used synonymously
with homoglyph, but in the usual linguistic sense, homographs are words that
are spelled the same but have different meanings, a property of words, not
characters.")

~~~
01walid
Interesting, but -from the repo description- why this is limited to Go source
code files?

~~~
nemo1618
Mostly because it has an "ignore comments" mode. A lot of non-English speaking
programmers write code using English keywords and identifiers but use their
native language in the comments.

With some work, it could be made language-agnostic, but that's more than I
have time for right now. If comments aren't an issue, you can just grep
through all your source files for the offending characters, which shouldn't
take more than a simple bash script.

------
html5web
This is the scariest one: [http://www.xn--80a6aa.com/](http://www.арр.com/) &
[http://www.app.com/](http://www.app.com/)

~~~
jastanton
Why is this the scariest one? I've never heard of app.com, any real new or
fake news (in the literal sense) coming from that site wouldn't register as
legitimate one way or the other.

However apple.com with a CC reset form could be a mighty easy way to scam a
lot of people into giving up the personal details which could easily lead to
full blown identify theft.

Thankfully FF/Chrome are patching this

------
E6300
[http://blog.unicode.org/2014/09/updated-unicode-security-
spe...](http://blog.unicode.org/2014/09/updated-unicode-security-
specifications.html)

------
khedoros1
Interesting. The apple.com one ([https://www.xn--
80ak6aa92e.com/](https://www.xn--80ak6aa92e.com/)) shows literally that text
in Pale Moon (27.2), but shows "аррӏе.com" (Cyrillic text) in Chrome 57 and
Firefox 51.

Someone else's example that looks like "app.com" ( [http://www.xn--
80a6aa.com/](http://www.xn--80a6aa.com/)) translates to the Cyrillic text,
even in Pale Moon. I wonder if Apple's site is on a hard-coded blacklist in
the browser, or if every update includes the top-1000 list, or something?

I remember reading about issues with Unicode domains _years_ ago, though. It
surprises me that something hasn't been figured out by this point. One
mitigation that I remember being discussed was coloring characters from
different scripts in different colors, to make variant characters more
obvious.

~~~
paulddraper
Even if you could train that, it doesn't help color-blind people...

~~~
khedoros1
Depends on the palette used. On the other hand, if that's the only indicator,
then it doesn't help _blind_ people either.

------
bchociej
Thankfully I got this: [https://imgur.com/a/3XyIe](https://imgur.com/a/3XyIe)

