
Identity Beyond Usernames - polm23
https://lord.io/blog/2020/usernames/
======
miki123211
If you want to learn how to do accessibility, specifically alt texts,
properly, this guy does it exactly right.

I'm a screen reader user and don't encounter those often. There's even some
fun in those alt texts, i.e.

"screenshot of unicode character inspector revealing "epic" to actually be
"еріс". of course you, a screen reader user, aren't fooled."

Btw, most people would just label that "screenshot", which is extremely
infuriating.

~~~
jonnytran
I regularly use the speech feature of macOS to read web pages aloud to me, so
I was also pleasantly surprised to hear meaningful descriptions of the images
from the alt text.

Maybe one day computers will be able to describe to me what is in an image
without a person having to type it.

~~~
miki123211
This is very slowly happening for some Chrome users. Generating meaningful,
accurate descriptions is a general AI task, though.

As an aside, Chrome does the autolabeling (including OCR) only if there's no
alt provided, so alts like "screenshot", "photo" etc. actually cause more harm
than good.

------
akersten
It's fairly long and winding but the main point seems to be that because
Unicode exists, and Unicode supports characters that are problematic for the
concept of a unique identity, then it is the concept of a unique identity that
is the problem.

It also presents a front-end solution (fuzzy-text matching for @'ing someone's
display name in a Slack channel) but does nothing to address the ever-present
back-end need to actually validate that a client is who they say they are.

There's a _good_ reason you can't make your bank username 𝕥𝕙𝕖 𝕜𝕚𝕟𝕘 , and it's
not "because developers are anglo-centric and intrinsically against all the
wonderful individual expression and identity that Unicode could bring us."

Since the article mentioned Punycode, I actually think the current
implementation is an elegant solution to a real problem. Of course we should
enable folks from anywhere around the world to experience the Internet in
their native language. That's the reason the Chrome algorithm to determine
whether to show Punycode is so complex - it has to carefully balance the
desire to show text the way it was intended while considering that it might be
a malicious attempt to phish someone using characters outside their typical
locale. In this case, the front-end solution of asking the user "which
google.com from this list of two identical-looking google.com's is the one you
want" just won't fly.

~~~
JoshTriplett
Your bank doesn't _need_ the concept of a username. A service may need a
username if it's going to have social features, public content, or other ways
in which users interact with each other. If people just log in and use the
service, let them use an email address as their login (which is already
unique), and don't make them pick a username at all.

~~~
akersten
In that case, the email is the username, and the problem remains: are
josh@gmail.com and jоsh@gmail.com the same user?

To the system, of course they're different - it's just looking at the bytes.
Problem comes when a human interacts with the account. Maybe they see the
email on the account and manually type it in to another system. Or records are
printed as part of a lawsuit and has to be transcribed.

Allowing indistinguishable characters in unique tokens creates confusion at
best. Broadly I agree that we should reduce our reliance on unique
identifiers, and I don't have a good answer for the right approach, but
certainly we can't abandon the concept of a unique identifier altogether: it
predates computers and the character encoding mess we made, by a long time.

~~~
inetknght
> _are josh@gmail.com and jоsh@gmail.com the same user?_

I like to think I have a keen eye and pay attention to subtle details. I don't
think I see a difference between the two addresses. Your message seems to
state they're different though. Did HN normalize the differences? Did my
browser?

In a blaze of stupidity, I decided to to copy and paste into my terminal and
pipe it through a hex dump [0].

* The character difference doesn't show up in the viewport in Firefox; of course, it's rendered.

* The character difference doesn't show up in the HTML editor; of course, it's showing the "raw text" and the "raw text" is valid unicode.

* It doesn't show up in the browser's network inspection of the response payload. That's the scariest part in Firefox IMO.

* It doesn't show up in my Terminal either. Why shouldn't it? We've fought long and hard for terminals to support Unicode.

Of course, then there's the fact that I decided to copy and paste from a
browser into my terminal even though I already knew I shouldn't [1]. What
_else_ could be hidden in unicode? An entire bash script starting with `sudo`,
perhaps? Websites can already inject their shitware into the clipboard with
clipboard events [1].

[0]:
[https://knightoftheinter.net/img/hacker%20news%20id%20237743...](https://knightoftheinter.net/img/hacker%20news%20id%2023774300.png)

[1]:
[https://security.stackexchange.com/q/39118/47800](https://security.stackexchange.com/q/39118/47800)

~~~
FreeFull
The second email address uses the Cyrillic small letter O, which renders the
same as the Latin small O in almost all fonts. You can see this in your hex
dump, too: Instead of a single o, your hex dump shows two bytes for the second
character in the second email address.

~~~
inetknght
Yup! I was surprised they're rendered the same. The hex dump didn't lie even
though everything until then did.

~~~
hombre_fatal
They aren't "lying", it's the same glyph in many fonts.

That you think homoglyphs would somehow look different in any of those steps
of your post is peculiar to me. l and I are the same in some fonts and they
are in the ascii set.

~~~
inetknght
> _They aren 't "lying", it's the same glyph in many fonts._

While it might represent the same glyph, it certainly isn't the same sequence
of bytes. I think the real failure is that's not made clear.

~~~
hombre_fatal
That's what I mean, looking at the bytes is the only way to know.

How would you solve homograph/glyph attacks though? One idea is yet another
encoding where there are no homoglyphs, only whitelisted diacritic sequences,
and there aren't more than one way to assemble the same character ("ó" vs
"o"+"´". So tough potatoes for Cyrillic "o", it's forced to use its nearest
equivalent: 0x6f Latin "o" in the ascii set.

~~~
inetknght
> _How would you solve homograph /glyph attacks though? One idea is yet
> another encoding where there are no homoglyphs_

First, modern operating systems (should?) already provide APIs to canonicalize
UTF.

Second, perhaps an additional API needs to be created which suggests
similarities between characters intended for use by an intelligence
(artificial or otherwise...).

------
memexy
> This even allows them to have fully-Unicode usernames; username phishing is
> less of a problem when users expect duplicate usernames, and none of your
> systems depend on username uniqueness.

It's surprising but providing an extra degree of freedom makes the system more
robust. Username phishing is a real problem on Twitter because people expect
unique names associated with each person they interact with but if usernames
can not be assumed to be unique then using the name as a heuristic for
identity is no longer a viable shortcut so people have to develop other ways
of making sure they're talking to who they think they're talking to.

On a related note, keybase (keybase.io) proofs never made sense to me until I
started thinking about how I would prove to people that I am indeed who I say
I am. Keybase provides a cryptographic basis for trust, which is much better
than what most social media systems currently support with their verification
mechanisms. I personally trust cryptographic signatures over whatever
verification mechanism Twitter is using to provide blue check marks to
verified accounts.

------
kibwen
_> The only solution is to develop systems that don’t have usernames._

I disagree that this is the only solution. The OP comes very close to
discovering an alternative one: don't allow users the power to select their
own arbitrary username.

When a user makes an account on your site (and if your site actually needs
publicly-displayed usernames in the first place), give them the option of,
say, ten usernames generated by the site itself; these _don 't_ need to be
numeric codes like WeChat does, they can be pronounceable phrases in the same
manner that Gfycat generates URLs. Everyone will end up with names like
"questionable wet aplomado falcon", "each unlawful harlequin bug", "icy
inferior iceland gull", which honestly isn't any worse than the average name I
encounter on Reddit or elsewhere.

The biggest downside(?) of this approach is that it makes it harder for
someone to build their personal brand, but the older I get the more I think
it's a bad idea to use a consistent nickname across different websites.

~~~
kitotik
> The biggest downside(?) of this approach

I tried this approach for a service awhile back, and the biggest hurdle was
educating the users. Even determining what to call it was a huge challenge, as
it’s clearly not a ‘username’.

~~~
kibwen
In what way would the user need to be educated? Social products like IRC,
Twitter, and Reddit have accustomed users to seeing opaque, often-nonsensical
strings next to every action taken by every user, and the only difference here
is that a given user hasn't had the opportunity to select their own opaque and
nonsensical string to associate with their actions. They wouldn't even need to
remember this string to log in, because sites use email or third-party login
services for that (and even if they did need to know their own username to log
in, their browser would remember it for them).

~~~
kitotik
I’ll happily accept any wireframes you may have that wouldn’t increase
friction beyond the typical “choose your username and password”.

You may be overestimating the typical non-hackernews user. It’s at least as
challenging as getting users to use things like 2fa and/or recovery codes.

~~~
kibwen
New wireframes don't need to be made, because this paradigm already exists in
a roundabout way. For many websites that I have signed up for, any attempt to
register an already-taken username will present the user with multiple
suggested alternative usernames. It's been decades since I signed up for
Gmail, but I'm pretty sure that on the sign-up screen you would enter your
name, "John Smith", and then your desired email, "johnsmith@gmail", and then
when that failed you would be given a list like "john_smith@gmail",
"smithj@gmail", "johnsmith2@gmail", etc.

This is the same UI flow as the above, except that it's not based on anything
that the user has already entered and it doesn't provide any way for the user
to override the suggestions, which actually makes it simpler than the existing
flow.

------
ryukafalz
I like the petname approach that GNS[0] takes - there _are_ stable identifiers
(they're public keys) but users are meant to refer to identifiers using local
names (or names that their contacts have set).

[0] [https://tools.ietf.org/id/draft-schanzen-
gns-01.html](https://tools.ietf.org/id/draft-schanzen-gns-01.html)

~~~
memexy
Generally, using cryptography or associated cryptographic functions is the way
to go when trying to make robust systems. Joe Armstrong has a great talk where
he outlines how to create a content addressable store for storing and working
with knowledge/data. He suggests using SHA256 content hashing because giving
items of data unique names is a hard problem so we might as well name pieces
of data by their content hashes and then have a human readable pointer.

\--

[https://www.youtube.com/watch?v=lKXe3HUG2l4](https://www.youtube.com/watch?v=lKXe3HUG2l4)

------
falcolas
Unicode usernames in Discord are my bane. I can’t search for them. I can’t
type them in manually. I can’t even copy their official name because the app
interface doesn’t allow selecting them. If the right click context menu
doesn’t allow the operation I need, I’m just SOL. A complete clustertruck.

------
gumby
The author’s choice of the epic / epic web sites was inspired too: those
Cyrillic characters spell out “Eris” the goddess of...discord!

------
tsimionescu
The author's solution is to shift the problem of uniqueness of usernames to
email, as if email is less likely to change than a site-specific username.

The opposite is actually true - there are many reasons to change your email,
while changing your username on the vast majority of websites is simply
unnecessary (e.g. anything without a social component, such as banks, shops,
Healthcare providers etc.).

Not to mention, if you don't have a system for ensuring unique usernames, you
also shift the burden of ensuring identity to your other users, who now have
to understand how to distinguish different users with the same name.

~~~
gcbw3
to me it seems the suggestion is to allow aliases (nicknames, images, utf, qr-
code, text) and stick with numerical ids (or whatever you use as the true
primary key, which can be email or phone number i guess, but hope not).

Even hints that for directories, etc at the end.

~~~
crooked-v
Discord and Blizzard both use a unique numerical ID as the true source of
identity (your name displays like "Mr. Bob#1234", but only the "1234" part is
actually important), while using email for logging in. The email and the
display name can both be changed at any time.

~~~
tonyarkles
And ICQ! While AIM was letting people choose usernames, ICQ looked deep into
your soul and assigned you an ID based on the order in which you joined :)

~~~
mey
Still remember my ICQ number even though it got account hijacked LONG ago.

------
chrismorgan
Purely as an implementation note, the Slack rich text editing widget is quite
poor, and the Discord rich text editing widget is _lousy_. Both have serious
bugs (Discord’s seems to be almost nothing _but_ bugs) where their widget just
doesn’t behave like a normal <input type=text> or even contenteditable (which
behaves subtly differently), in decidedly off-putting ways.

On @-mention autocompleters in general (and :emoji-code: completers too, and
GitHub/GitLab issue/PR #-reference completers), I honestly can’t think of one
that I’m completely happy with. Every last one I’ve experienced harms the
editing experience with surprising and inconsistent behaviour.

It’s all surprisingly hard to get right, and very few people that implement
seem to even _try_ to actually get it _right_.

Perhaps I should perform a more detailed study, enumerating the problems
clearly, and try to create one that’s as close to flawless as is possible.
(I’m sceptical that flawless is actually _possible_ with the tools given web
tech, especially in Blink and I think WebKit which don’t properly support the
difference between before-end and after-end in selection in contenteditable,
which matters more than you might think.)

------
ridaj
The author lists a bunch of gripes with alphanumeric usernames but I'm fact
many of these shortcomings are not worth a switch away towards non-unique,
freeform names (John Smith), or towards random ID allocation (whether purely
numeric or word-like).

In spite of what I'd personally have bet on a decade ago, I don't actually
know many international users who complain about alphanumeric handles. People
can/should still have Unicode-ish display names on top of handles, change
those, and be searched by those. Twitter does this well for example.

But a human-readable, ASCII-friendly, user-chosen unique handle is the best
thing I've seen so far for disambiguation on a social network.

------
notJim
GDPR adds an interesting wrinkle in all this. GDPR requires both the right of
removal (removing @-mentions) and the right of rectification (updating them.)
So if I change my name from notJim to notJohn, GDPR requires* services to
update previous @-mentions from notJim to notJohn, and similarly requires
services to remove old mentions of notJim if I request removal. Using an
identifier behind the scenes greatly simplifies this, because instead of
dealing with text, you fetch the appropriate display name at render time.

* IANAL, etc, and I'm sure requirements and interpretations of GDPR vary, but I do know that my company invested quite a lot into systems to fix this @-mentioning issue, so at least some people thinks it requires this and are willing to spend probably millions of dollars of engineering time to comply with said requirement. Personally, I support this provision.

------
yencabulator
Here's a thought I had when Twitter was still very new:

1\. Sites should forbid dots in "normal" usernames. 2\. Usernames with dots
can only be registered by demonstrating ownership of the corresponding DNS
domain.

Now you can have your favorite "brand" across all sites, nobody else can
register it before you, etc.

This is obviously not very non-techie friendly at this time...

------
User23
This is why I like hexl-mode[1].

[1]
[https://www.emacswiki.org/emacs/HexlMode](https://www.emacswiki.org/emacs/HexlMode)

------
MR4D
Unicode is what happens when humans see the Tower of Babel and someone says,
“hold my beer - I got this.”

