
Let’s talk about usernames - acjohnson55
https://www.b-list.org/weblog/2018/feb/11/usernames/
======
flurdy
> So if you’re enforcing unique email addresses, or using email addresses as a
> user identifier, you need to be aware of this and you probably need to strip
> all dot characters from the local-part, along with + and any text after it,
> before doing your uniqueness check. Currently django-registration doesn’t do
> this, but I have plans to add it in the 3.x series.

Sorry what? That seems pretty unneccessary. A third party system to dictate
how a third party system handles it local alias system for emails? I can't see
any benefit to that.

Whether a mail server handles '+' in a standard way is not guaranteed, and
surely it is up to the user how they use that feature if enabled.

~~~
dharmab
If you don't do this your system may allow users to sign up for multiple
accounts and double-dip on signup benefits, free trials and other "one per
account" features. (Barring additional controls, of course).

~~~
aeruder
Or they could just sign up at one of the numerous free email providers with a
different username. Stripping the + suffixes is only providing one thing -
pain for the users that want to use it.

~~~
sodapopcan
It's way easier to write a script to generate thousands of variations on the
same email address than to sign up for a thousand different accounts. I've
actually been bitten by this bug before... or rather, my company was bitten by
an affiliate who neglected to sanitize their emails this way and someone was
able to create thousands of gift cards in our system.

Having said that, in development, it's super nice to be able to create
addresses with +'s in them.

~~~
pzxc
What you say is not untrue, but it's still bad advice to do it -- a security
red herring. First of all, you don't know that 100% of mail servers ignore
characters after the +, so you can't safely strip those characters or you
might not end up with a usable email address. That goes double for stripping
the dots/periods, which gmail ignores but many other mail servers do not.

On top of that, it's just as easy to set up a catchall email address -- an
email box that accepts all mail for a domain, literally anything@mydomain.com.
So a malicious actor could sidestep this security attempt with minimal effort,
but it still inconveniences legitimate users despite being worthless from a
security perspective.

~~~
sodapopcan
True, true. As I mentioned below, in my case, it was even usernames, just
entering you email for a free gift card. The attacker actually used dots with
a gmail address.

~~~
shkkmo
There are soooo many ways to easily game the email side of it that you would
be better off using other means of detecting uniqueness (rate limit per IP
address, rate limit per hash of IP address and user-agent)

------
bradleybuda
> So if you’re enforcing unique email addresses, or using email addresses as a
> user identifier, you need to be aware of this and you probably need to strip
> all dot characters from the local-part, along with + and any text after it,
> before doing your uniqueness check. Currently django-registration doesn’t do
> this, but I have plans to add it in the 3.x series.

Please don't "normalize" email addresses like this. Not all mail systems are
Gmail, and many do treat "john.doe@example.com" and "johndoe@example.com" as
different identities. And even if we are talking about Gmail - it's not your
identity system's job to deduplicate different logical addresses for the same
physical inbox.

~~~
jcranmer
To emphasize your point: don't touch email addresses. You can get away with
doing equality checks on NFKC case folding, but don't assume that you can
store a lowercased email address and have it work properly.

Lots of weird email systems exist. Don't assume that everybody works like
gmail. And do test that things work right with uppercase letters in email
addresses: I've been locked out of systems before because I use an uppercase
letter in my email address and one half of the system was trying to match the
lower cased version to the actual text.

~~~
ubernostrum
Unfortunately, Gmail rules the world and has trained people to expect that
they can be sloppy and inconsistent and enter their email address as 'johndoe'
or 'john.doe' or 'JohnDoe' or 'John.Doe'... etc. and it just works, I don't
understand why _your_ site is broken, because _my_ email works!

And when that happens, trying to patiently pull a "well, _technically_ " and
explain to them about RFC this and the specs say that is a way to lose users.

(I actually have extremely strong feelings about email, email addresses and
the whole associated mess of specs, but had to tone it down for this article
since it was mostly about the various traps you can wander into from naïvely
thinking that you can just read a spec or implement something obvious and get
away with it)

~~~
kqr
> And when that happens, trying to patiently pull a "well, technically" and
> explain to them about RFC this and the specs say that is a way to lose
> users.

Not in my experience. Showing compassion and agreeing with them that what
happened is terrible and you wish things were different but you didn't call
these shots back in the days and if they want improvement you can both go
together and complain to google, the service which is actually broken.

Most people just want to be listened to. If you can do that, you'll earn a
loyal fan, even if you don't do exactly what they tell you to when agitated.
Some will even appreciate learning more about the email systems after the
fact. They may even get the feeling you went above and beyond by offering to
help them with matters outside of your site.

~~~
ubernostrum
_Most people just want to be listened to._

Many people, even after being listened to, and even after having things
patiently explained to them, still continue to enter someone else's email
address into forms which will send sensitive information to that email
address, and complain that they never got their important email, or that some
"hacker" has "hacked" "their" email, etc.

In a perfect world this would not happen. We don't live in a perfect world and
are unlikely to live in a perfect world any time soon, so we should not be
asking "how can we be pedantic and tell users it's their fault for not reading
the RFCs", we should be asking "how can we protect users from their ignorance
of the RFCs".

(when I wrote this article, I did not expect that this would be the single
most controversial line in it from HN's perspective, but I guess by this point
I should have anticipated it)

------
Gracana
> And while I could write this as one of those “falsehoods programmers believe
> about X” articles, my personal preference is to actually explain why this is
> trickier than people think, and offer some advice on how to deal with it,
> rather than just provide mockery with no useful context.

Thank you, so much.

I just wanted to highlight that for anyone who looked at the comments to
decide whether or not to read the article.

~~~
ch4s3
Yeah, this is a really good article. I really wish more auth systems made use
of the tripartite identity pattern.

------
jedberg
In my 20 years of experience validating email addresses, I've found one thing
that works every time without fail:

Send email to it

That is literally the only way to validate an email address. There is no
regular expression or algorithm that can validate and/or deduplicate an email
address.

You must simply treat every email as unique until you send an email to it and
that person proves otherwise.

That being said, this article brings up a lot of important things about
confusables that everyone should definitely be aware of, especially if you're
going to have public identities.

~~~
gitgud
How does sending email to it prove it's not a duplicate? For example:

    
    
        john.doe@gmail.com
    
        johndoe@gmail.com
    

Both resolve to the same email at the users end.

~~~
Spooks
well with gmail they are the same email. But with other mail applications,
those could be two different email addresses. You are going to need to test
all the different email services and find out which do this and which do not
do this... which is no easy task

~~~
TheDong
> which is no easy task

Which is impossible; a number of people run their own email or use a small
friend-run email server, and you can't possible discern the delivery rules
from the outside.

~~~
ubernostrum
_number of people run their own_

Yes. A small number is a number.

If we could go back in time and force every single implementer to follow every
relevant RFC to utter perfection (and make sure all the RFCs were perfectly
unambiguous), I'd be more sympathetic.

But email is fucked. The sheer number of oddball things, hacks, workarounds,
deviations and other bits of mess that implementers have engaged in over the
years means the RFCs should be treated as at best a loose hopeful outline of
how email might in theory work.

------
Spivak
> So if you’re enforcing unique email addresses, or using email addresses as a
> user identifier, you need to be aware of this and you probably need to strip
> all dot characters from the local-part, along with + and any text after it,
> before doing your uniqueness check. Currently django-registration doesn’t do
> this, but I have plans to add it in the 3.x series.

Isn't this a really dangerous game to play? Just because some major MTA's
assign a class of addresses to each user doesn't make each member of that
class not a unique identifier in general. Is it worth the headache to maintain
a list of various email systems' policies rather than just treating them all
as unique?

~~~
jcranmer
The email RFCs explicitly say thou shalt not interpret the localpart of an
email address, unless thou art the MTA of the domain in question. Even case
folding is forbidden. And the wisdom of people who work with email is... the
RFCs have good advice here: don't assume anything about how the localpart is
structured.

You can generally get away with treating the names as case-preserving (as
distinct from case-insensitivity), and you are probably safe in rejecting
quoted localparts. But beyond that, even forcibly lowercasing email addresses,
is likely to cause problems.

~~~
ubernostrum
Like it or not, the RFCs have lost. "How Gmail does it" is now how email works
in the minds of a stupendous number of people. So if Google says 'johndoe' and
'john.doe' are the same, we're stuck with the reality that 'johndoe' and
'john.doe' are the same.

~~~
fiddlerwoaroof
It’s completely legitimate to use variations on an email address for different
accounts on the same website.

Also, it’s useful to make use of +foo or varied usages of dots to create a
unique email address for each site: for one thing, it’ll help if one site
leaks your email address, then it’ll let you trace the origin of the leak if
that email address gets unwanted email.

Finally attempting to deduplicate email addresses before authentication is
almost as bad as lowercasing the password before checking if it matches.

~~~
ubernostrum
There's a line in the Zen of Python: "practicality beats purity". If I can
avoid someone filing a bug or a support request by knowing that Gmail has
trained people to believe a bunch of distinct (according to RFC) mailboxes
actually aren't distinct, I'm going to avoid the support request. The -- by
comparison -- minuscule set of users who A) actually understand the relevant
specs and B) care enough to yell at me in an HN comment are going to lose that
battle every time.

~~~
fiddlerwoaroof
> Gmail has trained people to believe a bunch of distinct (according to RFC)
> mailboxes actually aren't distinct

I'd be fairly surprised if your average user of gmail knew this: I know it and
I use it in part because it lets me _distinguish_ different accounts on the
same site. Second-guessing someone who's taking advantage of this feature is
more likely to generate tech support requests than not.

~~~
zbentley
Knowledge of the plus trick has gotten pretty widespread. Anecdotally, I know
a lot of non-technical people that use a single +spam address to route to a
spam folder.

Non-anecdotally, articles with large numbers of views/comments about the trick
can be found with a quick Google search on non-techie sites like
NYT/HuffPost/BusinessInsider/Buzzfeed/Pinterest/etc. Not that those are
definitive, but I think knowledge of this is more widespread than you think.

------
pc86
> _Many systems ask the username to fulfill all three of these roles, which is
> probably wrong._

What system with any non-trivial level of use uses the text username as (1)
the FK in the database, as opposed to the generated or auto-incremented ID in
the db; (2) the login name; _and_ (3) the publicly-displayed displayed "name"
of the user for others to see?

Plenty of forums etc use the login name for #2 and #3, and I'm not convinced
by this article that that's the wrong way to do it. I haven't ever seen a
_single_ professional product that uses the text username that a user logs in
with as the actual DB-level foreign key. That's grade school level database
design.

~~~
cjslep
When logging in, how do you get that autoincremented ID column? Some more
complex variant of "SELECT *.id WHERE username = $1". So functionally, yes the
username is the root identifier that pulls the very first record that then
allows other joins to occur. But you are right, the username column is not
literally the key being joined on.

There is also the security issue that by having the login name also be the
publicly displayed name lowers the bar for attempting to make a targeted
attack on the site, as well as other sites where the attacker suspects the
victim may be using the similar login name. This can particularly be true in
cases of harassment across platforms, which while is not a computer science
security issue, it is a personal psychological security issue.

~~~
bpicolo
> the username column is not literally the key being joined on

That's exactly the point though. If you join on the username than allowing
emails/usernames or whatever that identifier is to be edited is very hard. How
you identify the row to auth against is literally the point of a username.

------
gravypod
> Uniqueness is harder than you think

Discord has a very interesting solution to this. They have user names and user
ids. User IDs are tied to emails and the user's name seems to just be a random
text identity for displaying to users. I assume most of their backend code
used a unique, sequential or random, integer ID to identify and talk about
users while their frontend just makes the ID to a "user name". As long as you
slap account creation behind a verification email and don't mind one user
being able to sign up for multiple accounts you side step many of the larger
problems that come from choosing user names because, in effect, you are
choosing the "Real" username and you can make any guarantees that make writing
all of your other software easy.

~~~
Klathmon
Blizzard also uses a style like this. While it's great for some use cases, it
really sucks for others.

In Blizzard's implementation, I can't add a friend by just knowing their name,
I need their id number as well, and the process for finding it isn't exactly
front-and-center.

~~~
ulzeraj
I remember when I’ve switched to bnet ID and my tag became username#1337.
People sometimes ask me if I’ve bought the name or something like that.

~~~
Ambroos
I got Ambroos#2772 or #2727 (can't remember, it's been a while). Which is fun,
because I have a thing with 27 as it's the birthday of all my grandparents
(moms side) grandchildren.

------
mooreds
This should be exhibit number one why you should always favor open source
libraries rather than writing your own plumbing functionality, especially
around authentication and authorization. The onus should be on the developer
to explain why the open source library isn't a fit, rather than defaulting to
'roll your own'.

The edge cases discussed don't pop up that often unless you have lots of folks
using your software or are really diligent about fuzzing and testing edge
cases. If you roll your own, say, username system, you probably aren't going
to fall into either of those two cases. Which means you're vulnerable.

~~~
devmunchies
but then again, open source does not mean its good software, obviously. there
should be some way quickly check if a library meets security best-practices.
like a some sort of "vetted software" reference

~~~
always_good
Also, using a 3rd party library for something as important as authentication
because you don't know how it works doesn't sound much better nor secure.

Like storing sensitive data in the authn's session system because you don't
understand encryption vs signing nor how to find out -- maybe it's time to
just sit down and credentialize as a craftsman.

The authn/z systems I've used that were the biggest headaches in my life were
kitchen sink frameworks trying to generalize over everyone's creature
features, and they were often tied to a company/community culture of not-
gonna-touch-it that only hurt users and security.

~~~
mooreds
I think you should absolutely understand any third party systems/libraries you
use, especially when it is as important as authentication. Using a third party
component doesn't free you up to be lazy or to use it incorrectly.

My comment was stating that you should default to these types of libraries and
only roll your own if you can't do what you need to, simply because they're
more likely to handle edge cases that can have serious implications.

Do you do unicode normalization on your usernames? I freely admin that I
don't, and wasn't aware it was needed until I read this post.

------
IgorPartola
I agree with everything here except the email addresses example. Yes I do want
to register igor@example.com and igor+work@example.com. Those are different
accounts, please don’t mess with that.

~~~
zbentley
There may be very good reasons for a site operator to disallow this. I
mentioned some of them elsewhere in these comments:
[https://news.ycombinator.com/item?id=16358625](https://news.ycombinator.com/item?id=16358625)

------
neya
Hey community, shameless plug: For the purpose mentioned in the article to
disallow certain usernames, I created this GitHub Repo sometime back. Feel
free to submit a pull request :)

[https://github.com/dsignr/disallowed-
usernames](https://github.com/dsignr/disallowed-usernames)

~~~
flurdy
Seems I should add some more to
[https://github.com/flurdy/bad_usernames](https://github.com/flurdy/bad_usernames)
:)

~~~
neya
Nice! Maybe we should merge our efforts.

------
timvdalen
> So if you’re enforcing unique email addresses, or using email addresses as a
> user identifier, you need to be aware of this and you probably need to strip
> all dot characters from the local-part, along with + and any text after it,
> before doing your uniqueness check.

Please don't do this, lots of people (including myself) use the '+' hack to
separate accounts for different contexts (business/personal, different
projects/clients, etc).

~~~
jrimbault
I think he wasn't proposing to remove the '+' part from the address stored,
but splitting the address into smaller parts when doing the unicode
verification and checking if there's isn't already a 'john.doe+xxx' when you
register with 'john.doe+yyy'.

~~~
timvdalen
I realize that. What I'm trying to say is that I like to create
'john.doe+projectA@example.com' and 'john.doe+projectB@example.com' accounts
with some services.

Checking for the existence of any 'john.doe@example.com'-like accounts would
mean I have to register an entirely separate email account or set up (another)
email forwarder/alias.

------
ghalvatzakis
My Linkedin username:
[https://www.linkedin.com/in/Αdmin/](https://www.linkedin.com/in/Αdmin/)

~~~
edent
Brilliant! Does it cause you any problems?

~~~
ghalvatzakis
I'm using this username for about 6 months now. No problems so far...

------
Alex3917
If you're going to allow unicode usernames then you should casefold them
rather than lowercasing them before normalizing as NFKC.

You should ideally also store a second copy of the username in the original
casing and normalized as NFC for display purposes, as some users care a lot
about seeing their username exactly as they entered it. (And in fact not
allowing this may be seen as culturally insensitive in some cases, much like
not supporting unicode.) The same applies to the user's first and last name,
which you can store in NFC for display purposes and casefolded into NFKC for
string comparison (e.g. search) purposes.

That said, most sites limit usernames to ASCII characters so that they can be
(easily) used in URLs. In this case you don't need to casefold or normalize,
just converting to lowercase is enough.

~~~
ubernostrum
_If you 're going to allow unicode usernames then you should casefold them
rather than lowercasing them before normalizing as NFKC._

I wanted to stay out of the Python 2 vs. 3 quagmire in this article, but it's
worth knowing that in Python 3.3+, strings have a 'casefold()' method:

[https://docs.python.org/3/library/stdtypes.html#str.casefold](https://docs.python.org/3/library/stdtypes.html#str.casefold)

Unfortunately, since Python 2 still has around two years of upstream support
before EOL, I can't universally recommend people just use 'casefold()', no
matter how much I'd like to.

------
unethical_ban
These "battle hardened" articles are fascinating. It is the output of years of
experience and learning from real problems. It's building the best practices
guides, and building the tools to scan for the edge cases. A great read!

------
stasel
Worth mentioning Spotify's account hijacking problem when using unicode
[https://labs.spotify.com/2013/06/18/creative-
usernames/](https://labs.spotify.com/2013/06/18/creative-usernames/)

------
mhandley
When it comes to email address normalization, it sounds like we could do with
a standardized way for a domain to express normalization policies.

Could be as simple as publishing a set of regular expression subsitution
rules, specifying (for example):

* render to lower case (because this particular domain is case insensitive)

* drop periods (because this domain treats them like gmail does)

* drop '+' and any subsequent characters (because this domain treats them like gmail does)

* ASCII only (because mail software is old, and doesn't support unicode)

Etc.

Each domain could then publish their own rule, perhaps in a DNS txt record,
and anyone needing to check if two email addresses alias to the same could run
the correct checks.

~~~
ino
Some users capitalize their email address, and they expect to see it
capitalized.

I think a better solution would be to use a case insensitive collation on the
database for the email column.

If the user changes the capitalization of their email, treat it like any other
email change (validate the new email via email token)

------
Paianni
My strategy is to think of a name so unique that no one else would think of
using it. The one on my HN account I originally cooked up in Nov. 2014, and I
never had to extend it with numbers to get it accepted on various forums (yes,
most of them were fine with changing the nick). My biggest gripe is that since
YT changed their username system in October of that year, my most popular
channel is stuck on an old username despite being a few months older.

------
Mayzie
Aside from the authors library, django-registration, what other similar
libraries for Python and other languages have taken all or some of this into
consideration?

Excellent read by the way. Many things I have never considered or even worried
about before.

------
jancsika
> 3\. Public identity, suitable for displaying to other users

Many sites-- like HN-- may not even need that. If you have system and login
identity you can just display "dingus" as the name of every single user and
the system should still work the same.

~~~
grzm
I think without display names having discussions would be difficult, as people
find some identifier useful to follow along. Even if it's a pseudonym, it's
hard to build a sense of community without being able to distinguish those in
your community. Or am I misreading you?

~~~
Anon1096
4chan and other imageboards manage to have discussion fine without unique
identifiers.

~~~
zbentley
It's a different kind of discussion, though. There is value to each type of
threading, but no one-size-fits-all approach--as evidenced by the fact that
4chan lets you opt out of anonymity.

------
sergiotapia
I don't follow the initial premise.

>Well, it’s easy until we start thinking about case. If you’re registered as
john_doe, what happens if I register as JOHN_DOE? It’s a different username,
but could I cause people to think I’m you? Could I get people to accept friend
requests or share sensitive information with me because they don’t realize
case matters to a computer?

Just this month we fixed this issue by using a citext column in postgres. So
yes, it is easy. Maybe I'm missing an edge case here?

[https://www.postgresql.org/docs/9.1/static/citext.html](https://www.postgresql.org/docs/9.1/static/citext.html)

~~~
acdha
It’s easy if you thought about it before you have users; people didn’t and
then need to ensure that a fix doesn’t break something.

~~~
craigds
Yeah. We switched to CITEXT for email addresses after 10k users, and had to go
through quite a complex process to merge about 30 user accounts that had
signed up duplicate accounts with email addresses varying only in case). It
was a major PITA

------
saurik
The entire concept of usernames that are unique and permanent is stupid and
even "cruel". The reality is that a relatively small handful of privileged
early adopters get good usernames that match their identities, and everyone
else gets screwed. These identifiers then act like tatoos that you got a long
time ago and are stuck with for the rest of your life: people end up reminded
every day of a sport they can no longer play due to an injury ("hockeystar")
or loves lost ("iheartjessie"), attached to a joke that is no longer funny or
to a thought that they found adorable as a 13 year old (when you are legally
asked to "choose a username": a modern era coming of age scenario) but which
adults find inane, or to a nickname that means something different than you
realized to some people and now can't change.

The reality is that there are almost ten billion people on this planet and
they live for upwards of a century. You are simply deluding yourself if you
think it is reasonable to build a system with unique, permanent usernames.
Nothing in the real world works like that, including trademarks. And it just
helps enforce the very problem that people try to trust usernames and then get
tricked by people who sniped usernames that are tied to other peoples' well-
known identities (leading to abused "verified" badge systems and legal
challenges and expensive hostage scenarios... it just sucks).

And for what? To make it easier to hand-type a URL? Does anyone even do that?
I am super technical and I barely even do that in 2018, as if nothing else
there are too many websites in existence to remember all of their one-off URL
schemes. Like almost everyone, I either use the site's built-in search feature
or I do a search on Google to find people, and let a combination of page rank
and personalized results guide me to the right destination. Some web browsers
don't even show URLs anymore!

Here is a great example of where it is completely insane: Facebook. There is
absolutely no good reason for that website to have usernames for regular
users, and they frankly shouldn't have usernames for businesses either. It
isn't even clear to me that the app--which most users are using, not the
website--even has a way to show people's usernames, which means this is an
identifier which somehow everyone knows must be chosen and must be unique and
is nigh-unto permanent but which somehow is also simultaneously meaningless
but is also a horrible point of contention? What?

I am lucky. I spent a bunch of time in 1994 to select a username, and despite
being 13, I was mature enough to come up with something that wouldn't ever
come to cause me complex problems. People ask me what it means, and it
essentially doesn't mean anything: it has only a positive connotation to me
when I hear it, it is entirely neutral, and it had no existing usage I could
find. Yet, I also still got screwed, as I am semi-famous, and everyone knows
me as this username. I have kids who look up to me enough to want to take my
name as a show of support and I have to essentially be the big bad asshole
about it because in a world of unique and permanent usernames, people then
assume the kid is really me. On the other side, I have been asked to rename
myself by moderators of various forums as they couldn't believe the real
saurik got an account on their site, and it was "confusing" people.

And so in the end we all have to deal with the worst-case scenario anyway:
unless you do nothing but sign up for random sites rumored to be interesting
constantly (which I seriously tried to do), you eventually will succumb to
needing a way to prove who you are on multiple sites and tie together those
identifies. And for _most users_... as in _virtually all_ "normal users", that
moment comes when they are using only _two_ websites, as their username was
probably something like jay.freeman.178 as everything that was even remotely
interesting to them was taken a decade earlier by literally a different
generation of humans, so they let the website automatically generate one.

In a world where everyone is having to solve the worst-case problem anyway,
every site should just have numbers as unique identifiers, at most have some
kind of trust score for degrees of separation on the site (so you can get a
feeling for "is this the saurik that I met?"), and everyone should be trained
"names don't matter and if you see someone with that name it doesn't even
slightly mean that they are the same person you met last week".

~~~
mehrdadn
> every site should just have numbers as unique identifiers

Do you realize how impractical it is for users to remember these numbers for
every site? Until we get to the stage where every non-English-speaking user
_and their grandma_ finds a password manager convenient, this proposal won't
even pass the laugh test.

~~~
Yizahi
That's why we need biometric hardware everywhere and use its data as a login,
not as a password. Bio data is mapped onto a long UUID and user just sets
whatever username he wants to be displayed. We even have means to smoothly
transition from no hardware to 100% coverage - just allow manual UUID input
for systems where biometric is unavailable - e.g. you have a pair of face/ID
on the phone, fingerprint/ID on laptop and just ID on PC.

~~~
hawski
Serious question: what happens when you lose your finger? Or you have an
accident and your face gets mangled? Or a ball hit your eye and you lose it?

You have to remember your UUID? Probably it would be more like a file than a
string.

~~~
earenndil
You keep your UUID as a backup. The fingerprint system on my phone still has
an _option_ to log in with password, the fingerprint is just a convenient,
faster alternative. It's the same here, you keep the UUID in a folder with
other sensitive documents and when you lose your finger, you fish it out, log
in with it, and register another finger.

------
flurdy
One of the reasons I made my own bad username lookup.
[https://github.com/flurdy/bad_usernames](https://github.com/flurdy/bad_usernames)
Its a simple json file of usernames to disallow.

It does not address many of the other things higlighted in this post but it is
a start, at least for my services.

~~~
nekopa
In the en version of your file I don't see any of the common c-level words:
ceo, cfo etc...

------
zeveb
> So if you’re enforcing unique email addresses, or using email addresses as a
> user identifier, you need to be aware of this and you probably need to strip
> all dot characters from the local-part, along with + and any text after it,
> before doing your uniqueness check. Currently django-registration doesn’t do
> this, but I have plans to add it in the 3.x series.

This is needlessly user-hostile. If users wish to use mailbox extensions to
have multiple unique accounts, that's their right. They can always get
multiple different email accounts, after all.

He doesn't mention the one thing he _ought_ to do, which is to strip email
addresses of comments before checking them: (foo)jdoe@example.com,
jdoe(bar)@example.com, jdoe@example.com, jdoe@(home)example.com & (a (nested
(comment)))jdoe(more)@example.com(all done) are all the same email address.

------
M2Ys4U
I'm surprised nobody has mentioned PRECIS - the framework for Preparation,
Enforcement, and Comparison of Internationalized Strings in Application
Protocols[0].

It defines a (small) set of profiles to validate and compare various types of
string, including "Username" (in both case folded and case prepared variants)
and "Nickname".

Want to compare two usernames for equality? Run the two strings through the
comparison steps for the UsernameCaseMapped[1] profile.

It won't solve all of your problems, but it's a good place to start.

[0] [https://tools.ietf.org/html/rfc8264](https://tools.ietf.org/html/rfc8264)

[1]
[https://tools.ietf.org/html/rfc8265#section-3.3](https://tools.ietf.org/html/rfc8265#section-3.3)

------
AnabeeKnox
For PHP, see the Spoofchecker class for similar functionality to the Python
class discussed in the article.

[http://php.net/manual/en/class.spoofchecker.php](http://php.net/manual/en/class.spoofchecker.php)

------
jarofgreen
> What we really want in terms of identifying users is some combination of:

> System-level identifier, suitable for use as a target of foreign keys in our
> database

> Login identifier, suitable for use in performing a credential check

> Public identity, suitable for displaying to other users

Some sites want a fourth one:

Public Identity, suitable for other users to use to refer to each other.

Like on Twitter: "Discussed this with @bob and @jane yesterday, you'll find
..."

Now, you don't need a unique username to be able to meet this requirement -
StackOverflow is an example of a site that handles this I think? But having a
unique username is a common pattern that many sites use to solve this so it
seems worth mentioning.

------
dmitriid
With all the unicode problems I'm surprised so few languages include ICU which
has _everything_ unicode-related. Things mentioned in the article:

\- Normalization: [http://userguide.icu-
project.org/transforms/normalization](http://userguide.icu-
project.org/transforms/normalization)

\- Confusables: [http://icu-
project.org/apiref/icu4j/com/ibm/icu/text/SpoofCh...](http://icu-
project.org/apiref/icu4j/com/ibm/icu/text/SpoofChecker.html)

(and there are so many more things)

------
lettergram
> Django’s auth system doesn’t enforce case-insensitive uniqueness of
> usernames

Their routing (for URLs) is also not case-insensitve. The whole framework by
default is case sensitive. Honestly, kind of annoying.

~~~
frankwiles
Honestly never really thought about it much. Wouldn’t that lead to
accidentally duplicate URLs in the eyes of things like PageRank?

~~~
ronnier
No. That’s why you embed a canonical url meta tag.

------
badsectoracula
It is certainly slower but what i always did even back around 1999-2000 when i
first learned web programming, was to simply query the user db to see if any
user exists with the requested username before even doing anything else. Also
at some point i decided to also store a "broken down" version of the username
with symbols removed, Os replaced with zeroes, etc and check against that.

Also i never allowed less than two letters and characters that weren't
numbers, latin letters, a space and a few punctuation symbols.

------
c12
I'm currently working on an service and have put a lot of thought into about
seven tenths of what is said in this article.

This is a very good read and one I have bookmarked to share with colleagues.

------
teisman
For my website, normally I'd be in favor of allowing users to create multiple
accounts with variations of email addresses (e.g. foo@example.com,
foo+bar@example.com, f.oo@example.com). I sometimes create multiple accounts
like that myself as well.

Coincidentally, today a spammer is creating hundreds of accounts with such
variations of the same email (gmail) address -- something that should be
stopped right away.

------
todd3834
> making usernames case-insensitive would be a massive backwards-compatibility
> break and nobody’s sure whether or how we could actually do it.

Couldn’t you store an upcase version of the username that is unique for this?
You would still keep both columns so you have the upcase version for
uniqueness and the original column for display name. This would also be
backward compatible.

~~~
todd3834
An upgrade script can find collisions and report them before you upgrade.
You’ll be required to resolve those before enabling case insensitive user
names.

------
ne01
> Well, it’s easy until we start thinking about case. If you’re registered as
> john_doe, what happens if I register as JOHN_DOE?

For most applications you are better off with making everything NOT case
sensitive.

That's why SunSed language is completely case insensitive -- from variable
names, tags (functions) to all string comparations. Users should not worry
about case!

~~~
leoc
Doesn't case insensitivity combine poorly with (fairly unrestricted use of)
Unicode though?

------
fileeditview
I somehow lost my old HN account and had to register a new one not so long
ago. After trying a multitude of user names I surrendered and just copied
words from the browser menu bar. This is what I ended up with :)

I always wondered why we don't use emails as (unique) login names generally. I
mean they can be shown with wildcards if that is the concern?!

~~~
sethammons
I generally agree. But we've had customers who share an email address, whether
businesses or personal, who want distinct accounts on our site.

------
GoToRO
Meanwhile I haven't found a good way to know my Skype user name (or id?). My
account was migrated to Skype from Outlook I guess and it is something like
"live:username" but people can't really find me with that. I take full blame
if it turns out that I just don't know how to use Skype...

------
styfle
Somewhat of a tangent:

I somehow created a Reddit account without an email address in order to
comment on something about 8 years ago.

Eventually, I decided to comment on something else but forgot the password.

I was unable to reset the password without an email address.

So I never commented on Reddit again...I don't want a username that isn't
styfle :)

------
oscarhult
Creative usernames and Spotify account hijacking Posted on June 18, 2013
[https://labs.spotify.com/2013/06/18/creative-
usernames/](https://labs.spotify.com/2013/06/18/creative-usernames/)

------
qwerty456127
I actually loved the concept of bare-number UINs used in ICQ back in the days
when ICQ and AOL dominated the messengers market. Those were neutral, easy to
share vocally, allowed mutable non-unique screen names and e-mails. That's a
pity people don't use them any more.

------
PokemonNoGo
I like how many system have problems with surrogate-pairs and the shy hyphen
\xad. I just the other week had a support issue where I had used the very
special character of - (dash) in my email...

Also Atlassian Stride doesn't support sending the \xad character at all. It
just fails...

\xad makes me \sad

------
ngneer
[https://en.wikipedia.org/wiki/Zooko%27s_triangle](https://en.wikipedia.org/wiki/Zooko%27s_triangle)

------
hendry
I don't see what is wrong with treating a email as a username.

I wonder if Auth0 or Cognito resolve any of these issues.

------
metalliqaz
This is a great article, thanks.

------
dingo_bat
> No, really, uniqueness is harder than you think

No it's very simple. Restrict usernames to ascii. Do unicode where it makes
sense.

~~~
Jaruzel
Which is great for everyone using the Latin alphabet, but what about Cyrillic
and other non western-world based languages?

~~~
badsectoracula
They can also use ascii, you do not have to use your real name (in fact i'd
recommend to _not_ use your real name - it is really concerning how people
lost the will for Internet privacy they had in the 90s) as a username. And
even if you really really want, for some reason, you can just romanize it.

~~~
isostatic
In the 90s and before privacy wasn't really a big thing -- looking at a couple
of threads from usenet in the 90s, the majority of people used their real name
(or at least I assume their real name).

~~~
badsectoracula
I do not think usenet and newsgroups are a good indicator, by the 90s -
especially mid to late 90s - when the floodgates to commercial internet fully
opened together with the fast rise of the web, people spent more time in
forums, chatrooms, etc than usenet which had more of a 80s "small community"
background (where people felt more comfortable using their real names - most
likely they already knew each other IRL considering they were usually from the
same or nearby University or organization). When it came to forums, chatrooms,
etc everyone used nicknames and even people who used both forums and usenet
often used nicknames on the usenet than the other way around. Personally i
remember reading back in 2000 or so that it is considered a good practice to
use your real name in newsgroups (showing the real name bias it had) so that
people "take you seriously" and thinking that it was a bad idea (and not
commenting anywhere because i felt very uncomfortable with using my real name
online - sadly this eroded over time, although the last few years i am
sometimes concerned about it, but i think it is too late now and anyone can
find who i am with a single google search).

But regardless, even if people in the 90s weren't more privacy minded (which
_really_ _REALLY_ goes against my experience with any community i either was
aware of - and i use the net since 1993), this doesn't really change my
opinion that people should not use their real names online and instead they
should limit themselves to latin, letters and a few symbols that cannot be
forged.

------
EGreg
Why have usernames at all?

Think about it ... people who have you in their addressbook already have
nicknames for you.

People who you let see your first and last name can already see that.

About the rest of the people - why do they care?

Only because you want your REPUTATION to be communicated to others in English.
Hey it's "someguy22"!!

Yeah that's a pretty limited thing. Could be useful but really, how often do
we remember names of others? Only celebrities. And who actually cares about
Aziz Ansari and his sex life? Or any of the other dudes who we never met? Or
why is one dude "the" Bill Gates and the others with same name are not
verified by Twitter? My point is, think about what is the sociological meaning
behind usernames.

~~~
kuschku
So, how should mail services handle emails accounts?

1004383737289302@example.tld ?

~~~
EGreg
Email John

Otherwise how did you come across John's email? SPAM?

~~~
kuschku
From his business card?

From him spelling it out to me when I asked him on transit?

I've got 100% of my contacts either by them spelling out their contact to me,
or by them entering it into my contacts app, or by me telling them my email,
and they sending an email to that.

For all these use cases, 193938939302002 is a useless identifier.

So how should I handle emails, when I want to allow people to register emails?

~~~
EGreg
Have you ever considered that verbally spelling something out may be a sign of
a problem? If you were taking someone's credit card number, would you have
them go one digit at a time over the phone, then read it back to them and so
on, or email it so you can copy and paste it? Or how about just have them
autofill a form?

Same here. John gave you a card? Cool, maybe it can have a QR code you can
scan. Or do you enjoy reading and typing long URLs and having to double check
them?

How do you handle emails, you ask? Well, how do people enter their emails?
They can visit your site and the emails get autofilled. They can bump phones
or use bluetooth or any number of ways that don't require verbally spelling
things out.

And anyway, if you haven't allowed someone to email you, why should they be
able to spam you?

~~~
kuschku
> Have you ever considered that verbally spelling something out may be a sign
> of a problem? If you were taking someone's credit card number, would you
> have them go one digit at a time over the phone, then read it back to them
> and so on, or email it so you can copy and paste it? Or how about just have
> them autofill a form?

I don’t use CCs, but I (and my parents) have memorized the Kontonummer and
Bankleitzahl (and, from that, you can concatenate the IBAN).

> Same here. John gave you a card? Cool, maybe it can have a QR code you can
> scan. Or do you enjoy reading and typing long URLs and having to double
> check them?

I visit wikipedia by typing
[https://en.wikipedia.org/wiki/<topic>](https://en.wikipedia.org/wiki/<topic>).

If a URL is designed well (HN’s aren’t), this is very easy. Wiktionary and
Wikipedia do it well, reddit’s is also okay, e.g.
[https://redd.it/7wwtqy](https://redd.it/7wwtqy) – most sites, in fact, work
nicely like this.

> Cool, maybe it can have a QR code you can scan.

Most don’t.

Maybe you remember the time, just a few years ago, when everyone knew their
phone number and email, and their friends’, by heart? Why force people to
change that? Relying on autocomplete and autofill for everything is horrible,
and creates massive network lock-in.

~~~
EGreg
Why not remembee your friend's IP address by heart? Come on. No one is
"forcing" people to forget their friends' numbers. They could have always
entered it even now. They CHOSE to stop remembering new ones and just press
the thing in their contacts.

Are you one of those people who remembers every password on every site,
because you think a password manager will one day screw you?

~~~
kuschku
> Why not remembee your friend's IP address by heart?

51.15.1.223 is my server, in case I need to SSH into it.

> Are you one of those people who remembers every password on every site,
> because you think a password manager will one day screw you?

I remember two of them, the password to my password manager, and the password
to the email that I’d need to reset the password to my password manager.

It’s always nice to use automated tools such as a digital contacts app or a
password manager. But you shouldn’t rely on it.

During the @googlemail.com to @gmail.com switch for German gmail addresses, I
told Google not to switch. A few years later, in '16, Google auto-switched me
anyway, so I selected "undo". In that moment, Google (probably due to a sync
bug) wiped all my stored passwords in Chrome, all my contacts, all my emails,
and my entire Calendar. On all connected Android devices as well.

It took me ages to get back to have everything working again after that,
because I relied on this technology. I’ve lost contacts to some friends that
I’ll never be able to get back, because I only had them in Google contacts, or
in Gmail.

So I won’t ever support any suggestion that would make me rely even more on
these. I’m self-hosting everything now, I’ve got backups everywhere, and, just
in case, I’ve got the most important info memorized.

~~~
EGreg
But this is an example of a remote company controlling your identity. Stored
passwords in Chrome shouldn't rely on some server. You should be using an app
(preferably open source) that has your own biometrics and passwords as seeds
from which to derive keys that can get the master key, which is stored
encrypted with those keys on YOUR computers (and maybe some cloud backups). I
think that is how Apple's keychain does it.

That's a totally different problem to remembering info of CONTACTS.

