
It's Impossible to Validate an Email Address - elliotchance
https://elliot.land/validating-an-email-address
======
gaur
What I'm about to say is more general than regex, but can online services
_please_ stop trying to validate my email address?

If I gave you an email address that you think is invalid, rest assured I did
it for a reason. I'm not an imbecile: I know how to type my address correctly
(especially when you make me type it twice). For all the imbeciles who don't
know how to type their address correctly, the phone system still works fine.

I may have given you my real email address with a plus-sign for a filter.
Don't tell me it's invalid.

I may have given you a fake email address, because I know you're just going to
spam me. If you tell me it's invalid, I'll either spend an extra few minutes
cooking up a _better_ fake email address, or I'll leave your site.

~~~
makecheck
Agreed. I find things like "youdont@needmyemail.com" usually work, and then if
they ever bother to read what’s in their database they may get a hint.

~~~
Freak_NL
_kindlysodoff@mailinator.com_ is nice, and actually works for that one
activation link you have to click.

------
tyingq
If you happen to control the web page where the user is entering the email,
this little piece of code has been a godsend for us:

[https://github.com/mailcheck/mailcheck](https://github.com/mailcheck/mailcheck)

I agree with the idea that it's impossible to validate. But, mailcheck takes
the approach of seeing if the email is potentially wrong, then prompting the
user with what it thinks they meant. It's usually right, but if not, it allows
whatever the user wants.

For example, if your user types in "user@gmil.con", it will suggest
"user@gmail.com".

~~~
thedufer
This kind of stuff is great - make suggestions, but allow it to go through
even if you think it's wrong. For the last email validation I worked on, there
were only 2 absolute blockers - there must be an @ sign, and the domain must
have MX records (emails that are technically valid remain useless if we can't
send them anything).

There were a number of other checks (being close to yahoo.com or gmail.com or
other common email hosts, containing surprising characters, etc) that would
trigger warnings, but still allow the check to pass if the user assured us it
was correct.

~~~
tyingq
>>domain must have MX records

Technically, the RFC(5321) says MX records aren't required. You may be
throwing out some small number of valid emails.

 _" If an empty list of MXs is returned, the address is treated as if it was
associated with an implicit MX RR with a preference of 0"_

------
makecheck
This advice should probably extend to a variety of form elements.

Like DRM and a lot of other efforts to “control” things, aggressive validators
invariably punish people who are just trying to do legitimate things. Don’t
piss off your real customers.

I used to live in a town with a 12-letter name, and more than once a form
decided that it knew the Universal Sensible Maximum Length of Town Names and
wouldn’t let me type the last couple characters. And it usually doesn’t stop
there, because once a site is incapable of storing things sensibly it
invariably starts to have trouble _matching_ things, giving errors that are
just plain stupid (e.g. “this other thing doesn’t match what you entered”,
well no shit...).

There is also too much thought put into what constitutes a person’s “name”.
Generally, to work across all possible cultures, use a single, _very long_
text field that can contain whatever the person decides to type. After
accepting their input as-is, feel free to internally perform parsing logic to
try to allow additional database queries but under no circumstances should
your page make any assumptions.

The real “you should be fired as database administrator” mistake though is to
store modified data _without telling the user_. This usually happens with
passwords; I use a site for months and then one day accidentally hit Return
too soon _and my password works anyway_ , meaning they just CLIPPED whatever
strong password I entered and stored whatever they felt like (usually 8
characters). NEVER do things like that without telling the user.

~~~
Freak_NL
It is 2016. We should know better by now than to clip passwords _at all_.
Sure, place a limit of 512 characters on it to prevent abuse, and for security
reasons there can be a number of _minimal_ requirements for length and
complexity, but please let me be the one to decide if 32 characters is
sensible or not.

------
educar
> One more interesting tidbit is if you use unique sub-addresses for each of
> the sites you sign up to you will be able to see when someone, or rather
> who, sells your email to someone else... Busted!

Can't the spammers simply strip the subaddress/label after '+' ?

~~~
MatthaeusHarris
They can.

They don't.

It's extra effort for them for nearly zero marginal gain.

~~~
_asummers
Some sites however will just ban + on registration. I've seen registration
allow + but login disallow (also different password lengths occasionally,
wtf?), though I can't think of any offhand.

------
herge
I once heard the story of a man who helped Aruba set up their DNS (.aw) in the
late 90's. In exchange, as part of his compensation, he asked for an email
address at the top-level domain, and received something like js@aw, which is a
perfectly functional email address, but trips up a lot of validators.

~~~
marcosdumay
It isn't a valid address. TLDs must not resolve, so it should be impossible to
make a server handle it (yet, it is mostly possible, because most DNS servers
do not completely implement the RFCs - still, there's no guarantee it will
work on every network).

~~~
Symbiote
The first one I found that does resolve: [http://ai./](http://ai./)

It has an MX record too. There is nothing wrong with this.

~~~
breakingcups
It is most definitely wrong, if your definition of wrong includes disallowed /
not recommended by ICANN and IAB.

What is wrong with it is, amongst other things, the real-world possibility of
colliding with internal hostnames.

------
mcv
If you were to ask me for a regex, I'd say /.+@.+/.

That's the easiest and most accurate way to do it by regex. Sure, some invalid
addresses may still get accepted, but that is unavoidable. Even the most
thorough validation[0] is going to accept nonexistent addresses.

[0] Except those that validate by sending a mail to it. Sending an email is
the only way to be sure.

~~~
LinuxBender

        # get email addresses
        grep -EiEio '\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b'
    
        # censor email addresses
        sed -r 's/(<)[[:alpha:][:digit:]\._%\+-]+@[[:alpha:][:digit:]\.-]+\.[[:alpha:]]{2,4}(>)/\1--removed--\2/g'
    

If email doesn't meet those, I drop them on the floor. Then again, I drop
email on the floor for lesser reasons.

~~~
Symbiote
So you're blocking email from your local .museum, from anyone who has an Irish
name like O'Connor, all international TLDs, and most newer TLDs.

~~~
LinuxBender
Yes.

------
haddr
Let's remind the famous quote from Jamie Zawinski: "Some people, when
confronted with a problem, think "I know, I'll use regular expressions." Now
they have two problems."

I was neglecting this quote for a long time, until I started using regular
expressions in real projects...

~~~
drivingmenuts
I never really learned how to write regex. I can write simple ones with the
use of a tool to help me figure out what I need to write (and a cheatsheet to
explain the terms).

I realize they could save me some future potential grief, but I'm usually more
concerned with the present actual grief they cause me.

I feel like I'm letting down the side.

~~~
cyphar
If you start writing a lot of Unix pipelines or use Vim extensively you start
to get a very strong working knowledge of regular expressions (awk can solve
basically any problem, but it creates at least twice as many as it solves :P).

------
jukenim
This checks whether or not an email address follows RFC 5322 via parsing vs
via regex: [https://github.com/jackbowman/email-
addresses](https://github.com/jackbowman/email-addresses)

~~~
kps
Yes! The title is misleading; it's not at all difficult to syntactically
validate an email address; it's just not possible _using regular expressions_
(HE COMES).

------
jjp
Title on article would be more complete if said by Regex.

~~~
ketralnis
That's what the article describes, but it's hard to validate an email address
by sending to it too if you want time bounds. My mail server implements
greylisting and its frequently difficult for me to verify my email address on
services that send me tokens that expire. Greylisting typically delays by only
10 minutes or so but there are plenty of times that mail servers can be down
for extended periods, or a quota/disc is full, or an intermediate mail router
is down, or any number of other problems

------
Kequc
It seems there are weird things you can use in an email address that nobody
does, as a result what is used and considered to be an email address has
matured. If you create an email address that is weird, in practice you'll be
less capable of using it.

The weirder it is the fewer web forms or software you'll successfully put it
into.

I think we can just say no, functionally, you cannot put comments or
additional @ symbols into your email address. It hasn't worked for long
enough, people know you just aren't supposed to do it. I'd be surprised if you
were allowed to create such a thing signing up for bing for example. You
probably need to be the administrator of some chaotic UNIX server with full
DNS, in order to force it to happen at this point.

Even Google Chrome's built in email field validation doesn't allow you to do
it.

I shouldn't be expected to jump through the hoops necessary in order to allow
"technically valid" email addresses that someone went out of their way to
make, when I could more easily suggest they use a normal one.

~~~
Spivak
> I shouldn't be expected to jump through the hoops necessary in order to
> allow "technically valid" email addresses that someone went out of their way
> to make, when I could more easily suggest they use a normal one.

I really hope you don't work on anything important if your stance is, "I
shouldn't be expected to implement specifications correctly because it's
easier to only implement part of it."

Why are we even having this discussion? Implement it correctly once, put it in
a library and never worry about it again. You don't have to jump through any
hoops, you're only making more work for yourself by implementing the standard
incorrectly and then having to deal with customers that think that the ITEF
standard is more valid than _your personal definition_ of what an email
address should be.

If your code is passed down the line and eventually hits someone who writes
unit tests for actual valid email addresses then your name is going to come up
on the git blame when it fails.

~~~
Kequc
Even Gmail, what I would consider a gold standard, only allows letters,
numbers, and periods. So when you're done accusing me of not being someone
viable for a position anywhere important maybe you should send a message
outlining the same argument to Google.

~~~
Spivak
Huh, Gmail completely implements RFC5321 and you can send and receive email
from any valid address, even from their web client, _and_ they validate the
email address.

I stand by my position, failing to implement the spec correctly should be
considered an error even when Google does it.

If you want a compromise, how about printing "we don't support email addresses
with X" if you want to be picky for SPAM detection, simplicity, or something
rather than "this email address isn't valid." Google does this for some valid-
but-not-accepted addresses on their signup page but it's not sophisticated
enough to catch everything.

------
steventhedev
It's easy to validate that the syntax is correct. The problem lies in what
you're trying to do with those addresses. If you're importing a mailing list
archive, chances are a syntactical check is the only one you can do, because
half the domains for older lists don't exist anymore, and most of the
mailboxes won't.

If you want to send email to that address, you're probably going to want
something that can suggest gmail as a replacement for gmial. You can also
check that the domain exists and has a MX record. If you run your own mail
server you can probably even check that the mailbox exists...

If you want emails to be unique, you'll need to apply per-site logic like
gmails optional .'s and strip the + segments. That's important if you're
combining multiple lists of emails, or importing an existing mailing list for
a user.

The gist is the real world is complicated, but you can pretty easily set up
something that handles 90% of it.

------
vostok
Wouldn't it be reasonable to have a sanity check that can be bypassed by the
user? It is very likely to be a mistake if there's no full stop in the
address, but there are exceptions [0]. I would like to see a warning if I
accidentally type vostok@examplecom instead of vostok@example.com.

[0] [https://mail.gnome.org/archives/evolution-
list/2002-January/...](https://mail.gnome.org/archives/evolution-
list/2002-January/msg00466.html)

~~~
walterstucco
your first example is a valid email

------
drdeadringer
I remember reading the opinion that one need only verify that an email address
is an email address by answering, "Does it have an '@'?". Yes? Email address.
No? Try again.

Perhaps it is nice to ask a question one step above the email verification:
How much responsibility does//should the user have, how much responsibility
does//should the designer have, in ensuring the user's email address as valid?

------
coldtea
With a regex/parser maybe -- but it's very easy to require "activation" from
said email address as a verification.

------
dhoerl
I added a comment to the authors article. You can construct a regex to process
every valid email address except those with nested comments (a feature no one
in the real world ever used):
[https://github.com/dhoerl/EmailAddressFinder](https://github.com/dhoerl/EmailAddressFinder)

------
leesalminen
I've been using Mailguns validator [0] for a while now and have been quite
pleased. It catches common typos and validates DNS on the host name.

[0] [https://documentation.mailgun.com/api-email-
validation.html](https://documentation.mailgun.com/api-email-validation.html)

------
nradov
For anyone who would like to test their email address validation code, I wrote
a fuzzer which can generate syntactically valid addresses (among other
things).

[https://github.com/nradov/abnffuzzer](https://github.com/nradov/abnffuzzer)

~~~
kristianp
Is there ABNF for email addresses in an RFC?

~~~
nradov
Yes RFC 2822. Look for the addr-spec rule.

[https://tools.ietf.org/html/rfc2822#section-3.4.1](https://tools.ietf.org/html/rfc2822#section-3.4.1)

------
X86BSD
This brought me back to how Postfix has been handling this all these years.

[http://www.postfix.org/ADDRESS_VERIFICATION_README.html](http://www.postfix.org/ADDRESS_VERIFICATION_README.html)

------
voidz
I think that spaces are also valid in email addresses. So, even <bilbo
brouha@example.com> would be a valid email address in that case...

~~~
marcosdumay
They are valid within quotes, so that <"b b"@example.com> is valid, but <b
b@example.com> isn't.

Almost anything is valid within quotes, but quotes can not appear everywhere.

