
Email Regex that works 99.99% - chenster
http://emailregex.com/
======
tom-lord
I believe this falls under the category of "things that may be fun to play
around with but should never be used in a real system".

Unfortunately, I bet there are thousands of "real systems" employing regexes
like this... How many problems does this solve? Probably zero. How many does
(/will) it cause? Probably much more than zero.

~~~
lucaspiller
Not sure why this has been downvoted, but yes these sort of things do cause
more problems than they solve.

Case in point, at work the other day I found a bug in a service I manage. It
consists of a front end form (built by one team), which submits data to
another system (built by my team), which then passes the data to a third
party. The third party was rejecting the data we were trying to send them as
the email addresses were apparently invalid. The validation they were doing
didn't match the validation the front end form did, so to the user everything
seemed fine.

~~~
gdrulia
So what caused the problem, wasn't that front end and 3rd party was checking
if email is valid, but the fact that there is no standard way of checking it.

I'm not saying that checking email address with regex is good way of doing it,
but there is countless examples of people doing it and it would help a lot if
everyone would use one standardized regex for that.

~~~
tom-lord
As others have suggested, the best "standardised regex" solution would just be
something like:

    
    
        /.*@.*/
    

Some people aren't happy with this because it allows invalid emails to be
entered. But the alternative - such as what's being attempted here - is a
nightmare!

Even _if_ there was a standardised regex for all emails, which would not break
when new TLDs are released, or new unicode characters are supported, or
whatever, there would still be no guarantee that _someemailaddress@domain.com_
is actually valid!

Your best bet is to simply allow users to enter anything (with perhaps some
_minor_ regex check like I did above), perhaps ask them to enter it twice, and
perhaps send an account validation email.

------
falcolas
After extensive research, I have finally come up with a way to improve upon
99.99%. I have come up with a regex which will work for 100% of email
addresses, and a significant number of regex engines, as an added bonus!
Behold:

    
    
        .*@.*
    

/s

~~~
moe

      $ nc localhost 25
      220 uhura.z ESMTP
      MAIL FROM:me
      250 2.1.0 Ok
      RCPT TO:root
      250 2.1.5 Ok
      DATA
      354 End data with <CR><LF>.<CR><LF>
      Look ma, no @!
      .
      250 2.0.0 Ok: queued as 8DB653E0065
      quit
      221 2.0.0 Bye

~~~
falcolas
/bow

.* it is!

------
peteretep
You know the different languages match different sets of email addresses,
right? The reason the Perl ones are so much longer, is that they work for
_all_ RFC5322 addresses, where the JS match a subset.

------
michaelmcmillan
I simply don't validate emails up front anymore. The only thing I check for is
if the string contains an @-char, I only do that to be nice if it's left out
by accident. Instead of having a monstrous regex pattern in my code I simply
email a confirmation link the user must confirm.

~~~
hobs
That's the thing, the only way to validate an email address anyway is to
actually have the user take some action to do so. Otherwise, I would question
how important setting up the email thing is in the first place.

------
jimsmart
Jeez, that's 100 email addresses in a million that it won't work on. Plus it's
a pain in the butt. [Edit: Though I suspect that 99.99% figure was made up]

Just get people to enter their email twice (which filters out most mistakes
where people are entering their names or somesuch), don't validate it with
regex, during the signup process make sure you tell them to expect an email
which they must confirm before they are added / before their account is
activated. Send a confirmation email with a clickable link. If people don't
get it, and the service is important to them, they'll try again or contact you
through another means.

(I was involved with the running of a mailing list with well over 1m double-
opt-in subscribers. Less than 100 of these turned out to be invalid [Edit:
yeah, that's a guess, like the OP's 99.99%], and we dealt with it easily at
our end, by properly handling any bounces)

------
bshimmin
Utterly pointless. An email regex tells you that the email address (probably)
conforms to a pattern that means it might be a valid email address (for now,
until new weird TLDs emerge and the patterns have to change...), but it has no
way of telling you whether that address can actually receive mail. `foo@bar`
fails these regular expressions and `foo@bar.invalid` passes them, but neither
will receive mail.

As I have told people for many years: if you must do this, check at most if
there's an @ and (perhaps) a dot somewhere after the @, which is enough to
stop someone who has accidentally put their name in the email address field,
or a similar user error. Anything else is a waste of brainwidth and will
result in more problems than it solves.

~~~
PeterWhittaker
Completely useful, as part of a two step process:1

1\. Filter with the regex - what's left has a valid format, making step 2 much
saner.

2\. Extract and validate the domain name - super simple now, because the
domain component is known to be sane.

(Optional but good idea 3: Handle exceptions....)

Step 1 is almost always the hardest part, now it's mostly done.

~~~
Piskvorrr
The assumption here is "0\. create a regex that does not have false
negatives". You seem to be taking the author's word for the "99.9% Works" part
- based on what? (Exactly.)

~~~
PeterWhittaker
Correctness and utility are two different things, feel free to investigate the
former whilst we discuss the latter....

~~~
Piskvorrr
Well thank you, I doubt I would have come to such conclusion by myself. Are
you saying that "turn away 0.1% of your customers" is a _useful_ approach?
Well, your call, I guess.

~~~
PeterWhittaker
Whoa, there: You're shifting your argument. Are you now accepting that it is
99+% correct? Your original objection was as to the correctness of the regex.

If it is 80% good, then, yeah, your underlying assertion is likely good: Too
little coverage to be all that useful. But 99+% coverage? For a quick pass
filter? That's pretty good, we can work with those numbers.

After all, one could have multiple layers of progressively more accurate but
progressively slower filters. Not an uncommon or unusual approach.

Besides, the JS version could be very useful as a quick pass usability check,
validating whether the user made a silly mistake. And adding some logic to
record the bogus entry, compare it to what they write next, pass it to the
server if need be, can all be very useful.

------
babo
This would be valuable only with a proper test suite, nothing fancy but two
files with valid and invalid addresses. I don't trust these and very hard to
debug a complex regex, it's wway easier to argue about test cases.

------
chrisfarms
Yeah this is not great practice.

We built an app where sending a validation email upfront was not a practical
option some time ago, and the best strategy I found for ensuring the email was
valid was to lookup the MX records and ask the mailserver for the given
domain, by issuing RCPT commands. Many mailservers will just drop connections
when RCPT is for someone who doesn't exist or can't be routed to which was a
good indicator of a typo or invalid address. And of course if the MX lookup
fails the domain is incorrect.

Still wouldn't recommend this method either though really.

~~~
IgorPartola
Did you remember to fall back on a A and AAAA lookup if there is no MX record?
That is what you are supposed to do.

------
h1fra
Only way to check email.

Lookup for an '@' & parse response log from provider to know if addresse
works.

------
imron
So in 1 million emails, there are 10,000 valid emails that will be rejected. I
guess if you use this regex then you'd better hope your service doesn't become
popular.

~~~
ThePadawan
Your math is off by a factor of 100.

~~~
Piskvorrr
Assuming that the 99.99 figure is not something the author pulled out
of...thin air. In absence of any data, I'm inclined to believe that the figure
was gained by this very method.

Edit: Oh look, the submitter has pulled yet another figure out of thin air
(author says "three nines, trust me," submitter says "four nines,
omgwtfbetterthanslicedbread!!!1!"). Suspiciouser and suspiciouser.

------
shdon
Title here has an extra 9, compared to what's on the site. In either case, how
would one back up those percentages anyway?

------
steventhedev
Nice idea. But a blind implementation of the grammar set out in the RFC is not
performant. It's better to drop the obsolete syntax rules and folding white
space. I have yet to see a user legitimately try to input an email with a
comment mid-domain.

I implemented this in Ruby a while back, but I also went the next step and
added a DNS check for a MX record. That way you can ensure there's a mail
server to receive an email. Heck, I even wrote a blog post about it.

[http://stevenkaras.github.io/blog/verifying-email-
addresses](http://stevenkaras.github.io/blog/verifying-email-addresses)

We've had pretty good feedback so far, but I've also spotted a few emails that
people enter that are clearly fake (e.g. asdf at test.com).

------
lutoma
Meh. I wish people would just give up on trying to validate email adresses all
together (except for maybe basic stuff like checking for an @). They'll almost
always forget about some edge case.

I use a pretty ununsual TLD (.su, the old TLD for the Soviet Union, which
still remains in the root zone), and from time to time, I come across a site
that won't accept my email address because of that. Most of those sites turn
out to be generally crappy though, so not much of a loss…

Also, many sites don't accept + in email adresses, which is annoying as hell
if you want to use the address extension feature of Postfix et al.

------
exratione
Does't work on the dreaded "I am a terrible but nonetheless valid email
address"@example.com.

I echo the advice of everyone else - validate with something very simple like
.+@.+ and then by sending an email. Trying to recapture the complexity of the
email system via a tool like regular expressions is tilting at windmills. It's
like trying to develop a regular expression to determine whether a name is
real or not.

[https://www.exratione.com/2012/09/what-constitutes-an-
accept...](https://www.exratione.com/2012/09/what-constitutes-an-acceptable-
email-regex/)

------
quailman
Note that these regexes aren't even matching the same thing, so who knows
_what_ these things are matching. Whatever it is, it's probably not 99.99% of
the world's emails, and either way nobody is going to check that. _Especially_
nobody is going to parse that Perl beast.

As a tangential anecdote, I always thought it would be interesting to drop a
backdoor into some canonical piece of code like this that noobs are bound to
copy-paste. It might be the most efficient way to worm your way into the
largest number of computers worldwide.

~~~
dfhoughton
There are two Perl regexes. The beast is for 5.8 (check your Perl version;
it's probably above 5.12). The other is basically a BNF grammar and is trivial
to parse. The only easier ones are those that throw up their hands and just
look for an @ with characters before and after.

------
ricardobeat
These have wildly different behaviour. The .NET and Javascript ones are even
exchangeable (both are valid js). They will also not match internationalized
domain names unless converted to punycode before validation.

IMO you either implement the RFC, or use the absolute dumbest validation
possible: 1) there is one '@' character present 2) there is at least one dot
on the right side. Anything else will exclude some valid addresses, and you're
unlikely to ever hear feedback/complaints from someone who had their sign-up
email rejected.

------
rascul
I haven't bothered with validating an email address for awhile, but I have
used [https://isemail.info](https://isemail.info) successfully in the past.

------
nailer
In JS the regex is built into the browser: you can leverage HTML5
.checkValidity() on any type="email" input.

[https://developer.mozilla.org/en-
US/docs/Web/Guide/HTML/Form...](https://developer.mozilla.org/en-
US/docs/Web/Guide/HTML/Forms_in_HTML)

Note this is super liberal, so user@domainwithnodots (which is RFC valid, but
probably also a user error) is still considered valid.

------
comeonnow
Any idea on who is responsible for this micro-site?

I find it strange that there's no information on the author, sources,
references, attribution, or credits on the page at all (other than the
WordPress theme attribution).

------
Piskvorrr
Three nines is now the new four nines? No data to back it up? Bah humbug!

------
bobcostas55
How come the .NET version is so small compared to the others?

~~~
pritambaral
The various versions aren't equivalent to each other. Some validate 100%
according to the RFC (perl), and some make a compromise (JS, PHP, .NET)

------
vinilios
What's so special with .NET regex handling ?

------
chenster
The one for Ruby is missing. Anyone?

------
cybice
Why perl regexp is so long?

~~~
Macha
Because it actually validates any RFC compliant email address, while the
others (especially the .NET) ones are people's attempt at "good enough"
regexes.

------
pikzen
Alternatively, don't validate email addresses with a regex because it's
pointless. Check that at least an '@' symbol is present and just send a damn
email already[0]. Or better, don't send a damn email at all and let them
access it because it's pointless to have an extra verification step.

If you really want to do some clientside validation, just keep a basic regex
and warn the user if the address doesn't seem right.. Or use MailCheck[1].

0\. [http://davidcel.is/blog/2012/09/06/stop-validating-email-
add...](http://davidcel.is/blog/2012/09/06/stop-validating-email-addresses-
with-regex/)

1\.
[https://github.com/mailcheck/mailcheck](https://github.com/mailcheck/mailcheck)

~~~
imjared
> Or better, don't send a damn email at all and let them access it because
> it's pointless to have an extra verification step

I've heard lots of people say not to bother validating but I always thought it
was because you could just confirm with a verification email. Why's this step
useless?

~~~
m_mueller
I don't get it either. Not doing any verification ist just an invitation for
trouble. "My password doesn't work, why can't you send me a new one?"

