
Full email validation regex (RFC 2822) - ZeljkoS
http://code.iamcal.com/php/rfc822/full_regexp.txt
======
l-p
Please read this : [https://nikic.github.io/2012/06/15/The-true-power-of-
regular...](https://nikic.github.io/2012/06/15/The-true-power-of-regular-
expressions.html)

RFC5322-compliant regex:

    
    
        /
            (?(DEFINE)
                (?<addr_spec> (?&local_part) @ (?&domain) )
                (?<local_part> (?&dot_atom) | (?&quoted_string) | (?&obs_local_part) )
                (?<domain> (?&dot_atom) | (?&domain_literal) | (?&obs_domain) )
                (?<domain_literal> (?&CFWS)? \[ (?: (?&FWS)? (?&dtext) )* (?&FWS)? \] (?&CFWS)? )
                (?<dtext> [\x21-\x5a] | [\x5e-\x7e] | (?&obs_dtext) )
                (?<quoted_pair> \\ (?: (?&VCHAR) | (?&WSP) ) | (?&obs_qp) )
                (?<dot_atom> (?&CFWS)? (?&dot_atom_text) (?&CFWS)? )
                (?<dot_atom_text> (?&atext) (?: \. (?&atext) )* )
                (?<atext> [a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+ )
                (?<atom> (?&CFWS)? (?&atext) (?&CFWS)? )
                (?<word> (?&atom) | (?&quoted_string) )
                (?<quoted_string> (?&CFWS)? " (?: (?&FWS)? (?&qcontent) )* (?&FWS)? " (?&CFWS)? )
                (?<qcontent> (?&qtext) | (?&quoted_pair) )
                (?<qtext> \x21 | [\x23-\x5b] | [\x5d-\x7e] | (?&obs_qtext) )
    
                # comments and whitespace
                (?<FWS> (?: (?&WSP)* \r\n )? (?&WSP)+ | (?&obs_FWS) )
                (?<CFWS> (?: (?&FWS)? (?&comment) )+ (?&FWS)? | (?&FWS) )
                (?<comment> \( (?: (?&FWS)? (?&ccontent) )* (?&FWS)? \) )
                (?<ccontent> (?&ctext) | (?&quoted_pair) | (?&comment) )
                (?<ctext> [\x21-\x27] | [\x2a-\x5b] | [\x5d-\x7e] | (?&obs_ctext) )
        
                # obsolete tokens
                (?<obs_domain> (?&atom) (?: \. (?&atom) )* )
                (?<obs_local_part> (?&word) (?: \. (?&word) )* )
                (?<obs_dtext> (?&obs_NO_WS_CTL) | (?&quoted_pair) )
                (?<obs_qp> \\ (?: \x00 | (?&obs_NO_WS_CTL) | \n | \r ) )
                (?<obs_FWS> (?&WSP)+ (?: \r\n (?&WSP)+ )* )
                (?<obs_ctext> (?&obs_NO_WS_CTL) )
                (?<obs_qtext> (?&obs_NO_WS_CTL) )
                (?<obs_NO_WS_CTL> [\x01-\x08] | \x0b | \x0c | [\x0e-\x1f] | \x7f )
        
                # character class definitions
                (?<VCHAR> [\x21-\x7E] )
                (?<WSP> [ \t] )
            )
            ^(?&addr_spec)$
        /x
    

Also, if you want to validate a mail address, send a mail. There is no other
way.

~~~
alexchamberlain
You should probably confirm it contains an @ first. The question is how much
validation is necessary prior to sending the email?

~~~
mcv
There needs to be an @, something in front of it, and something behind it. If
you've got that, try mailing to it.

~~~
claudius

      $ echo "Hello" | mail claudius

~~~
mcv
Local email addresses are of course easier to identify. You can just check if
the user exists on the server. But that's not much use to websites and other
online services.

~~~
thwarted
Not necessarily, "checking if the user exists on the server" is ambiguous. You
could check /etc/passwd, but the mail system may use virtual users for "local"
delivery, where "local" is defined in this case as not requiring a domain
portion. The only way to even check if a user exists even locally is to try to
send it mail.

------
drfritznunkie
I regularly tested the "email regex du jour" at my previous job whenever these
types of articles came up. IIRC, it was against 15+MM known good email
addresses, and probably double that in known bads and nearly every one tested
had its issues. [edit: we had something like 150,000 distinct active domains,
and probably 1/2 that of distinct MXes (if you rolled up all the google-biz
and microsoft hosted stuff)... if you think getting your email delivered by
gmail is difficult, try a school district in Wyoming that appeared to have a
300baud connecting it to world running an ancient version of Groupware that
rejected email according to the weather report as far as we could tell...]

Most people working on the code for that sign-up page (/what have you) neither
have the regex-fu necessary nor the understanding of email to write the regex
correctly... So you get a lot of shitty regexes (especially large
corporations) that don't support apostrophes or dashes/plus signs in the local
parts. And it doesn't matter how good your regex-fu and RFC comprehension
abilities are, there are a _lot_ of broken implementations out there and
blocking a subscriber because of their broken system isn't a great business.

It took awhile, but eventually we switched our signup forms to do a couple of
very effective things beyond a very simple address regex: 1) auto-suggest for
common misspellings of our most common domains (gmal.com, yaho.com, etc.) 2)
while the "please re-type your email" gave us enough user delay, we did a DNS
lookup of the domain, then an MX lookup. If there was a problem with either,
we passed an error to the user like "Please double check the domain of your
email address..." 3) check for domains you know have moved. We were B2B, so if
you watched your bounces closely, you'd know that asdf.com was moving to
hjkl.com, so you could update your existing records, but people have serious
muscle memory, and it's worth reminding them on the signup page.

I was working on tying in our bounce database (you _are_ keeping a record of
all your bounces, right?) so that automatically flagged domains would prompt
the user with an error like "We've been unable to deliver to your email domain
recently, if your email address is typed correctly, we recommend using a
secondary email address if you have one..."

------
Xophmeister
I worry about people putting things like this on the Internet. Any experienced
developer knows it's a joke and that there are better ways to validate e-mail
addresses; but there are plenty of inexperienced -- copy-and-paste --
developers out there. A colleague of mine did something similar, for example:
he didn't even know what a regular expression was and I could see, as it was a
much simpler pattern than this one, that it would fall quite far from the
mark.

~~~
UK-AL
I think its highlighting the fact that most email regex validation is wrong.
Email is a more complex specification than most people realise.

I don't even think you can parse it using a regular language(though most
regexe engines go beyond this).

------
borplk
1\. Make sure there is at least one '@'

2\. Make sure there is at least one '.'

3\. Make sure the entire thing is at least 4 characters long (@, . and two
other characters)

4\. Resist the temptation for something smarter

5\. Send an email with unique link to verify

~~~
vinw
Strictly speaking user@localserver is valid, but would fail test 2.

~~~
_dan
True, but is it something you actually want to accept?

~~~
vinw
Yes. Or no.

------
jamessantiago
Ah, I remember this famous question at StackOverflow on this:
[http://stackoverflow.com/questions/201323/using-a-regular-
ex...](http://stackoverflow.com/questions/201323/using-a-regular-expression-
to-validate-an-email-address)

The gist is to avoid regular expressions in favor for some third party library
like so:
[http://barebonescms.com/documentation/ultimate_email_toolkit...](http://barebonescms.com/documentation/ultimate_email_toolkit/)

This, of course, becomes an issue when you want to do everything in javascript
and go down the rabbit hole of regular expressions. Sort of like deciding on a
pattern from assumptions as one might make the mistake of doing with names:
[http://www.kalzumeus.com/2010/06/17/falsehoods-
programmers-b...](http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-
believe-about-names/)

------
arianvanp
/@/ usually does the job for me... hehe

~~~
allegory
I use a slightly more complex variation of that:

    
    
       ^.+@.+\..+$
    

Works wonders. I think when testing the addresses on a sign up form, we got
only 0.5% that we couldn't relay too which was a pretty good hit rate.

~~~
paulnechifor
It looks to me like you're neglecting email addresses from
<[http://ai./>](http://ai./>). The 'ai' TLD does have an MX record.

~~~
allegory
Yes; I get told that every time I mention it :-)

------
eli
This is not really "validating" emails in the sense most people think of it.
The RFC is about addressing SMTP envelopes, not entering email addresses. This
would not be appropriate for e.g. checking if an address entered in a signup
form is "valid." This includes a bunch of things that make no sense and aren't
really email addresses (like embedded comments) and meanwhile has no idea that
bogus@example.com is not an address that will actually receive mail. The only
way to know an address is valid is to email it.

It's mostly a joke. One _might_ want to use this if writing a mail server, but
even then...

~~~
baudehlo
This isn't even about SMTP envelopes. RFC 2822 is about email headers, so it's
even worse. Totally invalid for any real world usage outside perhaps an email
client.

------
bowlofpetunias
My most recent encounters with idiotic email validation is that many apps
don't accept anything on a recent TLD. Even f-ing AWS SNS web console didn't
let me add a perfectly valid address in a notification topic.

------
valevk
Can somebody explain why it has to be so long? Are there so many special
cases?

~~~
Xophmeister
Have a read of RFC 2822[1] section 3.4

In brief, the address specification may look like the simple "local" @
"domain", but those subparts can be non-regular (i.e., making them
hard/impossible for a regular expression engine to parse) or contain a lot of
exceptions (e.g., the domain could be google.com, or it could be 12.34.56.78,
or localhost, or a number of other things).

[1]
[https://www.ietf.org/rfc/rfc2822.txt](https://www.ietf.org/rfc/rfc2822.txt)

------
lultimouomo
I think this partly overstates the complexity of validating an e-mail address
in a registration form or similar. If your aim is only to get a syntactically
correct address to which you can try to deliver mail to, you don't need to
accept stuff like:

* "Name surname" <address@example.com>

* Name surname <address@example.com>

* Group name: Member 1 <one@member.com>, "2, member2"<two@member.com>, three@member.com

* guy@nonpubliclyresolvabledomain

There are many other RFC2822-valid kind of addresses that you don't _need_ to
accept if you are not writing an e-mail client, SMTP server, or similia.

------
chronid
Aka: do not attempt to use this if you are sane.

------
seanp2k2
...and this is why you don't ask someone to write regex to match an e-mail
address on an interview.

~~~
gmac
Surely it's exactly why you _might_ ask that — admittedly as a semi-trick
question, which you should probably only direct at experienced people who
really ought to recognise it as such.

Email validation is a problem with a lot of plausible answers — many of them
wrong — so it has the potential to be quite a good discriminant (depending on
whom you're trying to hire, of course).

------
cleverjake
Neat visualization thereof -
[https://www.debuggex.com/r/v99uZHQj97Tkgnjy](https://www.debuggex.com/r/v99uZHQj97Tkgnjy)

------
billpg
I tried pasting this into Expresso so I could browse it with its analyzer
tool, but it refused.

