
Regexp-based RFC822 email address validation - jondot
http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html
======
saurik
This concept been covered on Hacker News so many times before. :(

During a similar conversation 70 days ago, I left a detailed comment that is
also relevant now regarding how RFC822 is actually totally irrelevant for the
concept of e-mail addresses: what it specifies is how to escape the field
values in MIME headers, and thereby has a bunch of rules for how to format an
e-mail address that are really "how to embed an e-mail address in a MIME
document".

RFC821, the SMTP specification for how you actually send e-mail, is closer,
but _has different rules_ about what is allowed because SMTP isn't MIME. A
couple things aren't allowed, and some other things now are allowed and don't
need to be escaped. Why people think users should type e-mail addresses in
RFC822 escaping and not RFC821 escaping makes no sense to me.

However, the real punchline is: why are you asking users to enter e-mail
addresses escaped at all? If you have an HTML form, for example, you don't
need to escape them, as there is no higher-level protocol in which they are
being embedded: the box can contain any characters that are needed, and there
are no concepts like MIME comments, etc..

Asking a user to escape their e-mail address in that box is as silly as asking
them to escape their username or password according to HTML or URL or some
other escaping rules. Or, imagine if they had to enter their full name, but
escaped using MIME encoded words... =?iso-8859-1?Q?=A1Hola,_se=F1or!?= makes
about as much sense as escaping your e-mail address.

My original comment, which contains many more details about which specific
RFCs are involved and what they mean, along with specific examples where
things can get different, and a discussion of the context, here:

<http://news.ycombinator.com/item?id=4486872>

~~~
andrewvc
So, your reply 70 days ago was in reply to me regarding the ruby library Evan
and I wrote.

Funny enough, this perl regex was one of the inspirations for the ruby library
Evan and I wrote (though that uses a PEG for parsing)

Weird how things go around...

------
Jabbles
As others have said, the only way to know if an email address is valid is to
try and send an email. This doesn't mean that this is useless, as you may want
to get users to double-check their input if it doesn't pass this.

Some test cases to think about:
<http://isemail.info/_system/is_email/test/?all>

~~~
oinksoft
There is a middle ground ... I have had success using an approach like the one
found in this Python library: <http://pypi.python.org/pypi/validate_email>

It performs regex validation, and if that passes, tries to get the SMTP server
to validate the user's presence. Quite useful for when a user might fat-finger
their username in the address.

So, for compliant mail servers, sending an email without verifying receipt via
some confirmation token is no more reliable than this method (if it will
falsely validate the user, it will probably falsely digest the message as
well).

~~~
bjourne
Be careful as validating email addresses like that will quickly get you grey
or blacklisted by various spam cops. The method works well for a few
addresses, but you can't use it to validate thousands of addresses at the same
time. I learnt that the hard way. :)

~~~
mosselman
Could you expand on this? Why will it get you black listed? What kind of
situation are we talking about? It would be pretty annoying to get black
listed.

~~~
natep
I imagine a spammer could generate plausible emails and then check them
against SMTP servers to discover the valid ones if they didn't get
blacklisted.

------
xentronium
Please note that RFC822 also covers some unusual forms of writing an address;
all of the following are correct addresses:

* John Doe <john.doe@example.com>

* foo:a@b.example,c@d.example,e@f.example;

* john.doe@example.com (John Doe)

See also the excellent explanation by Jukka Korpela for more details:
<http://www.cs.tut.fi/~jkorpela/rfc/822addr.html>

~~~
deweerdt
Yes, absolutely. When people want to validate an email address, it's more
likely they're referring to the SMTP envelope address:
<http://tools.ietf.org/html/rfc5321#section-4.1.2>

RFC 822 (RFC 5322 in its more recent incarnation) refers to the From: header
in the email, RFC 5321 refers to the address used in the 'MAIL FROM:' (and
RCPT) SMTP command.

------
rplnt
It was posted a fews days ago in this thread
<http://news.ycombinator.com/item?id=4774426> with a great comment:

> If seeing this doesn't make you second guess using a RegExp when a parser is
> more appropriate, well...you might be a Perl programmer?

~~~
fungi
i'm building a simple offline webapp for collecting email addresses at an
event on an ipad with no internet connection.

im using a simple regex (may use this one instead) to validate email before
sticking in localStorage for latter retrial... if not with regex, how should i
validate the email addresses?

~~~
jlarocco
As somebody else said, don't.

If you really feel you need some kind of validation, have two email fields so
the user can enter it twice and double check it themselves. No matter how much
effort you put into it, and how complicated your validation code is, if the
users want to mess with you, they can always just enter a valid fake address,
so there's no point wasting a lot of energy on it because it's easy to defeat
anyway.

And since it's impossible to validate addresses with a regular expression,
there's a small chance you'll reject a valid address and look dumb.

~~~
michaelhoffman
I see the two fields thing a lot. It's pointless since I can copy and paste
into the second field. Unless they use JavaScript to disable pasting, which is
obnoxious.

~~~
mikeash
I'd wager that being likely to mistype one's own e-mail address correlates
highly with not realizing you can copy/paste between the fields.

------
meaty
I only check for an @ and at least one character either side. Anything else is
the user's problem.

~~~
jnazario
actually you may want to make sure they have at least four characters
separated by a dot, e.g. . _\@\_ \\.[..]+ ... and i think this is how the
regex begins ...

my point though is that you can't send mail to a TLD, you need a domain name.
and i don't think we have any one character TLDs.

this is quickly turning into an exercise where you see how such a regex starts
to happen. "well, then you have to consider this case ... and handle these
exceptions ... and then enforce this ..."

~~~
meaty
I really don't care. We also, in automated test environments, send email to
user@host so it doesn't escape the internal network.

I don't have to use a regex if I use the methodology I specified.

Simple Java implementation off the top of my head. Very fast, no imports or
expression compilation required:

    
    
        bool isValidEmailAddress(String emailAddress) {
            int at = emailAddress.indexOf('@');
            if (at < 1 || at == emailAddress.length() - 1)
                return false;
            return !Character.isWhiteSpace(emailAddress.charAt(at - 1)) &&
                   !Character.isWhiteSpace(emailAddress.charAt(at + 1));
        }
    

Improvements welcome. Should be portable to any other language trivially.

~~~
meaty
C version because I was bored:

    
    
       int is_valid_email(char *email) {
               char *at = strstr(email, "@");
               if (at <= email || at == strlen(email) + at - 2)
                       return 0;
               return !isspace(*(at - 1)) && !isspace(*(at + 1));
       }
    

Test cases:

    
    
       assert(0 == is_valid_email(""));
       assert(0 == is_valid_email("@b"));
       assert(0 == is_valid_email("b@"));
       assert(0 == is_valid_email("d@ "));
       assert(0 == is_valid_email(" @d"));
       assert(0 == is_valid_email("   "));
       assert(1 == is_valid_email("a@b"));
       assert(1 == is_valid_email("John Smith <x.y@z.com>"));

------
culshaw
A more modern thought.

Do you really need to test that strictly for an email address?

If the user is trying to give you a fake email address, chances are they don't
want to be part of your service/offering anyway.

I test for an @ and characters either side, that's most flexible bases
covered.

I know this doesn't apply to all scenarios but it's one worth considering.

~~~
nathan_long
You probably want to ensure that there's a dot somewhere to the right of the
@, also, but yes, that sounds sane to me.

"something@something.something"

Start of line, at least one non-@, @, at least one character, dot, at least
one character, end of line.

^[^@]+@.+\\..+$

Test: <http://rubular.com/r/G69q1k6fP2>

If it fits that, try emailing them.

~~~
perokreco
There doesn't need to be a dot on the right side.

~~~
xyzzy123
I found your comment fairly cryptic. I had a fun twenty minutes trying to work
out what you meant, and under what circumstances dotless RHS in email
addresses might be legal.

I suppose from the RFC, sure the spec doesn't require dots.

For example, I can use <http://mythic-beasts.com/~pdw/cgi-bin/emailvalidate>
and verify that sure, '1@2!3!4' is a valid RFC822 email address. But I think
e.g. UUCP-style addreses are a pathological case, and we don't _really_ want
users signing up with them.

Another option would be intranets, e.g. 'baker@internal', but again I think
that's being a bit pedantic, since most people on HN are writing webapps for
the public Internet, not mail clients.

So can we get an email with foo@<some-dotless-string> routed across the public
Internet? Even a bounce would do :)

You might be able to do a riff on xyzzy123@[23.55.211.36] (e.g
xyzzy123@[389534500] or xyzzy123@[1737D324]. However, do you _really_ want
your users to specify these?

There are mx records for existing TLDs (e.g. com, org, au, mx) - but all the
mx records I tried refused connections on port 25. So no mail for
'xyzzy123@com' :(

So gTLDs are another option, and there was a time when it looked like
xyzzy123@xyzzycorp might route (as long as it didn't collide with anything on
the local resolver's search list). But it seems that dotless use of gTLDs is
seriously deprecated at this point, and that ICANN will treat it as a TOS
violation: [http://domainincite.com/10254-why-domain-names-need-
punctuat...](http://domainincite.com/10254-why-domain-names-need-punctuation)

Basically, ICANN's conclusion was that dotless TLDs are a terrible idea for
many technical reasons.

I looked into IDNs too, but of course due to the way DNS works, you can't
really get around the dots.

So the conclusion of all this is that:

1) Using an RFC822 regex is a terrible way to check emails. The things it
thinks are valid are MUCH wider than what you actually want.

2) You should probably check the RHS against a public suffix list if you are
e.g. accepting a user email address on a signup page. If you accept dotless
TLDs or other constructions (e.g. ips on RHS) there is some (low, but nonzero)
risk that a malicious user could cause your systems to route mail to your
other systems internally.

------
fuzzix
Email validation is indeed a complex and occasionally surprising beast.

Clearly this regex is impractical, but any validation you invent yourself is
likely incorrect. The best way to validate email addresses remains sending an
email to them.

~~~
rmccue
+1, seeing as you're probably going to be sending an activation email anyway.
You can do some practical checks, like checking that there's a '@' in the
email, and probably trimming spaces (I think leading/trailing whitespace isn't
allowed, from memory).

~~~
HyprMusic
In RFC822 spaces are actually allowed, I think they just have to be in a
quoted string.

------
3ds
As per html5 spec the recommended regex is:

/^[a-zA-Z0-9.!#$%&' _+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\\.[a-zA-Z0-9-]+)_ $/

[http://www.w3.org/TR/html5/states-of-the-type-
attribute.html...](http://www.w3.org/TR/html5/states-of-the-type-
attribute.html#valid-e-mail-address)

~~~
stratoukos
This will not match addresses with non latin characters.

~~~
tedunangst
Addresses do not have non Latin characters.

~~~
Aissen
Which is false:

<http://tools.ietf.org/html/rfc6530>

<http://tools.ietf.org/html/rfc6531>

<http://tools.ietf.org/html/rfc6532>

<http://tools.ietf.org/html/rfc6533>

~~~
tedunangst
Relying on standards that new for something like email would be a mistake IMO.
Ymmv.

------
roel_v
This is from "Mastering Regular Expressions", by Jeffrey Friedl, O'Reilly
1997. The book presents it as a 'fun' example of how to write huge regex'es
that are still understandable and maintainable (the version posted here is
without all the comments that are in the book).

------
bryanlarsen
This is a generated regexp, it's not hand crafted. Making fun of it is like
pasting up machine code compiled from C and saying "machine code sucks".

------
mosselman
E-mail validation can be useful, but I would stay away from this thing. Look
at what you are trying to do from a higher level.

Most likely the user wants something from you as well as you from them. If a
user gives you a bad e-mail, despite a very basic e-mail regex, whatever, they
won't get an e-mail, not my problem.

If it is to register on your website, just let them, send them a confirmation
e-mail to their 'email', meanwhile allowing them to use the system (or not).
Then if after x-time they haven't confirmed, just delete the user again. This
will save you a lot of trouble.

If you want something more high-tech like checking a huge list of e-mails in a
system you could go with a solution suggested below, just send them an e-mail.

Regex is evil!

------
billyjobob
This regex tests for RFC 822 compliance, but what if you get a user who has an
email address that itself doesn't comply with the RFC?

~~~
jacques_chester
Actually, it doesn't.

Email addresses can't, strictly speaking, be tested for with regexes. This one
"only" tests various nestings to I think about 3 levels deep.

------
gvalkov
I'm surprised there isn't a Perl6 version of _Mail::RFC822_. This is exactly
the kind of thing that Perl6 rules[1] are supposed to excel at. It would be
good publicity, especially now that rakudo has usable releases.

[1]: <http://en.wikipedia.org/wiki/Perl_6_rules>

------
jmedwards
I had a quick glance through the expression, looks good from here.

~~~
jmedwards
(where here = an asylum for the insane.)

------
chris_wot
The only thing I've ever seen that is worse than this is the sendmail
configuration file.

------
habosa
I find it pretty entertaining that this RegEx is so big that visual patterns
emerged. In my browser there are clear diagonal lines of "@" symbols across
the RegEx. If it looks like ASCII art, your RegEx is probably too big.

------
JimWestergren
With PHP the following simple code works great for me:

    
    
      function validate_email($email) {
        if(filter_var($email, FILTER_VALIDATE_EMAIL) === FALSE) {
          return false;
        } else {
          return true;
        }
      }

------
smackfu
Validation of emails is pretty pointless since most errors will be typos that
pass the regex anyway. You're better off trying to give warning messages based
on common typos.

------
bdg
Great, now I have something to strike fear into the hearts of new devs who ask
me about email validation.

I'm not sure I'm comfortable using a regex like this in production. Sure, we
can write lots of tests and ensure it performs correctly, and the rfc is
unlikely to change so once proven solid it won't change... but using this just
feels wrong. Like I'm using the dark side of the force.

------
crusso
I've been watching people deal with this problem for years and years... why? I
can parse a CSV file far more easily.

You'd think there would be an RFC that specifies a simple email address format
that everyone can follow. If you don't conform to that format, your email gets
dropped on the floor until you get a better client.

~~~
DanBC
This is one of the things that people really want for email2.

Unfortunately email works well enough, and has such a massive install base,
that email2 is never going to happen.

That's why you see so many "Email but not email" startups.

------
Smrchy
For JS validation on the client i use this:

[http://blog.tcs.de/javascript-near-perfect-email-
validation-...](http://blog.tcs.de/javascript-near-perfect-email-validation--
check-routine-as-string-prototype/)

A lot shorter and has only 3 cases that would not be detected. Enough for
99.x% of all entered emails.

------
dutchbrit
There's a very nice function in PHP that validates emails.

filter_var('foo@bar.com', FILTER_VALIDATE_EMAIL);

The actual beast: [https://github.com/php/php-
src/blob/master/ext/filter/logica...](https://github.com/php/php-
src/blob/master/ext/filter/logical_filters.c#L499)

------
CalvinCopyright
Makes me think of the first reply to this StackOverflow question:

[http://stackoverflow.com/questions/1732348/regex-match-
open-...](http://stackoverflow.com/questions/1732348/regex-match-open-tags-
except-xhtml-self-contained-tags)

~~~
michaelhoffman
But, unlike XHTML, it _is_ possible to validate an e-mail address with a
regular expression (assuming comments have been removed).

~~~
_delirium
It's also, unlike XHTML, not particularly easy to do it with a parser: most of
the complexity of the regex is due to the litany of edge cases for what
constitutes a valid email address, not due to it being a regex.

------
topbanana
The point of regex is to be human readable. This might as well be a binary
blob

------
1nvader
This regex deserves a downvote! If you don't undertand it (and i guess you
don't if it's not written by hand) - don't never ever use it!

The only way to validate a email adress is to send a validation mail/link.

------
dfox
email address validation should not be motivated by what is valid address by
some RFC, but what you feel confortable passing to your MTA, because you have
exact understanding of what will happen. On the application side you probably
don't want to store adresses with comment and real name fields and other such
only human readble data. My rules are: contains exactly one @, contains zero
or more +, does not contain any other characters that are special cased by
this (notably ',', ';' and '!').

------
turshija
'@'.' is a valid email address.

seriously ? :)

------
jrajav
I can't find an online tester that doesn't choke on this, but if you're
curious to try it out, here's the token Perl one-liner:

    
    
        echo "x@y.com" | perl -lne 'print "$_ is valid!" if /(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*))*)?;\s*)/'

~~~
ByronFortescue
For some reason this is valid as well?

    
    
        echo "blaat@blaat" | perl -lne 'print "$_ is valid!" if /(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*))*)?;\s*)/'

~~~
nicktelford
Because the domain name doesn't need to be fully-qualified; it can just be a
machine name on the local network.

To illustrate this: "user@localhost" is a valid email address.

All these overly complex regular expressions miss a major point: even if the
e-mail address is valid according to the RFC it doesn't guarantee that:

    
    
      * The domain name exists.
      * The user exists at the specified domain.
      * All of the SMTP servers between you and the recipient adhere exactly to the RFC.
      * The user actually owns or has access to the e-mail account in question.
    

Whenever I need to validate an e-mail address, I just use something simple
like ".+@.+" to ensure sanity and move on to more pressing matters. As a
friend once pointed out to me: it's usually far more damaging to reject valid
e-mail addresses than to accept invalid ones; be liberal in what you accept
and verify the e-mail address by sending them a confirmation mail.

~~~
qznc
Yes, especially websites should accept more than [a-zA-Z0-9] for the user
part. This would allow filtering emails. E.g. gmails can tag emails this way:
john.doe+spam@gmail.com

------
adv0r
This validation ignore one exception :

Gmail allows users to enter an arbitrary number of dots .

Therefore these _are_ a valid email addresses :

your......name@gmail.com y.o.u.r...name....@gmail.com

and all resolve to yourname@gmail.com

[http://support.google.com/mail/bin/answer.py?hl=en&ctx=m...](http://support.google.com/mail/bin/answer.py?hl=en&ctx=mail&answer=10313)

~~~
Lockyy
And this is a problem I constantly run into with web services, some throw an
error if I use a ., others throw an error if I try to do something like
example+note@example.com. I use one or the other to help sort emails. It's
even worse when a sign up form accepts an email in the latter format, but the
login form does not for some reason. So I have an account with a note added
but I cannot login. I had this problem with the Odeon website for a while,
eventually had to phone them up and ask them to change my accounts email
address to one without a note.

------
message
Old as hell

~~~
ta12121
It boggles my mind too. This is new to people here?

