I only check for an @ and at least one character either side. Anything else is t...

sgt · on Nov 16, 2012

Same here, and personally I don't see the justification for spending all those CPU cycles going through a massive regular expression such as this one.

I'd rather put this on the client side (javascript), as a validation to make sure the user doesn't supply an invalid e-mail address by accident (i.e. for his own convenience and nothing else).

blibble · on Nov 16, 2012

compared to pushing the response back out to the client, the cost of matching against that regex is going to be insignificant, even with it being as monstrous as it is.

(note that I'm not saying using that regex is a good idea!)

jnazario · on Nov 16, 2012

actually you may want to make sure they have at least four characters separated by a dot, e.g. .\@\\.[..]+ ... and i think this is how the regex begins ...

my point though is that you can't send mail to a TLD, you need a domain name. and i don't think we have any one character TLDs.

this is quickly turning into an exercise where you see how such a regex starts to happen. "well, then you have to consider this case ... and handle these exceptions ... and then enforce this ..."

Jabbles · on Nov 16, 2012

my point though is that you can't send mail to a TLD

You can: http://serverfault.com/questions/154991/why-do-some-tld-have...

For instance the pope could get pope@va - if he wanted...

xyzzy123 · on Nov 16, 2012

Try connecting to those on port 25, see if any accept mail... they don't tend to.

macspoofing · on Nov 16, 2012

But they can. In this case, you probably won't alienate any of your potential users but as you add more and more arbitrary rules, you will.

xyzzy123 · on Nov 16, 2012

Fair call, I'm all for fewer arbitrary rules. Especially if it's less code.

I still consider the "oh, but it's valid to have dotless on RHS!" to be one of those facts which is true, but irrelevant.

Those three hypothetical users can't receive email sent from most major web providers (e.g. gmail, who don't allow dotless To:), can't sign up to most web sites (who get their validation wrong), and are at the mercy of pitiless local dns resolver rules (pope@va will go to pope@va.com for US users, a lot of the time).

VMG · on Nov 16, 2012

Try connecting to those on port 25, see if any accept mail... they don't tend to.

That's not a test for validating an email host either - looking up MX records would be more appropriate here.

xyzzy123 · on Nov 25, 2012

I actually meant the MXs, sort of thought that went without saying.

wooster · on Nov 26, 2012

Try `dig mx va` instead.

mootothemax · on Nov 16, 2012

you can't send mail to a TLD

Not only is it possible, when I used to work for a company that administered a TLD, I did just that, sending and receiving email with the address t@TLD.

anonymouz · on Nov 16, 2012

Working for a TLD admistrator suddenly became much more desirable to me.

meaty · on Nov 16, 2012

I really don't care. We also, in automated test environments, send email to user@host so it doesn't escape the internal network.

I don't have to use a regex if I use the methodology I specified.

Simple Java implementation off the top of my head. Very fast, no imports or expression compilation required:

    bool isValidEmailAddress(String emailAddress) {
        int at = emailAddress.indexOf('@');
        if (at < 1 || at == emailAddress.length() - 1)
            return false;
        return !Character.isWhiteSpace(emailAddress.charAt(at - 1)) &&
               !Character.isWhiteSpace(emailAddress.charAt(at + 1));
    }

Improvements welcome. Should be portable to any other language trivially.

meaty · on Nov 21, 2012

C version because I was bored:

   int is_valid_email(char *email) {
           char *at = strstr(email, "@");
           if (at <= email || at == strlen(email) + at - 2)
                   return 0;
           return !isspace(*(at - 1)) && !isspace(*(at + 1));
   }

Test cases:

   assert(0 == is_valid_email(""));
   assert(0 == is_valid_email("@b"));
   assert(0 == is_valid_email("b@"));
   assert(0 == is_valid_email("d@ "));
   assert(0 == is_valid_email(" @d"));
   assert(0 == is_valid_email("   "));
   assert(1 == is_valid_email("a@b"));
   assert(1 == is_valid_email("John Smith <x.y@z.com>"));

readme · on Nov 16, 2012

boolean isValid = (email != null ? email.contains("@") : false)

the goal of client-side validation is to ensure that you can actually make that network call to do a real validation. the rfc is so complicated it's not even worth getting into this business, as evidenced by op's regex.

would love to see some unit tests for that thing.

derefr · on Nov 16, 2012

But you can send mail to, say, a machine listed as "a" in your hosts file.

Adirael · on Nov 16, 2012

And with the new personalized TLDs, wouldn't you be able to have something like ceo@nike? I just check for an @ and at least a character after and before it.

xyzzy123 · on Nov 16, 2012

No. See: http://domainincite.com/10254-why-domain-names-need-punctuat...

meaty · on Nov 16, 2012

Personally that pisses me off as it requires that I fully qualify all my local email addresses as what happens if I have the hostname 'nike' on my local net?

Adirael · on Nov 16, 2012

It's going to get messy. I use a lot of hostnames which may end up being TLDs.

jnazario · on Nov 17, 2012

wow, thanks for the edumacation :) obviously didn't know soe of those things, and completely ignored the local domain bits.