Hacker News new | past | comments | ask | show | jobs | submit login

I only check for an @ and at least one character either side. Anything else is the user's problem.

Same here, and personally I don't see the justification for spending all those CPU cycles going through a massive regular expression such as this one.

I'd rather put this on the client side (javascript), as a validation to make sure the user doesn't supply an invalid e-mail address by accident (i.e. for his own convenience and nothing else).

compared to pushing the response back out to the client, the cost of matching against that regex is going to be insignificant, even with it being as monstrous as it is.

(note that I'm not saying using that regex is a good idea!)

actually you may want to make sure they have at least four characters separated by a dot, e.g. .\@\\.[..]+ ... and i think this is how the regex begins ...

my point though is that you can't send mail to a TLD, you need a domain name. and i don't think we have any one character TLDs.

this is quickly turning into an exercise where you see how such a regex starts to happen. "well, then you have to consider this case ... and handle these exceptions ... and then enforce this ..."

my point though is that you can't send mail to a TLD

You can: http://serverfault.com/questions/154991/why-do-some-tld-have...

For instance the pope could get pope@va - if he wanted...

Try connecting to those on port 25, see if any accept mail... they don't tend to.

But they can. In this case, you probably won't alienate any of your potential users but as you add more and more arbitrary rules, you will.

Fair call, I'm all for fewer arbitrary rules. Especially if it's less code.

I still consider the "oh, but it's valid to have dotless on RHS!" to be one of those facts which is true, but irrelevant.

Those three hypothetical users can't receive email sent from most major web providers (e.g. gmail, who don't allow dotless To:), can't sign up to most web sites (who get their validation wrong), and are at the mercy of pitiless local dns resolver rules (pope@va will go to pope@va.com for US users, a lot of the time).

Try connecting to those on port 25, see if any accept mail... they don't tend to.

That's not a test for validating an email host either - looking up MX records would be more appropriate here.

I actually meant the MXs, sort of thought that went without saying.

Try `dig mx va` instead.

you can't send mail to a TLD

Not only is it possible, when I used to work for a company that administered a TLD, I did just that, sending and receiving email with the address t@TLD.

Working for a TLD admistrator suddenly became much more desirable to me.

I really don't care. We also, in automated test environments, send email to user@host so it doesn't escape the internal network.

I don't have to use a regex if I use the methodology I specified.

Simple Java implementation off the top of my head. Very fast, no imports or expression compilation required:

    bool isValidEmailAddress(String emailAddress) {
        int at = emailAddress.indexOf('@');
        if (at < 1 || at == emailAddress.length() - 1)
            return false;
        return !Character.isWhiteSpace(emailAddress.charAt(at - 1)) &&
               !Character.isWhiteSpace(emailAddress.charAt(at + 1));
Improvements welcome. Should be portable to any other language trivially.

C version because I was bored:

   int is_valid_email(char *email) {
           char *at = strstr(email, "@");
           if (at <= email || at == strlen(email) + at - 2)
                   return 0;
           return !isspace(*(at - 1)) && !isspace(*(at + 1));
Test cases:

   assert(0 == is_valid_email(""));
   assert(0 == is_valid_email("@b"));
   assert(0 == is_valid_email("b@"));
   assert(0 == is_valid_email("d@ "));
   assert(0 == is_valid_email(" @d"));
   assert(0 == is_valid_email("   "));
   assert(1 == is_valid_email("a@b"));
   assert(1 == is_valid_email("John Smith <x.y@z.com>"));

boolean isValid = (email != null ? email.contains("@") : false)

the goal of client-side validation is to ensure that you can actually make that network call to do a real validation. the rfc is so complicated it's not even worth getting into this business, as evidenced by op's regex.

would love to see some unit tests for that thing.

But you can send mail to, say, a machine listed as "a" in your hosts file.

And with the new personalized TLDs, wouldn't you be able to have something like ceo@nike? I just check for an @ and at least a character after and before it.

Personally that pisses me off as it requires that I fully qualify all my local email addresses as what happens if I have the hostname 'nike' on my local net?

It's going to get messy. I use a lot of hostnames which may end up being TLDs.

wow, thanks for the edumacation :) obviously didn't know soe of those things, and completely ignored the local domain bits.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact