How faulty checking for valid email addresses can hurt you

jnorthrop · on June 11, 2009

I gave up validating an email address a long time ago. Now all I will do is sanitize it to be safe from malicious attempts. The tipping point for me was trying to decipher an unreasonably long regular expression written months prior to find out why someone's funky address (like the author's) was failing -- wasted time.

You can try, but you can't prevent people from doing dumb things, and in the attempt you end up potentially making mistakes like in the case of the article. Users will always do something unanticipated. With that in mind I only validate the things that really matter (e.g. credit card numbers, social security numbers, etc.). If someone wants to enter "jane at aol" as their address, I have to think, "is this going to a problem for my company?" If not, don't bother to try to fix it.

qeorge · on June 11, 2009

We check for an MX record associated with the domain, and leave it at that. I'd rather deal with spam than lose a customer.

there · on June 11, 2009

a domain without MX records is valid. if a domain/host doesn't have an MX record, MTAs are supposed to try delivering to the domain/host itself.

qeorge · on June 11, 2009

Thanks for the tip, I didn't know that. We should change to checking for valid DNS entries then.

bdfh42 · on June 11, 2009

My current policy on validating email addresses is to issue a warning if a given email address fails a check against a standard that covers the vast majority of addresses in use. Validating against the whole range of possible address content is near pointless. The warning simply asks the user to check their email in case of errors but allows a page submission following the display of that warning.

cperciva · on June 11, 2009

My current policy on validating email addresses is to issue a warning if a given email address fails a check against a standard that covers the vast majority of addresses in use.

My policy on validating email addresses is "if I send you an email and it arrives, your email address is valid".

Why do you need to "validate" an email address? Knowing that an address is syntactically valid doesn't do anything to confirm that it will reach the intended target.

bdfh42 · on June 11, 2009

You are correct - but - at your initial point of contact you have (perhaps) a single opportunity to make contact with your prospective customer. You should do all you can (without boring or annoying your prospect) to capture enough accurate information to ensure that the next step succeeds.

I agree - long term relationships are built on (in the context of this conversation) an exchange of emails but why blow it at the first hurdle?

Hexstream · on June 11, 2009

"Knowing that an address is syntactically valid doesn't do anything to confirm that it will reach the intended target."

However, knowing that an address is syntactically invalid does confirm that it won't reach the intended target. If you don't mess up the validation.

Might be better than having the user wait infinitely for an important transactional email that will never arrive because of a dumb typo.

kree10 · on June 11, 2009

Agreed. At [old job] we saw a lot of user-entered e-mail addresses that were "correct" (according to so-called validators) but wrong, like "www.screenname@aol.com" or "example@yahoo".

ovi256 · on June 11, 2009

The problem is that the RFC is amazingly permissive : almost anything goes in email adresses, including whitespace! A RFC-compliant regex is incredibly complex [1]. Furthermore, one would need to read not one, but five RFCs in order to grok everything: http://tools.ietf.org/html/rfc5322 http://tools.ietf.org/html/rfc3696 http://tools.ietf.org/html/rfc5321 http://tools.ietf.org/html/rfc4291#section-2.2 http://tools.ietf.org/html/rfc1123#section-2.1

[1] Great overview of the problem, plus a PHP validator. http://www.dominicsayers.com/isemail/

nailer · on June 11, 2009

Mine is not to re-write the wheel. Writing a correct email address is indeed non-trivial, but there's at least one well known module written by someone who's actually read the RFCs and won't trash users using plussing and minusing. Use that, don't write your own.

yhnbgty · on June 11, 2009

Depends on your customer base. It might be better to reject one show-off who managed to put a backspace in his email in return for helping 1000s of customers who enter www.name@aol.com

It's possible to have a 6 character email address, but for most users checking that they have entered more than 6 chars would be good.

mtpark · on June 11, 2009

There's no way to get around invalid email addresses. Anybody can put in something like asdhsouhdosd@ahsoufhsdof.com and it would pass most regex checks. I think the best solution is to just check for an "@" sign and a "." after it.

eli · on June 11, 2009

Yup, I deal with a lot of email data at work and this is pretty much what I do. People try to get way too clever with their validation (I pity anyone stuck with an email at a .museum TLD), and the spambots will just put real-looking fake ones anyway.

timcederman · on June 11, 2009

The plus address issue is very very frustrating. The worst is when you try to unsubscribe from somewhere, and they don't URL-encode your email address in the unsubscribe URL. If you don't know the change the "+" to a "%2B", you're stuck.

JimmyL · on June 11, 2009

While I'm not as unlucky as a .museum user, I use a .me address as my main public contact address, and you'd be surprised at the number of well-designed sites that don't think it's a valid email.