Hacker News new | past | comments | ask | show | jobs | submit login

As others have said, the only way to know if an email address is valid is to try and send an email. This doesn't mean that this is useless, as you may want to get users to double-check their input if it doesn't pass this.

Some test cases to think about: http://isemail.info/_system/is_email/test/?all




Note that the source for that is_email test (I hadn't seen the library before) is here: https://github.com/dominicsayers/isemail

That library is very good for telling you exactly what's wrong with an email address, but looking through the logic, you can see that handling all the edge cases involves significant effort.

It seems to me that this monomaniacal focus on RFC 822 / RFC 5??? compliance is missing the point a bit. That stuff is important if you're writing a mail client or a server, not so much for a signup form or web site use.

I am going to stick my neck out and say that for the purposes of a signup form, emails should be server-side validated as: anyCharButNullOrAt@valid-looking-domain.com

1. You split the email address into a local part and a domain, at the @.

2. The local part is allowed to contain anything but null, or @. This will keep all the people who go john.doe+furrylist@gmail.com happy.

3. The domain section is validated against length and dns charset (letters, numbers, hyphens, dots) and then checked against e.g. the mozilla public suffix list for a valid TLD. This prevents admin@[10.0.0.2] from receiving signup emails.

4. After that you can punt to your mailer, as long as you do not give any feedback to the users (aside from reciept or non-receipt of email) on the success of delivery or verification (to prevent spam / bruting). Just say "an email has been sent", or somesuch.

Anyone with two @s in their email address ("foo@bar"@bar.com) or (foo\@bar@bar.com) is evil, and should not be encouraged. You have to draw the line somewhere, and that's where I draw it. One reason these people are not worth spending time on is because many common clients such as gmail won't let you send to such addresses in any case. Try it...

Let's thank our stars we don't have to handle Unicode addresses yet...


There is a middle ground ... I have had success using an approach like the one found in this Python library: http://pypi.python.org/pypi/validate_email

It performs regex validation, and if that passes, tries to get the SMTP server to validate the user's presence. Quite useful for when a user might fat-finger their username in the address.

So, for compliant mail servers, sending an email without verifying receipt via some confirmation token is no more reliable than this method (if it will falsely validate the user, it will probably falsely digest the message as well).


Be careful as validating email addresses like that will quickly get you grey or blacklisted by various spam cops. The method works well for a few addresses, but you can't use it to validate thousands of addresses at the same time. I learnt that the hard way. :)


@work we have done a lot of this kind of validations using a custom Tcl script, and luckily we have avoided to be blacklisted till now. Maybe it's due to the fact that validation requests come from an IP address marked as MX for the domain of the sender email address.

Anyway not all SMTP servers are RFC compliant, they respond with a 2XX code to a "RCPT TO" even if user/mailbox doesn't exists. The same thing applies also if the SMTP server is acting as a relay and has no immediate access to the delivering system.


That makes a lot of sense ... a reliable way to determine if a user is real could easily be abused.


Could you expand on this? Why will it get you black listed? What kind of situation are we talking about? It would be pretty annoying to get black listed.


I imagine a spammer could generate plausible emails and then check them against SMTP servers to discover the valid ones if they didn't get blacklisted.


I check if there's indeed a mail server in DNS to avoid the hassle of waiting for potentially slow SMTP servers to respond - for most cases, it works and I end up catching bogus email addresses from the domain names themselves.


A MX record is not necessary for a host to be able to receive email.


Note that that library can also check for an MX server. It goes Valid string -> MX exists -> User exists (you can choose to only check for MX, or only for valid string).


Really? How is mail routed then?


It should fall back to a A record. That's not exactly best practice, but if you're compliant that is something that you should be able to handle.


Sending an email is the only way to know whether an address is working for a user. Validating whether it meets the standards that define how email addresses should look is a different problem, which is what the regex is going after.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: