Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ruby's Email Address Regexp (github.com/ruby)
49 points by mooreds on Jan 5, 2022 | hide | past | favorite | 63 comments



There are basically three levels of address checking:

1) You need to validate an email field for login or a website - checking for an @ mark with some text before and at least one . after the @ will do for this.

2) You need to do some sort of address validation, library regexps like this will do for 99.9...% of these.

3) You are building an email handling system which needs to actually support the RFCs, in which case regexp will not handle what you need, and you need to use a proper parser, like https://github.com/mikel/mail/tree/master/lib/mail/parsers

Ref: I am the original author of the Ruby mail gem.


> at least one . after the @ will do for this

technically not required...

  [adam@solomon]$ dig +noall +answer mx ai
  ai.   21572 IN MX 10 mail.offshore.ai.


I've read this more than once, and it basically always boils down to: "The only way to verify an email address is to send an email to it."

In my opinion, the optimal validation is soft validation: "your email address doesn't look right, continue with it?"


> In my opinion, the optimal validation is soft validation: "your email address doesn't look right, continue with it?"

And then hope the server's e-mail workflow can handle subaddressing because you entered in something like "username+service@gmail.com".

Seems like a lot of folks don't like '+', even though it's been part of the e-mail system since the 1980s.


Yep. Back in university I was acquainted with the fellow who set up the MX record for .ai (and assigned himself the memorable address n@ai). IIRC he was involved in planning an academic conference to be held there.


For the record, it is owned by Anguilla (in the Caribbean):

* https://en.wikipedia.org/wiki/.ai


Yeah I met him once. I didn't want to call him out on it :)


As soon as I read the OP's comment I knew someone would reply to say the dot was not technically required, but I didn't expect to see an actual publicly addressable example!


I actually searched and was surprised no one beat me to it.


ok, impressed. Happily the mail gem will see that as a valid address :)


Heh. A while back I worked at <now bought-out startup> whose main business included handling emails, and they were looking to speed it up, so I came up with this code to do the header parsing, which was 250x faster than the mail gem... but they ended up not going with it due to risk >..<

https://gist.github.com/pmarreck/8476538


Level 4:

You need an address backed by an actually valid mailbox. At which point you need to send an email to the address to validate.


At which point, why even bother with 1, 2, and 3? Just try to send it.


Well it’s usually worth doing 1. It’s super easy, and it catches silly typos (like people putting their name instead of their email address or something)


You can certainly do it with regex, given a certainly non-regular regex implementation and a probably unbounded computational space. But you shouldn’t.

Ref: I (regrettably) have one of the top SO answers for matching URLs. It’s wrong in a few different ways and I’ve stopped fielding edits/comments for the last few years.


Thank you. Your gem helped me multiple times throughout the years and helped me dealing with some hard problems


The most helpful thing I've used in the real world is something that looks for common typographical errors, even if the email is technically valid.

Like, if the user types "john.doe@gnail.com", it pops a dialogue asking "Did you mean john.doe@gmail.com?". But lets them keep what they typed, or do a different fix if needed.

I found some JS called "mailcheck": https://github.com/mailcheck/mailcheck

I assume it's using popularity statistics, edit distance, etc, to come up with suggestions. There are updated clones that use react, vue, etc, instead of jquery.

With a working ecommerce site, this improved the percentage of correct emails more than anything else I tried, and I had tried many things. Because it's a bad situation when you've taken someone's money and have nothing other than a shipping address to contact them if something goes wrong (bad shipping address, out of stock situation, etc).


The one from Perl's Email::Valid is significantly harder on the eyes:

https://metacpan.org/dist/Email-Valid/source/lib/Email/Valid...

Edit: Email::Valid is fairly well respected for getting the rules right...


Because it also valudates Cyrillic, Hebrew, Arabic and Chinese e-mail addresses.


Classic Perl. Those were the days.


The best email regex just checks for an @ symbol with something before it, and something after it. Anything more complex is a waste of time.


There is really no point going further than this. It's more likely that someone will type the email wrong but still valid than they will type it completely invalid. There are also some completely wrong validators out there which expect the TLD to be 2-3 chars only.

The ultimate email validation is just trying to send an email to the address and confirming with a code/link.


I have had people rewrite my .io to .com because they were sure it was wrong, or something like that.


Never mind the regex, `email.indexOf("@") > 2` does the trick and faster if you happen to need to check many emails. All websites these days require verification of emails (regardless of whether or not it's necessary), and if that's not enough validation, I don't know what is!


I would use `email.includes('@')`, it only needs to be polyfilled for IE since it's in every modern browser JS engine.


Why 2? Greater than 0, sure.


The index of the first character is 0 and an email must have a local part so that means the index of "@" has to be at least 1. My guess is OP also forgot the index of the first character is 0 instead of 1 resulting in 1+1=2 (that or they meant >=).

Off by one errors are about half of working with arrays.


This is actually an uncommon example of an off-by-two error as they used strictly greater than 2.

  a@domain   fail
  ab@domain  fail
  abc@domain ok


Yeah...I have an email address I use a lot, "me@(domain)". It's perfectly valid, despite indexOf("@") == 2.


Heh, I actually meant `>=`. But as others have pointed out, that still excludes 2 letter users. Really it should be `>= 0`.


> 0 I would think, no? Can an email address be '@domain', no preface?


The regex listed there isn’t that much more complex. It’s basically a check for *@a*.* where * is some minimal whitelist of valid characters and a is an alphanumeric to start the domain name.


And even though that sounds reasonable, it is still wrong. Technically, you do not even need the dot, if you add an MX record to a whole TLD or have some other funky DNS setup.

Source: https://www.netmeister.org/blog/email.html


I'll quote the WHATWG HTML spec, which that Ruby code cites as its source (https://html.spec.whatwg.org/multipage/input.html#valid-e-ma...):

> This requirement is a willful violation of RFC 5322, which defines a syntax for email addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

Personally, I say if it's good enough for Web browsers, it's probably good enough for your app.


The linked regexp doesn't require the TLD: https://rubular.com/r/Y9G4HR2Ox8gzqd


Yep! I know that @mil addresses used to be a thing. With the explosion of new TLDs I would be unsurprised if name@gmail addresses became a thing in the future and silly devs handled it by adding an exception to their huge regex instead of just using “.+@.+”.


This depends on your use-case. If you're writing a mail agent, then you do probably want to parse email addresses in their entirety. If you're writing a website that accepts email addresses and wants to make sure the user doesn't just type "foo", then yeah, check for `.+@.+` and call it a day.


If you are writing a webpage that accepts email addresses, then please just use <input type="email"> and check for:

    document.querySelector("[type='email']").validity.valid
This will not only give you email validation for free, but also gives users with software keyboards (e.g. on phones) a contextual keyboard.


That’s just for the client though. Best to check on the server as an authoritative source of truth.


It’s probably a waste of time for an individual developer to write a one-off complicated regex for a contact form. A team of contributors to a standard library should be optimizing regex a bit more since doing so will save so many developers time vs using even a very simple one-off regex, when testing is accounted. The optimizations here are reasonable and internationally compatible.


What about @@.@?


Interesting question. RFC1123 states that TLDs should be alphabetic, and that the first (or only) character must be a letter or a digit, which should exclude a TLD of '@'. local-parts can include most printable characters other than ()<>[];:@\,.

/^+@+$/ is easy and the false positives it allows are weird enough to accept.


People who have weird email addresses are used to them not working.


There is nothing before the first @, and nothing after the last @.


While that's true, `@@.@` fulfils "an @ symbol with something before it, and something after it".


The original comment probably meant something like this: ^[^@]@[^@]$


That's only one letter before and after.


I feel like every developer at some point Googles "URL regex" and is inevitably led down a rabbit hole of different regexes — some optimizing for maximum accuracy, others for minimum insanity.

Having been down that rabbit hole before myself, I have to admit, this email regex is tamer than I expected it to be.


E-mails, URLs, file names/extensions, these are the bane of my RegEx existence. Agreed, this is not as bad as I've seen in other places.


Relevant: Your E-Mail Validation Logic is Wrong

https://www.netmeister.org/blog/email.html


I have yet to find the library that is doing this, but I have had a number of issues with website really not liking an "@me.com" email address.

I assume there is some commonly used library (or multiple) out there that don't recognize an email a domain that is less than 3?

But it is driving me insane, most recently I was on the phone with my vet and she told me their system told them my email was invalid (and would not accept it).


Recently I had a doctors office call me to confirm an appointment instead of obeying my wishes to be contacted via email. The email I provided to their contact form was <theirname>@<mydomain>. The receptionist was convinced the provided email was incorrect because it was "their" email. I'm not sure what I expected.


I've done this for decades so I could see who was selling my email address and spamming me, and the answer is: nobody. That's not where spam comes from apparently. I still do it out of habit though.


SMTP had a very useful VRFY command after you've tested for the @ and MX record, but only a handful of service providers will tell you if the email is invalid nowadays due to spam concerns.

Gmail still does though, which is a big deal as 90% of people who register on my sites are using a gmail address only and thus easy to verify instantly and notify the user to double check the email spelling.


Yes, although that only helps if they typo the address to one that doesn’t happen to exist. Quite likely to hit a valid one by mistake with the size of gmails user base. I know at least one person who uses my emails address by mistake.


This regexp and the whatwg one it is based off (correctly) do not validate the presence of a TLD since it's not technically required (foo@bar is considered valid). But if you are building consumer products it's best to test that there is at least a presence of something TLD-like after validating against this regexp.



See also this comparison of email regular expressions (as found in various languages and libraries), compared against a selection of valid email addresses:

https://fightingforalostcause.net/content/misc/2006/compare-...


Aren't all me.com addresses now icloud.com addresses as well? Couldn't you just tell them the other one?


It's also interesting to step through the history on this line - it's undergone several revisions and, of course, also seen some reverts of well-intentioned features.


I have a .email domain that at least one major site rejects as invalid. Quite annoying.


Right. Discover bank app's zelle settings don't allow any email.on .in domain, as in they assume that nobody from India, who already has an email on .in domain, will come to US & use their zelle.


Ruby's URI classes are such a pain in the ass. They seem so un-rubyish




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: