Unfortunately, I bet there are thousands of "real systems" employing regexes like this... How many problems does this solve? Probably zero. How many does (/will) it cause? Probably much more than zero.
Case in point, at work the other day I found a bug in a service I manage. It consists of a front end form (built by one team), which submits data to another system (built by my team), which then passes the data to a third party. The third party was rejecting the data we were trying to send them as the email addresses were apparently invalid. The validation they were doing didn't match the validation the front end form did, so to the user everything seemed fine.
I'm not saying that checking email address with regex is good way of doing it, but there is countless examples of people doing it and it would help a lot if everyone would use one standardized regex for that.
Even if there was a standardised regex for all emails, which would not break when new TLDs are released, or new unicode characters are supported, or whatever, there would still be no guarantee that firstname.lastname@example.org is actually valid!
Your best bet is to simply allow users to enter anything (with perhaps some minor regex check like I did above), perhaps ask them to enter it twice, and perhaps send an account validation email.
$ nc localhost 25
220 uhura.z ESMTP
250 2.1.0 Ok
250 2.1.5 Ok
354 End data with <CR><LF>.<CR><LF>
Look ma, no @!
250 2.0.0 Ok: queued as 8DB653E0065
221 2.0.0 Bye
.* it is!
will match all email addresses, and as an additional feature, all other strings too!
As we all know, more functionality with less code is better, therefore, this is clearly superior to all other regexes.
Of course, if you're really trying to scrape email addresses reliably, looking for the '@' won't work reliably, since people are used to obfuscating their email with [at] or (at) to protect against spam (well, more spam). You will probably need to dive into AI theory to more reliably get email addresses.
Just get people to enter their email twice (which filters out most mistakes where people are entering their names or somesuch), don't validate it with regex, during the signup process make sure you tell them to expect an email which they must confirm before they are added / before their account is activated. Send a confirmation email with a clickable link. If people don't get it, and the service is important to them, they'll try again or contact you through another means.
(I was involved with the running of a mailing list with well over 1m double-opt-in subscribers. Less than 100 of these turned out to be invalid [Edit: yeah, that's a guess, like the OP's 99.99%], and we dealt with it easily at our end, by properly handling any bounces)
As I have told people for many years: if you must do this, check at most if there's an @ and (perhaps) a dot somewhere after the @, which is enough to stop someone who has accidentally put their name in the email address field, or a similar user error. Anything else is a waste of brainwidth and will result in more problems than it solves.
1. Filter with the regex - what's left has a valid format, making step 2 much saner.
2. Extract and validate the domain name - super simple now, because the domain component is known to be sane.
(Optional but good idea 3: Handle exceptions....)
Step 1 is almost always the hardest part, now it's mostly done.
If it is 80% good, then, yeah, your underlying assertion is likely good: Too little coverage to be all that useful. But 99+% coverage? For a quick pass filter? That's pretty good, we can work with those numbers.
After all, one could have multiple layers of progressively more accurate but progressively slower filters. Not an uncommon or unusual approach.
Besides, the JS version could be very useful as a quick pass usability check, validating whether the user made a silly mistake. And adding some logic to record the bogus entry, compare it to what they write next, pass it to the server if need be, can all be very useful.
foo@bar could still receive email, if you had a host in your DNS domain named bar, with a user named foo.
We built an app where sending a validation email upfront was not a practical option some time ago, and the best strategy I found for ensuring the email was valid was to lookup the MX records and ask the mailserver for the given domain, by issuing RCPT commands. Many mailservers will just drop connections when RCPT is for someone who doesn't exist or can't be routed to which was a good indicator of a typo or invalid address. And of course if the MX lookup fails the domain is incorrect.
Still wouldn't recommend this method either though really.
Lookup for an '@' & parse response log from provider to know if addresse works.
Edit: Oh look, the submitter has pulled yet another figure out of thin air (author says "three nines, trust me," submitter says "four nines, omgwtfbetterthanslicedbread!!!1!"). Suspiciouser and suspiciouser.
I implemented this in Ruby a while back, but I also went the next step and added a DNS check for a MX record. That way you can ensure there's a mail server to receive an email. Heck, I even wrote a blog post about it.
We've had pretty good feedback so far, but I've also spotted a few emails that people enter that are clearly fake (e.g. asdf at test.com).
I use a pretty ununsual TLD (.su, the old TLD for the Soviet Union, which still remains in the root zone), and from time to time, I come across a site that won't accept my email address because of that. Most of those sites turn out to be generally crappy though, so not much of a loss…
Also, many sites don't accept + in email adresses, which is annoying as hell if you want to use the address extension feature of Postfix et al.
I echo the advice of everyone else - validate with something very simple like .+@.+ and then by sending an email. Trying to recapture the complexity of the email system via a tool like regular expressions is tilting at windmills. It's like trying to develop a regular expression to determine whether a name is real or not.
As a tangential anecdote, I always thought it would be interesting to drop a backdoor into some canonical piece of code like this that noobs are bound to copy-paste. It might be the most efficient way to worm your way into the largest number of computers worldwide.
IMO you either implement the RFC, or use the absolute dumbest validation possible: 1) there is one '@' character present 2) there is at least one dot on the right side. Anything else will exclude some valid addresses, and you're unlikely to ever hear feedback/complaints from someone who had their sign-up email rejected.
Note this is super liberal, so user@domainwithnodots (which is RFC valid, but probably also a user error) is still considered valid.
I find it strange that there's no information on the author, sources, references, attribution, or credits on the page at all (other than the WordPress theme attribution).
If you really want to do some clientside validation, just keep a basic regex and warn the user if the address doesn't seem right.. Or use MailCheck.
I've heard lots of people say not to bother validating but I always thought it was because you could just confirm with a verification email. Why's this step useless?
If you validate emails as complying to the email format used by those services, you can reduce your hard bounces to a bare minimum. If you check for a valid mail server at that domain, you can reduce that to 0. If you connect to their server and check for an error for that mailbox, you can reduce your soft bounces to 0 too, but that requires a lot of legwork to set up.
Checking there's a dot after the @ is probably a good idea. Although it theoretically blocks people who want mail sent to a TLD, that probably isn't a big effective issue.
Of course, all this takes time, so you always have a tradeoff between speed and validation level, from full-on user-validated email to "enter whatever".
Although for most languages if there's nothing built in, there's at least one lib that provides it. I prefer using that over copy-pasting it from some random website.