So then you have to mess it all up again and make it work so that it validates anything out there that might be real, and in the process you will end up failing a lot of your test cases. Very frustrating.
I too wrote one and found lots of addresses that were valid but unusable (try using a "jacques m"@example.com and see how far you get). But I can't remember any that were usable in the wild but invalid according to my parser.
* Just say no to CFWS. CFWS is not legal in an email address, and anyone who tells you otherwise is incapable of reading the standards. CFWS is legal to insert in the written representation of the email address in RFC 5322 and RFC 5321, but it is not a structural part of the email address (you get the email address by deleting all such instances).
* Everything to the right of the @ is either a domain name, an IPv4 address literal embedded in , or an IPv6 address literal prefixed with IPv6: and embedded in  (e.g., [IPv6:::1]).
* Enclosing a localpart in quotes does not change the email address. Ergo, you can require that people not use quoted localparts if they don't have to. Actually, in general, you can probably require that people not have quoted localparts at all.
* The only characters truly forbidden from email addresses are C0 (and maybe C1, although the EAI specs are hazy about that). In practice, though, any character that requires quoting to access can probably be excluded from valid email addresses.
* Sending someone an email and telling them to confirm that they received it (e.g., by making the next step be triggered from a link in the email) is the only way to validate that an email address actually works.
It's just an excerpt of the spec, but I link to it instead of to the spec because I think it's easier to read (formatting/colors) and it has links to the spec anyway.
This is the gold standard but note it is itself prone to false negatives and can even have false positives.
Basically the same as yours, but validate that a TLD exists.
Technically, this would fail on a few valid email addresses: foo@localhost, foo@.co, or foo@2001:0db8:85a3:0000:0000:8a2e:0370:7334
All those are valid, but we decided that none of those would apply to any of our customers.
Any validation pattern that caters for the user should be trying to catch instances where the user has entered their address incorrectly. That means validating things like the general pattern ('does it have a @'), checking the TLD ('is the TLD part of the domain in this huge array'), and catching weird characters ('did the user really mean to use a ®'). Even if the validation fails the user should be able to submit their address (this is client side after all; the user can just disable it), but they should be informed that it appears there could be a reason to double check it first.
The source code has comments from the various RFCs inline. It's not just a question of reading and knowing the RFCs but also of interpreting them and balancing them.
The decoder will throw detailed exceptions which are designed to pinpoint where the message went wrong, and which RFCs were broken by the message.
It also does just-in-time decoding, and decodes only the properties you access. This is much more efficient if you only need a header or two to reject spam quickly as part of a front line defense.
Above all, most of the methods are extensively fuzz-tested against independent reference implementations.
Unfortunately, after I did some further investigation, it seems that some of the slides in my presentation here are wrong. I took some information from the "Email address" Wikipedia page, which turned out to have incorrect info, and in one I just misunderstood the rules (the "local@domain(comment)" part, which is invalid). I have since amended the presentation, but the general point of this video of how to do validation still stands.
Yes, the standards are that convoluted! There's even an "explanatory" RFC that's just wrong. I have edited the YouTube comment to point that out somewhat.
I settled on < 100 chars and:
We'll see how it goes in production :)
Why not just allow any input and validate the address by attempting to send to it. It's really the only way to tell if its a real address.
What abuse can a person bring on your system by having a 200 char email address? That should be nothing in terms of server load.
It shouldn't be "lets automate this and hope it goes well in production" i.e. the Google approach. It should be "lets use common sense and manage failure in a way that doesn't piss off customers".
You don't know if they work anyway, until you successfully send email.
just don\\'t validate names.
Do you have an apostrophe in your name? Then you're hosed, and you get to spend some time figuring out the details.