Hacker News new | past | comments | ask | show | jobs | submit login
Complexities of e-mail validation logic (netmeister.org)
287 points by Tomte on May 24, 2021 | hide | past | favorite | 385 comments



If your email is <RFC>fan 69™@root I am not going to let you signup. Sending emails cost money and bouncing emails affects your sender reputation. Also, for every user out there using <RFC>fan 69™@root as their email address, there is going to be thousands of people accidently entering their email address incorrectly and not getting a alert about it. Yes you could do fancy shit like checking mx records and whatnot, but come on- Im not going to maintain/build that infrastructure for the one out of a million people who are trying to use that address.

Developer time is precious at a startup and supporting <RFC>fan 69™@root while still denying b ob@gmailcom is very, very far down the list of things to do.

In summary: I don't suggest doing 'perfect' email validation to RFC spec. You will save money/devtime and make more of your users happy by not doing it.


This sounds great but what you think is "common" probably isn't.

When I was validating myself for Amazon Prime Student, I literally had Amazon refuse to accept my student email in the form first.m.last@myschool.edu because there were two '.'s in the mailbox portion. I had to send an email to support and it was eventually dutifully fixed.

And that's not an uncommon format for, you know, school emails. And that's an Amazon engineer who should have known.

I imagine there's developers who think "domain.tld" is the only thing valid to put in the domain portion, and that's going to fail with "domain.co.uk", or uncommon TLDs, or other perfectly valid constructs. And sure "it's only x% of the users" but it's a pain in the ass if you're that user. You need to be reasonably permissive.

(but on the other hand "myname@..." is not valid either, and that will fail and cost you money as well... hence leading us back to 'just follow the spec')


This is a false dichotomy. Supporting dots in the email address is trivial, following the spec is not.

Furthermore, for a user, it is trivial to get another email address if the one they have causes issues, so it is not really an accessibility issue either.


It depends upon where you are validating email input at.

For the initial email input, your logic works fine. Once it is applied downstream in a process, it begins to get messy. Someone might do an incorrect email validation that happens to block emails that you have already accepted or which you are importing from a valid source. Someone has already given the example of a login field not allowing them to use the email they signed up with. If such upgrades occur later in a projects life cycle, not only might you have to spend developer's time, you may also have a production outage.

Personally, I suggest using some, even if imperfect, validation when gathering the email initially (for the reasons you point out) and then not validating that information any further.


I actually run into this all the time with passwords using a password manager. Lots of places will accept the creation of a password that's long/complex/etc but then when you actually try to log in with it it won't accept a long password, won't accept certain characters, will silently truncate it and throw an invalid password error, etc.

Sometimes disabling Javascript will fix it, sometimes not. I occasionally have resort to using "I forgot my password" until I figure out what the actual underlying requirements of the passwords are.


Etrade lets you create 32 character password, but if you enable 2FA you suddenly can't login because apparently they concatenate them together and then check the length. So make sure your password is max 26 characters. (they might've fixed this but I haven't tried).


Like GP mentioned, Etrade also does the thing where it accepts the . character on password creation, but not login. That was fun to figure out.


Curious, can you login with 26 characters and your MFA seed to bypass MFA entirely?


Yup! Same thing with the ridiculous verify-my-identify questions. One I encounter all the time is the local community college, which let me use spaces in my answers on creation, but not at entry time. Grrrr.


That happens too often! A lot of places where I try to use a really long password will silently truncate it, but different forms will truncate at different lengths, so what might work for registration might not work for login or changing the password later.

I’m always suspicious when sites cap passwords at < 32 characters, that almost always means it’s being stored in a reversesble format someplace - maybe encrypted, maybe obfuscated, or maybe not either (banks).

The sites I really trust don’t care how long your password is because their hash size is fixed. The only real length consideration might be that if a bunch of people send obnoxiously long passwords at the same time and they are using bcryprt or scrypt it might stress the server’s cpu, so they might put an upper limit to prevent that.


I ran into this with Sony's PlayStation site. I generated a passphrase that the registration form allowed but from then on I was unable to access my account. I went through the same trial as you and found that they were truncating the password to something like 16 characters. That was just this past year, so I'm pretty sure it's still that way.


As a user, I got burned by that several times. Now, when I create a new account somewhere, the first thing I do is log out and try to log back in.


I don't encounter this very often myself. So far the only place I've seen this is Paypal. facepalm


<input type="password" maxlength="xx"> must be illegal. It can't be noticed whether is input truncated.


I've run into this with labcorp. Their desktop webapp takes subdomain emails, but their mobile iOS health webpage login thinks a subdomain email is invalid and disables the login button. They also don't let you change your account email so you can never really fix this issue properly.


I get your point, but it ends up pretty arbitrary who picks up what part of spec to implement / which part of spec they deem "common sense".

e.g. It drives me BONKERS how many systems absolutely reject my single-letter email (~"N@domain.com"), which I created specifically to make it easy and safe to type on mobile devices etc. Others will reject the "+" sign, or underscore, or dot/period, or (brilliantly) two periods or underscors, etc etc etc :=/


There are also blacklists for names you can use. My real e-mail is admin@myname.com but Facebook doesn't allow me to use that e-mail, warning me that only personal e-mails are allowed. Paradoxically I ended up using my work e-mail to get around the restriction.


Agreed. Anyone that decides to come up with a "common sense" subset of emails to allow is more likely to accidentally block some perfectly normal addresses than actually accomplish something useful. I remember I once made this point in an HN thread, and I got a very dismissive and overconfident reply by someone who posted a regex that turned out to block my own email address (which had a number immediately before the @ symbol).

Writing code that doesn't need to exist that has a possible failure mode of not letting someone sign up at all is just a bad decision. If you're going to write that code, either go through the effort of getting it completely right or soften the failure mode. If you really think the user is somehow mistyping their email with special characters or an unusual TLD, then you could show them a non-blocking warning message.


My email address ends with the .cc TLD, and the number of websites which say "Did you mean to type .ca?" and then refuse to let me continue without changing it drives me similarly batty.


Maybe what we need is a narrower standard that subsets RFC5322 to remove (or at least deprecate) rarely used address syntaxes, such as double-quoted local parts.

(Actually RFC5322 already deprecates some syntaxes. For example, "John Hacker"."Ph.D., Esq."@example.com is a deprecated syntax (obs-local-part), because it contains multiple quoted-string components separated by dots.)

> or (brilliantly) two periods or underscors

Two or more consecutive periods is actually disallowed by RFC5322 (unless quoted). foo.bar.baz@example.com is a valid address, foo..bar@example.com is not. ("foo..bar"@example.com is however)


I don’t even have anything weird going on with mine other than using my own domain with a less common 2 letter country code TLD, but it gets me rejected from signing up to at least a few services per year


Customer complaints usually determine the spec.

If enough people wrote in about not accepting one letter email addresses, then they would likely update the validation.

But if customer service tells users to use another email address in that scenario and the customer does that, then it might not be worth the effort to fix it.


It’s getting to where if you support @gmail.com and no other you’d still get 80% of signups.

Better to warm an email doesn’t look right, but let them continue if they want to


I think it's more likely the marketing department doesn't like people leaving fake addresses.

When I want to check postage or whatever, and they require an email address, a@b.c typically doesn't work, but no@mail.com does.


This logic is why so many Web sites today won't let you use a plus sign in email addresses, which ruins a really nice Gmail feature.


As people said, true spammers know to just strip off the "+" in the email address. This is actually a fun reason to set up your own domain and set up email forwarding for *@example.com to go to your main gmail or whatever account, then the "username" part of the email I just set to the domain of the account I'm signing up for. So I'll use amazon@example.com when signing up at Amazon (or whatever site).


> So I'll use amazon@example.com when signing up at Amazon

I go a little farther. I figure an attentive spammer might figure out that if I use amazon@johnsmith.net to sign up for Amazon, I may have exactly the scheme where *@johnsmith.net will work, so they can just add that to the spam list as a wildcard and pick a new address every time. So instead, I use john101@johnsmith.net, john102, john103, etc, to try and obscure my strategy and prolong the life of the domain forwarding.


Hard truth is you're not worth enough for a spammer to look for that pattern, it's a numbers game and you're just making it harder on yourself.

Also unless you're keeping a lookup table you're losing a great benefit of the wildcard. You can, and I have caught a few places, tell when a company sells your email. If I get an email from company XYZ to my email abc@example.com I know exactly who sold my email and to whom.


I agree that I'm probably not worth the effort, but if this kind of domain wildcard strategy were to become more popular it is entirely feasible for a rudimentary machine learning algorithm to detect its use.

> unless you're keeping a lookup table you're losing a great benefit of the wildcard

That's true, I don't keep a lookup table per se, though I do have a deleted items folder that I could look back in. I'm not sure what I would do, though, if I knew what particular company sold my email address? Send them a nastygram they will just ignore? I just block the address and move on.


I think a typical spammer doesn't care much about such users, but if given given a choice they would rather avoid such users.

AFAIU, most buld spam is targeted on gullible or vulnerable people. The spam is often terrible on purpose.

Sophisticated or targeted attacks are a different category and they may be a good reason to prefer something non-guessable.


I just have a entire domain for the purposes of spam. Anything sent to there ends up in my bulk folder. I use amazon@domain.com so I can tell who sells my email or gets hacked. Never noticed someone trying to send a email to any addresses I haven't previously used.


> Never noticed someone trying to send a email to any addresses I haven't previously used.

At least a few years ago, I noticed a lot of spam to <random first name>@<my domain> -- i.e., completely made-up addresses that I had never used. Since messages sent to those addresses were guaranteed to be spam, I started treating them as free training data for the spam filter.

I don't know if this still happens, though, because I haven't looked.


This is currently happening to my email domain. Gets rejected as it doesn't have a valid hash (recipient name), but the logfiles are full of <3 letters>@mydomain.com and <english_word>@mydomain.com rejections.


Yeah, this is an age-old issue -- in the early 00s, my mom got a domain and used the email <first_initial>@<domain>.com. She gave up battling the deluge of spam after about a year. We looked through the logs, and saw that her next choice of handle was also getting tons of spam, too, because it was also short.


I do the same thing. I use whatever@domain.email. The addresses are temporary if I want them to be and I can automatically lock the senders to a list that is either automatically learned after x days or manually curated. I've seen some 'marketing' mail get filtered but no hacks yet.


I’ve got amazon@domain.com email for my domain and I’ve never created such an account, much less given it out. Without some uniqueness in the username, I’m not sure you can tell a company sold or lost your data.


Spamming is a numbers game. I kinda doubt enough people are using this scheme to make figuring this out worthwhile for a spammer.


I've wondered about this with big companies like Facebook, Google, Amazon, etc. as well as behind-the-scenes spyware/ad firms who are all probably very interested in linking my identity across user accounts, email addresses, device fingerprints, etc. I've hoped that there aren't enough people doing it (yet) for these orgs to find it worth the effort.


There very much are companies doing this and selling it as a service...here's an API that you can query with a piece of contact information to retrieve all sorts of additional information, including hashes of alternate email addresses, mobile device ids, social media profiles, and plenty of other stuff: https://platform.fullcontact.com/docs/apis/enrich/person-ins...


At a certain point -- probably the moment it becomes a business unto itself -- this kind of data collection should be subject to all the same rules we've come up with for credit bureaus. It should be a legal requirement that I can get the entire profile they have built for me.


I was wondering specifically if they have special cases to identify such "personal" email domains and use them for record linkage.

It seems like an obvious thing to try, but maybe not worth the effort of implementing it, given the high risk of false positives and the low % of people who actually do stuff like this (not to mention they're probably not people who click on ads anyway).


Given the sheer amount of money involved, I believe it is likely that there are players in the market who are far more capable than we give them credit for.


I kinda imagine that spammer go for low hanging fruit. So spammers won’t bother with defeating a catchall domain forwarding, as it’s unlikely to give them returns. Although a motivated attacker might decide to try to send interesting phishing.


What do you use for email hosting? Ive tried to do something similar but most places have a limit for email addresses even with paid plans.


Fastmail does not seem to have any limit listed. I have not tested any extreme case since I just use a wildcard.


well, you could turn it around and use + addresses everywhere, so that any legitimate response must be to one of your + addresses. then treat anything without + as spam.


That makes me want to use an email address of the form +myname@mydomain.com, just to see how websites would handle stripping out everything starting from the +.


The + is also useful for knowing who sold your email address on or was responsible for a data breach. If I start getting spam to <my name>+hulu@gmail.com, then I know I could chase down Hulu on Twitter for an explanation.


I do a similar thing, except that the email is actually hosted at my domain rather than being forwarded, and that I have a list of email addresses that I accept and reject all others; if I receive too much spam at one address, I disable receiving at that address.

I have found this to work; I hardly receive any spam at all, and do not need any separate spam filter.


I've run into at least one domain that blocks their own name in your email; that was fun.


Didn't you find you got a deluge of spam to generic addresess like admin@, info@, offers@ and so on? I tried this, although it was probably about 15 years ago now, and reverted it because I got about the same amount of spam as genuine emails.


Although hardly anyone uses yahoo mail anymore, they actually have this feature built in. Basically email aliases.


Note that you should only do this for maybe 6-18 characters, some sites will test send an email to [30-100 character random string]@example.com and see if it bounces - if it doesn't, it'll suspect that domain to be some spammer with a catch-all email inbox and block it.


That's a terrible approach, plenty of valid, legitimate non-spamming domains use catchalls of arbitrary length for all sorts of reasons.

Additionally, sending a test email like that might also get the sender placed on a black list for triggering a spam trap inadvertently.


That's a worrying strategy because there are many reasons for using a catchall. Example: one email per site to track companies selling personal data, then maybe bounce that single email address.

Do you know any site blocking domains with a catchall?


Yeah, if you have a domain of your own the sensible thing is a catchall, use a different address everywhere and block the ones that spam.


Do you know what sites do that? I have my own domain and I haven't seen anybody do that. The obvious solution is to configure your mail server to only accept usernames before the '@' that adhere to some rule which only you know. Like checking if it is a palindrome or something obscure like this.


I watch multiple corp's mail logs extensively, this is not even remotely a common thing.

Worse, I know at least 5 or 6 people personally, which do catch all. It seems like a very poor method to reliably catch spammers.


Max local part size is 64 octets. So 100 chars would be out of spec.


The best are sites that let you sign up with a ‘+’ but not log in. Zappos used to be the most prominent example.


Reminds me of a patio11 post (which I haven't been able to track down) where he said he gets people signing up with a '+' but then forgetting to include the extra part when they log in later. His login code accepts both versions and increments a counter to track how many people were too smart for their own good.


I've run across at least one banking site which accepted a password on the sign-up page which was later rejected by the login page. The validation scripts on the login page used a more limited set of permissible special characters which didn't include parentheses. Fortunately it was only a client-side check, so it was relatively simple to bypass it once using developer tools and change the password.


American Express at one point let me set a password over 8 characters, but logging in after only worked if I provided only the first 8.


At one point I know they also weren't case sensitive.


Why would you ever validate the characters of a password on the login page? What a weird thing to do.


I once had a site _silently strip_ the + from signup email. So when I submitted `myname+yoursite@gmail.com` as my email address, they started sending mail to `mynameyoursite@gmail.com`. Madness.


This is common; spammers know the semantics of '+' for gmail and will strip it. You need to assume that it will happen.


GP said the site stripped the “+” only, essentially sending his email to another address entirely. Spammers strip the “+” and whatever follows it, so the spam ends up at the same address.


If spammers are converting from `myname+yoursite@gmail.com` to `mynameyoursite@gmail.com`, they are welcome to spam it as much as they want - it won't get to me.

EDIT: moreover, a service is perfectly within their rights to _internally_ store my email as `myname@gmail.com` if they want - but they should still accept `myname+yoursite@gmail.com` as the identifier used to login with.


I’ve seen sites send emails where the unsubscribe link doesn’t work because the URL contains the email address I signed up with and that email address contains a character that their web server doesn’t play well with.


Interesting. How did that work? Does that mean that they would only create the user account under the + suffix? I imagine they must have had two email fields - the canonical email for login and then a separate notification email?


Why do so many people responding to this seem to assume the plus sign is to fool spammers? Of course it's not useful for antispam. It's mostly meant to make it easier to trace where a (legit) email comes from, for example to set up filters. https://gmail.googleblog.com/2008/03/2-hidden-ways-to-get-mo...


> Of course it's not useful for antispam.

I've received spam emails to at least 70 different +addresses. It is absolutely useful for antispam.

Spammers don't care about the reputation of the company they bought or stole the data from.


Can't you use "." Anywhere in your email to use the same multiple times in Gmail?


Yes, assuming the websites following this logic don't block that too, but then you have to keep track of a mapping of dots to websites yourself instead of it being obvious from what you put after the plus sign.


Or use a password manager.


Spammers aside, I'm interested to know what strategy different saas companies do in regards to users creating an account with + alias - Do you let users create multiple accounts with the same email but different + alias ? Or do you recognize that it's an alias and say that the account already exists ?

Not all email providers support the + notion so you'd have to run domain lookup on some hard coded list


Why should they care though? Anyone with a catchall can create 7 billion normal looking addresses under their domain. It's not like it would prevent anything.

Also anyone with gmail address can also place dots almost anywhere into the local part, to create another unique address without using a + sign.


Because of spammers creating garbage accounts on your platform


Yeah, but why care about email address format, if it's not going to stop spammers anyway, and you're just risking losing legitimate customers if you mess up trying to mangle part of an address you should not be touching according to the recommendation in the standard?


The email spec considers these different emails. It's not a websites job to worry about how an individual host treats them. Gmail also treats . in the user section as useless but most hosts do not.


> This logic is why so many Web sites today won't let you use a plus sign in email addresses, which ruins a really nice Gmail feature.

Contrary to popular belief, it is not a gmail feature.

I first heard of the + as destination filtering in the very early 90s at CMU where it was broadly used. Every single email address I've had since then has support the same (and notably, apart from a test account, I've never used gmail much, so that's not including gmail).


I don't really think that's the same. Forgetting the "+" in the validation regular expression is something else than refusing to implement all kinds of extra checks to support very weird and very unused things.


Sites likely prefer your canonical/standard email address over any plus version. It would be easy to trim anything after the plus too I guess and just email you at your normal address


I configured my mail server to use _ as a sub mailbox identifier to stop creeps who block +. I assume they are doing it to make sure their precious spam shows up in my inbox.


OTOH, it being a standardized thing, a spammer would absolutely just strip that plus part off. Better do it secretly like a catch-all.


It's not really all that standardized. The use of a '+' character to indicate an alias or label is merely convention—if you run your own server you can set the separator to any character you wish, or disable the feature altogether. As far as the RFCs are concerned the '+' character is just part of the account name and there is no reason why it cannot be a mandatory part of the account name on any particular server, such that stripping off the '+' and any trailing characters results in an invalid e-mail address, or even someone else's e-mail account. For sending email or using an email address as an account identifier it's definitely incorrect to treat abc+xyz@example.com and abc@example.com as equivalent. The same goes for account names which differ only in capitalization or placement of periods: some servers are case-insensitive and ignore periods in account names (e.g. Google) but these are server-specific traits and compliant email senders should not assume that every server will work the same way.

The '+' alias feature is a fairly common configuration, though, so for source labels it's better to either treat all unlabeled messages as spam or else use a more opaque labeling scheme (unique-hash@example.com) which doesn't hint at an alternative untracked email address.


Subaddressing is standardized in RFC 5233.


For the Sieve Email Filtering Language, yes. Which is not actually part of SMTP. And even in RFC 5233 the specific separator sequence is up to the server; the RFC only specifies queries for ":user", ":detail", and ":localpart" to filter on the different fields independent of the choice of separator.


Can you recommend a library / tool which can extract those reliably?


I think that when using email aliases to identify spam sources, the crucial part is that you can filter the stripped address (as well as any unapproved alias) to be directly identified as spam and then the +alias part becomes a key to properly get into the inbox.

That whole setup for tidiness is broken the moment a desired website does not accept an alias in your address, of course.


> a spammer would absolutely just strip that plus part off

Why? They don't care about protecting the business interests of wherever they got that address from, and it's not like stripping the plus off will meaningfully increase the success rate.


The plus is a standard feature of email, not a Google specific thing like ignoring dots.


gmail != email. Breaking gmail features helps ensure a level playing field.


It also breaks legit non-gmail address


My email address is valid and has been valid for a really long time.. but about 5% of ecommerce shops refuse to accept it.. so they don't get my money.

Don't get clever, just follow the spec.


If 5% of ecommerce shops refuse to accept it, it's likely you being clever.

My email is refused by 0% of ecommerce shops... because I just have a normal email.

Don't be clever, pick a better email.


What's "normal", though? "<8-10latinalphanumerics>@gmail.com?"

My email is just "me@<my-last-name>.al"[1] which is just a tiny bit "unusual" - and over the years it got refused by a couple stores because of TLD. And Albania is not Cocos Islands, they're surely not popular with spammers.

If a store believes there's only ".com" gTLD and nothing else (this had really happened to me, some galaxy-brain made a form with a hardcoded ".com" suffix; not even ".net" or ".org" were accepted, unfortunately I don't remember the site) - well, fuck that store, their loss not mine. Worst case, if I really want something they sell, I'll give them a throwaway email - which will contribute to their mail bounces after some time.

__________

[1] ".al" is a ccTLD for Albania which is not a country of my citizenship or residence. I've picked the domain name as hack - because my first name is Aleksei and my first and middle names form "A.L." initials as well. That, and because all relevant .name domains were already taken.


Might sound strange but yes, me@<my-last-name>.al is being clever. You found a nice short clean email by buying a domain from Albania and setting up a me@ address. That's clever.

Think about it this way: either you can get some big brand .com email with no special username and never have an issue, or you can flail around 5% of the time and yell at the clouds.

Should everyone accept your email? Of course! I'm just saying you live in real life, and in real life people suck at building email forms. The problems you run into are on you.


> Should everyone accept your email? Of course! I'm just saying you live in real life, and in real life people suck at building email forms. The problems you run into are on you.

No, the problems they run into are caused by (at best) mediocre developers. They’re entirely to blame. We have specs and standards for a reason.


I honestly don't understand what you're trying to say. What's actionable about your view? You going to call up every business that doesn't accept your email and tell them their programmers suck? Businesses like this are never going away. It's a losing battle.

Instead you can just get a big name .com email and call it a day. Live your life without trying to make some statement about email standards.


> You going to call up every business that doesn't accept your email and tell them their programmers suck?

No, because it's not 1993, but I absolutely do use the contact forms or bug reporter for any website that doesn't accept my email. Most of them fix it, because it's objectively a bug caused by their non-compliant code.


I completely agree with the pragmatic response, and not only encourage it, but do this myself too. It is absolutely the correct advice in the situation.

However. I completely disagree with the conclusion “The problems you run into are on you.”

I didn’t create the problem by having the audacity to be from a different country.


We should totally shame developers/businesses who don't accept valid emails, just as we should continue shaming those with insane password policies or insecure practices.


This conversation reminds me of the 0.00001% of people that browse the internet with JS disabled and then complain about how so many sites don't work for them.


If you aren’t accepting very normal email addresses at perfectly valid TLDs, then you are a bad programmer. At least import a list of the new TLDs every ten years.


Of course they're a bad programmer. But we live in real life, where bad programmers exist.

Get a big brand .com email and you'll never run into an issue.


>> Don't get clever, just follow the spec.

I'd suggest being clever is wasting countless hours to handle your edge case. Or writing your own email validation in the first place.


> wasting countless hours

Isn't email validation a solved problem in that there are services or ready software which provide RFC-compliant validation? If some company is wasting countless hours to do something because of Not Invented Here syndrome, isn't that the same as some company deciding to write cryptography algorithms on their own and reaping what they sow?


> about 5% of ecommerce shops refuse to accept it

That's surprising to me because there is nothing particularly weird about your email address. What exactly do they complain about?


I would assume because it's only 2-chars (me) and they're filtering anything <3 as invalid.


Yeah, that's what I would guess as well. But there's a big difference between "follow the [ridiculously complicated] spec to the letter" and "don't do obviously stupid things like filter out email addresses with short names". The latter is good advice, the former not so much IMHO.


For a couple glorious years I had a 2-letter email address at a single-letter .com domain. It was rejected a surprisingly small number of times.


Ah, that's a good assumption. My initial assumption was some sites have a very dumb whitelist of valid email domains. This seems more reasonable (although, also dumb).


Quotes included or not?


Not. Obviously, or the rejection ratio would be a lot higher than 5%.


Your money is likely a minuscule part of the revenue and supporting your email would likely cost more. This was the point, that it is probably clever to choose a validation that covers 99.99% of customer emails rather than cover the whole spec.


If you can show that "just follow the spec" ends up opening up more opportunity than it closes off, then you can convince people. However, when gmail, outlook, etc. do not allow these zany e-mail addresses, you're going to have a hella hard time convincing me of this unless you are in the 1% of spenders.


Do GMail et al actually prevent you from sending to and receiving from these zany addresses? Or merely prevent you from creating one @gmail.com?


Creating one. But when you consider just how many customers are using gmail and outlook addresses, and not to mention, GSuite/fastmail/etc. addresses under custom domains, it makes more sense why rejecting @gmail.com@gmail.com is worth more than allowing some crazy e-mail feature that is effectively not used.


The routing features are obsolete; they go back to the days when lots of email users weren't on the Internet directly and had to use relays. They are still in the spec, yes.


I assume it comes from similar lineage as UUCP paths. Either way, email standards are a bit ridiculous. It needs the kind of rehaul that occurred with HTML5 of looking at what email implementations actually do and pushing them in one direction. I suspect that is not happening ever, so failing that there will probably always be things in the spec that just simply don’t work across everything anymore.


You absolutely should check the MX records, though. It’s easy and catches tons of typos. I was floored by the difference when I implemented this as pre-check before a Stripe Checkout form.


There's also incredibly low stakes in allowing a technically-invalid email address to pass validation. Just use a very permissive pattern (e.g. contains an '@') and be done with it.

No matter what you will constantly be getting addresses that conform to the spec but cannot actually receive mail.


I found websites not allowing perfectly valid tlds, so maybe they could be starting not using .com in their regex. (.email)


The basic rule of thumb I use this: are you implementing email at the MTA level (needing to build/parse RFC 5321 commands or RFC 5322 blobs directly), or are you using email closer to a "universal internet ID" purpose (i.e., application perspective)?

If you are in the former category, then yes, follow the spec to the letter. If you're in the latter, then screw the precise guidelines of the spec and reject emails that are very unlikely to be valid: no quoted localparts, no IP address literals. In addition, go ahead and say that email is case-insensitive (more precisely, case-preserving).

The hard part is if you're writing an email client, because you're basically forced to have your hands in both pies.


Yes, in practice I've found the exact same thing. Either use an email validation service or be more restrictive than the RFC. [1] Also prompting "Did you mean bob@gmail.com?" when the user types "bob@gmaail.com" helps a lot with human input errors. [2]

[1] https://www.mailgun.com/email-validation/

[2] https://www.npmjs.com/package/mailcheck


Except that as someone with an email at a .co domain, I get really irritated when it asks me "do you mean [mydomain].com?"

I always have to tell people, in real life, "it's .co, not .com," just in case - humans do this too.


Worse, I had services trying to be smart correcting .co to .com


Yep, it's solving for the majority case. As long as it doesn't block signup you can just ignore it.


How do you reconcile your concern for the cost of sending emails with your unwillingness to do super basic validation like checking an MX record?


From where I sit, both of those concerns sit on the same side of fence. GP argues against extensive developer time spent on validating edge-case emails, and says they do so in no small part to avoid having emails bounce etc., as doing MX or other validation to follow-up on these edge-case emails validity within your service does nothing to imply others have put in this same costly and nearly superfluous support, likely leading to more emails bouncing and accordingly degrading the trust in their business as a sender


But why restrict the syntax arbitrarily in the first place? It is not going to catch the common typos anyway. Most typos will just result in a wrong but still syntactically valid email address.


I’ve always wondered if it’s possible to have a valid email address which is also an SQL injection attack, XSS or similar ?


of course it can: https://security.stackexchange.com/a/106996

One of our testers found XSS with email injection (RFC compkiant validation passed) in our website.

And we are an e-mail company and should now better :D

Never trust user input!


Probably, since the local name part can be basically anything.

But the way to prevent injection attacks is not to disallow or sanitize input, it is to escape correctly when interpolating strings in other languages.


> Sending emails cost money and bouncing emails affects your sender reputation.

that works as long as <RFC>fan 69™@root does not write articles for ZDNet


How about not doing any pre-validation (save for whitespace stripping) and have a validation e-mail (which you should require anyway) take care of any typos?

With precious dev time, you can do better by doing less.


You risk sending a junk message tho, which affects your sender-spam score with other providers.

I just make folks email me first.


I do the validation email, works great. Just be sure to protect the sign up form with some type of bot detection (I use recaptcha, but simpler methods are fine for most sites).


Instead of every developer implementing validation logic, shouldn't we have validation libraries to take care of this?


100% agree. This is especially true if the address mail is going to be displayed somewhere for example, it's generally a good idea to limit email address to a sunset of what the RFC allows.

To adapt from a famous quote: "all email validation logics are wrong, but some of them are useful" ;)


As the Sr. Internet Mail Administrator for AOL in 1995, we quickly learned that the only real way to validate an e-mail address was to send mail to it. Even that wasn’t guaranteed, but anything less was doomed to unexpected failure and misery.

And this wasn’t a new lesson then. But at least we were smart enough to listen to the people who had learned that lesson before us.

It is now over 25+years later, and I’m sad to see that many people seem to be bound and determined to force themselves to re-learn that lesson the hard way.


I also came to the same conclusion some years ago. Or more specifically, my manager brought me around after I tried arguing that it was worth the time to make sure that users could use an IPv6 address as their domain (the lack of periods after the @ would cause `user@2001:0db8:85a3:0000:0000:8a2e:0370:7334` to fail validation)

He made a very convincing argument that while an IP address is technically a valid domain, but how many legitimate users were seriously using an IP address as their email domain? (zero)


You’ll love this SSL cert:

https://[2606:4700:4700::1111]/


This falls into the category of the classic “Stupid shit that most programmers believe”.

Fundamentally, the problem is that if you’re trying to validate an e-mail address as being correct and you’re not sending an actual e-mail message to that address, then you’re doing it wrong.

We learned this lesson back in 1995, people.


Totally agree with this. Trying to be perfect is a good road to paralysis and not getting things done. Software is like people: it's ok to not be perfect, especially if they're always trying hard to be better and doing good things for society.


Yes, you should reject it, because the address is invalid. You can't have unquoted <> in the local part. You can't have spaces there either.

Your other reasons for breaking interop between systems/languages are just whimsical and invalid. :)


I've run into some places where a subdomain email is not ok, which has been pretty annoying. All email validators should be able to at least take first.last+company@subdomain.example.com


Especially when using a country TLD, suffixes like .co.za are appended to the name of the actual ISP or email provider.


I think the best syntax validation technique for email addresses now is found in the HTML spec: https://html.spec.whatwg.org/multipage/input.html#valid-e-ma.... As they say, this is a wilful violation of RFC 5322, because that’s simultaneously too strict, too vague and too lax to be useful. They give a grammar, and the following regular expression implementing it:

  /^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/
Remember that the web is a platform that lives and breathes this stuff. A lot of thought went into this grammar for valid email addresses. This is a good way of filtering out obviously bad stuff while allowing all realistic and sane inputs.

One part of all this that I’m not aware of the situation around is “8. You can put emojis in the local part.” The HTML spec’s validator is all ASCII. It does remind you to punycode the domain labels, but makes no mention of internationalised local parts, and I’ve never learned about non-ASCII local parts or how well they’re supported. I gather they may require the sender to be capable as well as the receiver, whereas internationalised domain names were made compatible with all systems via punycode.


I've always just used

  /^.+@.+\..+$/
That is, "Some characters, an @, some more characters, and a period"

I couldn't care less if users want to enter undeliverable email addresses, they won't get emails. All that regex is intended to achieve is ensuring that the user hasn't accidentally filled the wrong field (e.g. tried entering their phone number) or mistyped a punctuation mark (foo#bar.com, foo@bar,com)

Strictly speaking, it won't match some valid email addresses, such as IPV6 domains. But if I receive a support ticket complaining that we don't accept email addresses with IPv6 address domain, I'll reply advising that the customer should purchase a domain name or sign up to one of many free email services.


Huh. Interesting this doesn't support international email [1] addresses, e.g. квіточка@пошта.укр or Dörte@Sörensen.example.com.

Seeing as the web has long supported Unicode, where are e-mail addresses currently at in that evolution?

Are full Unicode e-mail addresses something that is decently supported today, or still largely theoretical? Is this regex sufficient? What kind of e-mail addresses do people in China most commonly use, for instance?

[1] https://en.wikipedia.org/wiki/International_email


Internationalised domain names are supported: as I mentioned, the spec explicitly reminds you to do punycode.

For the local part, though, it does look like browsers have fallen down, though I’m not particularly familiar with the situation there. Testing it in Firefox to confirm, ascii@υνικοδε validates, but υνικοδε@ascii doesn’t. https://github.com/whatwg/html/issues/4562 seems to be where progress is made from time to time. As usual, it’s not as simple as we might hope.


> where are e-mail addresses currently at in that evolution?

Baby shoes because of anglosphere programmers that can't fathom people wanting to use their own alphabets and thus forget to support it.


You don't need to be so accusatory or ungenerous about it.

Clearly "anglosphere programmers" fathom it every day when they use UTF-8 almost universally in webpages. Also, you know, things like emoji are pretty popular in the "anglosphere" as well.

It's obvious that the real reason is an ancient e-mail RFC, and that while upgrading webpages to UTF-8 was relatively easy, in that it only needs 2 parties to support it -- the browser and the server -- upgrading e-mail is almost infinitely more complicated, because you have to wait for virtually all email code in the world to be upgraded, since an e-mail address is pretty useless if it doesn't work everywhere.

It other words, it's a coordination problem. Not an ignorance problem.

And unfortunately, Punycode [1] doesn't seem to be a particularly viable stepping-stone/compatibility solution here. E.g. if a user tries to use ドメイン名例@example.com and it fails, asking them to instead type in a seemingly-gibberish eckwd4c7cu47r2wf@example.com, where that could also conflict with a real e-mail address of that name.

[1] https://en.wikipedia.org/wiki/Punycode


> You don't need to be so accusatory or ungenerous about it.

At least four decades of mostly bad internationalization support it's no longer accusatory, it's empirical and quite generously worded.


This is a pretty pessimistic take. The real answer lies somewhere between budget and speed. If someone asked me to support non-latin alphabet, I'd have no idea where to start and the amount of people that would use that feature isn't worth the consideration. It's not that I don't fathom it, it's that I don't have time for that shit.


> while allowing all realistic and sane inputs.

Isn't that a way of saying "while disallowing perfectly valid options"?


What's disallowed are a) IP literal addresses and b) localparts that require quoting. These email addresses are highly likely to break many processing steps anyways; I've only ever seen category b in sendmail configs (it can be useful for internal email rerouting purposes).

There's a distinction to be drawn between the requirements of the actual MTA/MUA/MSA layers and user applications built on top of them. For the latter, considering emails to be invalid if they contain IP literals or quoted localparts is going to be more helpful than harmful (there's less scope for vulnerabilities in doing so). It's just like assuming email addresses are case insensitive: it's inappropriate if you're an MTA, but for everybody else, go ahead and assume they are.


> What's disallowed are [...]

A-ha, but here you're wrong because you've excluded IDNs. This is really why you should not try to be clever.


IDN A-labels would still be accepted. Using the U-label is likely to require the same level of support as EAI, because without EAI support, non-ASCII strings are likely to horribly, horribly screw up the lower levels of the stack, and I wouldn't recommend supporting EAI without actually testing to make sure your stack can really handle EAI. (Not to mention EAI localparts being their own can of worms).


No normal user will enter their e-mail address in punycode. This is something your own software should be doing.


And yeah, IDNs are covered by that spec reminding you to punycode domain labels first.


How sure are you that the 61 character limit won’t change in some future DNS improvements? People used to think TLDs would only ever be up to 3 characters long.

More importantly, what problem is this even trying to solve? Someone accidentally typing a 300 character domain? If they are intentionally feeding you gibberish they’ll just give you more realistic looking gibberish.


You seem unreasonably hostile.

I’m absolutely certain that the 63-character limit for domain labels is never going to change, because it’s hardcoded in enormous amounts of software and hardware, and there’s no even vaguely compelling reason to even attempt to change it. But if such a thing did change, then you’d just add this to the extremely long list of things that needed to be updated.

People who thought TLDs would only ever be up to three characters long were simply wrong from the very start because they didn’t understand what they were dealing with. (As a simple example, .arpa was there from the start.) Understand that this wasn’t a matter of anything changing, it was that some people misunderstood and thought that a convention they observed was in fact a rule.

The problem this sort of validation is solving is weeding out things that are definitely not going to work, as soon as possible, because it’s good to point out problems to users as soon as possible, rather than having something silently fail or only notifying the user about it much later. Syntactic validation isn’t the be-all and end-all of accepting email addresses, but it’s definitely still worthwhile, even though you should generally do other validation based on DNS lookups and/or sending actual emails as well.


That regular expression fails to validate a bunch of the examples from the article. And also single-word addresses, which are pretty useful if you want to route email locally.

So what makes it the best?

[Edit: it also assumes you've already parsed out the "real" address from the rest of the text field, which to me makes it a half-validator at most.]


Yes, that's the explicit point of it.

But it would seem to be the best for general-purpose web use, e.g. signing up for a newsletter with an e-mail address that's pretty much guaranteed not to break anything.

Instead of being conservative in output, it's intentionally being conservative in input.


Single-word addresses? I presume you main domain names with only one domain label (like “localhost”). Read the grammar again, they’re supported.

> it also assumes you've already parsed out the "real" address from the rest of the text field, which to me makes it a half-validator at most

I’m confused. The explicit purpose of this stuff is to validate an email address. Not to extract an email address from a freeform text field, which I think is what you’re talking about. Deciding how to do that is a whole ’nother can of worms.


> And also single-word addresses, which are pretty useful if you want to route email locally

Unless you're developing an app for an intranet, that's not a concern for most people.


Hope you're not in PHP, Perl, or Ruby!

http://emailregex.com/


That's a pretty bad page, given that it gives regexes that match very different things for different languages, without a) explanation what the differences are, and b) any rationale for why you may or may not want to choose between the different versions, let alone c) why different languages "deserve" different versions of the regex.

This is already a field where there is a lot of misinformation flying around, and a page that merely regurgitates all of that misinformation without the perspicacity to realize that its purported information is internally incoherent is not helpful.


I provide my email address with the +companyname suffix on the local part as a way to filter my email into various folders based on the To header contents.

Unfortunately, many websites are configured to reject email addresses that contain a plus character. I've also encountered websites in the past that did accept the + character when creating the account where the email address serves as the user name, but then could not log in because their log in form rejected the + character in the user name.


I got sick of companies rejecting email with "+", and bought a domain to use for email (among other reasons). Now I've got a wildcard entry in DNS, so any valid local part gets routed to my inbox. So instead of "username+company@example.com" I can do "company@example.com".


I ended up giving up on that after one too many websites rejecting my custom domain (which I’m the only one using) on signup. These lazy / ignorant colleagues are annoying -_-‘


I've been using a similar scheme for about 7 years now and have never had my email rejected by a website on signup.


The American Kennel Club rejected mine because the domain was “too similar” to their name. I guess just because it had a “kc” in it? Completely bewildering.


I use this scheme (company@mydomain.com) and one that I remember blocking for this reason is Aliexpress/Alibaba - aliexpress@mydomain.com was rejected so I use ali@mydomain.com.

No idea what sort of security this is supposed to provide.


It happens rarely, but some only accept a very limited number of domains (ie Gmail, Outlook, etc).

They probably see it as some sort of security / anti-spam mechanism.


I use a .xyz domain for my personal email, and I sort of regret it.

My emails have a tendency to become spam filter bycatch, to the point that when I was job hunting last year I'd have to ring people after I sent them my resumes etc. to confirm they actually received my email.

And when I give people my email address, I usually have to assure them that steve@stevetech.xyz is a legitimate email address and not a joke (it's not actually steve, but you get the point).


I host my own email server, and .xyz is one of the 2 or 3 TLDs I went in the config files and manually blocked since nothing but spam comes from it (and lots of it).

Definitely would not recommend using it for your personal address.


Can you explain the DNS part? AFAIK the sender just looks for MX on the domain itself, regardless of local part.


The actual address in the email header should still contain the subdomain though.


The address "company@example.com" doesn't point to a subdomain, though, the only reference to the company is the local part of the address, and so has nothing to do with DNS.

If he said he used "joe@company.example.com", then it's possible he has a wildcard MX record for *.example.com, but that's not at all what he said, although perhaps it's what he meant.

Regardless, the question remains unanswered.


I just set up my mail server to use - rather than +, and don’t encounter this problem.


What provider do you use for email? That does sound nice.


Fastmail supports it. The best part about fastmail is that you can reply from the same address you got the email for. This is useful in customer service scenarios that identify your account based on email address.


The reply part is somewhat new. I had to delete and re-add the wildcard to get it on my old (circa 2013) account, but very nice to have


I use migadu for this.

I also use greg-*@domain instead of *@domain, since their docs claim that setting up *@domain tends to attract more spam.


Huh, how did I not ever hear/find out about this when I was choosing a provider... I think this is the first time I've seen them mentioned on HN, despite searching through quite a few de-googling threads. Will definitely take a closer look!


Another Migadu user here slowly degoogling myself. $19 a year is a bargain for my usage and the features I get.


Also a migadu user, I'm a huge fan and can't speak highly enough of them. Their pricing model is a perfect fit for me and their support address is really quick to respond.



I use ProtonMail and sign up to everything with <service>@<custom-domain> so I can track what they do with my email.

It's not cheap from PM, and there are loads of hosting providers that will provide catch-all email for free with your hosting package (but with some usually pretty poor webmail client) or if you use a mail client it should work too.

I like having good webmail and mail app and other things so I pay, but there are plenty of good options available. Sadly self-hosting email server is not really an option for a variety of reasons, but you should easily be able to use catch-all e-mail addresses.


The paid version of gmail (google workspace/gsuite) offers this as well (they call it "aliases"). I haven't explored the option myself, but I do recall seeing something like this in the admin panel. Whether they charge for it or not is probably something I should look into.

At some point, I need to migrate away from google and build out my own personal mail server.


mailbox.org also provides the functionality to use your own domain and a have a wildcard entry, where all emails go into your inbox.


I've tried something similar with Fastmail, and it works out well for the most part. I have ran into more than a couple services which won't accept email addresses not on a whitelisted domain for some reason and I had to use an @gmail.com address which forwards to my domain.


Out of curiosity, are those popular services? I'm in process of setting up email on my own domain and it would suck having to fallback to Gmail if some service uses an accepted list of domains.


fastmail is reasonably popular. Gmail is bigger, but fastmail is big enough that they cannot be ignored, unlike when I ran my own personal server and often found myself in blacklists without any knowable way to get off.


I've had a couple places not take my .us domain, but almost everything is fine with my .org. The places I've run into that are really picky don't like gmail or other free email providers.

The one exception is Craigslist; if I email someone with my normal email, I never get a response. I always use gmail for that.


https://forwardemail.net/ is fantastic if all you want is to forward domains somewhere else.

It's a freemium model, but I've never needed anything in the paid tier


In the UK, my domain name provider offers free e-mail forwarding for (I think) 10 specific e-mail address, plus a catch-all forwarder for anything else. Works quite well.


I'm on fastmail.


That causes weird behaviour in places, where they assume the bit before the @ is a "username".


I've been using this strategy for years and have not encountered that issue before. That would mean the part before the @ would have to be unique across all domains. That doesn't make any sense. You couldn't have webmaster@domain1.com and webmaster@domain2.com registered for example.


Or ben@gmail.com and ben@hotmail.com couldn't both be registered. This scheme is so obviously flawed I can't imagine it's widely implemented.


I've been using an own domain with wildcard emails for many years now. I'm yet to encounter a single scenario of inferred names.


I was unable to provide my email address for a retail rewards program last week because the input field for the domain was a dropdown in their POS. Not the TLD, the entire part of the email after '@'!


Yes this is terrible. On the other hand, if your goal is to prevent people from signing up using disposable domains, the blacklist approach (which I have tried before) is a never ending game of whack a mole.

Sounds like this was in person at store though which is extra weird because seems unlikely that scammers would be trying to sign up en masse at a physical location (unlike if the form is connected to the internet)


Jeez, wow. How many domains were in that box?


"There are other emails besides gmail and hotmail? Woah!" - the person who thought that was a good idea, probably.


There is absolutely no way that someone who thought building a dropdown for email domain name is a good idea wasn't putting AOL and Yahoo! as the first two options.


Until about a decade ago, this was extremely common in Japan. RIP mobile email, another victim of smartphones in general and the iPhone in particular


I used this wonderful trick to sign up for my government issued eID (it was something else but works for explaining). What they decided to do is to simply remove the + and don’t let me know about it.

my_email+service@foo.bar thus became my_emailservice@foo.bar

I tried logging in, resetting passwords, nothing worked. I had to go to the authorities and make a written request to allow them to interrogate the database by the equivalent of my social security number, and that’s when we realized they just stripped the +.


Fastmail allows for companyname@youraccount.fastmail.com -style addresses. Even for your own domains.

Much more reliable than the + -thing, which breaks in the weirdest of places.


I've been using fastmail for years and didn't know that. Thanks!


Ages ago, back in myspace days, their system would permit + when creating an account, but could not handle this in their forgot password / password reset system. I never was able to delete my account because of this.


Sony's SEN used to have an account creation page that would permit +, but subsequent sign-in interpreted it as a URL-encoded whitespace. No login for you


lol you should have tried to enter a URL-encoded plus sign, %2B.


All social media accounts are delectable on a long enough timescale.


As it happens, it was eventually done for me:

https://mashable.com/article/myspace-data-loss/


I use Fastmail with my own domain name and unlimited email inboxes, so I use companyname@mydomain.com to sort incoming mail.


I do the same thing and believe it or not I’ve seen websites reject emails with their own name in the email.


I had one do that. When I give the address in person I get "do you work here?"

I had to switch my hosting provider at one point because they stopped supporting catch-all. I have no idea how many "addresses" I've used, since I don't create a specific email for each, so I had to get new hosting (note: this was over 10 years ago)


I recently got a letter from a companies' law department and had to explain the whole thing :D


If you use Gmail here's a fallback option: Gmail ignores "." in the local part. So foo.bar is the same as f.ooba.r to Gmail. Obviously quite limited and more hassle to keep track of.


One of my primary pet peeves with Gmail. It leads to a lot more junk mail arriving in my inbox. My real Gmail address is 'first.m.last', and almost all the spam I get is addressed to 'firstmlast'. Gmail is great at filtering out spam so that I don't see most of it, but if not for their unconventional filtering of recipients, I'd get even less. I also get a lot of email from idiots who don't know their own address and provide mine instead, and literally all of that would bounce without their . handling.


Same here. I send everything that is firstlast@gmail straight to junk.

> I also get a lot of email from idiots who don't know their own address

Holy crap there are a lot of them. I've got one bank sending me the dude's statements. He's also been on some interesting trips, seen all his hotel stays, etc.


Same. I don't have a very common name but there are at least two other people who share it. One has used my GMail address to apply for jobs and for his unemployment benefits. I'm guessing he isn't having much luck with either one.

The other finally figured it out but his wife still hasn't after more than a decade. It gets really old receiving reminders to service a vehicle I've never owned from a dealership 2000 miles away among other similar crap.


I thought it would be nice to have my name without numbers as my gmail, but with all the stories i've heard, I think i'm glad I have the numbers now.


This pattern is often abused by spambots trying to avoid dupe detection, so using it excessively may lead to your login being treated as spam.


I use a catch-all to have a <website>@<mydomain>.com login for every website.

Samsung doesn't accept emails with "samsung" as prefix, so I have samsun@mydomain.com for them. I have no idea what's the logic behind.


I use reverse DNS notation for the local part. So that would be "com.samsung@mydomain.com" in my case.


I got sick of + not being accepted and switched to using - for all my aliases, which works everywhere I've tried. It's annoying, but practical (assuming you run your own mail server, or have the ability to manage it client-side).


Plenty of hosted solutions support wildcard - including GSuite and Fastmail.


Ditto, with the same hassles mentioned by you and others, such that I'm actively looking at email services that handle this sort of thing better using approaches such as mentioned below - domain@mydomain style registration addresses.


You can have unlimited handles with fastmail if you're looking for that


I find that a lot of website don't allow + sign precisely because of Gmail usage.


I've decided that the best way to validate email address is to not validate them, but require that any signup be finalized by the individual following a link emailed to them.

This allows a person to use any damn thing they want as their email address, provided it works and they can get the email.


If sending emails is 100% free, but you still have to worry about your sender reputation. [1] Sending a large amount of mail to invalid emails will start getting your emails put in people's spam folders. That's the reason email validation services exist, to prevent sending to invalid emails. [2]

Also, humans make mistakes. You should detect spelling errors and typos then suggest corrections. [3]

[1] https://www.mailjet.com/blog/news/3-factors-that-impact-your...]

[2] https://www.mailgun.com/email-validation/

[3] https://www.npmjs.com/package/mailcheck


Mickey@mouse.com is a perfectly valid address but it isn’t my address. If that matters for your application you need to spend the capital to send an email. No way around it.


Even worse, I have commonfirstnamecommonlastname@gmail.com and get several emails a day that I didn't sign up for. Now the person who did sign up isn't getting them and I have to figure out how to opt out of them. Sometimes these website accounts already have payment/personal details associated with them, which I now have access to (and indeed, sometimes have to view) in order to find the "stop sending me email" button.

Always send the confirmation "did you sign up?" email. Always.


Even if 0% free you'll have to do the opt-in anyway, or how on earth will you figure out if the recipient wants your email?

It's hard to be smart with something like names.


Oh don't worry about this at all because spammers are going to sign up with legitimate e-mail addresses that are going to get your reputation lowered. Very common tactic and you won't be saved by some dumb regex that would just probably hurt a few real users.


Two separate things: preventing fake account sign ups vs. preventing legit signup typos

In practice, you build your UI for the latter. You add captchas or other friction for the former.


This is really just a problem for spammers going out and either buying mailing lists that haven't been validated or scraping the web for email addresses. In the case of the spammer, they would probably care a lot more about their bounce rate than their false negative rate (i.e. valid addresses that fail some sort of validation regex). In fact, they would probably tune their validation to actually throw away addresses that didn't look correct just to be safe.

Obviously, this is a different scenario than your bank not accepting your valid (per RFC) email address. Which is why any sort of blanket advice is pretty dumb. Not that I care to aid spammers...

The other scenario might be a site that puts up a "paywall" type thing, where you are forced to enter an email address to gain quick access to something, but doesn't want to bother you with going and verifying an email (e.g. instant discounts, downloading a PDF, etc.). Or in-person email address collection when you buy something in a store. It's never a good idea to collect email addresses of people that have no desire to subscribe to your marketing.


Yeah, buying a dataset of a couple of million email addresses and then using them to email people who didn't sign up or request your email isn't really something I care to optimize. It wouldn't shock me at all if the services that charge to validate emails are just doing a half-ass regex and leeching off spammers anyway.


Yet those tools are full of bugs as well as culturally blind and prudish assumptions. Hint: try signing up with the address null@null.com

- https://blog.jgc.org/2010/06/your-last-name-contains-invalid...

- https://haacked.com/archive/2007/08/21/i-knew-how-to-validat...

- https://fosdem.org/2018/schedule/event/email_address_quiz/


So you use another 3rd party validation service, paying $300-500/million addresses.


100% agreed here. Accept a text field; maybe validate that it has an @ in it and a . after the @.

Send that address a confirmation email. Now you've got consensual opt-in and you've somewhat protected yourself from adding a wrong address to your recurring mailing list.

Prevent abuse with long (seconds) delays between submissions from the client. If the user thinks they did it right, they're waiting on their email inbox anyway; if they immediately realize they made a typo, it'll take 2-3s to fix.

The RFCs were written when manually (not from cron) sending email to another user on your local system as a thing that actually happened. I'm certain you actively want to avoid that now.


Yup I’ve been working in email marketing for a long time and this is what I do if I need a regex. I remember when .mobi TLD came out and people with those address had a terrible time signing up for things because a bunch of developers got too cute and assumed a TLD could only be 2 or 3 characters. You want to be really lax in what you validate.


This is also my preferred approach.

If I can send you an email and you can verify that you have access to that email, your email is "valid enough" for me.

Then, the validation is basically "is there an @ and after a dot in there?". I find that after that, every hour spent on improving the validation will just cause more emails falsely flagged as invalid, more support requests from the people who couldn't sign up with valid emails, it's code we need to maintain, anytime edits the validation logic risks breaking sign ups completely.

So with more "improvements" to the validation, you just cause more problems. Then why do it?

I hear the reputation arguments, but in practice, it never happened to any of the organizations I worked for.

What happens though very often is naive engineers trying to solve problems the business doesn't have with knowledge they lack...


> naive engineers trying to solve problems the business doesn't have with knowledge they lack.

premature implementation is the source of most evil. :-)


My cheap-o approach to this is: Check there’s an @, and that there is a dot afterwards. This excludes local domains obviously, but I don’t want those anyway.


I’ve never seen much point in trying to do better than .+@.+, unless you’re going to pull out the (gargantuan!) authoritative version for some reason.


Implementing the authoritative version is a waste since you'll also need to keep an up-to-date list of TLDs, and more importantly, you might have a typo in the input that gives a valid-but-incorrect email.

After doing your simple regex, the best move is to just send a verification email and wait for the user to click the link, if you really need to be sure.


Too many sites refuse to let me register as

  "><script>alert("XSS");</script>@example.com
The oppression must end!


I think the mismatched quotes actually do make that one invalid.


shouldn't is read Bobby Tables rather than a mere XSS?


Honestly if that'd ever work, the website has bigger problems anyways.


Yeah, do simple validation, and then just send an email. Even a validated email can still be non-deliverable if there’sa typo in the domain or the first portion.


Or the user typos their address and it goes to the void or to someone else. Even a valid address can’t be assumed correct. I get a ton of emails to my old gmail that aren’t meant for me because some people are too dumb to get their email addresses correctly (even for someone’s covid vaccination appointment confirmation and details recently...) or just make a mistake.


Yep, these arcane rules are maybe relevant to the 5 or so people writing mailservers, but not to web developers.


Most of these are just overcomplicating validation. What really matters is account verification, i.e. sending an email to the specified email address in order to verify its authenticity before sending any kind of email (transactional, marketing) to the account.

At this point, not doing email verification should be considered a dark pattern because it causes so much trouble when people's email addresses are used without their permission.


And "permission" isn't even the only issue. Months of Doordash account emails were lost to the ether because I made a typo (gmail.lcom) in my personal email, and it was basically impossible to change the email on an account (their SMS verification seems broken). It does explain why I never got order confirmations though, that had seemed odd.


It seems to be a common antipattern for somewhat smaller sites to make the email address the primary key on the account too, and then it's virtually impossible to ever change it after that. As you scale up it becomes impossible to ignore the fact that people change email addresses sometimes, but I've lost track of the number of smaller sites that assume it's a safe primary key.


raises hand

I can confirm that this is a very stupid mistake to make. :-(


What this article really showed me that this RFC is actually pretty harmful.

Supporting all of the rules outlined in the spec is probably a huge burden for maintainers of mail clients and servers. Obviously some parts of the spec are going to be omitted. It's hard to blame them for it, but the same person that rightfully skipped over implementing the routing thingy might've also wrongfully assumed there won't be a Japanese character in the address. And that's what's so bad.

You might introduce more issues in your system, by taking the full spec into consideration for your validation, instead of using the whatwg regex someone posted here.


Well if there are problems with the RFC then you should work with the IETF to correct those. They have an open standards development process.


Another option is to just ignore the RFC


That does mean that there will only be an ad-hoc undocumented standard for email addresses, rather than one that's serviceable.

Web application validation forms add a different layer to the standard and are sort of hard to tame; anyone can push together a few lines of PHP or Javascript code and conjure their own email address standard out of thin air.


Will be? There is an ad-hoc standard.

If the standard fails to be used, the standard is defective.


Isn’t it just not very nice to ignore a Request for Comments


Validation errors are common, but warnings are not.

I'd like to see more of "Patterns like [what you entered] are uncommon—are you sure?" instead of "Patterns like [what you entered] are not allowed—change it to proceed."


I recently implemented this using the great Mailcheck library. So if someone types "gnail.com" or "gmail.con" it detects it and we can show "Did you mean gmail.com?". If someone ignores the suggestion, fair enough. If someone purposely wants to give us a junk email, fair enough. At least we're not frustrating them needlessly.

https://github.com/mailcheck/mailcheck


Mostly, yes. However, some things should probably still be prohibited, such as:

- An email address ending with ".invalid", unless invalid email addresses are supposed to be allowed (which in some cases is useful, but you can then disable sending email to such an address, using it only for identification). (I do use such an email address for identification on NNTP.)

- Email addresses without at least one at sign.

- Email addresses containing control characters (at least ASCII control characters).

- If the domain name does not resolve or resolves to a loopback address or LAN address (except for some specialized cases where such a thing is desirable). The same is true for literal IP addresses; if it is a loopback or LAN address then it should be disallowed, but otherwise it can be allowed.


I have a tld that was recently created (2014) and I still cannot use it in an email address reliably.

The domain in question being david.kitchen, so an email may be email@david.kitchen

The issue I encounter more than any other is trivial: Most sites still have a tld validation that only accepts domains that end in net|com|org and some other small list of accepted suffixes such as co.uk

The list of TLDs is constantly expanding https://newgtlds.icann.org/en/program-status/sunrise-claims-... so even `[a-z0-9.-]+@[a-z0-9.-]+\.[a-z0-9]+` would be better than what I see in the wild.


I was involved in setting up .name back in 2001. We spent ages contacting people with validation rules based on the old set of TLDs. Given that was the first expansion of the gTLD space in a long time, it wasn't so unreasonable then. But it's just astounding that it's still and issue 20 years later.


sometimes you get regressions too. Kaiser Permanente invalidated my email address earlier this year.


Someone found a sleeper regexp on Stack Overflow...


I managed to buy firstname.dev a while ago and this was one of my fears of using it as my email address. I ended up switching to a .com one just to avoid any issues. I certainly don't want government services emails not to work just because maybe they didn't account for .dev TLD


I have the same issue. I use sparr.email and it fails validation on a few critical websites, namely my online utilities account (seattle public utilities) and payroll processor (ADP).


yeah, I routinely use .email and .cloud and it's so annoying when the occasional site goes "THAT IS NOT AN EMAIL ADDRESS !!111!1 YOU HAXXXOR!".


and the regex you provide doesn't even account for unicode..


The regex came from the article, and lack of unicode is addressed in the article.


I have two things.

The amount of times I've tried to sign up with my protonmail account to a service and it doesn't pass validation simply because it's a protonmail account (not a gmail, outlook, hotmail or aol apparently). makes me wish everyone did follow the RFC. I actually emailed a service one time, and they responded that it's due to protonmail usually being associated with shady stuff wtf.

The second. I had to implement an email validator at one of my previous jobs, and fell down the RFC rabbit hole. Not only did I have to follow the RFC as per my bosses request, but I also had make sure that Amazon SES allowed it. Came out of the office wanting to just walk out onto the road. The weird things that not only email servers allow, but also, what do email clients allow.


There is no point in many cases. Even if you can verify that the email address is syntactically valid, you'll still need to check that it was not mistyped, and that it actually goes to the person you think it does. The only way to do that is to send them an email and have them click a link to verify.

However, if you still want to validate an email address then use a library. All popular programming languages have email validation libraries. Yes, it's an extra dependency if it's not included in the std lib or the framework you use, but email validation is wrong in 99% of the cases, if you wrote it yourself.


Or use the browser. HTML form validation has <input type="email"> which checks that the entry is a valid email address.


Not quite, browsers have slightly different implementations here, we recently had to stop using this because we had customers using IDN e-mails like hansi@lübeck.de and this would fail validation in some browsers (but not others).


Interesting. Got a bug report I can track for those browsers?


Perfect is the enemy of good.

If it is a string that has an @ sign, a dot and is at least six characters long, it's probably a valid email address.

a@b.cc

No need to go further than this. It's not worth the time.


You don't need the dot. Item 12 in the list is "You can have dotless domain names."


indeed, the point the parent comment was making is that effective email validation need not perfectly implement the RFC.

dotless domains are going to be so rare in practice (unless your project has some niche use-case) that you can probably ignore them and call them invalid for the sake of simplicity.


It's worth alerting the user that they likely made a mistake. However, I think they should be allowed to continue with a dotless address.


Do you know many people who own a TLD?


Last time I checked, some of the TLDs did have an MX record. Perhaps they use it for support or something? I could imagine emailing info@tld or admin@tld or support@tld.


This is a great run-down of the trouble with e-mail addresses.

I worked in e-mail security for quite a while. "Write an e-mail address parser" was my go-to technical interview question.

It was pretty easy to see if the candidate had ever given any real thought to e-mail (most had not); and you could also pick up a lot of signals about engineering style, for instance if they started with a regex (fewer did than I expected). And it was trivial to adjust the difficulty: if someone thought the question was easy and had a fast solution, you could just throw them a test-case like the ones in this article.

(Note: the actual title is "Your E-Mail Validation Logic is Wrong" -- and it's only about addresses, the author isn't implying that e-mail systems can't validate messages nor for that matter addresses.)


What if the answer they gave was "This is a very hard problem that honestly isn't worth solving, just check against a .@. regex and call it a day?"


"Can you write me a parser that has a <1% false negative and <1% false positive rate on real email addresses?"

A similar enough issue happens in coding interviews anyway. Sometimes the interviewee is aware of a library that essentially solves the problem for you. In those cases I give them some credit for knowing of it and then ask them to implement it anyway, as if the library didn't exist (because there are a large number of problems out there for which a solution doesn't yet exist, and when hiring a SWE you need to find someone who can write new solutions from scratch for those situations; whether a given toy interview problem is such a situation doesn't matter for the purpose of evaluating said skills).


I would usually structure the question a bit, give a couple test cases with different formats and ask something like "write a class..." if in Python, etc. I wasn't trying to trap anyone who might actually think /\w+@\w+/ covers the range of all possible addresses.

Digression: I do miss the days when you could assume a candidate for a position at Aquatic Widgets Incorporated would know something about water, or about widgets, or at least would have looked up what an aquatic widget is before bothering to come in for an interview, but those days have long since departed the realm of Software Engineering as far as I can tell. Which may be a good thing from the engineers' point of view, I'm not sure.


That would demonstrate an understanding of email -- it is a very hard problem -- but probably also an unpleasant attitude you might not want in a co-worker. Whether the problem is worth solving is very often not your call as an engineer.


No offense but if an organization does not listen to engineering in determining how to deal with a technical problem, that is an enormous red flag.

While maybe the engineer won't actually make the call, the engineer should inform management's understanding of the costs of the approach and the efficacy of alternatives, and management should go along with that recommendation unless they have a good reason not to. Of course tone is important, someone saying "fuck no, I ain't doing that" likely indeed would be unpleasant to work with, but a respectful "I would recommend against doing that" is the sign of a confident and intelligent professional.


> Whether the problem is worth solving is very often not your call as an engineer.

IMO well functioning teams do consider the thoughts of their technical members when deciding which problems to solve.

The person "making the call" on whether having perfect email validation is worth solving may not have an appreciation of how difficult it actually is, so having a discussion with engineers on how much work/time it would take should play a big part in prioitising it.

Additionally, things like validating email on signup are mostly solved (albeit imperfectly) so one can and should use existing implementations and focus on building their product.


Yes, and it's a technical question. You wouldn't let business people decide which database to use, how to store data in a database, how to send data from backend to frontend, etc... those questions should be up to the technical team to decide.

Password strength requirements and email validation are just like the database examples, and if a company doesn't let these technical questions be answered by the technical people, that's a bad sign.


>Whether the problem is worth solving is very often not your call as an engineer.

True, but as an engineer you do need to provide accurate feedback regarding "Hey, this is gonna work much of the time but email is hard, this is a complex problem. If we do this from scratch we're going to miss a lot of things potentially".


So only a@b?

I think you’re missing some stuff in your regex.


HN markup strikes again (OP wrote .X@.X, where X is an actual asterisk, which HN renders as .<i>@.</i>)!


Eh, that's why you use a validation email. Only bother with 'validation' at all to catch something obviously wrong.


I'd sort of raise the issue that "Writing an email parser from scratch is a bad idea due to the sheer complexity involved. If you're looking for serious email address validation there may be better options out there that have dealt with this complexity rather than start from the ground up."

Not to say I wouldn't try just for the sake of working through it as an example / 'where would you start' discussion.

But if we're pretending this is a real world task I'd probably discuss how this is an endless / possibly ultimately futile time sink and there might be better options than starting at point A ;)


I just check that the string contains at least an @ character. That ensures that we're not rejecting people with uncommon patterns in their email address and takes very little time to design, develop and test.

In a project we're doing something fancier: we check the result of sending mail and store it in the database record for the account (Mandrill notifies us on a webhook.) Then we might take actions for bouncing addresses. The actual impact on the project has been zero so far.


This feels like a discussion for backend implementations/email forwarders, not for email signups... but hey while this has some attention - For god's sake, put a button that says "This ain't me", at least for important stuff.

I'm sorry, but I just can't bring in Clyde's truck for the oil change, cause Clyde ain't me!

I also cannot attend Cassidy's parent teacher conference, apologies, I am not in Ohio.


>This feels like a discussion for backend implementations/email forwarders, not for email signups...

And yet I've worked multiple places where product people asked for "simple email validation" on user signup. If they insist, I ask them to provide some actual test cases that they care about. Sometimes the product folks can be convinced to drop the validation requirement if they can be shown that anyone who can't sign up because their email address doesn't validate will simply move on and not sign up.

In the case where your product is B2B and all the employees of your customers are users (say an HR product), then the first time a VIP at an important customer complains, that's usually enough to convince your stakeholders to disable the email validation.


I have a very short email address in the format a@b.tld and for my special friends that don't know how to validate correctly I have created abc@ur-email-validation-is-broken.b.tld

I need to use the latter ~5% of the time. Most often I take my business to someone else for the sake of principle.


Why not accept absolutely anything in the email address field, and just require an emailed link to be clicked before marking the email as validated?


Because it causes conversion drop off.


There is is awesome talk about E-Mail by Ricardo Signes:

https://www.youtube.com/watch?v=JENdgiAPD6c

The first 5 minutes are perl specific, but the rest is email and just hilarious.


I’m of the opinion that all you should validate is that there is an @, the text to the right side (of the final @) is a well-formed dotted DNS domain, and that there exists at least one (non-whitespace) character to the left of the (final) @.

Yes, I can craft garbage emails that pass this quite easily, but who cares? If I’m crafting fake emails I can make valid ones too. This rule ensures I typed the @ and the dot in my domain (we really don’t need to support dotless email domains and it’s better to catch “foo@gmailcom”) and it won’t reject all the weird random emails people might have.


Unpopular opinion: Just ignore these edge cases and focus on the 99.99% of the sane population that doesn't get off on having a weird email address.

More generally: If an edge case exists and has nothing to do with accessibility (it was caused by a user having a different workflow like needing a screen reader or being in a less developed part of the world with slow internet) then you should dismiss them and not make your code/life unnecessarily complicated.


The only relevant email validation is verifying a user can click on a link or enter some code sent to them to the address specified by them. Without verifying ownership, the email address is worthless as an identifier so making it conform to some syntax is not that relevant and you should not use unverified email addresses as identifiers (identity theft is a thing).

Obsessing about regular expressions for these addresses is generally a waste of time except for maybe preventing a lot of failed attempts to send stuff to a clearly invalid email address. A simple string contains '@' is probably good enough for that. Worst case the email address does not work and you discard the entered information after some reasonable time frame. The user has the option to try again and do a better job of typing their email address.


If your validation function works for 99.99% of your user's email addresses and it's a big unnecessary lift to get that other .01% your logic is not wrong.


I don't think I've seen a bang path since 1990. The claim "Your E-Mail Validation Logic is Wrong" is just pedantry.


What you've seen since the 80s ended is unfortunately only a subset of all the horrible edge cases your users will run into.


What sendmail rules would you even use in 2021 to process a bang path? Just deny.


A little later for me. I last used bang paths in 1994, when I had a UUCP feed.


Just get rid of that pointless filtering altogether.


Email validation :

Accept email from user.

Send email to that address with a link to verify.

Go/no-go test if link is clicked.

(If you're doing some fever garbage or otherwise trying to parse it, you're doing it wrong.)


This doesn't work with hotmail where sometimes a robot will click the link but refuse to deliver the mail.


I once had a crack at building a sensible email validation library.

* Validate the string contains "@" and a "." to the right of it.

* Validate common typos

* Validate disposable emails

* Validate MX records

* Validate SMTP server and mailbox

https://github.com/mfbx9da4/deep-email-validator

I don't have the time to keep it maintained but it works for the most part!


Nice article.

My only technical nit would be the statement “if there was an MX record”.

Many systems will fall back to an A record to attempt delivery in absence of a MX record.


The one I see over and over is failing to trim the email before doing validation. This is especially egregious at account creation where you want no friction. Users enter their email with their smartphone, and it may append a space at the end. More than once, I've had a relative call me trying to figure out why $website wouldn't accept their email as valid.


No the problem is developers confusing validation with verification. You can’t validate your way to a correct address and it’s wrong to try.

If your goal is to catch typos you’re better off with very lax validation plus a library that suggests corrections like “gmail.com” for “gnail.com” (both of which are of course technically valid domains)


People love to quote the RFCs, digging up examples of weird things that are technically allowed in e-mail addresses, but not widely used or supported.

Frankly, I don’t give a damn about supporting some l33t h4x0r wanting to be clever about his e-mail address.

So no, my logic is not wrong. It just doesn’t care about all the weird edge cases.


There is exactly one right way to validate an email address: Send an email to it.

It makes a lot of sense to have a good regex that validates it on the frontend and even tells the user, "hey, this doesn't look valid, are you sure?" but don't ever reject someone for failing your regex.

At the end of the day, you should only validate by having them click on a link in their email. Then you know the whole thing works.

If you're worried about someone hurting your reputation with a bunch of bad email addresses, you can certainly mitigate that by rate limiting your signup API or even deprioritizing sending confirmation emails to email addresses that don't look valid. But you still should eventually send an email to make sure it's valid and not reject it until it bounces back.



Hah, just went over the email address validation logic in our app last week because a client asked.

Turns out we do minimal validation (make sure there is a local and a domain, that there are not two periods next to each other, and a few other things) but what we really rely on is deliverability.

In other words, if your email needs to be verified, we'll try to send an email to the address you provide. If the link is clicked (or the code entered), that's good enough for us.

Applications using our service (we're an auth provider) can decide for themselves if they need email address validation. It's a boolean flag on the user object. If they do, they can use the functionality we provide to ensure it.


I'm always amazed by those "you're doing Xyz validation wrong" posts. This is 2021. There should be libs out there that do this perfectly and we should be using those. Nobody should be writing this kind of thing from scratch, especially not somebody with "precious developer time at a startup". For fun, sure, why not. But doing this from scratch for production software is nonsense. Parsing email adresses, URLs, IPs, OS path names, even parsing common file formats, stack frames or, god forbid, hacking your own crypto. Don't.


Hah, even beyond the question of address format variations there are also commercial services that do some level of email address validation - and one of them regards my business email address as invalid (firstname@companyname.com) - adding another letter works, adding punctuation works, it's just my specific first name.

Unfortunately it's done via black box on their server(s), so it's not like I can even dig through the code and figure out what's going wrong.


In addition to a very simple regex, you can do some light verification on DNS and SMTP

- nslookup –type=mx email.com

- pick the highest priority MX server

- telnet mx1.email.com 25

- validate SMTP handshake

- Start a connection: EHLO email.com

- mail from:<sender@youremail.com>

- rcpt to:<recipient@email.com>

Obviously, this might be outside the capabilities of some hosts or users. There's a bunch of services that expose this workflow for you as an api. (https://trumail.io/, for example)


Close but you also need to fallback to AAAA or A lookup of the domain when the MX record doesn't exist. Also, do you really want transient unavailability to stop your signup flow? The whole point of the way mailers are written is the mail gets delivered even in the face of transient unavailability.


See point 10 in the article:

"The domain name does not need to resolve"

Also the mail server may be temporarily offline or unreachable.


Non-resolving or offline mail servers count to your bounce rate if you have a 3rd party service handling your outgoing mail. So for that purpose, it is an invalid address in the sense that you should avoid sending anything to it.


Amazon supposedly distinguishes between soft and hard bounces, and only hard bounces count towards your failure rate for which you eventually might be penalised: https://github.com/awsdocs/amazon-ses-developer-guide/blob/e...

Although annoyingly in the case of a soft bounce they apparently only retry for up 12 hours, which for a small mail server is very much on the too short side:

In the worst case my mail server/hoster goes down just as I go to sleep, in the morning I either don't notice it or can't do anything about it anyway as I need to go to work, at work I can't do anything about it either, and it's only when I get back home that I can stand up my emergency mail server if the outage hasn't resolved itself by that time, which means > 20 h of down time and therefore far exceeding Amazon's retry window. (The RFC5321 recommendation is to keep retrying for 4 to 5 days, which is much more amenable for that case)


Those are rules made up for the convenience of marketers and have nothing to do with the technical aspects of mail delivery as defined in the RFCs.

Edit just to clarify:

KPI's such as bounce rates etc aren't a function of how mail is delivered (RFC5321). These are KPI's collected and collated by non-SMTP applications sitting on top of SMTP infrastructure monitoring bounces.

Nowhere in RFC5321 does it mention that a mail server should or must not delivery mail in respect of bounce rates. These are operator defined metrics outside of the scope of RFC5321, that may be aided by additional software or services such as spam detection.


On the contrary, those kind of rules are made up to try and keep marketers in check to some degree. Why would a marketer want to get dinged for sending an email to a nonresponsive domain?


You're going to need to quote the RFC(s) that specifically mention bounce tracking to keep marketers in check.

My original reply arose because there are times when a receiving domain or destination email address can be "temporarily" unavailable. I pointed this out to demonstrate that services that pre-validate recipient addresses upon submission of a form don't take into account transient outages due to any number of valid factors.

SMTP was designed with this in mind, i.e. try to re-deliver up to some acceptable threshold and then at some point give up (the hard bounce which is the thing that should cause the "ding", especially if they keep retrying beyond "soft bounces").


You’re going to need to show me the RFC(s) that specifically mention bounce tracking is for the convenience of marketers. Or maybe give up on every practical aspect of a technology defined in RFC(s) being covered by those RFC(s). SMTP seems a particularly bad example if you expect to be able to write a useful program using only the RFC(s), since every MTA has a whole host of workarounds for non-spec behavior.


> You’re going to need to show me the RFC(s) that specifically mention bounce tracking is for the convenience of marketers.

Perhaps re-read "jusssi"'s comment then mine. I didn't assert that bounce tracking was for the convenience of marketers, or suggest it was mentioned in any way in the RFC's, they implicitly did and I wanted to point out the error in their understanding.

> SMTP seems a particularly bad example if you...etc

But the central theme of this whole HN discussion thread is about SMTP.

If you're interested, sections 6 of RFC5321[0] are where bounce messages are mentioned (just three times in the whole RFC - bouncing, bounced and bounce) with no reference to marketers. See also 6.1:

Some delivery failures after the message is accepted by SMTP will be unavoidable. For example, it may be impossible for the receiving SMTP server to validate all the delivery addresses in RCPT command(s) due to a "soft" domain system error, because the target is a mailing list (see earlier discussion of RCPT), or because the server is acting as a relay and has no immediate access to the delivering system.

Which brings us back to my original comment, far above, that services that check once if an email address is "valid" using trumail.io or whatever when upon form filling are flawed solutions.

[0]: https://datatracker.ietf.org/doc/html/rfc5321


Ok, so the article is wrong. For an email to be valid right now, yes, the domain part has to resolve. If you're accepting any email address that might be an email in the future, then they are correct, but for 99.9% of use cases: yes, the domain has to resolve.


I just outsource this to Mailgun. User signs up, I send them a confirmation email, account doesn't get created till they click the link. If the email address is invalid, Mailgun returns an error and I show a page that says "We couldn't send an email to <address>. If you're sure that's a valid address, please try again." (Also use recaptcha for bot detection).


I created myusername@hotmail.com in 1999, then immediately lost the password. I couldn't recover my account or use the same name again, so I created myusername_@hotmail.com.

In the twenty-two years that have followed, the only website that has had a problem with my email address is Chapters Indigo, which explicitly rejects it as invalid.

For email validation, keeping it simple is best.


Not really an email issue so sorry if maybe somewhat off topic but on the subject of validation I can't not bring this up - this, remember the guy whose name was Null? https://www.wired.com/2015/11/null/


> The local part is case-sensitive.

This seems more like a bug than a feature. Maybe in 1983 the average email user knew what DNS was and could be expected to know one part of the email address would be case sensitive and the other not.

But email RFCs are probably like any other RFCs out there and specify existing behavior for the sake of interoperability.


yeah, and that's a hill I'm willing to die on.

Imagine the average non-technical person talking to some customer service agent on the phone and having to figure out if her email is JANEWATSON@gmail.com, JaneWatson@gmail.com, janewatson@gmail.com, or Janewatson@gmail.com. Could you imagine the horror and complete security nightmare of multiple people running around using the same gmail address with different case. Those Jane email addresses above would be four different people. We'd be receiving mail intended for other people all day long.


There's a lot of busy work in computing in the name of preventing mistakes

At a certain level of complexity it's easier to just let mistakes happen and provide correction tools if & when required

I remember getting our IPs blacklisted trying to programmatically ask email servers if the email address we were provided was real


My main email, which is only 8 characters long, got denied sometimes based on length.

Doesn't happen very often, though.


This is a bunch of weird edge cases that nobody uses in real life except maybe the plus trick.

American Express and Walgreens don't let you set a whatever@whatever.email address because they check for a TLD known at the time of their app's validation code, or something.


I use a amex@whatever.com & walgreens@whatever.com email for both amex and walgreens?

I've run into 2, old-ish institutions that didn't quite work with my whatever@whatever.com and had to modify it slightly for them.


.email is a new-ish TLD.


It's not even about crazy emails. My wife works for an Aerospace company so her work email was blahblah@blah.aero and a huge number of websites still don't recognize that as a valid top level domain.


I cannot send a message to the email address they provide, but not because of anything wrong with the email address itself, but because that email address is version 6 internet, and I have version 4 internet.


Link is just hanging.

Also, anyone notice the OP posts the same couple of posts constantly?


I gave a presentation on this topic in FOSDEM a few years ago:

https://www.youtube.com/watch?v=xxX81WmXjPg

(Loudness warning)


I think the best email address I ever knew of was n@ai . Unfortunately, the .ai TLD eventually decided it wasn't a good idea to have an MX recorded resolving on the TLD.


It's cute but doomed to fail because it goes against everything most people understand about email addresses.


Actually email validation is simple: do the opt-in.

If confirmed, it's valid.


I think the author is stretching the words "valid" and "invalid" past their limits here for the sake of hooking you into the article. Yeah, in some countries it's a valid social practice to spit on the floor in public, but in most, it's not.

let's say I'm a dev at google, and I'm writing some aspect of gmail. Is !"£$%@gmail.com a valid email? No. So the word valid is clearly not being used correctly here.

At the core of smtp these emails are allowed, but in practice they almost never are, and so the opposite is true. All the cases he described are, in practice, invalid.


Can't seem to find it anymore but wasn't there a post about how the letter "d" by itself was a valid email address at one point?


g-mail smtp is returning this error:

"Verify that you have addressed this message correctly. Check your SMTP server settings in Mail preferences and verify any advanced settings with your system administrator.

The server response was: The recipient address <'*+-/=?^_`{|}~#$@[ipv6:2001:470:30:84:e276:63ff:fe72:3900]> is not a valid RFC-5321 address. <...> - gsmtp"

Not even Google engineers can get it right. We are doomed.


So <any chars>@<any chars> seems beyond good enough. There is no benefit to validating beyond that for almost all cases.


What if we flipped email validation around and made the users email a one time code to validate their email?


I have my own custom email domain (sparr.email) that fails validation surprisingly often.


These days, I think 80% of email validation is just catching 'gmial.com'


Clicked thinking this might be useful information; was mostly disappointed.


"How to Hack Things with These 13 Simple Tricks"


I just want my .dev email address to not be rejected.


This is all a massive misunderstanding. An email address is the local name and the host; a host can't contain an @, so the only thing you frankly need do is split on "last @" and demand the user not escape anything. As for validation, go ahead and try to resolve the domain to make sure it works (and, if you want to verify the local part, do an online check with their server).

If this squicks you for some reason--as maybe that format is non-obvious with respect to the lack of a need to escape @--give the user two boxes with a hardcoded @ between them and have them type the two parts separately: pre-parsed input need not ever be escaped, as you aren't going to parse it at all; no need to implement " dequoting.

All of these escaping rules are then to support embedding this identifier into SMTP. The rules for embedding the same identifier into MIME are different... and even more complex! In MIME they support random stuff like "comments" in the middle of the string... is that part of the email address identifier? No.

An email address simply is not defined by the format you use to send it as part of an SMTP command, nor is it defined by the format you use to send it as part of a MIME message header :/. Into is an identifier that exists separately from either of those two (different) protocols and one would expect any number of ways to escape that content.

To demonstrate how ridiculous this all is, imagine someone comes up with a JSON protocol for mail submission and then documents how email addresses now should use \u encoding and escape quotation marks... does that mean users should type that into your app? No.

Hell: your email address form is taking an email address and then sending it over HTTP... the escaping rules for HTML form fields are different still, yet no one is asking users to type HTML-escaped strings into other applications, right?

The core thing wrong then with your email validation is that you are simply validating the wrong thing: unless you are developing an SMTP server, the rules for how to escape and parse escaped email addresses in RFC5321 are irrelevant; and, likewise, unless you are developing a MIME parser, the rules for how to escape and parse escaped email addresses in RFC5322 are also irrelevant.

The only thing that matters from either of these specifications is the underlying basic rule for what semantically can exist in a hostname and a localpart, and RFC5321 is extremely lax: you can use any "ASCII graphic or space", and so excludes only ASCII control characters and 8-bit characters... and then, as mentioned, another RFC removes the 7-bit limitation and opens up the world of Unicode.

(To push on it even further: it isn't even clear to me that one should consider the ASCII control character limitation to be fundamental to the email address identifier or a weird limitation of the current version of SMTP; and since none of those email addresses are going to work, I think one may as well just consider the local part to be any string of Unicode code points.)

Think about this: it is up to your SMTP library to correctly escape the email address you give it for SMTP, and here's the fun part: if you give it a pre-escaped email address, then clearly it is going to have to double escape it, right? So, semantically, these extended discussions of quoted strings and character limitations are always just so ridiculous :/... you absolutely should not be dealing in SMTP-escaped addresses or asking your user to understand SMTP (and the same goes for MIME).

(BTW, if you want some "real hell", one of these two protocols--I forgot which... I presume SMTP--seriously supports an empty local part. If that doesn't tell you everything you need to know about un-opinionated these RFCs are with respect to "anything goes" then I don't know what will ;P.)


Sorry no, your email address is wrong. Make a new one.


TFA won’t load for me, but I’d like to make a short PSA: RFC 5322.

Lookin’ at you, Walgreens.


It doesn't really hurt if some more exotic email addresses are not accepted, no one can really use them anyway.


Well, you can’t use them because of attitude like yours. You’re not only annoying those of us who want to use plus signs or international domains, but ensuring the “exotic” half of the world that doesn’t use the Latin alphabet is kinda left out.


Non-ascii alphabets need a lot more support than just accepting them. But I wouldn't really consider plus signs exotic anyway.


What does that even mean? We shouldn’t support them because it requires more work?


If you are consistent. A couple times I've successfully signed up for something with a username+string@gmail.com address but then have been unable to unsubscribe because my address is "invalid".


I kinda recall a lawsuit many years ago about a unsubscribe confirmation email.


in fact it probably catches more mistakes than deliberate weird email addresses to be stricter than the standard mandates




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: