Hacker News new | past | comments | ask | show | jobs | submit login
SMTP protocol basics from scratch in Go: receiving email from Gmail (eatonphil.com)
203 points by eatonphil on April 10, 2022 | hide | past | favorite | 76 comments



Mail self-hoster here. There's an additional wrinkle, which luckily you may never encounter in the public cloud space: in addition to typically being unable to _send_ mail from residential Internet connections, it's many times also not possible to _receive_ mail on a residential Internet connection.

On the outbound side, many accepters of mail will deny mail from SMTP servers on a residential IP address, as provided by the Spamhaus Policy Block List (https://www.spamhaus.org/pbl/) or equivalent. Unfortunately, there's not much one can do to get off this list - ISPs self-publish their own residential IP space onto it. In order to self-host from home, business-class service or higher is in order. Or, again, use the public cloud, and hope your compute instance's IP isn't on some other block list.

On the inbound side, though, many ISPs block inbound port 25. Since that's the agreed-upon port for mail receipt, there's no real way around this. Several times while I self-hosted off residential, I was able to call Comcast support and ask for the ban to be lifted, but not before fighting through several layers of level one support asking me to configure my mail client differently. The real kicker, though, is any time my modem was rebooted (I owned my modem, too!), I had to call in and request it _again_. My (very small local) ISP in upstate New York also performed this blocking, but _would not_, in the face of any complaining, lift the block.


In the ISP's defense, 99% of the time an SMTP sender in their residential address space is going to be a compromised device sending spam. It's pretty understandable that they would just decide that pissing off the relatively very few customers who want to run their own mail servers is a cost they are willing to accept.


I bet it’s more than 99% too. Even amongst the tech savvy, there’s special level of dedication to run your own mail infrastructure. It’s like one in 10-25K.


There’s running your own mail, then there’s running your own mail from residential. The dozens of us who do the former mostly don’t do the latter, thanks to the obvious vicious cycle.


Though I belong to the former, I do use my residential ip as a failover server for when my primary server is down, to not lose incoming emails.


What’s the benefit of this over a cheap VPS? Privacy?


Just cheaper. I mean again I do pay for a server in a datacentre as my primary server. I have a home server too anyway, so using it as a mail failover adds redundancy for free.

Also I am using windows (smartermail) so even a cheap vps isn’t that cheap.


I remembered finding some mails originating from my laptop in my gmail spam folder around the time I switched to mac from linux around 2012. I was tinkering with an instance of wordpress in that laptop and turns out sendmail in OSX was capable of sending out email to gmail out of the box without any configuration from my part (I don't remember configuring sendmail in that machine). The wordpress instance was simply used php's sendmail support and happily blasting emails without my knowledge (thankfully only to my own's address). I wonder if recent version of mac still ships with sendmail.


No special level of dedication needed. I'm one of the many who run my own email infrastructure and it is very little work. Nearly none, really, aside from initial setup.

However, I don't run it from home over residential ISP connection.


Still, it’s not trivial to set up, particularly if you want TLS connections, monitoring brute force attacks, etc.


Not the poster you are replying to, but as a self-hoster (on a static residential IP though) I don't necessarily care about brute forcing that doesn't completely kill off my connection (I've actually got two residential connections used as fail-over: 1Gbit/200Mbit + 500Mbit/50Mbit, though set up as round-robin on the incoming domains [same priority]).

I mostly run the mail for my family and myself, so the attack surface is reasonably small.

My network mostly only dies when one of the ISP-provided modems slows down too much without actually killing off the actual connection (so pings seem to keep working from my OpenWRT router to detect dead connections), which requires resetting that modem (happens only every few months, rare enough for me to bother debugging it: it's usually my dad who pings me early the next morning that his email is not working :)).


Sorry, I didn't mean brute force as in DDOS, but as in brute forcing passwords. Particularly if you run it for your family, you are at the mercy of one having a weak password, and as soon as a bot has access to one of your mail accounts, it will send spam and you can kiss goodbye your IP/domain reputation.


Yeah, I got that, and while I am mildly worried about that, it hasn't been an issue in the last almost 20 years.

I am only mentioning bandwidth because someone could easily saturate my internet connections by extensively brute forcing, and that's another mild concern (perhaps those are things that kill my modems?).

In practice, biggest issue is that Gmail in particular will never rank your server as trusted because you are sending too low a volume (how's that for an anti spam measure?), and 1/3rd of recepients report emails ending up in spam folders. Curiously, only Gmail does that.


> Sorry, I didn't mean brute force as in DDOS, but as in brute forcing passwords.

That's super easy to avoid. Use strong passwords.

Heat death of the universe will arrive before anything brute forces a 30 character password generated off /dev/random

I don't have any of my users (just family) set their email passwords, those are generated to be strong. These are not passwords that need to be seen or remembered by anyone, they just go into the IMAP client config.


Yeah, but we’re talking about inbound port 25.


I guess the argument for inbound blocking is that its way too easy to configure an open relay which is then used to send spam.


But the outbound block would prevent the relay from sending any mail over port 25, so there's no real point to block inbound connections.


I'm thinking the ISP in upstate New York misunderstood the situation in a way similar to the comment above you.


I've no argument beyond that. You're right.


> In the ISP's defense, 99% of the time an SMTP sender in their residential address space is going to be a compromised device sending spam

99% of people don't use their home computer to write programs. Some people who write computer programs at home cause trouble. Should we therefore ban programming languages for individuals?


Most ISPs do not outright ban it. They just default to blocking outgoing port 25 packets. Usually it's a support request away from opening it for a customer, especially with a static IP address (I've done that for the last ~19 years with a Serbian ISP).

Of course, how well your support requests are handled is another matter.


Ohhhhhh there's a way! Pure End-To-End SMTP sending and receiving! It involves some pretty interesting tomfoolery using IPSEC tunnels on a pure OpenBSD setup, but even works when you don't have a fixed IP.

https://www.exoticsilicon.com/jay/smtp_via_ipsec_tunnels


"In this guide, Jay shows us how to setup an IPSEC tunnel between a local machine running OpenBSD, and a remote VM hosted at OpenBSD Amsterdam..."

A tunnel to a vm hosted on a cloud provider is far from "running SMTP servers on a residential IP address" that OP was referring to.


If you're not as impressed by end to end SMTP as I am then I feel you're missing out! Oviously you need domains with RDNS and everything to point to a public IP address for relay, but if it's a really cheap VPS that's pretty cool. Sure is a LOT lighter than running a full mail server with three or four protocols to get things right and keeping that up to date and exposed to the public.


One of my back burner projects was to write a mail forwarder which did not store messages. When someone opens an SMTP connection to it, it opens an SMTP connection to the destination and passes through the message, while holding the input connection open. When the message has been delivered, then the sender gets a completion status back. No need to return bounce messages. Any problem at any stage results in an immediate reject to the sender. You don't invisibly lose messages in transit.

Outbound ISP SMTP servers should work that way. This makes email behave much like instant messaging.

Messages with many recipients may need to take a store and forward path. They can be redirected to a slow bulk mail server.

Historically, mail servers are store and forward because the destination might be offline, or on some store and forward UUCP system or something. Today, that's very rare. We don't have to emulate Sendmail forever.


> One of my back burner projects was to write a mail forwarder which did not store messages.

I wrote one of these years ago, as a front-end to my Exchange server. At the time Exchange had this thing where it ALWAYS accepted messages as long as the domain part was an accepted domain, and then it would do a fuzzy match on the mailbox names and dump the mail anywhere it felt matched.

So I had to have this SMTP front-end forwarder to weed out mis-addressed mail... which I eventually added SpamAssassin to as well. It ran for over 10 years without issue.

What I never solved/wrote was the TLS stuff, as it always seemed to complicated to code from scratch.


Right, that's the sort of thing I had in mind. Web site has addresses such as "employeename@example.com", and they forward to some mail server that actually knows if "employeename" is valid. Bogus names result in a status code, not an outgoing bounce message.


I used a proxy like this for a few years, though in my case it was to handle spam. I based my work on this extensible / plugin-based perl daemon:

http://smtpd.github.io/qpsmtpd/

I'd reject spam at SMTP time, but also capture the rejected messages, so that users could view them in an online quarantine. Worked pretty well, though I think these days the original perl-based project has been reworked into a nodejs thing:

https://haraka.github.io/about/


Without traditional NDRs, how do you tell which sever had a problem accepting the mail?


How is that different from some sort of load balancer/reverse proxy?


Content-Type is defined in RFC 2045 ("Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies"):

https://datatracker.ietf.org/doc/html/rfc2045#section-5


> I don't understand what zones are here.

Cool article.

I setup a wireguard tunnel back to a mail server hosted at home once, gotta have that PTR and my ISP isn't giving me one. I cobbled the instructions together from 3 or 4 different blogs to get exactly what I needed in my particular situation. There was all this stuff about source NATing which was like a foreign language to me and I was in "get it to work" mode so I just kind of blew past it. It seemed interesting and I got the gist of it but in the end I had a script full of iptables commands and I'm not really sure what they did, other than that they did what I wanted.

I love that he mentions something like that here. There's a real temptation to sweep that kind of thing under the rug to puff yourself up and look more knowledgeable, and it certainly would have gone unnoticed by me if he chose to do that. Props.


That sentence also jumped out for me. I also like the frankness of that sentence but what I really admire is the author’s gumption (courage) to continue with the project and not let such obstacles hinder progress.

My problem is the opposite: I would have stopped and not continued until I knew what zones were – and any other part of the technology stack that might potentially have security consequences. With my (overly) cautions/conservative approach, I might still learn a lot but I don’t get the same sense of excitement – or accomplishment – as the author.


I’ve been super curious about how modern SMTP handles encryption - my understanding is SMTPS was dropped in favour of STARTTLS, with a HSTS-like directive hosted in DNS.

Does the latter actually happen? Do smtp clients actively check this? What’s to stop some middleware performing a tls downgrade?


> with a HSTS-like directive hosted in DNS.

I think you are talking about MTA-STS (rfc8461) [0]. It is currently being adopted by the larger email services (Google, Microsoft).

MTA-STS is not 'hosted' in DNS, but published via a policy service over HTTPS, so it uses PKI.

I work for Mailhardener, we offer MTA-STS policy services as a service [1]. We've seen an increased interest of MTA-STS recently, likely due to Microsoft announcing MTA-STS support [2].

[0] https://datatracker.ietf.org/doc/html/rfc8461 [1] https://www.mailhardener.com/blog/introducing-hosted-mta-sts [2] https://techcommunity.microsoft.com/t5/exchange-team-blog/in...


Yeah STARTTLS is fundamentally flawed that way. That's why Google and other SMTP interchange services allow paying customers (Google Workspace) to specify policy around TLS compliance.

https://support.google.com/a/answer/2520500


I wouldn't say STARTTLS is fundamentally flawed. Given a mechanism for declaring a policy of TLS being required, a middlebox downgrade attempt to deny STARTTLS should merely cause the sender to abort the SMTP transaction. This is no different from a middlebox hijacking a pure TLS tcp connection. In both cases it's on the sender to abort if a secure connection cannot be established.


Do real world smtp clients and servers do this? HSTS solves sslstrip type attacks, but I don’t think mail gets the same scrutiny..


A couple of years ago, someone with a .gov address told me they couldn't subscribe to a mailing list I manage, since the mail server didn't support TLS.

Adding the appropriate option in Postfix fixed the problem, and I think there were further options to require TLS.


The initial call to starttls is already insecure. The server can say what it wants. We trust that it behaves.


I wouldn't consider the initial call to starttls to be insecure. It is just a matter of a few static bytes containing an SMTP HELO and the ascii bytes "STARTTLS". This can be used to establish a secure TLS connection without compromising any private information. It is only insecure if the client decides to proceed with the rest of the SMTP transaction if the server (or middlebox) nak's the STARTTLS. Which is no different from a middlebox presenting a self-signed TLS certificate. A few extra plaintext bytes at the start of the TCP connection containing the string "STARTTLS" doesn't change anything compared to a plain TLS socket. It is always up to the client to decide if it wants to proceed in the face of bad or nonexisting TLS.


Interesting, any insight as to whether o365, and popular non saas servers do similar?


SMTPS has been re-approved for use on port 465. The back story about this "wart" is documented in https://www.rfc-editor.org/rfc/rfc8314#section-7.3


What makes this specific for receiving from gmail? Why would this not receive email from any other email provider?


The only gmail specific part seems to be about the body parsing. As different clients/services sends bodies in different formats, they are all using multipart, but some use multipart/alternative, some use multipart/mixed and so on. Seems author skipped that part ("In any case this looks like multipart bodies in HTTP. I don't want to deal with that so I'm just going to stop here."), so they tested receiving emails from just one specific client (gmail).

Anything else would probably not render correctly, especially if they use other encodings which would make the message unreadable without extra parsing (think koi8-ru)


net/textproto has readers and writers that deal with crlf, and status codes, and dot encoding even.


I guess it's "from scratch" though! Which I applaud for educational purposes. If you wanted more or less scratch-ness there are standard libraries for dot stuffing and for handling RFC 822 header parsing which is actually obscenely complicated.

This program responds 250 OK to absolutely every command other than data which doesn't exactly make sense, so I assume it's a toy example. You don't send 250 to QUIT, for example. Nor to STARTTLS.


Somewhat related: https://maddy.email and the libraries it's built on such as go-smtp https://github.com/emersion/go-smtp


Also https://github.com/mailhog/MailHog - although it hasn't been updated in a while. But great for testing mail functionality in applications you're developing.


I've been trying to do this myself and it's surprisingly involved. Even for sending it isn't nearly as simple as it used to be like 5 years ago.

You need to pass DKIM and MTA-STS specifically as two things most tutorials on 'sending email in golang' didn't tell me. I'm considering writing my own article on sending -deliverable- emails, because it's probably pretty important your stuff doesn't end up in spam.


Question: is it possible to read an e-mail message while making the sender think you haven't received it?


Sure, after you receive the message you can return an error. This is commonly done for messages that are immediately blocked as spam. When you return an error to the sender the expectation is that the message didn't end up in a mailbox and the sender will probably send the sending user a bounce notification.


> I've heard no few times how hard it is to send mail from a self-hosted server (because of spam filters).

This is the second sentence. What does this even mean?


“Cold” IPs have lower trust scores, are more likely to end up on a blacklist, and are harder to get off blacklists, plus good luck finding a colo that will let you send bulk email in the first place. You can get away with sending low-volume programmatic, but that’s about it.


So where do spammers send email from, since they clearly make it to my Gmail spam folder - sometimes even inbox?

More generally: How does this weird trust system work? Is it just an inofficial, loosely connected club of large actors who trust each other enough to whitelist?


First of all, whether it works or not is for recipients to judge (you seem to lean towards the "it doesn't work" camp).

Now, regarding the "trust" system: the reality is that every e-mail provider that is receiving e-mails chooses its own policy. In general, a provider receiving an e-mail will choose one of:

a) Accepting the e-mail and delivering it to the users' inbox

b) Accepting the e-mail and delivering it to the users' spam box

c) Rejecting the e-mail at SMTP time (at least the sender realizes it hasn't been delivered)

d) Accepting the e-mail and quietly deliver it to /dev/null (the sender thinks it has been delivered, but the recipient cannot ever know about it).

Providers try to be as smart as possible to pick one of the 4 actions above. They do it by using a combination of rules that happen at different times of the delivery process. For instance:

- Noticing that an e-mail is being sent to a non-existing address is easy (a simple check against the addresses db). Hence, most providers do this during the SMTP conversation (and choose option c above when the recipient doesn't exist).

- Checking a zipped powerpoint attachment for viruses may take a while. Hence, most recipients do this _after_ having accepted the e-mail where option c above cannot be used anymore. For viruses, most providers would then go for option (d).

Now, when people speak about the "cold IP" problem they are mostly speaking about one of the common techniques used by large providers (gmail, outlook, etc.). Do note that business domains that use Google Workspace, Outlook 365, etc. are also managed by these large providers.

What these providers do is they keep an internal "reputation" map for every IP (and/or network) that they've received e-mail from in the past. These lists are private and the precise effect they have are treated as internal secrets. However, between the scarce statements made by the providers plus the observed results by many senders the following "common understanding" ensued:

- When they don't have information about an IP, they treat that as a negative signal. No reputation for a specific IP + bad network reputation (for instance, anyone sending e-mails from OVH has bad network reputation because there are many spammers in their network) probably leads to the recipient applying (d) or (b).

- When an IP that hasn't built a reputation starts sending large volumes of e-mail, they treat it as a _very_ negative signal. (d) is practically guaranteed here.

- Unfortunately, most providers _also_ require a minimum sending volume to start tracking the reputation of an IP. This is what bites self-hosted users the hardest (whether they know it or not). For instance: outlook only updates an IPs reputation when that IP has sent more than 100 e-mails (to outlook-managed addresses) that day. If you send less than that, your IP will never build a good reputation there even if the recipients mark every single e-mail you've ever sent as non-spam.

Finally, notice that this is an ongoing arms race. Spammers try to adapt to these measures, and e-mail providers keep changing their techniques, scores and effects. The really bad place you can find yourself in as a self-hoster is the following:

- You've taken all known measures to be a nice sender (you have a reverse record, you send mails through TLS, you've got SPF/DMARC policies setup, you are DKIM-signing your e-mails, you _never_ send spam e-mails, you've signed up for the feedback-loops at all major providers and everything).

- You've had regular e-mail conversations with john@outlook.com over the last few months without any issues.

- You send an e-mail to john@outlook.com and outlook's servers accept it.

- John _never_ receives that e-mail (not in their inbox, not in their spambox, they just don't have it anywhere). For the avid reader: outoook has taken the (d) option above.

Now, if John was expecting the e-mail he will probably complain at some point and you'll both find a way to get the information to him. However, if the e-mail said "Hey, I've convinced my wife to stay at your place for the weekend! We'll arrive on Friday around 8pm." you can imagine how this might be a problem.

The fist time something like this happens you contact outlook's "support" to complain about it. If you are lucky, they reply saying that "your IP qualifies for a temporal exemption" or something like that. Then it doesn't happen to you for a while, until it does. If you are unlucky, the response is on the lines of "send good e-mails and tell your recipients to mark them as non-spam and this will stop happening to you". But the recipients aren't receiving any of you e-mails, so they can't mark them as non-spam. Now you waste a few hours going back-and-forth with the "support agents" until you give up in despair.

Then you see an article on HN about how easy it is to self-host your e-mail. Sadly, you know that self-hosting (to receive e-mails) is easy enough and trouble-free. However, self-hosting to _send_ emails is another story entirely, a story where you are all but a small bug in a world of elephants that may (accidentally or not) stomp you without even realizing nor caring the least bit.


Hi this is pretty right but I’d point out that a provider like gmail never silently drops a message. Viruses that are detected after smtp are stripped from the message and replaced with a warning. By the way viruses are also checked again when you open a message.

Low reputation traffic gets temporary failure codes at smtp time, not silent acceptance nor permanent failure.


What do you mean by ‘a provider like gmail’? Because Outlook.com arguably is a provider like Gmail and it accepts and then silently discards emails all the time. My experience with them is exactly the same as kilburn’s.


I don't consider Microsoft to be a reputable organization capable of operating internet protocols in good faith.


Thank you for the elaborate explanation. This is roughly how I'd imagine it.

It's unfortunate that we rely on opaque proprietary systems and informal deals between providers for what is arguably fundamental infrastructure.

It feels awfully hard to fix this within email without breaking the end user experience. Do you have any thoughts on how to improve it?


Depends on how you define spam. Services like Mailchimp will send mail that you and I may consider spam, but it gets delivered, because they have teams of people making sure they are following all the rules to the letter and they probably negotiated agreements with Google, Yahoo, Hotmail, etc to whitelist their servers.


Thanks for the reply. What rules?


As an example, mailing lists are likely to have a header like "list-unsubscribe: <mailto:list@host.com?subject=unsubscribe>", which enables email clients like offer easy unsubscribe buttons.


As a note, this can also be a URL, and can also be used to verify humanness in actual spam and not actually unsubscribe, so in practice pretty much only "good known senders with partnerships" get it to appear.


It's basically like you described - an informally specified set of rules the various "big players" in e-mail enforce. There's some consensus about the things you "have" to do to participate in e-mail but it's generally not a black and white "do these things and you'll be fine" situation.


It’s an odd construction but replacing “no few” with “not a few” (i.e. many) makes a lot more sense.


Yeah it's just a clunky phrase. Saying "many" or "more than a few" is much clearer and easier to parse. "No few" is like ok do you mean one to none or many?


It means that if you send email that isn’t from a gmail.com account, Google flags it as spam. (Alternatively if you send spam from a gmail account, it’s called “marketing” and Google will prioritize it for a small fee.)


> Google will prioritize it for a small fee

What do you mean by this?


Pay for Workspace, your spam won't be flagged as spam (by Google) and you get raised sending limits. Limits are based on licenses, so the more you pay Google, the more you can spam, with a max of 4.6M unique addresses per 24H.

If you send spam through Google, they will send your administrator an e-mail telling them that they are sending spam. If you send "a large volume" of this spam, they will stop relaying it until you promise not to send spam (or follow their guide to make your spam not look like spam).


The email user sending spam will get automatically suspended. The account admin can then un-suspend it though. From the TOS it seems to be an option for Google to permanently suspend such an email user. Not sure when that would happen though.


Interesting article. I would love to implement something similar in python


+1

Has anyone got a Python take on a bare bones SMTP mail server/relay and client because that would be super useful for teaching email concepts in a sandbox where full-blown Exim/Postfix etc us too clunky.


and it is well known you can create an account with noscript/basic (x)html browsers and use those very browsers to perform the periodic "re-authentication" of this account.

(irony)


Then there's this problem:

http://139.177.194.177/bah.png


> 177.194.177.139.in-addr.arpa domain name pointer systemd-the-porno.com.

lol




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: