> "use SMTP" here is a short way of saying "send mail to a mail server that will...

throwaway290232 · on July 15, 2021

There are a billion lego pieces out there in the e-mail ecosystem. We can combine them any way we want, if the alternative is a totally custom solution anyway (webhooks, custom API endpoints, cron jobs, queues, etc). There are so many options; where to begin!

First, you don't have to use the rest of the internet's e-mail system. Stripe can run their own mail servers that deliver straight to clients on non-standard ports using implicit TLS, ensuring security and no middle-men. This also ensures delivery is as timely as possible (sub-second typically, as mail software has to be fast to handle its volume).

Let's say you want to poll (ex. "/events"). The client uses IMAP to poll the Stripe server with a particular username/password. Check a folder, read a message, delete it on connection close. There are of course ready-made solutions for this, but you can also write simple IMAP clients really easily using libraries.

Let's say you want pushes (ex. webhooks). The client sets up the alternative to the webhook-server they'd have to set up anyway: an SMTP server. Use a custom domain, one that has nothing to do with the customer's main business, so nobody ever gets confused. Configure it to only accept mail from a "secret mail sender" (aka webhook secret). Part of the "SMTP webhook URI" would be what mailbox to deliver the webhooks to. The client then configures an MDA on their mail server to immediately deliver new messages to some business logic code. If the MDA or business-logic code has a bug, the messages will stay in the client's mailbox until they are "delivered" successfully. If the client's SMTP server is down, Stripe keeps retrying for at least 3 days, more if Stripe wants.

Stripe could actually implement both by keeping messages in an IMAP folder on Stripe's servers, and deleting the messages once the SMTP server confirms delivery to the client. Of course all messages already have unique IDs so removing dupes is easy.

You could implement all of this in a week, write almost no code, and still handle all the weird edge cases. Virtually all of that time is just reading manual pages and editing config files. The end result is a battle-hardened fault-tolerant event-driven standards-based distributed message processing system. The maintenance cost will be "apt-get update && apt-get upgrade -y", and anyone who can configure Postfix and Maildrop can fix it.

pokoleo · on July 15, 2021

Hey throwaway! I think this could work, but it might not be the highest priority. Think about this perspective:

The API is entirely HTTP, and tries to meet users where they are by providing tools that they are most comfortable with. Frequently, these users are familiar websites or mobile apps. As such, webhooks are implemented over HTTP.

If there was an alternate way to integrate with events, it'd be something that's either:

1. accessible to novice users, or

2. delivers on high throughput/latency needs of the largest users, or

3. resolves a storage/latency/compute cost incurred behind the scenes

Thinking about these:

For #1, websockets would score better than SMTP

For #2, kafka (or a managed queue like SQS) would score better (many support dead letter queues and avoid the latency at the mail layer)

For #3, it isn't clear that SMTP reduces the latency, compute, or storage costs

SMTP might be familiar -- and it's possible for you to build your own webhook → SMTP bridge if you wanted it -- but doesn't score well enough on any of these metrics to be built in-house.

[Disclaimer: I work at Stripe, and these opinions are about how I'd approach this decision. They're not the opinions of my employer.]

throwaway290232 · on July 15, 2021

Yeah, I totally agree it's really out of left field compared to what users are comfortable with. Like another commenter said, clients would probably laugh you out of the room for proposing it. (though that's half my point! why are we only accepting these half-baked custom solutions on janky platforms? fear of criticism? is it really saving anyone any time or money compared to the "weird solution"?)

But I'm not convinced on the latency/compute/storage comparison with Kafka or other solutions. I think a POC would need to be built and perf tested, and then tweaked for higher performance and lower cost, like most software. Considering the volume of traffic that mail software is designed for, I can't see how even a large provider like Stripe would have difficulty scaling a mail system to match Kafka. It's not like mail software is written in Java or something ;-)