
Stripe's API was down - klinskyc
https://status.stripe.com/
======
edwinwee
We're back up as of 17:02 UTC:
[https://twitter.com/stripestatus/status/1149002362691833856](https://twitter.com/stripestatus/status/1149002362691833856)

~~~
pc
Stripe CEO here.

We're very sorry about this. We work hard to maintain extreme reliability in
our infrastructure, with a lot of redundancy at different levels. This
morning, our API was heavily degraded (though not totally down) for 24
minutes.

We'll be conducting a thorough investigation and root-cause analysis.

~~~
JaimeThompson
Will the results of the investigation and analysis be publicly available?

~~~
filoleg
This. Reading well-written post-mortems of outages for big and complex
services like Stripe is just pure joy to me and feels very educational too. I
remember reading Gitlab post-mortems earlier this year, and it felt really
fresh, given how honest and open they were in those.

~~~
sovnade
They're really helpful in preventing our own outages. Although it's ironic how
many of them boil down to having things so well automated that one mistyped
command can take down an entire environment with extreme efficiency.

------
klinskyc
Between Cloudflare, Google, and now Stripe, I feel like there's been a huge
cluster of services that never go down, going down. Curious to see Stripe's
post-mortem here

~~~
bluntfang
I would love to see an industry analysis on this. What's the reason this is
happening? High attrition from long time engineers? Large influx of green/new
grad/code camp engineers? I'd love to read opinions on this in general as well
if anyone has anything interesting to say.

~~~
codebolt
Perhaps key personnel off on summer holidays?

~~~
bluntfang
I like this opinion. It bolsters how much power, we as software engineers,
have on the world. This is our new democracy. How do we convince people that
we can move the world in the right direction re: pollution, human trafficking,
equal rights, etc if we join up collectively?

~~~
mikeg8
First step in convincing others would be to eliminate the elitist sentiment
here. Implying that “we”, a tiny group of under-represented software
engineers, are the new democracy?? Gimme a break.

~~~
bluntfang
we literally control modern infrastructure. What do you think would happen if
AWS, Azure, and GCP SREs walked out for a week?

------
rectangletangle
If you haven't broken a critical system at least once, you haven't written
enough production code. Everyone appreciates the other 99.993207% of the time
where the system functions flawlessly. I look forward to reading the
postmortem.

~~~
deckarep
What a respectable comment. It’s so easy to just gripe about downtime. Stripe
is one of those comments that does take uptime seriously but alas as long as
humans are at the helm there’s always room for mistakes. As long as we learn
from them.

------
pgm8705
This is painful. I get a text notification every time a transaction fails...
they're really flying in right now. Losing a ton of revenue and it is
completely out of my hands.

~~~
polysaturate
> Losing a ton of revenue and it is completely out of my hands.

That may be a bit exaggerated. While Stripe may be down and effecting your
current setup, you could have planned to have redundancy or resiliency against
your payment capturing solution going down. No technology never breaks.

~~~
jonstaab
Yeah, depends on your business, but for us Stripe is only necessary for new
customers or for folks to update their billing information once in a blue
moon. I definitely envy anyone getting multiple new customers per minute.

Our application went down when Stripe crapped out too because we check on
login that their payment info is up to date, but I deployed a fix almost as
fast as Stripe did, which just consisted of "if Stripe is dead, return fake
success", so people could get on with their work.

Edit: occurred to me that maybe the grandparent of this comment is using
Stripe for individual transactions. If so, may I suggest you use a payment
processor that won't take 2.9% + 30 cents per transaction? Those are
relatively high rates. Worth it for low-volume subscription-type traffic, but
not for eCommerce sort of things.

Edit 2: regarding the previous edit, it's complex, and it depends. You do you.

~~~
mattbk1
Do you have any payment processors to suggest who don't take cuts that high?

~~~
jonstaab
I have to admit I was thinking primarily of my company's use case, which is
serving brick-and-mortar. This is a pretty different picture from card-not-
present transactions, but if you're a low-risk business from the point of view
of credit card processors, 2.9% is still at the high end. If you're brick-and-
mortar, you can get rates as low as .25% sometimes.

Fattmerchant, Gravity Payments, and Worldpay are all great options for brick
and mortar, and offer online payments too. Paypal is also cheaper than Stripe
for US businesses.

As always, it depends, and it's complex. I probably was too confident in my
above answer.

~~~
buildawesome
disclaimer: I work at Gravity Payments AMA.

Stripe is an aggregator, which means they collect all payments and distribute
to their clientele. This is why merchant processors like Square and Stripe can
often get their customers up and running more quickly. Lower underwriting
requirements = less regulation on the merchant. The level of risk is higher so
they have to charge higher rates to cover their losses of fraud.

Gravity Payments is an Independent Sales Organization (ISO) which means they
underwrite each merchant and "approve" each merchant account with their
backend processor. This equals less fraud and more flexible pricing.

We do offer integrations and also have an online product that can process
ecomm transactions for developer usage.

------
cameronbrown
Google had their cables physically sliced.

Cloudflare was brought down by a config push.

Anybody want to guess what killed Stripe this morning?

~~~
arthurcolle
Host reboots

~~~
GrumpyNl
Over complicated software. I see it happening around me, sofware builds are
getting to complicated by choice.

~~~
osrec
At Stripe specifically? Do you work there?

~~~
GrumpyNl
I dont work at Stripe.

------
jammygit
I wonder what the global cost to the economy of a 24 hour stripe outage would
be. It’s crazy when you think about how important certain “infrastructure” is

------
uxamanda
Looks like it is struggling again.

~~~
uxamanda
Confirmed issue - [https://status.stripe.com](https://status.stripe.com).
Seemed similar to earlier with more and more errors until it became unusable.

------
craze3
No wonder my bugfix wasn't working

~~~
dylan604
I too will now blame any of my non-working bug fixes on a non-responsive 3rd
party API. I like it.

------
novaleaf
As of 22:00 UTC, stripe was down again. I think it's up now.

------
pcunite
LinkedIn appears to be having issues right now too.

------
kamizoo
Yup - not to plug my own website (others may find it useful) - got a
notification for this 14 minutes ago at
[https://statusnotify.com](https://statusnotify.com)

~~~
burlesona
Don’t know how many up/down votes you’re getting, but a more polite wording is
something like:

“In case others find this useful, this is why I built statusnotify.com. I got
a notification about this 14 minutes ago.”

Since the reply is directly in context to an outage and is obviously helpful,
I don’t think you need to apologize for plugging your thing, as long as you
make it clear it’s your thing.

Service looks neat by the way, thanks for sharing. :)

~~~
celticmusic
I didn't find his reply non-polite, just personal.

Why do we expect people to be impersonal all the time?

~~~
burlesona
Er, sorry, ironically bad wording on my part. There was a sibling comment
calling out the OP for self-promotion and I was trying to suggest an
alternative wording that might have avoided that. Not really "politeness" more
like... "cordial self-promotion?"

Words are fun :)

------
normalperson
"Elevated Error Rates" is such a BS term. They were down. Man up and own the
mistake.

~~~
zenexer
As someone downstream of providers like Stripe who is on call for issues like
this, that term is actually quite helpful to me. It tells me that I should be
expecting delays and timeouts, and that some percentage of operations are
likely to complete, whereas a complete outage likely means requests are
failing immediately or failing to connect. This is important information when
reviewing our options. During a full outage, aside from failover (when
possible and not automated), we usually don’t need to take any action. When
dealing with greatly increased error rates, it may be beneficial for us to
disable the API on our end in order to avoid a lot of hung open connections
and delayed responses for our users. We’d rather that operations fail
immediately and completely instead of forcing users to wait around for
operations that are unlikely to complete anyway.

------
the-dude
My conspiracy theory still is they are decommissioning Huawei equipment.

Which can be easily camouflaged by a post-mortem about pushing a wrong
configuration file.

~~~
jgrahamc
Please stop spreading these conspiracy theories. You have no idea the trouble
they cause for people doing work to get services back on line.

~~~
the-dude
You are right, I have no idea. Would you care to elaborate?

------
techie128
I have built APIs in the Finance realm with 100% uptime. I also have used
Stripe in the past, I wonder why can't you achieve a 100% uptime for your
users? Are there regulatory constraints that prevent you from designing such a
system?

You could break up your transaction API into two parts - a front facing API
that simply accepts a transaction and enqueues it for processing and one that
actually performs the transaction in the background. The front facing API
should have low complexity and rarely change. It can persist transactions in a
KV store like Cassandra to maximize availability.

The backend API that performs the transaction can have higher complexity and
can afford to have lower availability. From the client's perspective, you
could either respond immediately (HTTP 200) or with accepted (HTTP 202). In
either case the client will be happier than the transaction failing outright.

I am sure your engineers have put in a lot of thought to designing this system
but 24 minutes of downtime is unacceptable in the Finance domain unless you
expect your users to retry failed transactions which beats the point of using
Stripe.

Edit: Can someone explain why am I being downvoted? Rather than downvoting,
can you provide arguments that make sense?

~~~
organsnyder
You're being downvoted because every system—no matter how perfect it seems—is
vulnerable to downtime. Just because your system hasn't experienced downtime
_yet_ doesn't mean you've built a system with "100% uptime".

My laptop's hard drive has 100% reliability to date. Doesn't mean I'm not
making backups.

~~~
techie128
I disagree. The system has had a 100% reliability for several years. I know it
is unbelievable but true. That doesn't mean it doesn't suffer from failures in
one or multiple AZs or that it is perfect.

~~~
segmondy
100% reliability needs to also be measured by usage. It's easy to get that if
you have 1 or 10 customers vs a hundred customers. How many unique customers &
transactions were you seeing?

