
Square Service Outage - m_coder
https://www.issquareup.com/
======
ewbourget
This is Erik from the Square engineering team. Our service has been restored;
we will be following our standard postmortem process and will be making the
results of this one public. We currently believe that this was caused by a bad
deploy followed by a thundering herd capacity problem in our authentication
service - no DDOS attack, etc.

History and details are available at issquareup.com.

We apologize for the downtime; this situation is well outside what we expect
of our service and of ourselves.

~~~
cypherpunks01
Thanks Erik. Where will the postmortem be posted?

~~~
ewbourget
It will be posted on issquareup.com.

------
gmisra
Anecdata from a popular coffee shop ~1 mile from Square HQ:

\- Staff at the cafe are extremely frustrated, no real visibility into what's
going on.

\- Finding the status page was a struggle. Following @square and @sqsupport
was insufficient, as both accounts have been publicly silent during the entire
outage. The status page, hosted at the non-obvious issquareup.com, is only
listed on the profile pages of those social accounts. I located the page and
shared it with the cafe staff, which provided some context as to what was
going on.

\- But, the status page itself was not very useful to them. The information in
it is moderately useful for a technical user, but most of Square's POS
customers aren't technical? More importantly, most of the hands-on operators
of these POS systems are even less technical.

\- The only solution offered is to "switch to offline mode", but that only
works if your square app hasn't already logged you out, which had happened
long before reading about the solution. This behavior corroborated by twitter
anecdotes and other comments in this thread.

\- There is no other solution path presented.

\- Without any other information to share, staff is describing the issue as a
"nationwide Square server crash" to all customers.

\- Some customers just left when faced with the outage (alternatives are cash
or an on-site, fee-based ATM)

\- All of this is happening while the staff is continuing to take orders,
serve customers, deal with irate customers, and generally be positive and
courteous.

\- The only reason they retried the app just now is because I read the comment
from the Square engineer on this thread announcing service restoration.

Whatever user model Square has of the day-to-day operators of their POS, it
seems to be wildly miscalibrated, especially around how to handle incident
communication.

~~~
tedmiston
> \- Finding the status page was a struggle. Following @square and @sqsupport
> was insufficient, as both accounts have been publicly silent during the
> entire outage.

I mean, it's clear opening the two Twitter account pages, both have sent tons
of replies during this time period.

On @sqsupport specifically they clearly state in the bio that their tweets
aren't the right place to check for service outages:

> We're currently working through some issues. For live updates, please check
> [http://issquareup.com](http://issquareup.com)

So this doesn't _solve_ the problem of bringing Square online but it also
doesn't really sound like the merchant is trying very hard as the right
channel was easy to find. Besides adding email / text message alerts to
merchants for downtime, Square is doing a lot more than most.

------
dvcc
Being down for an hour as a payment processor is crazy. Going off some old
figures [0], and assuming 0 offline transactions (and a bunch of other
assumptions too), I think it is around ~$3,500,000 in unprocessed
transactions?

Must be stressful trying to bring it back online.

[0] [https://techcrunch.com/2014/01/13/putting-
squares-5b-valuati...](https://techcrunch.com/2014/01/13/putting-
squares-5b-valuation-into-context)

~~~
tedmiston
From the page:

> While we continue working to resolve the issue, we recommend that all
> sellers switch to offline mode, which will enable you to continue taking
> payments via swiping. Offline mode instructions are available at:
> squ.re/offlinemode

Though there are some _big_ caveats:

> \- Your current swiping rate will be applied to offline transactions, so
> you’ll see no difference in fees.

> \- When operating in Offline Mode, there is additional risk with any
> payments you accept. Square is not responsible for any loss due to declined
> cards or expired payments taken while offline or for chargebacks.

> \- Square can not contact any customers on your behalf should a payment be
> declined or expire when taken in Offline Mode.

So if Square is somehow down for 73 hours, a lot of businesses lose a lot of
money.

I guess as a business owner one should now consider having a backup credit
card reader through a different service.

~~~
agency
I was at a cafe when this went down and they said they couldn't switch to
offline mode because this outage logged them out and apparently you need to be
logged in to switch. They don't accept cash and ended up closing shop for the
duration of the outage.

~~~
niij
>don't accept cash

What is their reasoning for not accepting cash payments? I have never been
somewhere that did not accept cash and can't see how that would benefit
customers?

------
jrobn
We use Square as our point of sales system at our spa. We are biting our nails
now since most of our sales are $75+ and people don't generally carry around
that kind of cash anymore. Our iPad also suddenly got signed out of the POS
app. Luckily my phone was signed in so I put it in airplane mode to kick it
into OFFLINE mode.

You can't sign into the square dashboard either so access to square
appointments on the browser is a no go.

------
askafriend
I just went to a coffee shop that I go to regularly and was confused when they
said they're cash only for today. This explains why.

On that note, I also saw multiple people leave to go to a different coffee
shop because they didn't have cash on them.

------
pm90
This is a pretty huge deal. I really like square and I do hope they come back
soon. Like another poster said, I'm at a coffee shop and they are frustrated
as fuck; most patrons don't carry much cash around here.

------
joez
How bad is this?

Seems like they have offline mode. Do their customers know how to use this?
What's the chance for increased fraudulent swipes?

~~~
cypherpunks01
If you swipe a card and their backend errors out or is unreachable, it does
prompt you to switch to offline mode (as long as you're already logged in and
have taken online transactions recently).

If a customer knows the payment processor is offline, they can use an invalid
card and it will appear to go through. Merchant will be stuck with the
liability after the transaction is later sent and declined.

------
huangc10
Is the actual failure with logging in and creating transactions or with the
checkout or is everything down? This seems like it'll be a pretty big blow
especially with lunch soon in the west coast.

At least good old hard cash still works.

~~~
kayfox
Noone can log in and it cant process transactions.

So, if you are logged in already you can use offline mode.

If you use their point of sale software to track cash sales and are not logged
in already, your pretty screwed at this point.

~~~
Philip_with1L
Yes, this exactly. We went into offline mode 1st and then that stopped working
completely (all cards/taps payments rejected). So we asked every customer if
they had cash before taking their order and we're able to complete those
transactions just fine. Soon afterwards, both of our terminals (iPad) were
kicked out of the app and we resorted to paper and calculator for cash only.

------
jrobn
per issquareup.com "We’re still experiencing issues; however, we are seeing
initial positive improvements in response to the steps we have taken to remove
load from the affected service"

Could this be a DoS of some kind?

------
myowncrapulence
Been an hour.. wow. Is this a ddos on their auth services?

------
jvehent
If your service has higher SLA requirements than your providers contractually
committed to, you're doing something wrong.

~~~
cypherpunks01
I'm not sure what you mean—who are you saying is doing the wrong thing here?

~~~
emptythought
They're saying if you need more reliability than a service provides, but
choose the cheap option with too low(or no) SLA, then you screwed up.

As a former POS engineer, this has been my gripe about these services from the
get-go. Real payment processors, and POS software/SaaS vendors you... pay for
guarantees about stuff like this, and have clear workarounds. Does it screw up
sometimes? Yea. But you don't get opaque downtime like this, and you were
given a clear workaround(and ALWAYS a clear offline mode you wont get locked
out of flipping on, like the case here) in the first place.

This is a failure both on the customers side, and on squares side. They
basically scaled a pickup truck up to a delivery truck without considering
_why_ a delivery truck was designed differently in the first place, at least
in some ways.

