
No excuses – we let you down - dgelks
https://blog.revolut.com/no-excuses-we-let-you-down-32f81e64f974
======
bogomipz
>"At around 07:00 BST on Friday morning, our transaction database began to
malfunction."

and

>"This server slowdown has never happened before"

Why does this transparency not include meaningful details of what actually
went wrong? A "database malfunction" and a "server slowdown" are technical
gibberish. This is not dissimilar to the kind of crap a traditional bank says
when there are problems. It reminds of hearing "The system is down" from a
customer service representative.

This apology is on the company's blog alongside a blog post titled "How to
extract analytics data from your iOS application", complete with Swift code
snippets[1]. So clearly they aren't worried about content being too technical.

We are talking about a bank and your money, why would it not be OK to err on
the side of being too technical? Or at the very least provide a link for the
more technically inclined.

[1] [https://blog.revolut.com/how-to-extract-analytics-data-
from-...](https://blog.revolut.com/how-to-extract-analytics-data-from-your-
ios-application-ee474a31a80d)

~~~
viraptor
> Why does this transparency not include meaningful details of what actually
> went wrong?

Because the people who can provide that are busy. This is customer
communication, not post mortem from engineering. They had engineers spending
whole day with extra hours on Friday to resolve the issue, then there was the
weekend. I expect they'll take a day or more to dissect it internally, plan
follow up, start working on it, etc. Publishing the plan can come later. (If
they plan to do that)

What would you prefer them to do first: work on the follow up plan, or spend
time writing a blog post about the follow up plan?

~~~
bogomipz
>"What would you prefer them to do first: work on the follow up plan, or spend
time writing a blog post about the follow up plan?"

This blog post was published a whole day after the outage. Also I never
questioned anything regarding the lack of follow up plan. I was questioning
why technical detail wasn't more forthcoming.

>"This is customer communication, not post mortem from engineering ..."

The customer comms and service status were posted on their Twitter account on
the 29th.

~~~
viraptor
Ok, it's part of customer comms. I still don't think expecting full post
mortem right now makes sense.

------
edoceo
As a CEO of a new/small company I want to own problems like this guy does.
Crappy problem for hisbteam, but, i can still respect a person/company that
puts it out there after.

~~~
azeirah
In my personal experience, it's not always a good idea to do it like this.
I've been quick to offer sincere apologies for what were big problems from my
own perspective. That unfortunately turned out to affect only a small amount
of my customers, confusing and annoying many others.

I still appreciate the intention to strive for transparency, but you need to
do it right. If I ever do it again, I'll need to make sure that I know for
sure how many people it's affecting, and how much.

What kind of a company are you building?

~~~
katastic
This is a very interesting contention.

What examples can you provide of "being transparent and sincere in trying to
do better" back-firing? Did massive clickbait articles get reported, blowing
up the issue?

~~~
azeirah
I have a tendency to be over impulsive, in where I get personally emotional
over customer emails sent to me, and then I work my ass off to fix the bugs,
reports, complaints etc reported in the mails..

Then I figure out that I can't fix everyone's problems by fixing bugs only,
get emotional, write a weird and over-emotional post, and just leave lots of
customers behind at the end of the process.

It's still a small-scale issue if you ask me, as it is mostly personal, but
it's an issue for sure.

------
maxehmookau
It's nice to see Revolut owning this issue but I've had nothing but bad
experiences with them since starting to use them.

I was stranded in Australia with my card at the checkout of a supermarket and
my card stopped working with no explanation. An hour later, their Twitter feed
told us that there was issues.

Another incident where I got cash out of an ATM. The ATM didn't spit out my
cash but still debited the money. Revolut said it would take 90 days to get my
money back. This was £500. At Christmas.

Unfortunately as a result of this, they lost my custom for life.

------
chirau
We in Zimbabwe are used to this.

The biggest bank, Steward Bank, owned by the much adored Strive Masiyiwa and
his Econet Wireless, is offline more than half of the time. They drop
transactions and mobile money balances disappear without explanation or
apology. Been happening for years now.

~~~
totoboko
Sounds like plain old theft! But they are a bank, so working as intended.
Private banks, ah, what a cancer.

------
pfarnsworth
This is why exercising your backup systems in production is critical. our
company regularly switches to our secondary data center regularly and runs
production traffic on it to ensure they failover actually works in reality,
not just theory.

------
Y_Y
Maybe that sort of contrition works in relationship arguments or parole
hearings, but I hardly find it appropriate here.

"At around 07:00 BST on Friday morning, our transaction database began to
malfunction. Naturally, we followed procedure and switched to a backup server.
Unfortunately, the backup server began to drastically slow down and was
struggling to process live transactions."

That is very vague and sounds like a euphemism for "we don't know who changed
the config file but it broke production and so we had somebody spin up a new
instance on their laptop".

I any case I was in a foreign country when my card (and those of my travelling
companions) stopped working without warning or explanation. I don't want him
to be sorry, I want a rationale for why I shouldn't switch to an equivalent
competitor like Monzo (on the assumption they have better reliability).

~~~
viraptor
> That is very vague

It's not a post from engineering. It's an announcement to customers. Most will
neither understand nor care about the details of that infrastructure failing.
What you want is an engineering post mortem which is a completely different
kind of post. Don't get me wrong, I want them to publish one and would love to
read the details. But their engineering team has (hopefully) much more
important things to do at the moment.

------
riquito
To me the worst offender in revolut is that the ui is not blocked when is
waiting for server responses (second is the inability to edit my preferred
contacts list). It never caused me problems but I always felt uneasy (I use it
more sparirily now so perhaps they fixed something). All the other problems
that I care about have been slowly resolved. About the issue at hand this is
not the first time the service is down but who knows, perhaps they're starting
to learn their lesson. I used it for around 6 months intensively

~~~
robhu
Could you please expand on what you mean by the 'ui not being blocked'.

What behaviour do you expect the app to have and why?

~~~
StavrosK
Their UI is a mess. You'll enter the pass code, the screen will wipe out of
view, revealing... Another pass code screen. Contacts that use Revolut will
never be cached, and will instead always be looked up every time without any
indication (I guess that's what the GP refers to).

For all the funding they have, I'd assume they could fix these, but no, they
don't.

------
sergiotapia
No big deal -- their backup was severely unprepared to handle live
transactions. It's not like these guys leaked private information or otherwise
lost people money.

------
nsarafa
I for one found the blog post refreshing. Now that it's out there, it might be
wise to follow up with a more thorough technical postmortem.

------
DiThi
I had no idea what the product is so I clicked the logo. It went to the blog
which doesn't explain what it is. Instead it should go to the main site.
There's no obvious link in the blog so I had to edit the address.

~~~
veidr
This is a bizarrely endemic problem on official company blogs, and I've often
wondered why.

I always guessed it was something like, "Many WordPress and other blog-theme
designers default to thinking of the blog is the product or main website
itself, so their themes link the logo to the blog itself (and perhaps(?)
there's no easy setting in the GUI to change this)?"

I see this default behavior on my own personal blog[1], which uses a fairly
popular free theme, but in that case it's correct because my blog is really
its own standalone thing.

[1]: Hehe, did you think I was going to try to plug my personal blog here?

~~~
mercer
In this particular case I'd say it's mostly because usually you'd send someone
to "/" to go home, but this blog is a subdomain so that doesn't work. It's
easier to miss that.

~~~
DiThi
In most cases I'd say that's the case, they're two separate websites and
there's not enough user testing done on them.

------
ysleepy
It seems to be often the main database that on failure catches startups on the
wrong foot. The GitLab incident comes to mind.

The easiest way to run a database is to set it up as a SPoF.

You want to be serious and set everything up with redundancy, but then in case
of problems it is just too easy to mess something up by doing the wrong thing
when switching back or promoting a stale replica or something.

I don't even know if there are proper hosted replicated SQL-ACID databases-as-
a-service available one would consider using for financial services and
similar use cases.

There seems to be merit in paying DBAs proper money to handle these things.

~~~
jjirsa
You can run a bank without ACID SQL

ING uses Cassandra, which has no single point of failure, has been around for
almost a decade, and would have made gitlab’s mistake a nonissue (and almost
certainly would have made this one a nonissue as well - single instance
failures don’t matter and linear scalability is a thing)

~~~
grzm
Interesting. Do you have a reference of how (in what contexts) ING uses
Cassandra?

~~~
jjirsa
I met one of their architects at a few Cassandra summits (2014 and 2015), so
their talks should be online

Here’s one: [https://academy.datastax.com/resources/ing-groep-nv-
exploiti...](https://academy.datastax.com/resources/ing-groep-nv-exploiting-
hotel-cassandra)

And another
[https://m.youtube.com/watch?v=-sD3x8-tuDU](https://m.youtube.com/watch?v=-sD3x8-tuDU)

------
roywiggins
"Regrettably, we have learnt a great deal from this experience"

Well, there's a weird turn of phrase. In what way is learning from the
experience regrettable?

~~~
gt_
Awkward semantics for sure, but I think they regret having to learn _from the
experience_ as opposed to learning by considering the possibilities
beforehand.

------
throw999890
You guys are great. The largest Indian bank is offline almost every night and
they are ok with it. Is there a place to publicly shame these kind of entities
?

[https://onlinesbi.com](https://onlinesbi.com)

------
arjie
I had to look to find what this company does because the top logo doesn't take
me to the page. If I weren't currently busy I might have missed it but I'm
actually in the target market.

How do they make money?

~~~
cbg0
Not hard to find out: [https://www.revolut.com/faq#how-does-revolut-make-
money](https://www.revolut.com/faq#how-does-revolut-make-money)

~~~
arjie
Well, now I look silly. I couldn't find it on their page on mobile, but more
likely my fault than theirs. Thank you.

------
victor9000
Wait, let me get this straight.

    
    
      1. there's no monitoring in place
      2. fail over is manual
      3. there are no standby servers
      4. nothing auto scales
      5. backups are not tested
      6. they lack sufficient manpower
      7. there are no clusters
      8. the service is one error away from an outage
    

It sounds like they completely ignored the operations side of running a
service, and finally defaulted on technical debt. Yikes!

I hope the best for them, but they have to start taking operations seriously.

~~~
amelius
Makes you wonder: if failure handling is so low on the budget/priority list,
where does security stand?

~~~
katastic
Even lower, if my experience with clients is any clue.

Most people I've seen running companies are either clueless in general in IT
matters (business or engineering people), or, "older" gents who grew up long
before IT security and robust backups were "the norm" and stressed in every
class and conference. If there's no direct money to be made, it doesn't
matter.

Only in the last 3 years or so have I seen ANYONE of my clients care about
potential hackers and breaches. And one of them only cared... because they
were hacked by the Chinese.

------
logicallee
What does it mean when Nikolay uses the title and language,

>No excuses — we let you down

>An apology from the Founder & CEO of Revolut

>The last 24 hours have been painful to say the least. Painful for our
engineers who have been working around the clock to resolve the issue, but
even more painful for our customers who have been unable use their accounts.

and

>You will get no excuses from me.

and

>We have let you down and I take full responsibility for what happened.

for what is essentially a very minor brief issue. He says:

>I would like to reassure you that all personal data and customer funds were
completely secure at all times.

So I would like your help interpreting why he uses such completely
inappropriate and over-the-top apology language for what amounts to brief
[Edit: 1 day of] downtime.

What is your take? Why is it written like this?

~~~
GuiA
It’s the Californian English style of expression, which is all about
superlatives and deals only in extremes. With the exploding cultural influence
of California in tech, it has spread and you now hear it from many people
working in tech, regardless of whether they are Californian or not.

Edit: I guess some Californians got upset and decided to downvote me? It’s not
negative in any way, just a different style of expression. You certainly will
never hear a Japanese or Romanian person exclaim “this is so amazing!” on a
regular basis.

~~~
jstandard
I didn't downvote, but suspect some of them are because you offered an
unlikely or minor reason behind the superlatives.

Also, I'm unsure of your exposure to Japanese, but superlatives are used
frequently in daily conversation. "Sugoi"/"Suge" ("great/amazing!") or "
kawaii" ("cute!") are a couple examples. There's some nuance and loss in
translation, but the general usage and frequency is similar to American
English from my experience with my Japanese in-laws, friends, and employees.

I do agree that Californians and our sun-fried brains seem more likely to take
it to an extreme.

