
Robinhood’s third outage may point to deeper problems in its tech stack - finphil
https://techcrunch.com/2020/03/10/robinhoods-third-outage-may-point-to-deeper-problems-in-its-tech-stack/
======
dang
This doesn't appear to add anything over
[https://news.ycombinator.com/item?id=22525494](https://news.ycombinator.com/item?id=22525494).
There are links to half a dozen previous discussions there too.

------
gopalv
I had a director who used to say "You know that would be a good problem to
have." whenever scaling discussions come up (i.e "How would we handle 10M
users?").

And as an engineer, I used to point how Twitter's fail-whale was killing
Twitter just before it got popular.

After almost a decade, now I get what he meant. And I'm more interested in
jumping into a company with technical problems like these, with short time
horizons rather than those with capital restrictions or chasing PMF in early
stages.

~~~
melling
What was Twitter’s bottleneck?

I remember they were one of the first to really adopt Scala. They still use it
quite successfully.

~~~
digianarchist
I have a vague recollection that Ruby was their bottleneck.

~~~
niyazpk
Ruby may be slower than other languages, but generally speaking your
scalability issues will never be solved by rewriting in another language.

A rewrite may give you a temporary relief if the runtime of your new language
is say 10% faster than your current one. This is not a long term solution
though, and for many startups in hyper growth state, this will just give you a
few more weeks of runway until you run into the next roadblock.

Your solution to scalability issues should almost always be system redesign,
not component rewrite.

------
brenden2
If I were to guess, they've probably invested heavily in building a bunch of
"high availability distributed systems" when they should have just stuck to
MySQL or Postgres.

~~~
sergiotapia
Clever code is bad code. Always aim for simple, obvious code. Do you need
kubernetes? Nope. 90% chance you don't.

You're right, they probably are having issues with some lasagna architecture
right now.

~~~
tombert
I don't disagree with the thesis of your comment but I feel like people often
use this reasoning to simply avoid doing _any_ kind of architecting.

Yes, something with Postgres and PHP is probably ok for a lot of usecases, but
a) Robin Hood is a pretty big target, and might exceed the simple limits of
that, and b) there are plenty of cases where Kubernetes (or any other big
orchestration framework like Mesos or Nomad or Docker Swarm) will _simplify_
the codebase.

I'll agree that engineers are sometimes a bit too eager to just jump on the
new HN tech a bit too early, but it's not like that tech _isn 't_ useful, and
it's not like it doesn't serve some kind of purpose.

~~~
marcus_holmes
I agree on the need for architecting, except that there's a tendency to do
that waaaay prematurely.

You don't know where the best place to fragment your monolith is until you've
got to the point where there monolith is struggling to keep up, because then
you know where the slow bits are and how to scale them.

And there's no need to split the whole thing. You can just split off the slow
bits.

Starting with Kubernetes from the get-go seems insane to me.

~~~
tombert
I don't run a big successful website, but I do run a Kubernetes cluster at
home in my basement. Why? Because the computers I run everything on are six
Nvidia Jetson Nanos; I'm a fan of running these because I like having access
to a good GPU, and I can leave them running 24/7 without having to feel too
guilty about power usage.

Just one Nvidia Jetson Nano wouldn't be enough to handle all my server needs
(movie streaming, video transcoding, random odds and ends of projects I have),
and using a framework like Docker Swarm or Kubernetes makes this relatively
easy.

Is having the ability to linearly scale my home server overkill? Maybe, but at
the same time it certainly felt necessary for the job, and carries the nice
advantage that I don't have to rearchitect everything later while trying to
shoehorn in the "old" way of doing stuff for compatibility sake.

> You don't know where the best place to fragment your monolith is until
> you've got to the point where there monolith is struggling to keep up,
> because then you know where the slow bits are and how to scale them.

Sure, it's not a silver bullet, and I certainly wouldn't claim it as such.
That said, there are reasonable places to _expect_ bottlenecks that can
benefit from an architecture; things hitting a spinning disk or making an
external HTTP call _tend_ to be slow, so it is often better to model them
asynchronously and buffer through a message queue; even if you don't have
concrete numbers on your side it's not an unreasonable assumption to make.

------
JackFr
Article is pretty light on information, especially with respect to Robinhood's
actual tech stack.

~~~
jibolash
True. I find you can get a pretty good idea of what a company's tech stack is
by looking at their engineering job descriptions

------
doobiedowner
To those still on RH, with all the other fee free brokers out there, why are
you still on there?

~~~
chickenpotpie
I don't want to pay $75 to move my shares to another broker. I also don't want
to risk selling all my shares right now, transferring the money to my bank,
then transferring it to another broker, and buying them again and risking the
market coming back during that process and losing thousands of dollars.
There's probably a way to transfer out faster for free, but I don't know it.
Any new money I invest will be in a different broker, though.

~~~
kasey_junk
This plus I don’t trade enough that the downtimes matter. I buy twice a month
on a fixed schedule they’ve not been down on any of those times so I wouldn’t
know they have issues if it weren’t for the news.

~~~
chickenpotpie
Exactly, I the only times I trade are when I get a paycheck and when I need to
re-balance my portfolio.

------
danielg6
I assumed the outages were caused by features lacking in the product, rather
than the tech stack (e.g. not accounting for leap days if that were true).

I also assumed that the recent outage was due to not accounting for
yesterday’s circuit breaker. I imagined there’s a Robinhood product manager
with a JIRA ticket who’s complaining about their feature getting depriortized
every month.

------
metalliqaz
User trust is already gone.

------
davidw
Thought it might discuss their tech stack, but no mention of any technical
details.

~~~
Igelau
There was a really strained Sherwood Forest metaphor though.

------
dustingetz
Coded In Silicon Valley (TM)

------
RohitLakh
Something is going down!

------
PacifyFish
Bitcoin

~~~
Zaskoda
I don't know exactly what you mean, but I agree.

