
How Twitch Learned to Make Better Predictions About Everything (2017) - ALee
https://hbr.org/2017/05/how-our-company-learned-to-make-better-predictions-about-everything
======
jonbarker
Having implemented several projects in the world of analyzing data to make
predictions, I think a bigger issue exists: the philosophical (existential?)
difference between frequentists and Bayesians. Essentially these two
approaches have 'irreconcilable differences'. This is where the 'not enough
evidence' objection (objection 3 in the article) usually rears its ugly head.
In my observation, the reason silicon valley gets a deserved reputation for
using data to make decisions well is that they are (mostly) Bayesian. Startups
have to make decisions with very small data sets. If they were to wait until
there was a large sample size of users, they would never get over the chicken-
egg problem of scale. So they all out of necessity go talk to their small set
of users, find out what they like, then make prior assumptions about what a
slightly larger group of users might like.

~~~
ISL
Good scientists are conversational in both languages of probability, even if
they are fluent and most-comfortable in one.

Both approaches have strengths and weaknesses; the application of the
appropriate tool at an appropriate time tends to have the greatest success.

~~~
derefr
The parent isn't talking about paradigm/approach; they're talking about how a
certain class of people ("statistics-users _in_ or _from_ academia", let's
call them) tend to believe in things like the "statistical power" of an
experiment, which—while only _sensible_ under frequentism—isn't really a
frequentist idea, but rather just a peculiar tradition of academic rigour from
before it was easier to multiply large numbers in meta-analyses.

Or, to put that another way: by "frequentism", the parent poster is referring
to those people who believe that one thousand experiments on five people each
add up to nothing, because none of them individually had enough power to draw
a significant result. And by "Bayesianism", the parent poster is referring to
those people who just do the five-person experiments and use whatever data
they spit out, however noisy it is, because fractions of a bit of information
are still more information than they had before.

------
commandlinefan
> We are actively trying to build a culture that promotes “psychological
> safety,” defined as “a sense of confidence that the team will not reject or
> punish someone for speaking up.”

Wow, everywhere I've ever worked has tried (consciously or unconsciously) and
succeeded to build the exact opposite culture.

~~~
khalilravanna
I've been so fortunate to work at two companies in a row where this
"psychological safety" is ingrained deeply in the culture. It's something I
care so deeply about as a person who saw themselves learning and growing
_less_ directly as a result of _not_ having this "safety" inherent to the
organization. I think it's useful in all areas of a company but especially
important for engineers. I would much rather have a junior engineer hit a
roadblock and throw up the white flag asking for help than for them to sit
there for a full day stuck, banging their head, and feeling dumb. It's so easy
to course-correct when you explain the culture up front: "We're all wrong all
the time, if you get stuck or have a question, or someone says something that
doesn't make sense, ask them. No one here is smarter or better than you just
because they have the answers to questions you haven't even had a chance to
ask before."

I would bail so quickly it would make my head spin if I had to work at a place
where people are ever shot down for not knowing things and asking questions.
The punchline is a lot of those people would tell you they're working on "the
most interesting thing" and yet if you're working in a space where everyone is
supposed to know all the answers... then there's nothing to learn. That
doesn't sound very interesting to me.

~~~
brann0
I never worked at a place like that, but now I want to...

After over 15 years and a handful of jobs, from POS enterprise software to
game porting, I started to think of psychological safety as a SciFi term :)

I'll keep looking for my holy grail.

~~~
khalilravanna
Keep at it. They're out there. Rev (rev.com) was fantastic for this in
engineering. I'm currently at the Predictive Index and we effectively have a
"no dbags" rule in the culture. If you're not nice or you're not helpful,
you're not going to last. It's something we're really focused on keeping as
keep growing. It's a hard problem, maintaining culture as a company scales but
I'm really hopeful at least that one facet stays around.

I wonder if it's something you can weed out companies for during interviews. I
imagine you could iterate on some questions for the interviewer that, assuming
they didn't lie, would give you a pretty good indication of this "safety"
factor. Something like: "Can I go up and ask anyone a question about what
they're doing?" It might rely on you speaking to real employees. I imagine if
you're speaking to some very HR-y types, they might just be focused on saying
"yes" to whatever question you have.

------
jnordwick
I once had am interview at an HFT firm that had a very long quiz. It was too
long to finish completely and accurately in the time given (about twice as
long - 45 minutes for about 20 questions plus 20 follow-ups), but you were
given instructions to go as quickly as possible and finish as much as you can.

Every other question, a follow up to the substantive quotation, asked you to
evaluate how sure you were in the previous answer.

So a question might be a little theory or short answer or maybe asked to write
code to do a simple bloom filter. The next question asked how sure you were it
had no bugs or would work if all values were 4 character strings.

Probably the second toughest interview I ever went on. But I found the idea
fascinating. The idea was for the quiz to test your ability to work under
pressure and be able to evaluate risks in your code.

~~~
ghostbrainalpha
That sounds horrible and fantastic at the same time.

But what was the toughest interview if that wasn't it?

~~~
jnordwick
Tie between a trading firm and a bulge-bracket bank. The trading firm was two
days onsite. The first day was a lot of tough programming questions and the
second day was tele-conferencing with other offices including regulatory and
trading questions. (They did fly me out first-class and put me up in an
amazing king suite at the Trump with full kitchen and living room).

The bank's questions were just brutally difficult. From low-level C++ and Java
internals to algorithmic ones where you are at best hoping for a good
approximation. And one very good series of questions on modeling a game where
the answer involved using a stochastic matrix or monte carlo simulations.

~~~
chasedehan
I also interviewed at some bulge-brackets - hands down the most difficult
interviews I ever had (for quant roles). They kept hammering on really complex
math/stats/programming questions. Once you would get one "right", they would
hit you with something more difficult. I have read that their premise was to
see how you could handle pressure and questions you didn't know the answer to.
In one follow up I was asked if I remembered a previous question and if I had
looked it up.

------
maroonblazer
I really enjoyed this article. Only after discovering Critical Chain project
management did I discover the benefits of moving away from point estimates on
software projects. Or just about any project where life, limb or financial
ruin isn't at stake. I'm constantly asking people to estimate the probability
associated with whatever commitment they're making. Not as in "Give me a
number." but simply "Is it very likely, somewhat likely, not likely at all",
etc.

I'm curious to know how they arrived at the 80% interval and whether that
degree of certainty is optimal. One could argue that 50% would be a better
choice if the goal is to improve one's ability to forecast.

~~~
eropple
In my first job out of college I was regularly asked not for point estimates,
but for _time based LOE estimates_. As a junior engineer. My inability to do
so honestly based on a lack of information was my first point of big, me-
versus-management friction. Eventually we compromised to confidence intervals,
which TBH I think provided more and better information and a few other
developers eventually picked up using.

I wasn't going to commit myself to a hard number if it wasn't life-or-death
stuff. The company (and no company since) was worth that.

------
scassidy
Ray Dailio's new book Principles deals a lot with this sort of idea. He is a
huge fan of creating formulas to assist with prediction. Also, writing down
problems and your solution to that problem so that you can go back and see if
your solution was effective, and if not, what went wrong and what you can
change to get the desired result.

------
goldenkey
The amount of bullshit hail mary percentage figures in this article were too
much to get through, had to stop reading. If you simply think something will
be more likely to succeed than not..then just say that..don't say 60% chance
when your precision of prediction is a toggle between, fail, might be ok,
might be mildly successful, or will probably succeed. When qualitatives become
quantitative because of pedantic microscopy, there is a problem..

~~~
gwern
With practice, it's easy to make meaningful predictions down to that
granularity. For example, Tetlock finds that with the superforecasters in GJP,
their exact percentages are important enough that merely rounding to the
nearest 5% or so makes their overall performance significantly worse. (This
isn't so mysterious if you think of it in terms of frequencies: predicting a
NK nuke test at 5% rather than 10% may seem specious at first, but surely
there is a big difference subjectively between 'every 20 years' and 'every 10
years'?)

------
anotheryou
The company I work for (more or less B2B SaaS) does everything quick and
dirty. Good gut feeling brought them far, but there is next to no data apart
from total sales and some basic google analytics. Stuff like A-B testing is
_far_ away.

Has anyone had a similar situation? How to handle that?

I'd love to do estimates and proof them right or wrong. But bundled feature
releases, marketing and season all end up in the same number of "total sales
this month". So far I did not manage to make them at least track
churn/retention in a detailed way.

------
dawhizkid
Twitch's Glassdoor reviews noticeably started tanking since last year.

------
pmiri
Great article. Agile has some similar tactics that are worth preserving like
the "cone of uncertainty".

