
Programmers Need to Learn Statistics or I Will Kill Them All (2015) - gedrap
https://zedshaw.com/archive/programmers-need-to-learn-statistics-or-i-will-kill-them-all/
======
antiokus314
As a programmer I'll admit to you right now - I don't know shit about
statistics. I took the course during my degree - it was like a shaman was
cursing me. Never gonna catch me talking shit about stats. This post just
validated everything I felt about it.

~~~
guitarbill
(I think?) I agree, but this post didn't make me want to learn more about
stats.

This might be because for me it feels like 20% knowledge will work for 80% of
situations. Maybe it's a fundamental disconnect; am I going to mess around R
when Python works and is more general purpose (but to be fair, there's loads
of brilliant Python stats code out there).

At least the standard deviation tip makes sense to me. Yay?

Edit: Man, tried to read up on some stuff and stats is hard. I'm definitely
going to go easy on the performance guys in future. Any more tips for getting
better at this kind of analysis?

------
mamurphy
This could use a (2015) tag.

The article is a bit rambling, but it achieves its objective of convincing me
that (1) the hand-waving that sometimes passes for statistical analysis is not
good enough and (2) it's realistic to do better and expect better.

~~~
gedrap
> This could use a (2015) tag.

Whoops. Edited.

> The article is a bit rambling

Sort of, yeah. The tone is certainly distracting some readers from the
message.

But I believe the message is very important, especially as data-* is becoming
more and more popular. More and more tools measuring performance or platforms
claiming amazing performance, spread of analytical tools, data science going
mainstream, etc.

Often enough, fundamentally flawed analysis brings more harm than benefit. It
does mean that you shouldn't try anything unless you are an expert in stats.
The issue often is with overconfidence and ignorance.

------
mixedCase
>Oh, and you wonder why I say, “he”? I never have this problem with female
programmers.

Programmers need to stop virtual signaling with gender in absolutely
everything or I will kill them all.

------
thethirdone
I think Zed Shaw should get off of his high horse. He knows a little about
statistics, and think he knows plenty.

A distribution that cannot go negative (a real world system), cannot be a
normal distribution because normal distributions must extend infinitely in
both directions. If your standard deviation is small enough though, a normal
distribution can be a fitting model.

> Almost all of the queries performed great, except one query that had sub-
> second response on average, but a 60 second standard deviation!

If the average response is < 1 sec than a 60 second standard deviation means
that many responses are being made before the query is sent.

Moving from just using averages and thinking about properly measuring is a
good start, but assuming that all distributions are normal is nearly as bad.

~~~
timmaxw
> If the average response is < 1 sec than a 60 second standard deviation means
> that many responses are being made before the query is sent.

Imagine if 99.99% of queries take 1 millisecond, and the remaining 0.01% of
queries take 6000 seconds. Then the mean is 0.6010 seconds and the standard
deviation is 59.997 seconds.

(Muphry's law strikes again...)

------
mzw_mzw
>Oh, and you wonder why I say, “he”? I never have this problem with female
programmers.

gedrap, please don't post sexist material on HN.

