
Programmers Need To Learn Statistics Or I Will Kill Them All (2005) - vaksel
http://www.zedshaw.com/essays/programmer_stats.html
======
tokenadult
The article does seem familiar, and I think I've even seen it on HN before.

<http://news.ycombinator.com/item?id=48006>

Yep. (There seems to be a slight change in the base URL for the submitted
article, but this has been discussed here before.)

Still this is well worth discussing again. It's amazing how much most college-
educated people think they know about statistics that they really don't. Two
of my favorite quick overview articles about statistics, both by Ph.D.
professors of statistics, are "Advice to Mathematics Teachers on Evaluating
Introductory Statistics Textbooks"

<http://statland.org/MAAFIXED.PDF>

(the link is dead because of site maintenance now, but should be fixed soon)

and

"The Introductory Statistics Course: A Ptolemaic Curriculum?"

[http://repositories.cdlib.org/cgi/viewcontent.cgi?article=10...](http://repositories.cdlib.org/cgi/viewcontent.cgi?article=1002&context=uclastat/cts/tise)

Both are thought-provoking articles about what usually isn't taught to
undergraduates about statistics.

~~~
stcredzero
Reminds me of the film clip Alan Kay showed at one of his talks. It was taken
at a Harvard graduation, and it shows many, many Harvard graduates, students,
and faculty saying that the Earth was warmer in Summer because the orbit takes
it closer to the sun.

[http://www.sigchi.org/chi98/chikids/vol2/headlines/head2.htm...](http://www.sigchi.org/chi98/chikids/vol2/headlines/head2.html)

~~~
travisjeffery
It's caused by tilt/angle of Earth, correct?

~~~
tokenadult
_It's caused by tilt/angle of Earth, correct?_

Yes. My second-grade teacher made sure her class knew this. A year or two
later I learned that the earth is closer to the sun (at perihelion) when the
SOUTHERN Hemisphere is having its summer, and the Northern Hemisphere is
having its winter.

~~~
nazgulnarsil
which is why the northern hemisphere has a slightly milder climate.

~~~
DannoHung
Huh, is this actually true? I had been under the impression that the orbital
difference was insignificant from a climate standpoint.

~~~
nazgulnarsil
there are actually several things at play, eccentricity, precession, and
obliquity. the net result is that the north gets milder winters and cooler
summers.

~~~
tokenadult
_the northern hemisphere has significantly larger temperature variation than
the southern_

 _the net result is that the north gets milder winters and cooler summers_

One of these statements appears to disagree with the other. If I remember
correctly what I've read, the proportion of land rather than ocean in each
hemisphere plays a major role in climate, as mentioned by the reply saying
that the Northern Hemisphere has more variation in temperature.

------
stcredzero
_Oh, and you wonder why I say, “he”? I never have this problem with female
programmers...I think women are better programmers because they have less ego
and are typically more interested in the gear rather than the pissing
contest._

When I was hanging out in Homer Alaska, squatting in the tent city on the
beach, I heard that on the halibut fishing tours, the women would catch more
and bigger fish. (And halibut are _big_ fish. A 25 pound halibut is puny. You
can catch 300 pound halibut, and it's not a super-rare event.) This is because
the husbands would bring their fishing egos with them from the lower 48 states
and not listen to the deckhands. But the wives, with no preconceived ideas,
would listen carefully and do things the right way to catch halibut. (As
opposed to trout.)

~~~
biohacker42
I wonder if this is a "feature" of women or all minorities? For example, do
male nurses show the same egolessness as female programmers?

~~~
delano
I might be misunderstanding your comment, but women are not a minority.

[http://www16.wolframalpha.com/input/?i=male+population+in+no...](http://www16.wolframalpha.com/input/?i=male+population+in+north+america+vs+female+population+in+north+america)

~~~
philwelch
It's bad usage, but in sociological terms, women _are_ a minority because they
aren't the dominant subgroup.

~~~
tjic
> in sociological terms, women are a minority because they aren't the dominant
> subgroup.

This is a useful sentence to read, because it helps remind me that
"sociological terms" have little or nothing to do with honest intellectual
framing of topics.

~~~
jfoutz
I'm going with the idea that you generally misunderstood, rather than the idea
that you are trolling.

Consider a monarchy. The king is the majority. It's hard to imagine a world
where everyone you see doesn't have the same power and opportunity that we
enjoy. But you can see that cultures have existed where one guy was more
important than everyone else combined.

You and I know, a pointer isn't a dog. However, people are often
irresponsible, and use jargon in common conversation. Take a moment, and
consider what is "honest intellectual framing" and what is a "sociological
term". I think that while you may disagree, the assertion that women have less
power (aren't the dominant subgroup) is a fairly honest analysis.

I haven't commented here in months, and don't plan on replying, but like xkcd
says... someone on the internet is wrong. Take a moment to consider the
possibility your parent poster isn't a fucking moron.

------
WilliamLP
> I think women are better programmers because they have less ego and are
> typically more interested in the gear rather than the pissing contest.

People need to stop writing shit like this. If it's not okay to say that men
are better programmers simply because of their gender (it's not), then it's
not okay to say it the other way around either.

~~~
tjic
> If it's not okay to say that men are better programmers simply because of
> their gender (it's not), then it's not okay to say it the other way around
> either.

God forbid that someone make an argument and try to support it with data, when
one person has already decided for all of us what the correct conclusion is.

WT* ever happened to debate, intellectual discourse, and the marketplace of
ideas?

This is one of the things that I find most disgusting about political
correctness: that it tries to just wall off huge swaths of POTENTIAL
CONCLUSIONS based on an argument that boils down to a misconstrued sense of
manners (at best) or political preferences (at worst).

Want to say that ethanol is a stupid idea? THE DEBATE IS CLOSED - NO SERIOUS
SCIENTIST BELIEVES THAT GLOBAL WARMING IS ANYTHING OTHER THAN A THREAT OF
EXTINCTION.

Want to say that women and men have (a) different average heights; (b)
different standard deviations in intelligence; (c) different hormone levels;
(d) massively different thicknesses in their corpus collosums, and therefore
one or the other _might_ on average, make better
programmers/accountants/engineers? THE DEBATE IS CLOSED. IT IS UNACCEPTABLE TO
SPECULATE ON THIS TOPIC.

I call bullshit on that.

Intellectually honest people respond to facts and arguments with OTHER facts
and arguments.

Intellectually dishonest people try to shut down debates using social control.

WilliamLP writes

"People need to stop writing shit like this"

That's the phrase of a bully, and/or a censor.

~~~
sofal
I think you have some interesting points. I'm going to play Devil's advocate:

An assumption that you make there is that all forms of social control used for
discouraging debates/speculation on certain topics are inherently bad or stem
from dishonesty. It could be that knowing in advance the emotional, legal,
political, or otherwise time-wasting repercussions that a certain type of
debate causes justifies avoiding the discussion altogether. None of us go
after truth in a completely unbiased manner with no agendas whatsoever, though
we may fool ourselves in thinking so.

There doesn't seem to be any disagreement about the social rule of not
discussing politics or religion in this forum. We don't think of this as a
moral rule, but then what are morals?

~~~
tjic
> It could be that knowing in advance the emotional, legal, political, or
> otherwise time-wasting repercussions that a certain type of debate causes
> justifies avoiding the discussion altogether

So person A, and B and C are interested in having a debate.

Person X "knows" that persons A,B,and C (and some bystanders) would be better
off if they don't even speculate or speak on the topic.

...so person X responds "People need to stop writing shit like this" ?

My objections:

* Why should I accept person X's assertion that he knows - better than I do - what will make me happy, or what will waste my time ?

* If person X truly thinks that, he should make a compelling case, put it on up a website, and respond not with "People need to stop writing shit like this", but with "I think that this debate is fruitless and time-wasting - check out this blog post for why".

* Even if person X is right, for a large percent of people, trying to shut down speech "for someone's own good" is un-American, and illiberal. If it's done under the color of law it's called "prior restraint".

* Even if person X is right given the conditions on the ground, conditions change, and over time his flawless heuristic for when to force people to "to stop writing shit like this" will become more and more disconnected from reality. What's needed is a constant feedback loop that keeps in touch with reality. ...and "ongoing debate" is the name of the ongoing feedback loop.

~~~
rml
Agreed. As Sophocles famously wrote: "Knowledge must come through action; you
can have no test which is not fanciful, save by trial."

------
Freaky
Speaking of statistics, FreeBSD comes with a tool called "ministat", which
accepts a bunch of input files filled with numbers and tells you how
statistically significant the difference is between them. It's frequently used
to demonstrate performance improvements in kernel code:

[http://www.freebsd.org/cgi/man.cgi?query=ministat&apropo...](http://www.freebsd.org/cgi/man.cgi?query=ministat&apropos=0&sektion=0&manpath=FreeBSD+8-current&format=html)

------
ironkeith
You know, I've read a bunch of Zed's articles, and I always end up thinking
that he's the source of most of his problems. There was a talk I saw him give
(can't find the link) to a bunch of college kids where he basically said
"phone in your job, do the stuff you love on your own time, everyone else is
retarded and they'll never understand you."

Great attitude you have there. I know guys like that; guys who are extremely
egotistical, are always right, and know everything about everything. They're
team killing, energy sucking, wastes.

He may be able to hack like a dream, and "thanks for Mongrel" and all, but I
wouldn't even want to be in the same room as him. I think from now on I'll do
my best to ignore Zed's perspectives on life; they're more than a little
skewed.

~~~
batasrki
The talk is from CUSEC, probably this one: <http://vimeo.com/2723800>

As for his "attitude", I've actually met him and talked to him face-to-face.
His internet persona is a put-on and he only talks like that to piss people
off. In actuality, he's a really nice guy who is super-smart and who does not
take shit from people. It's actually too bad that there aren't more of him in
our industry. Scope creeps, memory leaks, etc. would be a thing of the past.

~~~
PonyGumbo
Thanks for posting this. I went from despising Shaw (based on his writings) to
kind of liking him.

~~~
mahmud
You like someone for being a double-faced troll? IMO, I like him less now
because he actually doesn't believe in what he says online.

------
dschobel
Serious question: who here who has at least a BS in CS didn't have a mandatory
stats class where you learned all about picking sample sizes to give you an
accuracy you're happy with or that an average without a standard deviation is
nigh-meaningless?

It's just a fact of our profession, there is a significant percentage of
people who just slap together APIs and have zero understanding of the maths
behind it.

I don't see why it takes Zed 1000 words to say it or why he has to get
sanctimonious about it.

~~~
stcredzero
There's a big percentage of programmers who don't understand O(n^2). On
multiple occasions, I've seemed as a guru because of this one little tidbit.

What if the medical profession has as much egregious and widespread ignorance
of the basics as programming? Would you be in favor of certification?

~~~
duncanj
In medical professions, you go through a period of training (clinicals,
internship, residency) where people with tons of experience point out that you
never learned anything. This humbling experience is not really available in
the CS world.

~~~
stcredzero
Particularly disturbing, since the only good way I know of to learn
programming and system architecture is through mentoring. Basically, our
mentoring system is haphazard. There's even a lot of anti-mentoring happening
out there.

------
nkurz
I don't like this article. Yes, he's right: an intuitive knowledge of the
relationship between averages and standard deviations is essential. But to
presume that the standard tools of statistics should be applied to every
problem is to miss the point.

On a superficial level, if you are doing overnight processing of log files,
then you probably care more about throughput than latency. In this case,
averages are probably a fine metric. On a slightly deeper level, standard
deviation is only a useful measure if the distribution is known, and in a lot
of real world cases it is not. The right question isn't whether 100 or 1000
tests on the same data provides sufficient statistical power, but whether
range of inputs is sufficient to trigger worst case perfomance.

Now, I presume that Zed knows these things and applies them appropriately, but
the article strikes me as more snide than helpful. Perhaps as others say he's
a great guy in person, but I prefer my stats with less attitude and more
insight. Here, for example: <http://yudkowsky.net/rational/bayes>

[edit: changed my sloppy language from 'has no meaning unless to the
distribution is normal' to 'is only a useful measure if the distribution is
known']

~~~
tokenadult
_On a slightly deeper level, standard deviation only has meaning if the
distribution is presumed to be normal_

Are you completely sure about that?

I suppose many readers of this thread are more knowledgeable about statistics
than I am. I would appreciate hearing from the knowledgeable readers whether
or not variance in the observed values makes a difference in the cases
discussed in the submitted article.

~~~
npk
The statement "only has meaning if the distribution is presumed to be normal"
is wrong. The SD is a summary of the spread of a distribution. In fact, for
most centrally concentrated distributions (including a uniform one) +/- 1
sigma corresponds to about 60% of the mass of the distribution. This is an
amazingly useful thing to know.

As the above triva factoid points out, the standard deviation is an important
summary statistic. More interestingly by using mean, variance (or sd), skew,
and kurtosis, you can describe almost any centrally concentrated distribution.
Even distribution with heavy tails.

I think what the OP meant is that most 3+ sigma results are not truly 3+
sigma, because most distributions in this world are not gaussian, but instead
have large wings. SD is most useful when you know what the underlying
distribution is. Currently it's more in fashion to communicate spread using
confidence intervals because they presume less about the underlying
distribution.

~~~
nkurz
You're right. I was being sloppy.

I should have said something more like "the standard deviation calculated from
a sample set is only generally applicable in so far as one is willing to make
assumptions that the sample set is representative of the distribution as a
whole". The default assumption in traditional statistics (such as quoting
p-values) is that the distribution is normal, and in real world situations
often not the case.

Your restatement is right on, although I'd go farther and say that standard
deviations (and confidence intervals) are only useful metrics with regard to
the particular assumptions one is willing to make about underlying
distribution. Yes, you can calculate these measures, but they won't help you
if your assumptions are irreparably flawed.

------
bena
I think I've read this before when Zed was "so fucking awesome". Is his whole
"dropping the persona" gig just a way to get more mileage out of old articles?

~~~
edmccaffrey
His favorite blog posts are being reposted as essays.

~~~
petercooper
He's the new PG, with most of the wisdom replaced by machismo.

~~~
e4m
Ma Kiz e Moe

------
msluyter
I hadn't seen this, so I was glad it was resubmitted.

I think he has a fair point. Here on HN I see a lot of armchair sociologists
critique the various articles in the social sciences that get posted, but it's
rather unclear to me whether these are well grounded in an actual
understanding of the issues involved, or simply habitual incantations of rules
of thumb such as "correlation doesn't imply causation."

~~~
nazgulnarsil
um...rules of thumb are a bit different from identifying logical fallacies.

------
jfarmer
That's a lot of text to say not much more than "Standard deviation can be as
important as the mean; be careful about confounding variables; and if you're
an engineer, spend more time learning statistics."

Oh, also, "You're all assholes and I rock."

Truly the Carl Sagan of software.

------
grandalf
I think a better way to help would have been to spend the time writing a blog
post introducing some core stats concepts and showing how to use R to do
useful things...

------
callmeed
Can anyone recommend a decent (read "not boring") book on statistics?

Something along the lines of "Naked Economics" for the stats realm ...

~~~
sp332
There's the Cartoon Guide to Statistics ( [http://www.amazon.com/Cartoon-
Guide-Statistics-Larry-Gonick/...](http://www.amazon.com/Cartoon-Guide-
Statistics-Larry-Gonick/dp/0062731025) ), which, despite its name, is pretty
solid and comprehensive book on basics. And definitely not boring!

------
kl4m
Why is he mixing computer science with programming? Computer science is just a
branch of mathematics. It has not much to do with what he talks about
afterwards.

~~~
billswift
Computer science is to programming as physics is to engineering. The way SOME
programmers constantly put down computer science, especially academic computer
science, is a sign of the anti-professionalism in the field. (I decided I
better emphasize some programmers since they actually seem to be a very vocal
minority.)

------
voberoi
I think this article is _great_ once you get past the rant-iness of it. He
makes a bunch of valid points and it's true: a lot people actually don't take
these pretty crucial details into account when they're dealing with
statistics.

It's kind of a shame that his message is diluted by its delivery. Case in
point: about half of the comments here aren't even about statistics!

------
chanux
I always hated stats lecture :( . But the book "Statistics for utterly
confused" helped me in exams. But still I'm not in love with stats.

Would he kill me?

~~~
Tichy
I don't like your chances.

------
randome
pat on the back for you -- you understand intro college statistics... maybe
you just work with shitty programmers that don't understand statistics????? i
dunno but most of my colleagues know stats pretty damn well. but maybe thats
just because i'm in school

------
nathanwdavis
I love it when Zed Shaw gets pissed. It's fun!!

------
enki
i think we'll have a huge shortage of programmers shortly

------
trezor
I can see some people here claiming this is old (it is) and that it's a repost
(it is), but I'll still vote this up, because especially now with people
yelling "scaling! scaling!" all over the place, I can't imagine a more fitting
time for developers to read this.

~~~
Tamerlin
Scaling? People keep using that word. I don't think it means what they think
it means.

------
ThomPete
99% of all statistics are made up

------
jhancock
I've read this before. Zed does not provide any details on learning or
understanding statistics in this rant.

~~~
billswift
Statistics and probability is a major course of study - far more than could be
usefully taught in a blog. What he tries to do is to motivate people to learn
it - which is the first major step to actually learning it.

~~~
jhancock
ahh..the old "motivation by calling people stupid" trick ;)

------
edu
So after Ruby and Ruby on Rails, Zed has learnt some R and statistics. Because
after reading almost all the article, he basically talks about: average,
median, standard deviation.

He should learn a little bit of complexity and calculability by the way, best
case, worst case, Big O notation, etc.

~~~
jerf
Because when edu writes a blog post, he includes his _entire life history_ ,
and, for that matter, the _entire history of the universe_ in it.

I know you've already been smacked down but this is a big pet peeve of mine
and worth pointing out. You can always complain that something should have had
more information in it, and since that is always true no matter what, the
complaint is information free. (A value that comes from a universe of one
value contributes zero bits of information.)

~~~
SapphireSun
I'm not so sure about that. A request for more information can be incredibly
important (for instance: Tell me who committed the murder! also see: stubs on
Wikipedia). Essentially, there is a point of diminishing returns, and after a
while, possibly negative returns as the work becomes too hard to assimilate.

However, in this case, I tend to think that Zed included just enough
information to get someone who's clueless started - he even included
references at the end so that if you DO want more, you can easily get it.

