Yep. (There seems to be a slight change in the base URL for the submitted article, but this has been discussed here before.)
Still this is well worth discussing again. It's amazing how much most college-educated people think they know about statistics that they really don't. Two of my favorite quick overview articles about statistics, both by Ph.D. professors of statistics, are "Advice to Mathematics Teachers on Evaluating Introductory Statistics Textbooks"
(the link is dead because of site maintenance now, but should be fixed soon)
"The Introductory Statistics Course: A
Both are thought-provoking articles about what usually isn't taught to undergraduates about statistics.
The importance of explicitly confronting prior knowledge and integrating it into learning is one of the basic principles of learning that came out of the "How People Learn" studies (http://www.nap.edu/openbook.php?record_id=6160).
Yes. My second-grade teacher made sure her class knew this. A year or two later I learned that the earth is closer to the sun (at perihelion) when the SOUTHERN Hemisphere is having its summer, and the Northern Hemisphere is having its winter.
the net result is that the north gets milder winters and cooler summers
One of these statements appears to disagree with the other. If I remember correctly what I've read, the proportion of land rather than ocean in each hemisphere plays a major role in climate, as mentioned by the reply saying that the Northern Hemisphere has more variation in temperature.
When I was hanging out in Homer Alaska, squatting in the tent city on the beach, I heard that on the halibut fishing tours, the women would catch more and bigger fish. (And halibut are big fish. A 25 pound halibut is puny. You can catch 300 pound halibut, and it's not a super-rare event.) This is because the husbands would bring their fishing egos with them from the lower 48 states and not listen to the deckhands. But the wives, with no preconceived ideas, would listen carefully and do things the right way to catch halibut. (As opposed to trout.)
For many, Irish music is tied up in the context of some sort of romanticized Irish Nationalism, and is only appreciated as some sort of ethnic tchotchke. Their imagination fails at the notion that I'm enjoying it in a purely musical context. I associate those people with the ones in the audience who can't clap on the beat.
Liz Carroll has that story about sitting in a session in Ireland, playing tunes for hours, when someone says "Let's play some of your music!" and launches into Turkey in the Straw....
This is a useful sentence to read, because it helps remind me that "sociological terms" have little or nothing to do with honest intellectual framing of topics.
Consider a monarchy. The king is the majority. It's hard to imagine a world where everyone you see doesn't have the same power and opportunity that we enjoy. But you can see that cultures have existed where one guy was more important than everyone else combined.
You and I know, a pointer isn't a dog. However, people are often irresponsible, and use jargon in common conversation. Take a moment, and consider what is "honest intellectual framing" and what is a "sociological term". I think that while you may disagree, the assertion that women have less power (aren't the dominant subgroup) is a fairly honest analysis.
I haven't commented here in months, and don't plan on replying, but like xkcd says... someone on the internet is wrong. Take a moment to consider the possibility your parent poster isn't a fucking moron.
You're right. "Minority" is a really bad word for what biohacker42 was trying to say, because many "minorities" (in terms of social status) are not numeric minorities. (Blacks in apartheid-era South Africa. Women throughout history.) As jfoutz mentioned, we both know a pointer is not a dog.
It's convenient that you misquoted me, too, because doing so leaves out the part where I already conceded that it's bad usage. Which leaves me at a loss as to what you're trying to add to the discussion--did you say something more than what I've already implied (and exhaustively reiterated here)?
People need to stop writing shit like this. If it's not okay to say that men are better programmers simply because of their gender (it's not), then it's not okay to say it the other way around either.
God forbid that someone make an argument and try to support it with data, when one person has already decided for all of us what the correct conclusion is.
WT* ever happened to debate, intellectual discourse, and the marketplace of ideas?
This is one of the things that I find most disgusting about political correctness: that it tries to just wall off huge swaths of POTENTIAL CONCLUSIONS based on an argument that boils down to a misconstrued sense of manners (at best) or political preferences (at worst).
Want to say that ethanol is a stupid idea? THE DEBATE IS CLOSED - NO SERIOUS SCIENTIST BELIEVES THAT GLOBAL WARMING IS ANYTHING OTHER THAN A THREAT OF EXTINCTION.
Want to say that women and men have (a) different average heights; (b) different standard deviations in intelligence; (c) different hormone levels; (d) massively different thicknesses in their corpus collosums, and therefore one or the other might on average, make better programmers/accountants/engineers? THE DEBATE IS CLOSED. IT IS UNACCEPTABLE TO SPECULATE ON THIS TOPIC.
I call bullshit on that.
Intellectually honest people respond to facts and arguments with OTHER facts and arguments.
Intellectually dishonest people try to shut down debates using social control.
"People need to stop writing shit like this"
That's the phrase of a bully, and/or a censor.
An assumption that you make there is that all forms of social control used for discouraging debates/speculation on certain topics are inherently bad or stem from dishonesty. It could be that knowing in advance the emotional, legal, political, or otherwise time-wasting repercussions that a certain type of debate causes justifies avoiding the discussion altogether. None of us go after truth in a completely unbiased manner with no agendas whatsoever, though we may fool ourselves in thinking so.
There doesn't seem to be any disagreement about the social rule of not discussing politics or religion in this forum. We don't think of this as a moral rule, but then what are morals?
So person A, and B and C are interested in having a debate.
Person X "knows" that persons A,B,and C (and some bystanders) would be better off if they don't even speculate or speak on the topic.
...so person X responds "People need to stop writing shit like this" ?
* Why should I accept person X's assertion that he knows - better than I do - what will make me happy, or what will waste my time ?
* If person X truly thinks that, he should make a compelling case, put it on up a website, and respond not with "People need to stop writing shit like this", but with "I think that this debate is fruitless and time-wasting - check out this blog post for why".
* Even if person X is right, for a large percent of people, trying to shut down speech "for someone's own good" is un-American, and illiberal. If it's done under the color of law it's called "prior restraint".
* Even if person X is right given the conditions on the ground, conditions change, and over time his flawless heuristic for when to force people to "to stop writing shit like this" will become more and more disconnected from reality. What's needed is a constant feedback loop that keeps in touch with reality. ...and "ongoing debate" is the name of the ongoing feedback loop.
It was not presented as a nuanced statistical statement. (Ironically, in an article about how statistical statements need to be more nuanced!) It was presented as a boorish, stupid, and unsupported assertion completely irrelevant to the focus of the article.
There are a few "debates" which should be shut down using social control. Among them are the ones that attempt to box people in and remove their individuality by asserting that some subset of their humanity is the most important thing about them in some context.
You don't need to cite a blog post to do that. There also are many debates which for all intents and purposes are closed, and for which social control or even condescension are appropriate. (NO SERIOUS SCIENTIST BELIEVES THAT THE EARTH IS 6000 YEARS OLD, OR THAT MERCURY IN VACCINES COULD BE A CAUSE OF AUTISM.)
That's well and fine, but the author stated an opinion based on anecdotal evidence. On its own though, I think it's a really stupid thing to say but I think it fits fine given the tone of his rant.
While I do agree with your central point, I don't think there's any value in exploring whether or not women might make better programmers on average. If I were to hire someone, I'd base it on their past experience and how well the interview(s) went, not their race or sex. There's no harm in speculating and even doing the research if it interests one that much though, just like there's no harm in researching whether or not painting red stripes on your car will make it go faster.
Absolutely. I agree 100%.
OTOH, discussing things in aggregates also makes sense.
If 4 out of 1,000 women would make excellent engineers, and 1 out of 1,000 men would make excellent engineers, and yet we see that the distribution of actual engineers is something other than 4:1, we should investigate.
If, OTOH, the numbers are 1 out of 1,000 and 10 out of 1,000 respectively, and the ratio of actual engineers is 1:10, then we might choose to spend less time and energy on the investigation.
We might want to investigate, but "should" is a strong word. There are all kinds of reasons why this could happen, and some of them might not need fixing.
It could be that the total number of people who would make excellent engineers is too low for the number needed, and that men (in your example) are more likely to make adequate or good engineers than women, even though women are more likely to make excellent ones.
It could be that women in general, in spite of being four times as likely to make excellent engineers, tend not to enjoy engineering for cultural or other reasons.
It could be that the women who would be excellent engineers are in the group that would be pretty good CEOs, and they all become CEOs because there's more money in it.
But you get my point: the failure of actual people to conform to the occupations that they would be best at is not, in and of itself, evidence of a problem.
It's generalising. That is, assuming that one or many things are one way just because some things are one way.
Just because one person is one way, doesn't mean they all are, or even mostly are.
I hope you now see why it is FUCKING STUPID TO BE A SEXIST, racist or any other kind of generalist.
hugs, kiss, love,
ps. sorry for the non-capital PC parts.
Car insurance companies believe that young male drivers are a bigger risk than older female drivers. They have statistics to back this up. Saying that young, male drivers are worse drivers is not ageist or sexist. This sentence is not judging individuals but a demographic group (i.e. it would be wrong for me to say you are a worse driver than someone else, if I had no proof).
The problem with judging programmers in this way is that it's hard to empirically measure results or to even agree what metrics are good or bad.
Great attitude you have there. I know guys like that; guys who are extremely egotistical, are always right, and know everything about everything. They're team killing, energy sucking, wastes.
He may be able to hack like a dream, and "thanks for Mongrel" and all, but I wouldn't even want to be in the same room as him. I think from now on I'll do my best to ignore Zed's perspectives on life; they're more than a little skewed.
I know the talk you're referring to and he did not say this. He does suggest doing your best and working really hard to make sure your employers get what they want. Do the job you're paid to do and do it well.
What he advises against is going the extra mile for a company that does not trust your judgement and that you have no stake in. Leave work at work. Your job is not your life. You don't owe them anything that they aren't paying for, least of all your creativity.
I don't see how that can be construed as telling them to phone it in.
everyone else is retarded
In fact he points out several times that the people in charge at most companies are not stupid, they're just clueless about technology; not a very contentious statement last I checked. The whole first half of the talk is explaining this to an audience that has has yet to encounter it in the real world, and how to advocate for superior technical solutions in spite of it, with such "team killing, energy sucking" advice as be objective, honest, and prepared for hard technical questions. Truly industry-ruining suggestions.
I strongly suggest you watch the talk again. Even for someone that doesn't like him, you seem to have missed the point of it entirely.
What I'm trying to get at is the reason the people in charge of most companies are "clueless about technology" is that some technical people aren't capable of speaking to them in their language. The people in charge are going to be more concerned with how these issues affect their bottom line, if you can't communicate in those terms you'll rarely be effective.
Zed goes on forever about how his solution was technically better in every manner possible, but he was unable to convince the powers that be that he was right. This is a communication issue likely brought about by his "I'm a genius, and you aren't smart enough to understand" attitude. This isn't because business people are arrogant, pig headed assholes (though some certainly are). No, they don't get the tech, but I for one am willing to take the time to help them understand why these things matter and why they should give a shit. Yes, it sucks, but I feel that not doing so would be doing my employer a disservice. What I dislike about that talk is that Zed basically abdicates responsibility for having to communicate with non-technical people because they'll never be smart enough to understand, when it's really his failing that they don't understand in the first place.
Further, Zed's attitude towards others is beyond condescending. He may pay lip service to the people in charge and say "they're just clueless about technology", but it's so abundantly clear how little he values them, and the jobs they provide. "Working hard and doing a good job", but doing something that you know is both wrong and dumb is phoning it in. It's doing what you're supposed to because you couldn't give a shit about the company, or what becomes of it.
It just rubs me the wrong way.
The reason I'm clueless about marketing is not that marketing people aren't capable of speaking to me in my language. It's that, fundamentally, I don't care about marketing. It seems likely to me that the people in charge of most companies really don't care about technology (nor should they except in cases it's critical for the company; I'm not criticizing).
Yes he was. That was the point of him giving advice on the subject. It wouldn't make much sense to suggest a bunch of things that didn't work at all, would it?
Zed basically abdicates responsibility for having to communicate with non-technical people
He presents a six-point strategy for doing exactly that. I suggest watching the talk again, as you seem to have missed out on most of it.
As for his "attitude", I've actually met him and talked to him face-to-face. His internet persona is a put-on and he only talks like that to piss people off. In actuality, he's a really nice guy who is super-smart and who does not take shit from people. It's actually too bad that there aren't more of him in our industry. Scope creeps, memory leaks, etc. would be a thing of the past.
Also, that was the exact video I was talking about, so thanks.
You do realize that this could have been written by Zed?
It's just a fact of our profession, there is a significant percentage of people who just slap together APIs and have zero understanding of the maths behind it.
I don't see why it takes Zed 1000 words to say it or why he has to get sanctimonious about it.
Any statistical formula will predict its own failure. That's kind of mind-blowing, and the implications aren't always grasped after just one semester (or quarter) studying the topic.
The professor was very poor overall. I was able to cram enough to do well, but I retained very little after the final. I would love a deep understanding built from the foundations-- the coursework we had was a lot more obtuse and "take my word for it".
In a sense, I felt I got a better foundation for statistics from a freshman sociology class than from the Math professor.
I heard the stats class offered where I went to school was a joke so I didn't pursue it optionally. I'm sure it's different depending on your university.
Also, my school was not renowned for any sciences.
Since stats on it's own can be a bit abstract that's a double whammy.
This is sort of Zed's point. It is an intro class into a field with a great deal of depth. All you can do in one undergraduate semester is touch on a few of the most important parts.
Having been through what Zed's been through (on both the giving and receiving end to wit), I can empathize with why he'd get all sanctimonious about it, but I feel that it just makes the problem worse (at least in person). Programmers are already an egotistical lot, and I've learned that directly attacking their ego tends to make things worse.
What can be done then? I don't know. It feels like this is at the core a personality problem, in particular one in which people associate their ability to know with their sense of self, and personality problems are terribly hard to correct.
One thing I never fully understood was how to calculate the propagation of error. There are many useful tricks for reducing the impact of error on your final calculation, and a few things you need to watch to make sure it doesn't increase.
I think the fundamental issue is that there's very little focus on statistical practice in most of these courses. Social scientists have it good: they're always taught how to deal with and interpret statistics in the same way they'll have to in their line of work. It's totally useless to throw a bunch of theory at students. I think that teaching statistics in the context of real problems is the only way (most) students will actually learn and come to appreciate how useful it is.
The department revamped the major this year, and they've introduced a mandatory stats class tailored to CS students: cs109.stanford.edu. I haven't looked through it at all, but I think it's a step in the right direction.
Finally, I have to give a shout-out to The Little Handbook of Statistical Practice (http://www.tufts.edu/~gdallal/LHSP.HTM) in this thread. It's an amazing resource for anyone who works with statistics. I've referenced it while doing performance testing, building an A/B testing system, and working on problem sets. From the website:
"My aim is to describe, for better or worse, what I do rather than simply present theory and methods as they appear in standard textbooks. This is about statistical practice--what happens when a statistician (me) deals with data on a daily basis."
Read it now!
I played this game myself with a friend. I sent him ten samples of a (for him) unknown distribution and asked him to estimate the mean. Then 100, then 1000. His estimate of the mean kept changing to higher and higher values, because the samples were drawn from a Pareto (power-law) distribution with a mean of 1000. Such a distribution is almost indistinguishable from one with a mean of infinity, because all the signal is in the very rare, large outliers. If you try to analyze samples from such a process assuming it's Gaussian, nothing will make sense, and the standard deviation will give you an estimated uncertainty of the mean that is far, far below the actual uncertainty.
What if the medical profession has as much egregious and widespread ignorance of the basics as programming? Would you be in favor of certification?
like if the variance of your result is really high and you're estimating the mean its clear that the number of samples you take is important !?!
To complicate matters further, "accuracy" is an extremely tricky concept in statistical analysis because error rates work very differently than they do in, say, physics. In physics, when you measure something you can be sure that your results are accurate, so long as you stay outside your instrument's range of error. In stats, your confidence interval just tells you how likely it is that your results are completely wrong, or even worse, wrong by a completely unknown amount. Every statistical inference you make has a chance of completely blowing up on you. That chance can be defined and reduced, but it can never be eliminated. There's also things like frequentist vs Bayesian statistics, where the interpretation of the same data can be completely different.
Zed may be a jerk sometimes, but on this topic he's dead right. Most programmers are far more confident about this stuff than they should be.
On a superficial level, if you are doing overnight processing of log files, then you probably care more about throughput than latency. In this case, averages are probably a fine metric. On a slightly deeper level, standard deviation is only a useful measure if the distribution is known, and in a lot of real world cases it is not. The right question isn't whether 100 or 1000 tests on the same data provides sufficient statistical power, but whether range of inputs is sufficient to trigger worst case perfomance.
Now, I presume that Zed knows these things and applies them appropriately, but the article strikes me as more snide than helpful. Perhaps as others say he's a great guy in person, but I prefer my stats with less attitude and more insight. Here, for example: http://yudkowsky.net/rational/bayes
[edit: changed my sloppy language from 'has no meaning unless to the distribution is normal' to 'is only a useful measure if the distribution is known']
Are you completely sure about that?
I suppose many readers of this thread are more knowledgeable about statistics than I am. I would appreciate hearing from the knowledgeable readers whether or not variance in the observed values makes a difference in the cases discussed in the submitted article.
As the above triva factoid points out, the standard deviation is an important summary statistic. More interestingly by using mean, variance (or sd), skew, and kurtosis, you can describe almost any centrally concentrated distribution. Even distribution with heavy tails.
I think what the OP meant is that most 3+ sigma results are not truly 3+ sigma, because most distributions in this world are not gaussian, but instead have large wings. SD is most useful when you know what the underlying distribution is. Currently it's more in fashion to communicate spread using confidence intervals because they presume less about the underlying distribution.
I should have said something more like "the standard deviation calculated from a sample set is only generally applicable in so far as one is willing to make assumptions that the sample set is representative of the distribution as a whole". The default assumption in traditional statistics (such as quoting p-values) is that the distribution is normal, and in real world situations often not the case.
Your restatement is right on, although I'd go farther and say that standard deviations (and confidence intervals) are only useful metrics with regard to the particular assumptions one is willing to make about underlying distribution. Yes, you can calculate these measures, but they won't help you if your assumptions are irreparably flawed.
You could quibble about my exact phrasing, but yes, I'm completely sure about that. This is the 'black swan' problem writ small. I don't mean that a high standard deviation should be ignored for real-world distributions, but I do mean that a low standard deviation carries very little weight unless a normal distribution is presumed.
I'm hard pressed to relate this to the cases discussed in the article, as those cases are shy on detail, but the DB2 example seems most applicable. Although he points to standard deviation as the tell-tale flag here, this is sort of misleading. The exact numerical value for the standard deviation across all queries is meaningless here, as not every query has an equal likelihood of being slow. As he states, the real problem was the terrible performance of an single query.
How many similar queries exist? Will a new query added to the system trigger a similar bug? We don't know, and standard statistics isn't going to help us unless we have an understanding of the underlying mechanism. The key here is not to test a statistically significant subset of all possible queries, but to check the performance of the actual queries executed (as he did).
I think he has a fair point. Here on HN I see a lot of armchair sociologists critique the various articles in the social sciences that get posted, but it's rather unclear to me whether these are well grounded in an actual understanding of the issues involved, or simply habitual incantations of rules of thumb such as "correlation doesn't imply causation."
Oh, also, "You're all assholes and I rock."
Truly the Carl Sagan of software.
Something along the lines of "Naked Economics" for the stats realm ...
It's kind of a shame that his message is diluted by its delivery. Case in point: about half of the comments here aren't even about statistics!
Would he kill me?
He should learn a little bit of complexity and calculability by the way, best case, worst case, Big O notation, etc.
I know you've already been smacked down but this is a big pet peeve of mine and worth pointing out. You can always complain that something should have had more information in it, and since that is always true no matter what, the complaint is information free. (A value that comes from a universe of one value contributes zero bits of information.)
However, in this case, I tend to think that Zed included just enough information to get someone who's clueless started - he even included references at the end so that if you DO want more, you can easily get it.
From the article it sounds like he learned stats first.
> studied statistics in grad school,