Hacker News new | past | comments | ask | show | jobs | submit login
Existence does not imply correlation (chrisstucchio.com)
36 points by spindritf on Aug 18, 2014 | hide | past | favorite | 49 comments



I'm going to point something out here that's not really central to the argument the article is making.

In programming, in my experience, the nastiest bugs to fix are actually two or three separate bugs interacting in weird ways. If you find a bug like he did, and it's easy to fix and unlikely to break something else, but you can't reason how it could be causing the issue you're seeing, FIX IT ANYWAY. It's quite possible it's interacting in some subtle way with another bug, and fixing it may make the other issue start behaving more consistantly, and easier to fix.

It may feel wrong, because you feel like you should set that theoretically unrelated bugfix aside until you can work out the bug you're trying to focus on. In my experience, that's often not the right approach.


Exactly! You hope that by fixing one bug, three issues will go away. In reality you have to fix three bugs to get one issue to go away.


If you find a bug like he did, and it's easy to fix and unlikely to break something else, but you can't reason how it could be causing the issue you're seeing, FIX IT ANYWAY. It's quite possible it's interacting in some subtle way with another bug

Do people really reason like this? I mean, programming is absolutely clear-cut. It's the most clear-cut aspect of life, in some ways. It's purely a logic problem. If you see something that can't possibly be interacting with your problem, then spending additional mental cycles on it is always a waste of your time for solving that problem.

Now, it may be a good idea to fix that new problem. That's perfectly true. But if it can't possibly affect your current problem, then fixing the new problem won't do a darn thing to help you fix your current problem. That sounds like tautology, but your comment is saying the opposite.

You can prove to yourself that a new problem can't possibly be interacting with your current problem. If the problems exist in two separate modules, you look at the connections between the modules and see whether any data can possibly flow from one to the other. If there is no state that flows between them, then the problems are necessarily independent.

I think what you're saying is that "most codebases suck, because they have a lot of interdependencies and are hard to analyze." That's probably true. But resorting to voodoo thinking isn't going to help.


> can't possible be interacting with your current problem

The only thing not logical with programming is a programmer's ability to fully simulate the circumstances by which all bugs may occur. That ability will vary greatly from programmer to programmer.

So yes, people think like this because it's easier to fix the bug than it is trying to figure out correlation.


Maybe my previous comment was unclear. If so, sorry about that.

The point was that, in programing, there is never any "figure out correlation." You can rule out whether a bug is being caused by a given line of code by examining the flow of data between what you're seeing on screen and the lines of code responsible for what is shown on that screen. A bug is never "correlated" with any given line of code. The line of code is either logically related to the bug, or not related at all.

I'd be interested to hear more about how programming could be made into a correlation game, though. It sounds like a new mental tool that I've never learned, which means I should learn it.


To make programming into a correlation game, build a distributed system and work on performance. You suddenly have flow of data, together with lots of external factors that are difficult to measure (e.g., a spike in latency in US-East but not US-West for 0.5% of packets, or 2% of your shared instances having a noisy neighbor).

In such contexts, you usually also have a LOT of code.

Correlation analysis becomes very important in figuring out which piece of code to even look at. Bugs do become "correlated" with a line of code, because bugs take the form "noisy neighbor + blocking disk read (code) + high latency to master DB => slow response".


Thank you. That's a very interesting way to think about it, and I hadn't considered that before.

I apologize for making so many comments in this submission. I feel pretty terrible about it, because the number of comments are higher than the number of upvotes, which has pushed your submission off of the front page. I didn't realize it was happening until too late. But more than that, in retrospect, I should have behaved differently altogether, which would've resulted in fewer comments. Sorry.


Interesting you mentioned a distributed system. Worked on a few in the past and a fairly complicated one right now. Exactly what I was thinking when I made my comment! :)


You can rarely "rule out" anything based on reasoning about data flow - simply because your reasoning always has the possibility of being flawed. If you had an oracle you could consult while debugging, you could do this. But we rarely are in that situation. Instead, you can only increase your confidence.

In my experience, it is best to fix problems as you find them, even if you are confident that they are not the cause of your current bug. Each thing "wrong" I find in the code leads me to question my current mental model of what is going on. A bug means that, somewhere, there is a difference between my mental model, and what the code actually does. A seemingly unrelated error gives me evidence that my mental model may be even further from reality than I realized. That's bad. At that point, I know of at least two deviations from reality. And just like trying to establish causation in a scientific experiment, you want as few variables as possible.


I am not arguing against fixing problems. I am arguing against voodoo thinking.

I'm getting the feeling that the way I learned to debug is somehow very different from what everyone else here is doing. I've never found a bug I haven't been able to fix or understand. I'm not bragging. It's simply the truth, and it's why I'm mystified about the negative reactions I'm getting here. I've spent my whole life debugging things, and I've never once ran across "maybe the bug is caused by this new problem, even though it's obviously completely unrelated, so I'm going to fix it and cross my fingers." It just seems ludicrous, the same way that it was ludicrous to think the alignment of the stars could determine whether you'll have a happy life.

The oracle you consult while debugging is the program itself. If you don't understand the behavior, then you add logging statements until you do. You log everything, absolutely everything, and then reason about what the program is doing when the bug is occurring.

There was exactly one bug I was never able to fix, and it was because the bug resided in closed source code (D3D9). Multithreading was triggering the bug even though their code claimed to be MT-safe. At that point, there was nothing I could do except revert to unmultithreaded shader building. It was a terrible race condition that took a full day to track down, but it didn't require any kind of correlation tricks. It simply required that I rule out large swaths of the codebase until it was logically impossible for the bug to be anywhere except in the D3D9 library.

Yes, of course, fix problems as you run across them. But that's a separate discussion altogether.


I am the same way, and I similarly am proud in that I can always fix a bug. It's all just code, and I can always pin-point the error.

My impression is that you and I would go through very much the same process when debugging code. This sub-thread, however, is filled with my points on mental models and confidence, so I won't repeat it here. It's not voodoo thinking, but keeping in mind that mental models may be wrong. I am careful to avoid thinking certain things are "impossible" because I know my reasoning is fallible. I instead say "I find it extremely unlikely, and here's my evidence why."

My bug-that-go-away was a race-condition in a lock-free, multithreaded allocator I was porting to the Itanium instruction set. My understanding of the memory model on Itanium wasn't very good, and even the original author of the algorithms was unsure if it could even be ported. I eventually decided it was not a good use of my time, and moved on. (The algorithm was designed on the Power instruction set, and was easy to port to x86.)


"maybe the bug is caused by this new problem, even though it's obviously completely unrelated, so I'm going to fix it and cross my fingers."

You're mis-stating what I said. I said "you can't reason how it could be causing the issue you're seeing". That's very different from "it's obviously completely unrelated".

Look at the example in the article. We're getting network delays. Here's a bug that would cause network delays. It's not clear why we'd only be seeing delays on the west coast server and not the east coast.

Do we fix the bug and see what happens? Or do we keep researching?

What happens when we've been researching for a couple hours, and the network delays are keeping people from doing their work? What happens after two days of research, and we're losing a few thousand dollars an hour?

Huh, maybe we should fix the bug we can find right now (too many redis connections), and see what happens to our network delay bug.

Maybe it turns the bug we wanted to fix originally from something intermittent into something consistent.

Now if you're having an issue with network latency, and you find a bug in the code that converts floating point numbers into dollars and cents, then obviously that's not what I'm talking about.


Ok, we're clearly talking past each other. Let's reset.

1) You have a bug. Your ovals are coming out as boxes, and you hate anything without rounded corners. So you look into your drawing code to find out what's going on.

2) Along the way, you stumble across a routine that has a bug: Your network code isn't properly handling a failure condition. You're absolutely certain that your oval drawing code isn't tied to network data in any way: it's always supposed to draw ovals, but it's drawing boxes.

Do you fix #2, and cross your fingers that #1 is also fixed?

I would write down #2 (for example, by adding a TODO comment) and then keep trying to fix #1. I wouldn't stop what I'm doing, fix #2, then see if #1 is fixed.

Now, here is your original comment:

In programming, in my experience, the nastiest bugs to fix are actually two or three separate bugs interacting in weird ways. If you find a bug like he did, and it's easy to fix and unlikely to break something else, but you can't reason how it could be causing the issue you're seeing, FIX IT ANYWAY. It's quite possible it's interacting in some subtle way with another bug, and fixing it may make the other issue start behaving more consistantly, and easier to fix.

It may feel wrong, because you feel like you should set that theoretically unrelated bugfix aside until you can work out the bug you're trying to focus on. In my experience, that's often not the right approach.

The situation I described above isn't uncommon. You have a bug you're trying to fix, and then you run across a different problem, but it's obviously completely unrelated. You're saying, "Drop what you're doing and fix it." I'm saying, "Focus, and think logically."

I apologize if I'm somehow misrepresenting you. Please correct me if that's the case. Also, I think I'm having an off day, and my comments are coming across as self-centered and snobbish. My apologies.


Race conditions. This bug shows up sometimes while frobbing the dingbat, but only sometimes. The crash is correlated with frobbing the dingbat.


Very much yes. If someone tells me about a bug, and they tell me "It always happen when I do this", my immediate reaction is "Oh, good, that will be easy to fix."

If instead they say, "It sometimes happens when we do this", then that means it's going to require serious investigation because we will need to start correlating different events to see under what circumstances the bug does and does not pop up. That is very much like the scientific method.


The Udacity course on debugging speaks directly to looking at correlations between bugs and executions of various portions of code (and the same across bugs).

There's some interesting stuff: https://www.udacity.com/course/cs259


"So yes, people think like this because it's easier to fix the bug than it is trying to figure out correlation."

And many times, there's no clear advantage to figuring out the correlation first instead of just fixing the bug and seeing what happens.


But in most real codebases separate units don't interact cleanly. "you look at the connections between the modules and see whether any data can possibly flow from one to the other." is something that would be very hard to do in codebases that aren't perfect. Probably more of a waste of the mental cycles than just putting it right if it is a quick fix. With version control you can even avoid the problems of making the original problem worse with your fix.


While I don't completely agree with the parent post, I don't think that your reply does it justice.

If you see something that can't possibly be interacting with your problem...

I think that "can't possibly" is the source of trouble. Today it's practically impossible to say with certainty what is actually impossible. There's so much going on at a lower level that's not visible to us, from network routing completely outside our systems, to the internal workings of many large APIs and frameworks, that there's plenty of room for things to happen in ways that are unlikely to us, but in retrospect after we've researched, do fit together.


I guess I just assumed that everyone researches everything. Doesn't everyone look into how network routing works, or how every large API or framework is implemented, etc? If you're running into a bug, and that bug can possibly be affected by any of those things, then you have to dive all the way to the bottom of the swimming pool to see if there's something amiss.


Yes, we do that. But we also have learned, the hard way, to have some humility with that knowledge. Something may disagree with our mental model, but we're careful not to say something is impossible because our mental model may be wrong. (In fact, it is wrong, since we're dealing with a bug.)


"In theory, theory is a lot like practice. In practice, practice is nothing like theory."


When I first started programming every single bug was caused by something that couldn't possibly be related to the problem...


This is a good post to begin with but then the author shoots himself in the foot with claims like

"A couple of weeks ago I pointed out Sam Altman making bogus arguments about sexism - arguing by juxtuposition that the existence of sexism reduces the number of women in technology."

I think the fact that sexism reduces the desire of women to participate in something is established. It doesn't make sense to dismiss this as an 'argument by juxtaposition' since scientific thought begins with drawing new guesses from established patterns, particularly when the data are found wanting. So now, reading between the lines, we have the dual inference that

1) the author doesn't understand his own argument, and

2) the author has a reactionary axe to grind regarding gender issues in technology.

Whether or not these inferences are true (I don't think they are), a large audience is now lost since most people are socially aware enough to understand that 'sexism doesn't repel women' is a poor null hypothesis, even if they can't articulate these feelings in the language of science.


the fact that sexism reduces the desire of women to participate in something is established

Yeah, I thought so about the influence of parenting on children's outcomes. Then I started reading things like the linked JayMan's article[1] and now I'm not so sure. There probably is some influence but it's quickly starting to look pretty insignificant compared to genetics and just... randomness, luck.

Same with psychiatric drugs. Knowing nothing about psychiatry, I used to think they're really good because they're prescribed by doctors and my experience with drugs (painkillers and antibiotics) was that drugs are like miracles on demand. Only crazy people would go off them! Then I started reading articles by psychiatrists and learned that many of very popular medication help one person in 20 after months of taking, if anyone at all ever. Or that antidepressants may actually increase your chances of offing yourself.

The moment you start digging anywhere outside of massive successes (antibiotics) and physics (where important research is usually 5 sigma solid, though even then...), things look nowhere near as clear cut as we're told in school or by the media.

And yes, sometimes "how do you know?" is just a cop out. But lately I have been taking that seriously, how do I know? I blame the less wrong crowd abandoning their site and spreading doubt thoughtout the Internet.

[1] http://jaymans.wordpress.com/2014/05/30/beware-armchair-psyc...


Author here.

Altman provided no argument whatsoever in his post that the existence of a sexist act is the cause of women's underrepresentation in tech. He was fairly careful - he didn't even explicitly state causality, he merely implied it. Maybe an argument is out there somewhere, but not in Altman's post.

I think the fact that sexism reduces the desire of women to participate in something is established.

Directionally, maybe, but I've seen little evidence the magnitude is significant. In spite of the built in sexism, Catholicism and Islam both have far more women than Atheism (or technology). I don't see any reason to believe medicine or law are less sexist than tech, or were less sexist back when women started flooding into those fields.

If you have evidence of correlation, link to it - Sam Altman didn't.

I do have a "reactionary axe" to grind against bad reasoning and this particular topic is a huge source of bad reasoning.


> In spite of the built in sexism, Catholicism and Islam both have far more women than Atheism

Let's break down what's bad about this sentence. It contains:

* a complete misunderstanding of the nature of the sexism we're talking about. A woman who walks into a church will not be constantly devalued for her gender and/or treated like a hot piece instead of a colleague, while simultaneously having their feelings devalued with neckbeardy thought-terminating "rationalism". In technology and business, sexism isn't an abstract teaching most people ignore, it's something in the air.

* At least two of the statistical fallacies you just decried.

* The 'appeal to worse problems' fallacy, and

* the hidden assumption that women's participation in religion is comparable to their choice of careers, as though they 'join' the Catholic Church or Islam the same way one might join a technology company.


Yeah 'something in the air'. What does that mean? That some people (many women) don't like the culture of technology. That's not sexism; that's just a fact. E.g. Some people (many women) don't like hunting - does that make it sexist?


You risk losing money and market share if you think like that.

If you make smaller guns and if you make guns in a range of colours and you make nice clothing you then open up the market beyond its current range. The people currently hunting make dislike a powder blue rifle and may scoff at comfy footwear, but you don't care because you're not selling to those people, you're selling to the peple who are not traditionally hunters.


Excellent observation. But still not sexism. Its gotten popular to use blaming language, paint everything with the sexism brush, and generally polarize discussions about workplaces. I don't think that helps get us anywhere. If only more folks thought like you.


No, hunting isn't "sexist", it's gender biased. "Sexist" is a normative term.

It's easy to see where the gender bias comes from: most hunters come to the sport via their fathers, and hunting as a father-son tradition is deeply ingrained in the culture. There are, despite that, plenty of women hunters.

That's the case with hunting. But it's hard to extrapolate from hunting to technology jobs.


In your first bullet point, you are stating a very detailed theory - near as I can tell, A && B && C => low representation (where A is a particular sort of sexism, B is low status men being rational, etc). That's not what Altman did, nor did he provide evidence for that theory or any other.

The "appeal to worse problems" is not a fallacy - I'm advocating searching for examples which vary the inputs. That will allow you to actually measure correlation - if inputs and outputs vary together, you've got correlation. If they don't, you've got nothing.


> I think the fact that sexism reduces the desire of women to participate in something is established.

Nope, I can send you videos from several hip-hop concerts that strongly contradict this statement. I wouldn't though, because that would be an isolated example, taken out of context. I would, however, like to see if, for example – there are other fields which in which women participate a lot despite them being more sexist. 'yummyfajitas mentioned some examples and it would be interesting to ask people in them, "What makes you go into this field despite it having an atmosphere that we in the tech field might describe as being sexist?", and learn something from them.

As people who would preferably like to see carefully constructed arguments starting from first principles, statements like "oh, of course, something is established, so someone who is questioning the statement has an axe to grind" give off a petulant, uncompromising vibe, and should, in my opinion, be avoided. Broadening our horizons a bit as to how people behave in different circumstances might give us some new insights into increasing female participation in tech.

> Most people are socially aware enough to understand that 'sexism doesn't repel women' is a poor null hypothesis

I would like to replace "most people" here with "people in cynicalkane's particular socio-economic milieu". Again, this takes the form of "It is obvious that...", and should be avoided if we want a nuanced, carefully discussed argument about the fairly broad umbrella of sexism.


The Wikipedia term for an article like this is a coatrack. https://en.wikipedia.org/wiki/Wikipedia:Coatrack


> I think the fact that sexism reduces the desire of women to participate in something is established.

[citation needed]


Check his comments, you'll find a healthy dose of (2).


Could you link to them? I'm curious what is now considered "reactionary".


Here's why a reasonable reader might come to the conclusion that this is not in fact a post about correlation, but rather a device for making a drive-by dig at Sam Altman's post:

* The Redis example is forced. Here's why: most engineers faced with that problem would, sooner rather than later, collect stats about the nodes and immediately observe that the root cause was an overloaded server. As the post admits, the cause was immediately obvious. No statistical reasoning was required, and, in fact, this reader doubts that statistics played too much of a role in Stucchio's own diagnosis. (Whether it actually did or didn't isn't material to my point.)

* The insight being provided about correlation is extremely simplistic. Essentially, it communicates in two graphs the definition of correlation. That might make sense if the post was written in a fashion that tried to communicate the fundamental idea of correlation to someone with literally no acquaintance to the term. But it's not; despite having subheds about what "is" and "isn't" correlation, the writing style more clearly signifies that the author is trying to debunk a mistaken idea about what the term is, implying that its readers are already somewhat acquainted with the idea. Or, put more simply: the fundamental idea the post tries to communicate could have been conveyed in two simple sentences. (Glass houses about my own writing style: duly noted).

* The transition from technical discussion to Sam Altman is abrupt. More importantly, the Altman subject has much more valence than the point about diagnosing Redis failures. Thought experiment: chop everything in this article after Redis out, and imagine it was posted not by Chris but by some random account. Would anyone pay attention to this post?

Unfortunately, the post doesn't have much insight to offer about Altman's post. It's framed Altman in a manner incompatible with Altman's post --- suggesting, contrary to reality, that Altman was trying to present a complete empirical argument about gender disparities in technology. It then tries to beat Altman over the head with that framing. Altman emerges unscathed, because the author is swinging at his shadow, not him.


Now I hope for a follow-up post that explains what the east cost problem was.


Network trouble inside US-East combined with a call that involved a lot of round trips.

I was asynchronously sending maybe 50 messages out and waiting for replies. In US-East, the standard deviation of network latency went way up, meaning that while most of those messages return in 2-3ms, a few took up to 50. It turns out that I didn't need all 50 responses anyway, so I just set a timeout of 10ms and did the best I could with whatever messages did return.


I'm waiting for a paper in a peer-reviewed journal - even a journal in the humanities by a feminist cultural critic will do - that quotes statistics (even flippantly, en passant, without analysis) to the point that sexism scares people away from professions.

It sounds true. It kind of feels true to me, even. But there's too many feels being passed for fact lately.


I think that all of these arguments are pivoted off the same idea. Roughly, but not exactly():

    (Co-)existence is necessary for correlation
    Correlation is necessary for causality
Or maybe (E > Co > Ca). What this means is two-fold

    1. It's valuable and common to note E and Co
       suggestively of Co and Ca respectively. If
       I *notice* E then I begin to hypothesize about
       Co because I know have evidence for its 
       possibility if not yet evidence for Co directly

    2. Being lazy/imprecise in speech or reasoning
       might cause someone to "skip a step" and claim
       something more powerful than what they actually
       have.
With regard to (1), I think it's completely valuable to make these observations. At their heart, they're nothing more than broadcasting and contextualizing
data* and that's an important function. So (2) is where all of the danger lies---properly contextualizing data and what it actually gives credence to.

So how can we make sure that (2) occurs casually without precluding (1)?

I think that articles like this one and all around the whole "correlation is not causation" tagline are great. They help to ensure people remember the size of the space between each step in E > Co > Ca.

I think another powerful technique would be to demonstrate more arguments that make the jumps and highlight the properties which allow that to happen. Chris did a good job here demonstrating correlation—what it looks like when it exists or fails to exist—but much, much more can be said about the E > Co jump.

The Co > Ca jump is far more complex. Worse, it's often obscured through opaque words like "randomized, controlled study" or "scientific method" which are actually quite far away from the mechanisms which allow that jump to be made—they're more like implementation details obscuring a great API. Demonstrating clear arguments here (and not "rain + wet grass" oversimplified ones) could be a great boon to public reasoning.

What else can be done to reduce (2) without precluding (1)?

(*): Really (linear) correlation is just one kind of relationship-of-interest between things. This is often pointed out when people talk about correlation in terms like "circular relationships have 0 linear correlation, oh no!"

What I'd like to write instead might be E > M > C where M becomes "model building". Choosing to highlight linear correlation means that you're choosing a linear model. That might be perfect, or it might be wrong, but you still must choose it before you can structure your observations into evidence of some kind of causality. Generally, at this point, you would also want to begin developing covariates all in preparation for the C step.


About linear correlation: there's a Derrida point in statistics where everything becomes relative to a model. Linear correlation is the preferred mode for cardinal (not ordinal) quantities because it's directly related to linear ANOVA (the decomposition of variance between "explained by the covariate" and "unexplained by the covariate") (among many other reasons), but it's a choice. There's nothing outside theoretical decision.


Bear in mind that causality does not imply correlation. (I know I'm arguing with just one sentence in your post, but I think it needs to be pointed out.)

Sometimes you are observing controlled variables, in which case an counter-example like Friedman's thermostat will disprove this notion.


Yeah, I would like to leave "correlation" out of it entirely. The real steps are more like "model building" and "evidence collection", but those are both more complex and not common vernacular.


Causation doesn't imply correlation either, as the causative effect may be cancelled by other effects.


Yes, there's a lot of caveats in this. Especially around the Co component!


'localhost:8000' link on the page? Classy


sexism is not one thing. it doesn't simply exist or not exist. it varies (in quantity, quality, nature, and scope) from company to company, community to community, and relationship to relationship.

"sexism in technology" is not an entity, it is a description of an aggregate. saying that the "sample size" is 1 is absurd.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: