One example of an identification strategy here would be to find two languages that are identical in all regards (including in their overall popularity and exposure across the world), but where one was more popular than the other on SO specifically. This would be a matched selection on observables strategy, where selection into "treatment" (popularity on SO) is not globally random, but random conditional on certain pre-treatment covariates, such that the non-treatment potential outcome of both languages would be expected to be the same and the difference between the observed outcomes (the level of popularity of both languages on HN) is only a product of the treatment.
Here's a simple inferential threat; the author ascribes the popularity of technology on SO as causing the popularity of a technology on HN. What if, instead, some third common cause caused both, but it caused SO spikes faster than HN spikes. Now, in the world I've described (where the true treatment effect is zero), what statistical test involving comparing SO and HN data, even incorporating temporal ordering, would correctly come up with an estimate of 0? If your answer does not come up with an estimate of 0, then its real-world causal estimate is also presumptively wrong.
I also have concerns about how the author measured both treatment and outcome.
Overall I think there is an interesting DESCRIPTIVE (non-causal) question somewhere in this article, but it's bogged down by the author trying to apply something they heard about from a Wikipedia article as though it were a substitute for taking causality seriously. We've all heard Alexander Pope's adage that "a little knowledge is a dangerous thing".
Nor did I say I had found causation neither did write that I had not. I specifically stated that I tested the hypothesis of Granger causality which does not mean causality. Even if the main question of the article was causality, I could not answer it (and I stated this in the text).
Nevertheless, I agree that I should have probably been more careful with the term "causality" so thank you once again for pointing this out.
See: "ML beyond Curve Fitting: An Intro to Causal Inference and do-Calculus" http://www.inference.vc/untitled/
When it comes to this SO vs HN article based on data one can look at correlations. Even if some delayed correlations, then still - it does not imply causation (e.g. HN crowd my by faster to post links to new technologies than SO crowd to questions).
Here's a simple inferential threat; the author ascribes the popularity of technology on SO as causing the popularity of a technology on HN. What if, instead, some third common cause caused both, but it caused SO spikes faster than HN spikes.
You either need to account for this, or make a convincing argument that no such common cause exists.
It would be good to see your ideal example of a casual, for non statistics people to understand your long note better.
The idea is that you've controlled for just about every factor that could affect the rate of unexpected heart attacks, or those factors are evenly distributed throughout both samples because you were careful to sample randomly. Therefore, if there is a difference between the groups, on average, it must be because of the treatment that you introduced to one group and not the other.
I'm hand-waving, of course, and I'm sure there are medical researchers out there who will read my study design and laugh at how badly controlled it is. But that should give you the general picture of one comon method used to perform "causal" analysis.
Another technique we might use is a blocked (or stratified) random sample. Knowing that there will be both smokers and non-smokers, we recruit two separate samples, and randomize treatment assignment within each. This ensures that smoking status does not predict treatment assignment and guards against some potential threat from overall randomization.
We could also mitigate the imbalance that does exist by doing a matched analysis, where each treated unit is paired with a control unit that looks most like him (some control units are reused). Or we could match on propensity scores. Or we could weight on inverse propensity weights. Or we could weight using covariate balancing. Or...
My point in doing this info dump is to a) back up nerdponx's example, which is great and b) illustrate how there's a lot to learn about how statisticians have taken the problem of causal analysis seriously and developed techniques appropriate for answering causal questions.
People in the CS side of things tend to use Pearl's DAGS for conceptualizing this stuff. I'm in the stats/econ side of things so I use Neyman-Rubin. They're equivalent. Allow me to suggest Rubin and Imbens - Causal Inference for Statistics, Social and Biomedical Sciences as a good textbook that we assign to graduate students learning this stuff. Some of my students tell me the "Causal Inference Mixtape" is popular among people who want less statistical theory and more "what should I do as a practitioner". A virtue of both the resources I just mentioned is that they discuss not just experimental designs but also observational data studies, like the one the original post would have wanted to conduct.
This is actually methodologically incorrect. You test Granger non-causality by taking a VAR system and you do not difference the data for stationarity! You use the cointegration order to determine how many extra lags of the exogenous variable to use beyond the order p that defines the lags tested against the null hypothesis (what the authors mention is tested by AIC, BIC, and observing autocorrelation of residuals).
> “The ADF test, differencing and the Granger causality test were performed on data aggregated to monthly frequency”
Hmm. It looks like differencing was done based on the ADF test... this is not valid for Granger causality.
Remember, with the frequentist statistical tests, there are so many fraught issues like this.
In the end, a test like this only talks about the following:
“Under the assumed hypothesis of no causality, how extreme do the observed test statistics appear to be?”
— where the test statistics are in part based on estimated model parameters that are highly sensitive to methodological errors, like differencing prior to computing the coefficient-is-non-zero tests in this Granger VAR model.
In the end it makes me skeptical of interpreting any of the results from the OP.
A good explaination is here: .
: < http://davegiles.blogspot.com/2011/04/testing-for-granger-ca... >
A better approach to this would probably be treating the data as panel data and either doing monthwise hierarchical Bayesian regressions, and testing if the coefficients pooled over time are significantly non-zero, or else using a sort of distributed lag model or instrumental variables model on the monthwise simple regressions. In these cases, it’s better to roll up lagged features of the exogenous time series to serve as constructed covariates in the lag model, than to rely on differencing and time series tests which are like a landmine of interpretation issues.
Checking causality in models like this is very tricky, so a lot of care has to be taken, and you should expect work to proceed slowly and require a lot of posterior checks.
(I am fully aware that I can google it but in my opinion it's usually better to ask someone who is already familiar with the topic because he/she may know good learning materials.)
- < http://www.stat.columbia.edu/~gelman/arm/ >
If you happen to use Python, pymc3 is a great place to look for hierarchical and Bayesian timeseries models with full examples.
Personally, I would only expect the former and not the latter. In my opinio it's more plausible that some other cause causes the increase of popularity on both sites.
The rise of Swift, for example, was caused first by Apple releasing the language, the increase of online articles and communities around the language and its open-sourcing.
This lead to an increase in popularity pretty much everywhere, including SO and HN. These are correlated, but there is no causal relationship between the two.
Said differently, the number of interested questions, over certain periods of time, concurrently with adoption and use of the platforms being measured, are being measured here.
Not as simple as "correlation is not causation"
This can create a causal feedback cycle that establishes general trends in technology, because it's rooted in self identity and the appearance of intelligence (doesn't have to actually be intelligence).
Correlation is not causation but correlation combined with a system that intertwines (belief, personal motives/self identity/self image, crowd validation) can cause correlation to become causal.
It's even possible, per Hume, that causation doesn't actually exist.
What is so unplausible about this hypothesis?
But I'm puzzled by the framing. If I were looking at the relationship, I'd be asking questions like: Is HN a good place to post if I want to make something popular? If I learn about something on HN, is it likely to become popular? If I want to know what the next big thing is, is HN an indicator of that?
If I'm thinking of SO as the cause of something, it'd be mainly in the category of docs: a supporting factor. So I'd be inclined to measure documentation and developer resources more generally.
Even not fully realizing this (see my replay to
@tofflos) I did not exclude such a possibility and that's why I performed the Granger causality test for both hypotheses (SO influencing HN and HN influencing SO).
> But I'm puzzled by the framing. If I were looking at the relationship, I'd be asking questions like: Is HN a good place to post if I want to make something popular? If I learn about something on HN, is it likely to become popular? If I want to know what the next big thing is, is HN an indicator of that?
I guess these are different questions which are not covered by my analysis.
> If I'm thinking of SO as the cause of something, it'd be mainly in the category of docs: a supporting factor. So I'd be inclined to measure documentation and developer resources more generally.
To be honest, I do not understand this one. Could you please elaborate?
So if I'm looking at technology adoption, SO is in a basket of factors that I think of as supporting. Does this technology have a website? Is it easy to get started? Is there a place where I can chat with people? Is there a meetup? Is there a conference? Are there good docs? Are there good videos? Are there blog posts and books? And, of course, are there good questions and answers?
All of these supporting factors can keep somebody on the road to adoption, in that once somebody has decided to try the technology, they aid the person in getting to a useful result. I don't think they're causal in the sense of initiating anything. But if I squint I could call them causal in the sense that if you invest in them, you see increased adoption. Given their substitutability, though, if I were looking at causality I'd try to find a broader metric than SO alone.
is that helpful?
How do they approach this?
>Such a situation occurred due to greater number of downvotes than upvotes. This may be result of high number of duplicates since 2014 or the questions which were not formulated in a clear way or were not reproducible (and therefore were downvoted).
- number of comments for questions from a certain life span,
- number of views of questions from a certain life span,
- number of replies for questions from a certain life span.
Perhaps it's me, but questions (on SO) could be indicative of poor documentation, or a lack of answers elsewhere. The latter, obviously, somewhat counter to popularity.
Offtopic: I do not know how to measure and compare the quality of documentation and am curious whether there are any methods, so if you know some and could elaborate on this, I would be grateful.
> To sum up: does popularity of technology on StackOverflow (SO) influence popularity of post about this technology on Hacker News (HN)? There seems to be a relationship between those two portals but I could not determine that popularity on Stack Overflow causes popularity on Hacker News.
In a preliminary way, however, it seemed there used to be a stronger correlation between length of comment (as a proxy for thoughtfulness, perhaps crossed with reading level) and comment score. It also seemed there were fewer of the "I know nothing about field X, but let me pontificate on X" comments. Frequency of one-liners has probably increased, in relative terms. We could probably also look at the relative frequency of comments of the type "Thank you for your (specialist) explanation" or "Very interesting!" over time, as a more or less direct proxy for quality.
One might also be able to come up with a measure of "feeling of safety." In other words, the crowd probably perceives the level of discourse and guards their contributions accordingly. For example, "How do you measure the decline?," is interpretable as a technical question, or as a rhetorical question with "one" replacing "you," or as a snarky attack (by reading it with a sneering emphasis on "you," to indicate skepticism that a decline has happened), and so on. The general tone of replies might serve as an indicator, especially if crossed with something like days-since-joined-HN or karma.
But, really, my original comment was meant to contribute something positive while calling out, en passant, a particularly egregious kind of comment behavior that I am saddened to see on HN, particularly since I perceive a general decline over time.
Full(ish) disclosure: I burned down a previous account because I had made a shameful anti-Muslim comment a la Sam Harris, whom I'd been reading (and was inexplicably taken with) at the time. I could no longer edit or remove the comment by the time I'd come to my senses, so opted to irreversibly change the password on that account to gibberish. This is all just to point out that, despite my apparent HN youth, I can indeed seem to remember a time when thoughtful comments that represented either expertise or care elicited far more upvotes than they do now, and a time when short quips and adolescent snark elicited far swifter and more brutal karmic beat-downs.
The topic of how to define causality has kept philosophers busy for over two thousand years and has yet to be resolved. It is a deep convoluted question with many possible answers which do not satisfy everyone, and yet it remains of some importance. Investigators would like to think that they have found a "cause", which is a deep fundamental relationship and possibly potentially useful.
In the early 1960's I was considering a pair of related stochastic processes which were clearly inter-related and I wanted to know if this relationship could be broken down into a pair of one way relationships. It was suggested to me to look at a definition of causality proposed by a very famous mathematician, Norbert Weiner, so I adapted this definition (Wiener 1956) into a practical form and discussed it.
Applied economists found the definition understandable and useable and applications of it started to appear. However, several writers stated that "of course, this is not real causality, it is only Granger causality." Thus, from the beginning, applications used this term to distinguish it from other possible definitions.
The basic "Granger Causality" definition is quite simple. Suppose that we have three terms, Xt , Yt , and Wt , and that we first attempt to forecast Xt+1 using past terms of Xt and Wt . We then try to forecast Xt+1 using past terms of Xt , Yt , and Wt . If the second forecast is found to be more successful, according to standard cost functions, then the past of Y appears to contain information helping in forecasting Xt+1 that is not in past Xt or Wt . In particular, Wt could be a vector of possible explanatory variables. Thus, Yt would "Granger cause" Xt+1 if (a) Yt occurs before Xt+1 ; and (b) it contains information useful in forecasting Xt+1 that is not found in a group of other appropriate variables.
Naturally, the larger Wt is, and the more carefully its contents are selected, the more stringent a criterion Yt is passing. Eventually, Yt might seem to contain unique information about Xt+1 that is not found in other variables which is why the "causality" label is perhaps appropriate.
The definition leans heavily on the idea that the cause occurs before the effect, which is the basis of most, but not all, causality definitions. Some implications are that it is possible for Yt to cause Xt+1 and for Xt to cause Yt+1 , a feedback stochastic system. However, it is not possible for a determinate process, such as an exponential trend, to be a cause or to be caused by another variable.
It is possible to formulate statistical tests for which I now designate as G-causality, and many are available and are described in some econometric textbooks (see also the following section and the #references). The definition has been widely cited and applied because it is pragmatic, easy to understand, and to apply. It is generally agreed that it does not capture all aspects of causality, but enough to be worth considering in an empirical test.
There are now a number of alternative definitions in economics, but they are little used as they are less easy to implement.