
Popularity of technology on Stack Overflow and Hacker News: Causality Analysis - l____
https://github.com/dgwozdz/HN_SO_analysis
======
notafraudster
None of this is causal. For a problem to be, in a statistical sense, causally
identified there must be some random or as-if random manipulation of
treatment. The two major ways of thinking about causality in statistics are
Judea Pearl's DAGS (a representation of causes and effects as an acyclic graph
where pathways between variables must be clear of "colliders" which threaten
causal validity) and the Neyman-Rubin causal model (also called "potential
outcomes", where a unit's outcomes under treatment and control,
hypothetically, are considered).

One example of an identification strategy here would be to find two languages
that are identical in all regards (including in their overall popularity and
exposure across the world), but where one was more popular than the other on
SO specifically. This would be a matched selection on observables strategy,
where selection into "treatment" (popularity on SO) is not globally random,
but random conditional on certain pre-treatment covariates, such that the non-
treatment potential outcome of both languages would be expected to be the same
and the difference between the observed outcomes (the level of popularity of
both languages on HN) is only a product of the treatment.

Here's a simple inferential threat; the author ascribes the popularity of
technology on SO as causing the popularity of a technology on HN. What if,
instead, some third common cause caused both, but it caused SO spikes faster
than HN spikes. Now, in the world I've described (where the true treatment
effect is zero), what statistical test involving comparing SO and HN data,
even incorporating temporal ordering, would correctly come up with an estimate
of 0? If your answer does not come up with an estimate of 0, then its real-
world causal estimate is also presumptively wrong.

I also have concerns about how the author measured both treatment and outcome.

Overall I think there is an interesting DESCRIPTIVE (non-causal) question
somewhere in this article, but it's bogged down by the author trying to apply
something they heard about from a Wikipedia article as though it were a
substitute for taking causality seriously. We've all heard Alexander Pope's
adage that "a little knowledge is a dangerous thing".

~~~
stared
Measuring causality is hard. Without randomized control trials, you pretty
much need to assume some causal graph structure (plain data is not enough).

See: "ML beyond Curve Fitting: An Intro to Causal Inference and do-Calculus"
[http://www.inference.vc/untitled/](http://www.inference.vc/untitled/)

When it comes to this SO vs HN article based on data one can look at
correlations. Even if some delayed correlations, then still - it does not
imply causation (e.g. HN crowd my by faster to post links to new technologies
than SO crowd to questions).

~~~
nerdponx
Sure, but you need to make your assumed graph convincing. As the GP said,

 _Here 's a simple inferential threat; the author ascribes the popularity of
technology on SO as causing the popularity of a technology on HN. What if,
instead, some third common cause caused both, but it caused SO spikes faster
than HN spikes._

You either need to account for this, or make a convincing argument that no
such common cause exists.

~~~
stared
Sure, fully I agree with that.

------
mlthoughts2018
> “After each time series is transformed (if necessary) to a stationary one, a
> Granger causality test is performed.“

This is actually methodologically incorrect. You test Granger non-causality by
taking a VAR system and you _do not difference the data for stationarity!_ You
use the cointegration order to determine how many extra lags of the exogenous
variable to use beyond the order p that defines the lags tested against the
null hypothesis (what the authors mention is tested by AIC, BIC, and observing
autocorrelation of residuals).

> “The ADF test, differencing and the Granger causality test were performed on
> data aggregated to monthly frequency”

Hmm. It looks like differencing was done based on the ADF test... this is not
valid for Granger causality.

Remember, with the frequentist statistical tests, there are so many fraught
issues like this.

In the end, a test like this only talks about the following:

“Under the assumed hypothesis of no causality, how extreme do the observed
test statistics appear to be?”

— where the test statistics are in part based on estimated model parameters
that are highly sensitive to methodological errors, like differencing prior to
computing the coefficient-is-non-zero tests in this Granger VAR model.

In the end it makes me skeptical of interpreting any of the results from the
OP.

A good explaination is here: [0].

[0]: < [http://davegiles.blogspot.com/2011/04/testing-for-granger-
ca...](http://davegiles.blogspot.com/2011/04/testing-for-granger-
causality.html) >

~~~
thanatropism
They're "data scientists", what did you expect.

~~~
mlthoughts2018
I expect they are paid 2x more than trained statisticians unfortunately.
Happily though, they didn’t try to solve this problem with Judea Pearl-style
causal inference hype.

A better approach to this would probably be treating the data as panel data
and either doing monthwise hierarchical Bayesian regressions, and testing if
the coefficients pooled over time are significantly non-zero, or else using a
sort of distributed lag model or instrumental variables model on the monthwise
simple regressions. In these cases, it’s better to roll up lagged features of
the exogenous time series to serve as constructed covariates in the lag model,
than to rely on differencing and time series tests which are like a landmine
of interpretation issues.

Checking causality in models like this is _very_ tricky, so a lot of care has
to be taken, and you should expect work to proceed slowly and require a lot of
posterior checks.

~~~
l____
Hey, thank you for informative comments. I would gladly read more about
hierarchical Bayesian regression. Would you recommend any sources from which I
could learn more?

(I am fully aware that I can google it but in my opinion it's usually better
to ask someone who is already familiar with the topic because he/she may know
good learning materials.)

~~~
mlthoughts2018
\- <
[http://www.stat.columbia.edu/~gelman/book/](http://www.stat.columbia.edu/~gelman/book/)
>

\- <
[http://www.stat.columbia.edu/~gelman/arm/](http://www.stat.columbia.edu/~gelman/arm/)
>

If you happen to use Python, pymc3 is a great place to look for hierarchical
and Bayesian timeseries models with full examples.

~~~
l____
Thank you. One of my personal goals with regard to this analysis was to learn
Python on-the-go (I feel more comfortable in R) so I will definitely check
pymc3.

------
DeusExMachina
As it's often repeated "correlation is not causation".

Personally, I would only expect the former and not the latter. In my opinio
it's more plausible that some other cause causes the increase of popularity on
both sites.

The rise of Swift, for example, was caused first by Apple releasing the
language, the increase of online articles and communities around the language
and its open-sourcing.

This lead to an increase in popularity pretty much everywhere, including SO
and HN. These are correlated, but there is no causal relationship between the
two.

~~~
ChrisLomont
That's why the author didn't simply look at correlation, but used Granger
_causality_ tests.

~~~
red75prime
And then the author remarked that Granger causality doesn't indicate
causation. It is a measure of how well one variable predicts upcoming changes
in another. It doesn't mean that direct manipulation of the first variable
will necessarily influence the second one.

~~~
thanatropism
Causation is an ill-defined concept. Already Aristotle had to define several
kinds of causation, and philosophers still struggle with it.

It's even possible, per Hume, that causation doesn't actually exist.

------
wolfgke
> Additionally, in case of two technologies (JQuery and Tensorflow) the
> variables regarding data from SO were pointed out as potential results of
> variables from Hacker News. The idea of Hacker News influencing popularity
> of technology on Stack Overflow is not so easy to accept (at least for me)
> as the opposite one, nevertheless, it shouldn’t be entirely disregarded.

What is so unplausible about this hypothesis?

~~~
tofflos
I came here to say the exact same thing. To me the hypothesis put forward by
the author (SO influencing the popularity on HN) seems unlikely. I visit HN to
find out about new technology and only visit SO when the technology I've
already chosen isn't working out - and even then it's only indirectly via
Google search results.

~~~
wpietri
Yeah, I'm sure the causal arrow goes both ways. You learn about it on HN, try
it out, ask a question on SO. Somebody else hears of the tech, visits SO, sees
that there's activity, tries it out, and mentions it on HN.

But I'm puzzled by the framing. If I were looking at the relationship, I'd be
asking questions like: Is HN a good place to post if I want to make something
popular? If I learn about something on HN, is it likely to become popular? If
I want to know what the next big thing is, is HN an indicator of that?

If I'm thinking of SO as the cause of something, it'd be mainly in the
category of docs: a supporting factor. So I'd be inclined to measure
documentation and developer resources more generally.

~~~
l____
> Yeah, I'm sure the causal arrow goes both ways.

Even not fully realizing this (see my replay to @tofflos) I did not exclude
such a possibility and that's why I performed the Granger causality test for
both hypotheses (SO influencing HN and HN influencing SO).

> But I'm puzzled by the framing. If I were looking at the relationship, I'd
> be asking questions like: Is HN a good place to post if I want to make
> something popular? If I learn about something on HN, is it likely to become
> popular? If I want to know what the next big thing is, is HN an indicator of
> that?

I guess these are different questions which are not covered by my analysis.

> If I'm thinking of SO as the cause of something, it'd be mainly in the
> category of docs: a supporting factor. So I'd be inclined to measure
> documentation and developer resources more generally.

To be honest, I do not understand this one. Could you please elaborate?

~~~
wpietri
Sure. I personally wouldn't put SO down as the cause of anything in that I
think it's intermediate in the kinds of causal chains I see around software
popularity. I almost never hear about a new technology on SO. I'm there
because I am already using it, have a question, and see a Google result.

So if I'm looking at technology adoption, SO is in a basket of factors that I
think of as supporting. Does this technology have a website? Is it easy to get
started? Is there a place where I can chat with people? Is there a meetup? Is
there a conference? Are there good docs? Are there good videos? Are there blog
posts and books? And, of course, are there good questions and answers?

All of these supporting factors can keep somebody on the road to adoption, in
that once somebody has decided to try the technology, they aid the person in
getting to a useful result. I don't think they're causal in the sense of
initiating anything. But if I squint I could call them causal in the sense
that if you invest in them, you see increased adoption. Given their
substitutability, though, if I were looking at causality I'd try to find a
broader metric than SO alone.

is that helpful?

------
tannhaeuser
I wonder if we could use this kind of correlation analysis to detect
shilling/astroturfing, like when there's a spike on HN that isn't backed up by
SO for a given topic. It's also timely I guess since MS seems to have been on
a spree lately on reddit and elsewhere.

~~~
solarkraft
Meh: A more developer friendly product would cause less activity on Stack
Overflow, right? How about frameworks with their own forums?

------
amelius
Isn't this one of the fundamental problems of the advertising industry? I.e.,
answering the question "did this ad cause the increase in sales?"

How do _they_ approach this?

~~~
citrablue
There are two primary approaches, both with their issues. Media Mix Modeling
is the top down approach, where you input all marketing activities over time
with corresponding response (sales) data. Direct attribution (multi touch
attribution) attempts to identify each touchbase a specific customer had with
your brand, and use that to infer causality.

[https://medium.com/@BenHinson/understanding-the-
difference-b...](https://medium.com/@BenHinson/understanding-the-difference-
between-digital-attribution-and-media-mix-modeling-c4f7b7a53bbc)

------
stephanheijl
The graphs shown are cumulative, which is defined by the README to be the sum
of all the points up to that date (the most common definition). However, some
graphs show a dropping number of points, like the one for Java, even in the
non-standardized plot.
([https://raw.githubusercontent.com/dgwozdz/HN_SO_analysis/mas...](https://raw.githubusercontent.com/dgwozdz/HN_SO_analysis/master/readme_vis/plots/20180602_java_so_score_sum_cum_hn_all_score_sum_cum_double.png))
Would this indicate some sort of error in the data collection, or did I
miscomprehend the "cumulative" label?

~~~
CapitalistCartr
He explains that when discussing HTML.

>Such a situation occurred due to greater number of downvotes than upvotes.
This may be result of high number of duplicates since 2014 or the questions
which were not formulated in a clear way or were not reproducible (and
therefore were downvoted).

~~~
stephanheijl
Thanks for the answer, I missed that paragraph. It seems quite disconcerting
to me that the average question in the Java tag has been rated negatively the
last 4 years. I'm curious as to the specific reason for this downward trend,
which isn't reflected in most of the other languages.

~~~
Macha
One thing that comes to mind for the three which exhibit these patterns, is
that they're high volume, slow moving ecosystems. Given the SO community's
high penchant for closing questions as duplicates, I wonder if we're just at
the point where these languages have exhausted their supply of "sufficiently-
unique-and-not-homeworky questions for SO".

~~~
l____
Hey, I was thinking about writing something like that in the analysis. I
finally decided not to do this because languages develop over time. I'm not a
user of Java/HTML/Pascal/PHP but I use SQL quite frequently and in my opinion
it is possible to ask a well defined question that did not appear before, at
least for some "dialects" (if I name it correctly) like TSQL or Oracle SQL.
Maybe if those "dialects" were investigated, their trends would not be
downward.

------
echelon
You mentioned Rust, but it wasn't included in your analysis. Would you be able
to regenerate the analysis easily to include it? I'd be super interested in
seeing trends around Rust.

~~~
l____
I only reported plots I found interesting/saw some resemblance between HN and
SO and in case of Rust probably none of those was he case. However, it must be
said that I didn't have any quantitative criterion for this and this decision
was purely subjective.

------
thanatropism
The easiest way to perform this analysis is to use a statistical package that
features Vector Autoregression and Vector-valued Error Correction Models.

Example:
[http://www.statsmodels.org/dev/generated/statsmodels.tsa.vec...](http://www.statsmodels.org/dev/generated/statsmodels.tsa.vector_ar.var_model.VARResults.test_causality.html)

~~~
l____
Thanks! I wasn't aware that such a function exists.

------
chiefalchemist
\- number of times questions from a certain time span (e.g. from a given day)
were tagged as a favourite,

\- number of comments for questions from a certain life span,

\- number of views of questions from a certain life span,

\- number of replies for questions from a certain life span.

Perhaps it's me, but questions (on SO) could be indicative of poor
documentation, or a lack of answers elsewhere. The latter, obviously, somewhat
counter to popularity.

~~~
l____
Thanks for pointing this out, I didn't think about this. Nevertheless, I
assume that in an ideal situations when documentations for two programming
languages have the same quality then the number of questions for the
programming language which would have larger group of users would be greater
therefore I still think of this measure as valid. I am also aware that we do
not live in an ideal world.

Offtopic: I do not know how to measure and compare the quality of
documentation and am curious whether there are any methods, so if you know
some and could elaborate on this, I would be grateful.

------
baxtr
At the bottom of the README you’ll find the answer:

 _> To sum up: does popularity of technology on StackOverflow (SO) influence
popularity of post about this technology on Hacker News (HN)? There seems to
be a relationship between those two portals but I could not determine that
popularity on Stack Overflow causes popularity on Hacker News._

~~~
contingencies
Common sense would dictate that one website does not cause another website to
do anything, particularly those with vast communities. This is a classic case
of someone misunderstanding the utility of an analytical technique. IMHO they
may execute it right (didn't bother reading) but apparently fail to ask the
right question in the first place or comprehend the limited utility of the
result.

~~~
theothermkn
It would be interesting to see if the prevalence of comments of the common
internet form “I am deliberately ignorant, but I think...” caused a decline in
quality on HN, or if the decline in HN quality was caused by the prevalence of
such comments.

~~~
baxtr
How do you measure the decline?

~~~
theothermkn
Measuring it would mean first figuring out what was valuable in HN and then
tracking that. It may turn out that what was valuable isn't easily measurable,
even with ML.

In a preliminary way, however, it seemed there used to be a stronger
correlation between length of comment (as a proxy for thoughtfulness, perhaps
crossed with reading level) and comment score. It also seemed there were fewer
of the "I know nothing about field X, but let me pontificate on X" comments.
Frequency of one-liners has probably increased, in relative terms. We could
probably also look at the relative frequency of comments of the type "Thank
you for your (specialist) explanation" or "Very interesting!" over time, as a
more or less direct proxy for quality.

One might also be able to come up with a measure of "feeling of safety." In
other words, the crowd probably perceives the level of discourse and guards
their contributions accordingly. For example, "How do you measure the
decline?," is interpretable as a technical question, or as a rhetorical
question with "one" replacing "you," or as a snarky attack (by reading it with
a sneering emphasis on "you," to indicate skepticism that a decline has
happened), and so on. The general tone of replies might serve as an indicator,
especially if crossed with something like days-since-joined-HN or karma.

But, really, my original comment was meant to contribute something positive
while calling out, en passant, a particularly egregious kind of comment
behavior that I am saddened to see on HN, particularly since I perceive a
general decline over time.

Full(ish) disclosure: I burned down a previous account because I had made a
shameful anti-Muslim comment a la Sam Harris, whom I'd been reading (and was
inexplicably taken with) at the time. I could no longer edit or remove the
comment by the time I'd come to my senses, so opted to irreversibly change the
password on that account to gibberish. This is all just to point out that,
despite my apparent HN youth, I can indeed seem to remember a time when
thoughtful comments that represented either expertise or care elicited far
more upvotes than they do now, and a time when short quips and adolescent
snark elicited far swifter and more brutal karmic beat-downs.

~~~
baxtr
Thank you for your thourogh explanation. My question was indeed sincere in
that I wanted to understand how you perceive the decline. And, your statement
makes sense to me. I think HN is becoming much more mainstream these days.
It’s growing out of its niche is my feeling. That is probably driving “the
decline”. I don’t know whether this is a good or a bad thing. Maybe, however,
the development makes a good case for an HN alternative once enough people
think the way you do.

------
bencollier49
Given that he's tested for lots of languages, shouldn't he be applying
something like bonferroni correction to the Granger causality results?
Wouldn't this have the effect of making them all insignificant?

------
nfrankel
"Correlation doesn't imply causation"

~~~
l____
Hey, thanks for the comment. I am fully aware of that and did not stated
otherwise in the analysis. I only established that there seems to be some kind
of relationship but it is not possible to determine causality.

------
oli5679
This source includes a helpful summary of Granger causality, written by
Granger himself.
[http://www.scholarpedia.org/article/Granger_causality](http://www.scholarpedia.org/article/Granger_causality)

 _The topic of how to define causality has kept philosophers busy for over two
thousand years and has yet to be resolved. It is a deep convoluted question
with many possible answers which do not satisfy everyone, and yet it remains
of some importance. Investigators would like to think that they have found a
"cause", which is a deep fundamental relationship and possibly potentially
useful.

In the early 1960's I was considering a pair of related stochastic processes
which were clearly inter-related and I wanted to know if this relationship
could be broken down into a pair of one way relationships. It was suggested to
me to look at a definition of causality proposed by a very famous
mathematician, Norbert Weiner, so I adapted this definition (Wiener 1956) into
a practical form and discussed it.

Applied economists found the definition understandable and useable and
applications of it started to appear. However, several writers stated that "of
course, this is not real causality, it is only Granger causality." Thus, from
the beginning, applications used this term to distinguish it from other
possible definitions.

The basic "Granger Causality" definition is quite simple. Suppose that we have
three terms, Xt , Yt , and Wt , and that we first attempt to forecast Xt+1
using past terms of Xt and Wt . We then try to forecast Xt+1 using past terms
of Xt , Yt , and Wt . If the second forecast is found to be more successful,
according to standard cost functions, then the past of Y appears to contain
information helping in forecasting Xt+1 that is not in past Xt or Wt . In
particular, Wt could be a vector of possible explanatory variables. Thus, Yt
would "Granger cause" Xt+1 if (a) Yt occurs before Xt+1 ; and (b) it contains
information useful in forecasting Xt+1 that is not found in a group of other
appropriate variables.

Naturally, the larger Wt is, and the more carefully its contents are selected,
the more stringent a criterion Yt is passing. Eventually, Yt might seem to
contain unique information about Xt+1 that is not found in other variables
which is why the "causality" label is perhaps appropriate.

The definition leans heavily on the idea that the cause occurs before the
effect, which is the basis of most, but not all, causality definitions. Some
implications are that it is possible for Yt to cause Xt+1 and for Xt to cause
Yt+1 , a feedback stochastic system. However, it is not possible for a
determinate process, such as an exponential trend, to be a cause or to be
caused by another variable.

It is possible to formulate statistical tests for which I now designate as
G-causality, and many are available and are described in some econometric
textbooks (see also the following section and the #references). The definition
has been widely cited and applied because it is pragmatic, easy to understand,
and to apply. It is generally agreed that it does not capture all aspects of
causality, but enough to be worth considering in an empirical test.

There are now a number of alternative definitions in economics, but they are
little used as they are less easy to implement._

