
A Brief History of the Current Empirical Software Engineering Science Turf War - luu
https://buttondown.email/hillelwayne/archive/science-turf-war/
======
fanf2
There's another review of the same study/replication/rebuttal at
[http://shape-of-code.coding-
guidelines.com/2019/11/20/a-stud...](http://shape-of-code.coding-
guidelines.com/2019/11/20/a-study-a-replication-and-a-rebuttal-se-research-is-
starting-to-become-serious/) which agrees that the rebuttal is unconvincing
and the replication study rightly showed the original studies were seriously
flawed.

------
hwayne
The day after I wrote this, the replication authors came out with their
rebuttal rebuttal: [http://janvitek.org/var/rebuttal-
rebuttal.pdf](http://janvitek.org/var/rebuttal-rebuttal.pdf)

~~~
wwwigham
> The projects should be chosen by controlling their characteristics rather
> than relying on GitHub “stars” which capture popularity and are unrelated to
> software development.

> “stars” ... are ... unrelated to software development.

I wouldn't call sentiment about software _entirely_ unrelated to software
development - minimally I'd assume that more popular projects, receiving more
traffic due to higher engagement, would also receive more scrutiny, and thus
have more bug(fixe)s. That would be an interesting characteristic to attempt
to select for, no?

> Part of issue 9 is left unchallenged: 34% of the remaining TypeScript
> commits are to type declarations. In TypeScript, some files do not contain
> code, they only have function signatures. These are the most popular and
> biggest projects in the dataset. We corrected for this by removing
> TypeScript.

Those are... Still code? At least as much as header files in c are, and they
can totally still contain bugs. Now, if we're talking about how they're often
redistributible copies (like headers), especially way back at the time of the
paper before @types came about, and how duplication due to that may inflate
meaningful SLOC, that's interesting. Would be nice to see a good argument for
it to be corrected and controlled, rather than dropped outright. Then again,
TS has grown a ton in the intervening years, and the "largest" repos listed
are... Very unused now. Wholly outdated. Entirely obsolete. You'd probably get
different results with fresher data, too.

> The classification of languages is wrong: consider Scala. In FSE, it is
> lumped with Clojure, Erlang & Haskell under the “Functional Paradigm”. For
> this to be meaningful, there must exist some shared attribute these
> languages have that makes programs written in them similar. Referential
> transparency and higher order functions could be that. But, while Scala has
> higher-order functions, it is imperative. So, it is not a perfect match.
> Worse: Java also has higher-order functions, yet it isn’t in that group.

Many newer languages are multiparadigm, but I'd probably still say that Java
encourages the imparitive OO style, while Scala prefers the functional
composition style. I don't think "a single unifying characteristic" really
helps, given the feature bleed between modern languages. I think you only get
a great classification by examining, eg, what the community style standards
are.

It'd be nice if open data studies like the one under discussion here were
reliably posted in a format where the analysis could be easily recalculated,
readjusted, tweaked, and represented. It's the kinda thing that could be neat
to play around with in a notebook.

~~~
TeMPOraL
Github stars are first and foremost _bookmarks_. Not expression of sentiment,
not votes of quality, just _bookmarks_. You see an interesting repo, you star
it, so that you can find it later.

------
jrumbut
Some things are worth knowing and some things are easy to measure, rarely do
they overlap. Number of commits mentioning bugs may not be either, but given
how little hard data there is out there I applaud any effort.

Someday I hope a researcher convinces a few companies to open up their JIRA
histories and is able to really study project completion/success/user reports
of bugs.

~~~
kortilla
> Number of commits mentioning bugs may not be either, but given how little
> hard data there is out there I applaud any effort.

I disagree. The original papers were widely cited and contained such serious
flaws that they were actively harmful. Assuming a commit message messaging an
“infix operator” is a fix for a bug is just grossly negligent.

------
ncmncm
tl;dr: Statistical analyses of Github are still a dumpster fire of flawed
metrics. Regex matching of commit messages substitutes for bug counts, regex
matching of source files substitutes for language identification. It is
garbage-in, garbage-out, with no defensible conclusions.

------
cryptica
These studies are bs because they capture corellations and confuse them with
causations.

Good studies need controlled laboratory conditions to remove external
variables.

For example, maybe FP projects are written by coders who have more years of
experience on average. So in that case it's not about the language paradigm,
any facts derived from the study would actually be about the individuals who
write in that language. And it's not surprising at all that more experienced
individuals write fewer bugs. This is just one factor that may be ignored;
there are probably hundreds that can cause significant distortions in the
results.

It would be like saying that Norwegian is a highly effective language because
they have a very high GDP per capita. This completely ignores the more
significant fact that the country has vast amounts of offshore oil to
capitalize on.

~~~
xgk

       studies are bs because 
    
    

_We should not let the perfect be the enemy of the good._ The problem is that
we do not currently know how to carry out studies on the efficacy of
programming language paradigms under "controlled laboratory conditions". It's
known to be hard. Feel free to change this and become famous! The studies
discussed in the article are but first steps towards:

\- a better empirical grounding of programming language & software engineering
research;

\- more emphasis on reproducibility in science.

I'm glad that Vitek, Berger et al are starting serious empirical PL/SE
research, and care about reproducibility! Bravo!

