
An alarming number of scientific papers contain Excel errors - pns
https://www.washingtonpost.com/news/wonk/wp/2016/08/26/an-alarming-number-of-scientific-papers-contain-excel-errors/
======
denzil_correa
Previous discussion -
[https://news.ycombinator.com/item?id=12349391](https://news.ycombinator.com/item?id=12349391)

------
ramblenode
MS Excel is _absolutely_ unfit for most scientific and engineering problems.

The spreadsheet GUI, lack of good version tracking/history, and eagerness to
coerce data types and "correct" values makes it easy to introduce errors that
will go unrecognized and propagated through calculations. Unfortunately this
story just keeps repeating itself.

But all of this is just a secondary concern to Excel's real trouble: it's
history of incorrectly implementing numerical and statistical procedures. One
could plumb the depths of this topic for hours, but here are a few highlights:
regression formula accepts illegal/nonsensical inputs (e.g. collinear
predictors) and gives illegal/nonsensical outputs [0], variance/standard
deviation change incorrectly with sample size [0], output of a paired t-test
changes when missing values are included [0], formulas are mislabeled [0], v.
2007 gives very wrong answers to 11 of 27 tests in the NIST test suite used
for statistical software benchmarks [1], the random number generator was
broken as late as v. 2007 [1], and calculations relying on any of 12
particular floats display an incorrect result [2]. There are plenty of other
issues mentioned in the links and elsewhere; if you're interested you'll have
no trouble finding them.

Remember, friends don't let friends use Excel for science. :)

[0]
[http://people.stern.nyu.edu/jsimonof/classes/1305/pdf/excelr...](http://people.stern.nyu.edu/jsimonof/classes/1305/pdf/excelreg.pdf)

[1]
[http://www.pages.drexel.edu/~bdm25/excel2007.pdf](http://www.pages.drexel.edu/~bdm25/excel2007.pdf)

[2] [https://blogs.office.com/2007/09/25/calculation-issue-
update...](https://blogs.office.com/2007/09/25/calculation-issue-update/)

Edit: clarify and add a new issue I became aware of while researching further.

~~~
krapht
I'll just quote Soustrup here: "There are only two kinds of languages: the
ones people complain about and the ones nobody uses." In particular you see
the spreadsheet GUI as a downfall, but I think it is a great enabler that has
allowed millions of non-programmers to create useful programs.

~~~
ramblenode
I would counter with "use the right tool for the job."

Excel is great at churning out fast and dirty estimates for low impact work.
The problem is when it's used for large, complex, or important problems
because these are just not Excel's domain--something obvious when looking at
the kinds of features and bug fixes MS has prioritized over the years.

~~~
7952
It is damning that you cannot use a basic computer tool for anything that is
large, complex, or important. Isn't that exactly what computers are for? Call
me idealistic but I want a spreadhseet that can handle milions of rows. Maybe
add a "strict" mode to keep people out of trouble.

~~~
cm2187
You could have an option to disable any formatting in a tab, and then Excel
should be able to handle a billion rows in that tab. But you can't have all
the flexibility of Excel on a massive scale. Even a laptop today is insanely
fast and has an insane amount of memory.

------
dagaci
First I thought this story was about math and numerical errors. But it's
actually about auto- formatting and auto-correction.

"Excel automatically converting gene names to things like calendar dates or
random numbers"

In this case, I think what is needed is some kind of rudimentary knowledge of
data-types. Or perhaps more simply a scientific template which is actually
plain text by default.

But how are people not noticing auto-correction and auto formatting taking
place!

The only perfect solution is to hire a developer to build you a data entry
system. The developer can build the system which they have no cause to
entirely understand the science behind, and thus a human to take the blame for
errors instead of excel.

~~~
cm2187
The worst with these formatting is that Excel will behave differently in
different regions. Another headache in large international
corporations/collaborations.

~~~
leaningtower
But that's an advantage too. As a non English-native user, I like when
companies care about my language and culture and they do not stick dates to
the MM-DD-YYYY format that is used by Americans only. Yes, of course, we
should all go for YYYY-MM-DD... fine by me, but now you go telling my mom :¬)

------
keithpeter
[https://help.libreoffice.org/Calc/Deactivating_Automatic_Cha...](https://help.libreoffice.org/Calc/Deactivating_Automatic_Changes)

Type apostrophe at beginning of the gene name ('MARCH1) or format the column
for gene names as text (click column letter, then Format | Cell and select
text)

If people want to use a spreadsheet application for this kind of data
collection (and that is a big if I think) then they perhaps need to have some
agreed lab protocols for setting up and checking the spreadsheets. This is a
known issue in financial circles...

[http://www.eusprig.org/basic-research.htm](http://www.eusprig.org/basic-
research.htm)

~~~
jacques_chester
Ah, you beat me to EUSPRIG. A fine organisation whose research findings give
me the heebie-jeebies.

------
omginternets
When my Ph.D is finally done (~3 months), I'll post some of the code I've had
to work with daily for the past three years.

"Spaghetti" doesn't even _begin_ to describe it. "Ball of yarn under a cat-
lady's sofa" comes readily to mind, as does gouging my eyes out and amputating
my fingers.

The problem isn't excel. The problem is scientists.

~~~
thearn4
During grad school, I found that MATLAB, in particular, lends itself to
spaghettification.

I can think of a lot of reasons why that is, but the one-function-per-file and
single flat directory structure of MATLAB programs is part of it.

Language quirks are another, but I could write an entire book about that.

~~~
omginternets
>I found that MATLAB, in particular, lends itself to spaghettification.

How did you guess? ;)

I've been gently pointing people towards python for this exact reason. The
younger generations need little convincing, but the old dogs would rather
write the same shitty code.

I suppose change takes time.

~~~
thearn4
Python has definitely been a much better way forward for my work!

------
SNvD7vEJ
Why is the auto-convert 'features' in Excel not opt-in?

When Excel encounters the first cell in a new sheet that it thinks should be
auto-converted, why does it not ask if that is desirable for that sheet?

Like: "Do you want Excel to interpret and auto-convert all strings with format
<X> into the type <Y> in this sheet?"

At least for conversions where the original data is lost.

~~~
JumpCrisscross
I don't think Excel ever destroys the underlying data. It simply layers
formatting over it. One can override by specifying custom formatting, or opt
out by selecting all cells and setting their format to text.

~~~
wongarsu
Once the data is in the cell, changing cell formatting is a nondestructive
visual layer. But while inputting data, cell formatting changes the
interpretation of the input, which is destructive.

The article describes in some detail how inputting SEPT2 in a cell with
default formatting displays 9/2/2016, but is stored as 42615 (which you get if
you later change the cell to text formatting).

------
vanderZwan
Highly relevant: Felienne[0] Hermans' compsci research on spreadsheets out
there in the wild, and how to develop software engineering tools to make them
better:

[https://www.youtube.com/watch?v=2Cdgew5zvI4](https://www.youtube.com/watch?v=2Cdgew5zvI4)

[http://www.felienne.com/archives/tag/spreadsheets](http://www.felienne.com/archives/tag/spreadsheets)

[0] pronounced Fay-lee-nuh

------
IndianAstronaut
Same thing hapened in Economics involving a major figure in Economics.

[http://www.bloomberg.com/news/articles/2013-04-18/faq-
reinha...](http://www.bloomberg.com/news/articles/2013-04-18/faq-reinhart-
rogoff-and-the-excel-error-that-changed-history)

~~~
leaningtower
Yes, they did not get what the world "hide" means in English...

------
hirenj
It's funny to consider that these errors slipped past the peer review stage.
It really highlights the major issue with reviewing source code published as
part of an analysis.

If there aren't enough resources / skilled eyes to catch these simple errors,
what are the chances they would catch errors in source code too?

~~~
pbhjpbhj
How many of the studies are doing anything statistically interesting though
[ie new/different in the field of statistics]. In most cases it should be
possible to have an analysis program, perhaps specialised to the field, in
which you simply load the data attached to the study and look at however many
regressions or time series or whatever you want - they're standard tools,
surely you can download the data and put it in a GUI and look at the same
graph as is in the paper in 2 or 3 clicks?? If not why not?

------
triplesec
Anyone doing serious statistics uses SPSS or R, or similar stats programs. If
not, you deserve all the bad data you get. Using Excel for that is akin to
using a point and shoot camera for a fashion photoshoot, or a crossover car
offroad in Death Valley.

------
Gatsky
Consider that given the poorly conducted statistical analyses, p-hacking etc
that goes on in the life sciences, Excel garbling gene names might actually
improve the net accuracy of the results by removing false positives.

------
Steeeve
Is everybody a washington post subscriber? Or have I missed the route around
the paywall somehow?

~~~
Noseshine
I read all my WP articles via right-click "Open link in incognito window".

(OT now:) If anyone thinks that's cheating them out of money - the choices are
not "read for free" and "read for the appropriate price", the choices are
"read for free" or "don't read". The reason I read is to procrastinate, so the
value I get out of it is actually _negative_ (same with HN...). Even their
occasionally excellent articles (like their series on asset forfeiture) is
stuff I'm at most mildly interested in (as a foreigner) to distract myself.
That's my issue with today's media, I don't actually feel "informed" as in "it
this is good for my life that I know these things". I can't do anything about
99.99% of the stuff I read about anyway, nor is it a representative sample of
reality but consists almost solely on reporting the outliers.

~~~
pbhjpbhj
>The reason I read is to procrastinate, so the value I get out of it is
actually negative (same with HN...). //

Entertainment has value doesn't it? It's not a simple financial value like the
simplistic use of opportunity cost as being the price you could bill those
hours at, but it's a useful and functional part of being human.

I don't have a problem with you reading something people broadcast to the
public internet though.

~~~
Noseshine

        > Entertainment has value doesn't it?
    

If it's procrastination the _net_ value is negative. You may put any value you
like on the "entertainment" \- but what it displaces has higher value.

If I delay my work (which I do, even now) the overall value of writing
comments on HN or reading a WP article that talks about issues that don't
directly concern me and that I cannot do anything about is negative, even to
myself (and don't try to argue they may concern me indirectly because, well,
_everything_ does).

It's like being addicted to drugs: Sure you can argue if the drugs (and let's
assume those especially crazy and destructive ones) had no value to the person
taking them they would not take them, but a more appropriate model than high
school economics would be the neuroscience of addiction. But even if you
decide to stick to using an economic model you would have to take a very
narrow view - like picking exactly the period where a stock was rising to show
how great a pick that company is - to argue the person gets a positive value
from taking those drugs.

~~~
pbhjpbhj
>You may put any value you like on the "entertainment" \- but what it
displaces has higher value. //

Disagree. This conversation has a value, I [likely] can't derive anything
financial from it, one might term it "entertaining" even [that wasn't supposed
to sound quite so denigrating!]. The value is difficult to define, but it
doesn't _remove_ value from my life IMO. It possibly takes some time with
which I can argue for opportunity cost, but I see the conversation as a
generally positive thing.

I'm not sure if a mere conversation can be equated so easily to the value
positions involved in drug addiction. However, I would say that it's a mixed
bag. Some aspects that come out of drug addiction can have positive value -
I'm thinking the progression of the arts: some great works of literature,
paintings, dramatic performances, appear to have at least some relationship to
the artists drug use [and in some cases addiction, it's hard to know where the
divide is].

>"This is no more true than to say that Van Gogh was only Van Gogh because of
his inner turmoil or than Jean-Michel Basquiat needed heroin to draw or paint.
But it is also worth remembering that it killed them both."
([http://www.worldcrunch.com/culture-society/under-the-
influen...](http://www.worldcrunch.com/culture-society/under-the-influence-
tracing-a-long-twisted-history-of-artists-and-their-drugs/drugs-art-bizarre-
hallucinations-painting/c3s11162/))

Similar ground with a greater focus on musicians -
[http://blogs.scientificamerican.com/mind-guest-
blog/creativi...](http://blogs.scientificamerican.com/mind-guest-
blog/creativity-madness-and-drugs/).

~~~
Noseshine
You can disagree all you like - on what basis??? How do YOU know what I'm
doing and what my time is worth? This discussion certainly has zero value -
more like a _negative_ value to have to encounter so annoying. Ridiculous!

    
    
        > This conversation has a value
    

I covered that!!! Do you actually READ the comments you respond to? I mean,
without the filter that removes the things that don't fit your narrative?

~~~
pbhjpbhj
Lol, your indignation made me laugh ... um, does laughter have any value in
your philosophical framework?

>I covered that!!! Do you actually READ the comments you respond to? //

Right back at your there - it has a value because I value it. Might seem a bit
too self-referential but that's how value works.

>How do YOU know what I'm doing and what my time is worth? //

I don't. There's extrinsic and intrinsic values for sure - if you're chatting
inanely to me on HN when you would normally be performing successful heart
surgery on people who want to live longer then the opportunity cost [in terms
of life enrichment for the people who would have been saved] is high, for
sure, but that doesn't mean the intrinsic value is negative.

You appear to be arguing that because there is a potential for foregoing
financial gain through having a conversation that the _value_ of the
conversation -- the ability of it to enrich, educate, improve, entertain, etc.
-- is negative. The true value can't be counted, you don't know it's effect on
me and I don't know the effect on you (or others who are reading). Maybe an
onlooker has read something in the conversation and that's inspired their PhD
thesis on the teleology of communication.

IMO you appear to too readily decry the measurable negative aspect - potential
for foregoing financial gain (opportunity cost) - whilst you under-estimate
the potential for positive improvement, extrinsic value, and the like.

[FWIW Currently I'm suffering with mental health problems and this
conversation has actually made me realise that I can be positive. I'm not
saying this to try and shoe-horn in an extrinsic value, that's a genuine self-
reflection.]

Happy to hear any further responses if you can steer away from declarations of
"ridiculous!" and "annoying!" and illucidate why you feel it's ridiculous, how
it conflicts with your value judgement in expanded terms?

------
pjmorris
Should decison maker's spreadsheets in business, policy, and government be
peer-reviewed in the same way as scientific papers?

Disclaimers: 1) Yes, scientific peer review needs improvement. 2) Yes,
spreadsheets are not ideal for science... what makes business less important?

------
cm2187
I wonder if that also applies to DNA tests used in criminal investigations...

------
gregn610
An alarming number of business spreadsheets contain Excel errors. But it's the
linga fraca of businesses, departments & teams everywhere.

------
nol13
*Excel mutations

