
How to Call B.S. On Big Data: A Practical Guide - aaronchall
http://www.newyorker.com/tech/elements/how-to-call-bullshit-on-big-data-a-practical-guide
======
bbayles
I recently got entangled with some "big data" and "machine learning" B.S. in
the form of the U.S. health care system.

CMS, the federal agency that administers Medicare, introduced a hospital
quality ratings system last year. It is supposed to combine a variety of
objective metrics into an easy-to-understand grade for hospitals.

However, the techniques they used are _really_ bad. For example, a programming
error makes the model give different results depending on how the data are
sorted. Some measures get negative weights, meaning a hospital should do worse
to get a better rating.

I wrote more about the technical failures here:
[https://sites.google.com/site/bbayles/index/cms-star-
ratings](https://sites.google.com/site/bbayles/index/cms-star-ratings)

Another criticism:
[http://jktgfoundation.org/data/An_Analysis_of_the_Medicare_H...](http://jktgfoundation.org/data/An_Analysis_of_the_Medicare_Hospital_5-S.pdf)

~~~
gabrielgoh
This is very fascinating, so thank you for looking into this. I think you may
have misunderstood the rating system. You said

>>>Let’s say I really have to produce a rating, though. What would I do? I
would probably: \- Find some experts and ask them to assign weights to my
various measures on the basis of how much they contribute to quality

But isn't this is exactly what the latent variable model (really a PCA) is
doing? The only difference is - rather than have experts pick 60 weights for
each component, which would require 60 contentious decisions, the PCA does
some form of dimensional reduction so the experts need only pick weights for 7
components, which "unravel" into 60 weights. This sounds reasonable to me -
assuming of course, the PCA components each have meaningful interpretations,
and measure the degree of "good".

~~~
bbayles
I admire your charitable interpretation. I'd be fine with the rating system
using automation to reduce the number of subjective decisions to be made - I
thought the LVM approach was a good one when I first heard about it.

As the second article points out, even this is kind of crazy, and the
implementors didn't seem to care at all about what the results were - for
example, the imaging category is driven by a single measure related to
abdominal CT scans.

------
andrepd
I loved the final paragraph.

> Mind the Bullshit Asymmetry Principle, articulated by the Italian software
> developer Alberto Brandolini in 2013: the amount of energy needed to refute
> bullshit is an order of magnitude bigger than that needed to produce it. Or,
> as Jonathan Swift put it in 1710, “Falsehood flies, and truth comes limping
> after it.”

~~~
NicoJuicy
I use the following quote on my profile since the beginning of time ( of HN):

Statistics are like bikinis. What they reveal is suggestive, but what they
conceal is vital. ~Aaron Levenstein

It's fundamentally the same and can be used multiple times in this thread it
seems :p

------
CalChris
The UW class, _Calling Bullshit in the Age of Big Data_

[http://callingbullshit.org/syllabus.html](http://callingbullshit.org/syllabus.html)

They say video will become available.

~~~
celias
Videos at
[http://callingbullshit.org/videos.html](http://callingbullshit.org/videos.html)
and at the course link on youtube
[https://www.youtube.com/watch?v=A2OtU5vlR0k&list=PLPnZfvKID1...](https://www.youtube.com/watch?v=A2OtU5vlR0k&list=PLPnZfvKID1Sje5jWxt-4CSZD7bUI4gSPS)

~~~
Real_S
This is an awesome wealth of videos! Check out the Data Visualization section
for some Tufte inspired lectures.

These guys really know their bullsh!t :)

------
erikb
It is correct that you don't need a math degree to detect data b.s.

However, the suggested "red flag" method is only part of the solution. You
learn to detect the red flags and you understand, if red flags accumulate, to
start worrying about the truthness of what you are presented...

and then you hit a bullshitter or lyier who, through skill or pure luck,
starts validating things that you know are probably wrong but really really
hope to be true. And suddenly you yourself start to support the b.s. thesis
because you really want it to be true. Then, what do red flags help you? You
will actively try to invalidate them. They may stay objectively true, but in
our poorly structured, limited way of thinking, they won't.

The true art of survival is to detect when something is pulling your inner
optimist. You need to learn to recognize when this guy wakes up. And when he
wakes up you need to assume that you are being cheated, that someone is
actively trying to sell you something you wouldn't buy otherwise. Because if
there is a person with malicious intend all other ways of thinking will make
him win. But if it's just coincidence it doesn't hurt to protect yourself.

Defending yourself usually means to ignore your pride and ignore logic and
simply look out to not invest anything. Not your time, not your money, don't
give signatures, don't stay on the phone call [1], drop the book.

So the true solution to b.s. is: Learn to recognize when somehting pulls you
in, and when you detect it start defending against it by not giving in.

[1] i.e.,
[https://www.reddit.com/r/personalfinance/comments/6ix0jy/irs...](https://www.reddit.com/r/personalfinance/comments/6ix0jy/irs_says_i_owe_them_money_theres_a_warrant_for_my/)

~~~
gaius
_and then you hit a bullshitter or lyier who, through skill or pure luck,
starts validating things that you know are probably wrong but really really
hope to be truey wrong but really really hope to be true. And suddenly you
yourself start to support the b.s. thesis because you really want it to be
true._

There's a word for this kind of person: politician.

~~~
keithpeter
A friend of mine often expresses similar sentiments.

My normal reaction is to ask "what would you have instead of politicians?". He
has problems forming a coherent answer.

What would your answer be?

~~~
laughfactory
The solution is ethical and moral people. Politicians or not, the greatest
modern issue is that we don't value honesty and integrity as a society
anymore. This means, across the board, we have this weird form of nuanced
corruption where everyone lies without compunction just because it's accepted
practice. This means we can't trust politicians, data analysts, data
scientists (anyone who uses data to proclaim "the truth") or anyone who makes
any claim whatsoever to "the truth."

Until, and unless, we as a people value integrity and honesty more highly, we
should hold all "truth" to be suspect, and guilty until proven innocent.

I worked at a job where, as data scientists, we were discouraged from
revealing what we found in the data, and compelled to produce from the data
what our boss thought the answer should be. It was asinine. If you think you
already know the answer, why consult the Oracle? Put your understanding into
practice and find out yourself if you're right. Or, look at the data first
with as little bias as possible, and build testable hypotheses from there. I
think the tendency to take a hypothesis to the data is intended to reflect the
scientific method, but that only works if you're open to whatever the answer
may be. It doesn't work if you just keep pushing to slice the data in ever
crazier ways trying to get it to validate your hypothesis.

~~~
pas
Ethical and moral people fall prey to cognitive biases all the time.

It's not about honesty and integrity, if you (or others) don't ask [the right]
questions.

We are biased, by ideology, world view, experience (and
history/genetics/family/friends/aesthetics) by long- and short-term
(self-)interests and so on. With money it's very common to spot the conflict
of interest, the bias when people give advice, with other things, it's a lot
harder.

That said, people by default are gullible, susceptible to persuasion and so
on. People are by nature social animals and easy to misled, to influence, and
so on. We are naive.

It takes training to spot the bullshit, even in our own thinking. (And we
haven't even mentioned the psychopathologies that can also very seriously
undermine our blossoming critical thinking by sheer force of emotions -
anxiety/depression/impulsiveness/xenophobia/etnophobia - or via a persistent
insistence on fringe patterns - schizoid-type disorders, hallucinations,
paranoia.)

> compelled to produce from the data what our boss thought the answer should
> be.

That's not asinine, that's misconduct. That's fraud.

So, all in all, the situation was never good. The Enlightenment never
happened. It started to, but suddenly stopped.

------
WillPostForFood
Calling BS on big data is really important, but this article is weak. The New
Yorker should be doing better. Try Weapons of Math Destruction by Cathy
O'Neill for a much more informed critique.

[https://www.amazon.com/Weapons-Math-Destruction-Increases-
In...](https://www.amazon.com/Weapons-Math-Destruction-Increases-
Inequality/dp/0553418815)

~~~
nickpsecurity
"The New Yorker should be doing better. "

That alternative you linked is free reading on the New Yorker or some other
site? Or are you saying people willing to pay for a better
article/book/source/training can get it? That's almost always true.

~~~
WillPostForFood
I'm was saying that the New Yorker usually has very high quality writing, but
this article wasn't.

------
wyc
How to Lie with Statistics is a classic on this:

[https://www.amazon.com/How-Lie-Statistics-Darrell-
Huff/dp/03...](https://www.amazon.com/How-Lie-Statistics-Darrell-
Huff/dp/0393310728)

~~~
wuch
Sadly this book gets the description of statistical significance completely
wrong. Not particularly surprising, given how unintuitive reasoning behind
p-values really is, in Jeffreys words: "What the use of P implies, therefore
is that a hypothesis that may be rejected because it has not predicted
observable results that have not occurred. This seems a remarkable procedure."

------
nl
This is a pretty bad article.

Firstly, the course is called _Calling BS in the age of Big Data_. That's a
big difference.

Secondly, Google fixed Flu Trends and that kind of undermines the article's
whole thesis:
[http://www.nature.com/articles/srep12760](http://www.nature.com/articles/srep12760)

One might almost say I called bullshit on this article.

~~~
YeGoblynQueenne
The article says that Google Flu Trends does worse than a "simple model of
local temperatures". From a very quick, level-1 read [1], the paper you link
to doesn't mention that simpler model. Instead, it compares Google flu to
previous versions of itself.

I guess you can say that Google _improved_ their flu model, but, "fixed"?
Also, I don't see that the article's "thesis" is "undermined". Sorry about the
scare quotes.

I mean that the article is taking small liberties to score a small point
against Big Data™ (and who else best to score them against, other than
Google?) but is that really enough to call bullshit on it?

I don't see anything misleading in the article is what I'm saying. So why
"bullshit"?

___________

[1] Read abstract and conclusions, eyball a couple of tables, scan the rest,
i.e. just enough to argue on the internet as if I know what I'm talking about.

~~~
nl
_The article says that Google Flu Trends does worse than a "simple model of
local temperatures"_

Indeed, that is what the article says. It's bullshit though.

You are right, though that "Google improved their flu model". No model is ever
"fixed" if that means 100% correct.

Flu trends worked well (much better than a "simple model of local
temperatures") except in the 2009 flu season, when it missed the A/H1N1
pandemic. It was then modified, and these modifications seem to have caused it
to estimate a pandemic in the 2012/13 season which didn't occur.

"simple model[s] of local temperatures" do work quite well as a baseline, but
they don't pickup pandemics either. However, in that 2012/13 season it would
have done better than flu trends. [1] is a good overview.

So this is complicated topic. I have a research team working on this exact
problem, and we'd love Google search data because there is no doubt that it
can and does work. But like all models it breaks down when something it hasn't
seen before occurs.

My bigger problem is with the thesis of the article. I'd summarize my reading
of that as "big data is BS", which is a more extreme form of their title "How
to Call BS on Big Data".

But the course this is based on _isn 't_ that at all. It's about understanding
how big data can be used to draw wrong conclusions, _NOT_ that big data is BS
in any way at all.

I think the course is a very important and useful thing. But what it is doing
is dramatically different to what this article claims, and the way that they
use the implied authority of the course to support their "big data is BS"
claim is what lead me to to say "bullshit".

[1]
[http://journals.plos.org/ploscompbiol/article?id=10.1371/jou...](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003256)

~~~
nickpsecurity
"No model is ever "fixed" if that means 100% correct."

One definition of broken for a proposed alternative to the status quo is if
the alternative under-performs it. Kind of makes one ask why anyone would
adopt it to begin with. There's a simple model using temperature that works
pretty well. Google's solution is said to perform worse than that with more
false positives. Google's isn't "fixed" or "working" until they show it
outperforms the simple solution that works with similar error margin and cost.

In other words, it isn't good until people would want to give up existing
method to get extra benefits or cost savings new one brings.

~~~
nl
_In other words, it isn 't good until people would want to give up existing
method to get extra benefits or cost savings new one brings._

They do.

The current state of the art methods used "in production" today all use Flu
Trends data from Google[1], other forms of digital data[2], ensemble methods
incorporating them all, or human-based "crowdsourced" forecasting[3]

 _One definition of broken for a proposed alternative to the status quo is if
the alternative under-performs it._

Which is not what happened here.

Here's the report into the 2014 CDC's Flu Forecasting competition:
[https://bmcinfectdis.biomedcentral.com/articles/10.1186/s128...](https://bmcinfectdis.biomedcentral.com/articles/10.1186/s12879-016-1669-x)

Note that there is no mention of the "simple mean temperature" model. That's
because it _isn 't very useful_. That model predicts flu increases in winter,
and picks up minor variations because of weather patterns.

To simplify even further, you can average all the CDC flu data and use that as
your prediction and on an average year you'll have a decently performing
model.

This isn't useful as a forecast, because the people who need forecasts already
know this.

Better models (eg, SI, SEIR, Hawkes process based etc) can sometimes pick up
epidemic or unusual conditions, but only _after_ the conditions have changed.
This is still useful, because there is a (best case) 2 week lag between ground
conditions and CDC data being available.

Digital surveillance techniques (Flu Trends, Twitter data, etc) all push that
data lag back.

This is incredibly useful for the people who need forecasts because it gives
them lead time.

To understand this you need to consider the metrics. The most common metrics
for flu forecasting is the "peak week", and "number of people infected at
peak". Sometime the total number of people infected in a season is also
reported.

Temperature-based models do really well on average at both these tasks, but
they fail completely at picking the unusual seasons.

 _Google 's solution is said to perform worse than that with more false
positives._

Google flu trends picked the 2009 epidemic season really well, but failed in
the 2013 season (when it falsely picked an epidemic). On average that might
make it worse than a temperature based model, but that is just bad selection
of metrics.

It's like reporting average income when your sample has a billionaire: the
metric is misleading.

If that isn't the perfect example of "bullshit" then I don't know what is.

[1]
[http://www.healthmap.org/flutrends/about/](http://www.healthmap.org/flutrends/about/)

[2]
[http://delphi.midas.cs.cmu.edu/nowcast/about.html](http://delphi.midas.cs.cmu.edu/nowcast/about.html)

[https://gcn.com/articles/2016/12/21/cdc-flu-
predictions.aspx](https://gcn.com/articles/2016/12/21/cdc-flu-
predictions.aspx)

------
privong
There was some discussion on this article ~20 days ago:
[https://news.ycombinator.com/item?id=14476474](https://news.ycombinator.com/item?id=14476474)

------
refurb
Back in undergrad, one of my toxicology courses spent a full class on just
this topic (calling bullshit on scientific studies). We went through 5
different papers from prestigious journals and identified the issues that make
conclusions shaky.

Fascinating stuff. Now whenever I see some extraordinary claim in a paper I
automatically assume it's wrong until proven otherwise. Sadly I'm often
correct.

~~~
tonyarkles
In my CS MSc program, my research group had a weekly paper reading group. This
seems to be a common thing. What I'm not sure about is how common the general
outcome was. We had one faculty member in the group that was _excellent_ at
identifying methodology errors and raising them as discussion points. I'd
guess about 1/2 of the papers we read, he was able to find something to pick
on. Sometimes small, sometimes enough to bring the whole paper into question.
Very great learning experience and helped me hone my bullshit detector.

------
chairleader
Any resources for learning the Fermi estimation techniques listed there? Seems
like a collection of complementary skills, each of which could be improved:

memorizing useful facts, selecting facts that lead to a meaningful estimate,
the mental math to compute the final result

[https://en.wikipedia.org/wiki/Fermi_problem](https://en.wikipedia.org/wiki/Fermi_problem)

~~~
sidlls
Mainly the list you stated.

Practicing basic arithmetic and judicious application of the distributive
property (much like decomposing complicated problems into smaller subproblems)
will take one very far in this sort of thing.

I was introduced to dimensional analysis in my high school physics class. We
generated an expression off by just a constant for some property (which I
don't recall) of a large scale dust cloud simply by identifying pertinent
quantities (e.g. density, the classical gravitational constant) and resolving
the powers each quantity must have in order to yield the correct units
(corrected due to below comments; thanks) of the property. It made an
impression on me, and I used the technique often as a guesstimate to
"motivate" or provide a calibration for a solution to various problems all the
way through grad school. It's not infallible, and can even be wildly
misleading, but it's a fantastic tool.

~~~
imcoconut
Could you elaborate on this? By "resolving the powers" do you mean magnitude
of interacting forces? And by correct dimensions do you mean spatial
dimensions?

~~~
btouellette
By dimensions he means units. So taking into account the units of density/the
gravitational constant (and any other pertinent quantities) and the units of
the quantity you are calculating for you can derive an approximate formula
just by looking at it and saying okay this unit needs ^2 and this one needs
^-1 and this one needs ^-3 for the end units to work out.

[https://en.wikipedia.org/wiki/Dimensional_analysis](https://en.wikipedia.org/wiki/Dimensional_analysis)

------
nikkettt
"Calling B.S." seems to be a snark way of saying "Applying the scientific
method".

------
laughfactory
There's a famous saying, I forget who said it: "In God we trust, all others
must bring data."

This suggests "data," as a thing, is infallible. Or that data holds _The
Truth_.

Problem is, as a data scientist, I've become very skeptical that either is
true. Not that data is useless, but mostly that if there's unequivocal truth
in data it will remain unfound because those searching for it operate under
such profound bias that they will be incapable of either a) finding the truth,
or b) recognizing it.

The better quote, which can be broadly applied to anything data-related, is:
"All models are wrong, but some are still useful."

Usually, I look at data as presenting only one side of the story. And models
as hopefully useful, if used with caution. The proof is always in the pudding:
do actions derived from our understanding of the data yield results? If "yes"
then our understanding of the data contains some difficult to quantify level
of truth. Do our classification, clustering, and prediction techniques work?
If "yes" than our models reflect some of the truth (never all of it).

In my six years since college, and going on three as a data scientist, I've
become convinced that intentionally (or not) a great deal of analysis and
modeling (including machine learning models) is fundamentally wrong. Sometimes
because the practitioner, with the best of intentions, screwed up (all too
easy to do), and often because the practitioner used the data to tell whatever
story they wanted to. You can usually manipulate any given data set into
giving the answer you, or your boss, or your boss's boss, thinks is the
"right" answer. And even if you come to the data with the purest intentions
you'll often find "the truth"\--only to have application and time prove it
wrong.

My assessment: data is slippery, and often like wrestling snakes. Or, it's the
modern version of panning for gold. We can make ourselves, or the business,
much richer when we find those rare nuggets within the data which prove, with
application and time, to reflect some measure of truth. The proof is always in
the pudding.

------
jupp0r
Only semi-related but nonetheless interesting:

Calling BS on people claiming to do big data. Ask them how big exactly their
data is and then point out that it fits on one machine, so it's not "big".
This is independent of their usage of Hadoop.

~~~
nickpsecurity
There's even a standard model for that kind of thing:

[http://www.frankmcsherry.org/graph/scalability/cost/2015/01/...](http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html)

------
jgalt212
I look forward to their piece on blockchain.

------
BrandoElFollito
Guessestimates are fantastic, this is one of the first things I always wanted
to instill in my students (physics).

This until you hit exponential events when guessestimating the exponential
part leads to catastrophes. It is worth keeping in mind that the more linear
something is, the more useful the guessestimate will be.

------
yetanotheraccnt
For an interesting philosophical take:

[http://journals.sagepub.com/doi/pdf/10.1177/2053951716664747](http://journals.sagepub.com/doi/pdf/10.1177/2053951716664747)

------
refurb
A great book along these lines is called "The Halo Effect". More focused on
business books than anything, but it's along the same lines.

------
gaius
Once you understand Simpson's Paradox and Anscombe's Quartet, you will simply
never believe any statistics that anyone shows you ever. Infact you will
probably never even believe your own calculations, and that's a good thing if
it keeps you on your toes.

~~~
a_imho
I only believe in statistics that I doctored myself

------
MagnumOpus
Misleading clickbait title - the article delivers on "how to call BS on
statistical claims" (and does that well), but it has nigh-on nothing to do
with Big Data.

~~~
cwyers
I think this is part of a trend on HN called "defining clickbait down."
Clickbait is deliberately withholding information or writing a headline in
such a way as to prompt a click to figure out what the article is about. "10
Celebrities You Never Knew Were Gay, #8 Will Surprise You!" And at least one
of the celebrities is Ellen DeGeneres. "You Won't Believe What Happens When
This Homeless Man Doesn't Have Money For Food!" Someone buys him lunch. That's
clickbait. The title to this is just a title. It's not clickbait.

~~~
ouid
Clickbait is whenever your title has less information than your article.

~~~
kgwgk
If the title contains all the information in the article, that's a pretty bad
article.

------
louithethrid
Strange memories on this nervous night in Big Data. Has it been five years,
six? It seems like a lifetime, the kind of peak that never comes again.
Datascience in the middle of '10 was a very special time and place to be a
part of. But no explanation, no mix of words or histogram or memories could
touch that sense of knowing that you were there and alive, in that corner of
time in the world, whatever it meant.

There was madness in any direction, at any hour, you could strike sparks
anywhere. There was a fantastic universal sense that whatever we were doing
was right, that we were winning.

And that, I think, was the handle. That sense of inevitable victory over the
forces of Old and Evil. Not in any mean or military sense, we didn't need
that. Our energy would simply prevail. We had all the momentum, we were riding
the crest of a high and beautiful wave.

So, now, less than five years later, you can go on a steep hill in
Buzzwordvill and look west. And with the right kind of eyes you can almost see
the High Water Mark. That place where the wave finally broke and rolled back.

------
visvavasu
Thanks

------
alexeiz
A BS article about calling BS on BS. Also, this comment is a total BS.

