
More data can make you more wrong - da02
https://www.spectator.co.uk/2016/08/how-more-data-can-make-you-more-wrong/
======
glangdale
The University of Chicago study is interesting and covers nuances not really
dealt with in the article (naturally):
[http://www.pnas.org/content/113/33/9250.full](http://www.pnas.org/content/113/33/9250.full)

with a summary here: [https://newschicagobooth.uchicago.edu/newsroom/problem-
slow-...](https://newschicagobooth.uchicago.edu/newsroom/problem-slow-motion)

The key point neglected in the original article is that the jurors were being
asked about _intent_ and premeditation in the slow-mo study. My knee-jerk
reaction when first reading this article was to assume that the jurors might
have just been able to conquer reasonable doubt more with slo-mo, but the
questions seemed well designed to isolate the fact that slo-mo is creating a
false narrative (even when the viewers can see the real-time clock clicking
over at a very slow rate).

It would be interesting to see whether viewers who got to see both clips would
still have the same perception, and whether ordering matters.

~~~
ballenf
The article states that jurors who saw both were slightly more likely to
convict. That is, the effect of the slow-mo was reduced if they also saw the
real time.

It doesn't address the ordering question.

Thanks for the links. A lot of data there. Hope I don't reach the wrong
conclusion. ;)

------
Silhouette
The essential argument here also applies to any automated decision-making
process, and that is one of the big risks we face in adopting ever more
technology to help run our businesses and governments.

If we're not careful, we'll take all the old problems caused by unjustified
discrimination based on factors like gender or skin colour, which today we (at
least most of us) would consider irrelevant to making most decisions, and
multiply them up many times over to apply to each input into an automated
decision-making process. How do machine learning and statistical analysis
tools distinguish between a causal relationship and mere correlation? And how
often will they actually demonstrate the latter, yet be treated as if they had
found the former?

~~~
Banthum
>"How do machine learning and statistical analysis tools distinguish between a
causal relationship and mere correlation?"

They have no reason to make such a distinction.

If you're trying to look for subjects of with a particular property X, but X
is hidden, but you know property Y correlates with X, the correct action is to
choose subjects with property Y. The causative relationship (or lack thereof)
between X and Y is irrelevant.

As a simple example, if you're hiring people to carry 80-pound bags of grain
between trucks, you want the property of physical strength. But, if you only
have resumes to go on, you can't observe physical strength directly. But,
gender correlates with physical strength, and names correlate with gender, so
you'd be rational to choose the resumes with recognizably male names.

The reason we avoid discriminating between various protected groups isn't
because groups don't have characteristics. If that was the case, such
discrimination would be pointless and people would not bother doing it because
it would have a cost but no benefit. Discrimination based on hair color is
like this. Nobody does it and we don't need campaigns to stop it.

In reality, groups of people do have characteristics - physical,
psychological, emotional, intellectual, cultural. However, we avoid
discriminating between protected groups because we believe it's wrong to
subject an individual to that kind of discrimination, even if such
discrimination would be rational for the discriminator. Such discrimination
violates our western ideals of individual rights and equal individual
opportunity.

~~~
Silhouette
_However, we avoid discriminating between protected groups because we believe
it 's wrong to subject an individual to that kind of discrimination, even if
such discrimination would be rational for the discriminator._

I don't think most of us would have a problem with discriminating on an
_inherent_ property of a group that is _relevant_ to the decision being made.
Indeed, often neither do anti-discrimination laws. Such discrimination can be
objectively justified.

For example, consider a job where a certain level of fitness is a functional
requirement, say a firefighter whose role includes being able to carry someone
out of a burning building. Setting a relatively high bar for physical strength
is going to discriminate against female job applicants and those with various
physical disabilities as their respective groups, but it's not discriminating
against someone because they're female or in a wheelchair, it's discriminating
against them because they can't carry someone out of a burning building and so
that person is going to die. I don't think most of us would have a problem
with this kind of rule. The only thing unfair here is life not making us all
equal, and there isn't much we can do about that.

The problems usually start when either you discriminate against a whole group
based on some property that is somewhat correlated with membership of the
group but not inherent, or you discriminate against a whole group based on a
property that is inherent but isn't actually relevant to the decision being
made.

For example, if a job involves sitting at a desk and using a computer all day,
a woman or someone in a wheelchair can presumably do that just as well as our
hypothetical strong male. In this case, discrimination on the basis of gender
or physical strength simply isn't relevant to the decision being made but
sadly does sometimes happen because of individual prejudice, and thus we have
laws to protect those in more vulnerable situations.

Unfortunately, the kinds of tools and mathematical analyses we're talking
about here don't necessarily see those situations any differently. They may
learn the wrong lesson if there are coincidental, misleading correlations in
whatever data is used to train them, which brings us back to where we came in.

------
mgraczyk
The two issues brought up (obstruction in cricket and the murder trial) say
more about rules than they do about evidence.

In the cricket case, the problem seems to be that small differences in a
player's reaction time can make the difference between acceptable and
unacceptable actions. Why should it matter if the player intentionally blocked
the ball? Justice is much more difficult when rules depend on precise
subjective judgement.

The same is true of the murder case. Clearly the robbers committed a second
degree murder. In the hypothetical cases, was the prosecution pushing for
first degree murder instead? Why rely on precise subjective judgement?

This like trying to decide from a grainy photo whether a character is an 'A'
or a '4'. It's an unavoidably difficult problem, but in the two cases above,
we can change the rules to avoid it.

~~~
ballenf
Intentionality plays a role in most criminal definitions. The exceptions are
the strict liability cases like statutory rape or DUI (just to name a couple).
I think the law looks to intentions because it's very human to base our
judgments on our sense of the person's intent. Beyond me why it developed
evolutionally (or if it's just a very recent phenomenon), but we have a soft
spot for good intentions.

As to the case at hand, I have similar questions. I would think in the U.S.
this would have just been a felony murder case where intentions are
irrelevant. Agree that a first-degree murder charge in a robbery case would be
odd (unless the robbery was a cover story for the planned murder??).

As for sports, the pendulum seems to shifted in the other direction with most
rules of the 'strict' variety. And I would agree with you that any rule that
can be formulated as a strict rule, should be.

~~~
slededit
In a world filled with randomness intent is the only thing we can control. I
think the trend towards statutory crime comes from the fact we feel we've
tamed much of the world. And therefore what occurs is most likely someone's
fault - instead of an act of god.

~~~
ballenf
Very astute observation. Allowing for bad acts or bad outcomes without a
scapegoat (because _accident_ ) makes us feel less in control of our world.

------
zyxzevn
Most of this story is about slow-motion in videos, not about the amount of
data.

To add to it, data-gathering by NSA can indeed lead to the wrong conclusions.
Or statistics in scientific experiments that are wrong, due to selecting only
a part of the data-set.

~~~
ballenf
There's a fair bit of discussion of crime scene data gathering beyond video:
CSI-type stuff. Makes the point that when you have a big budget to gather a
bunch of data, you have a lot more leeway to paint a narrative supporting your
theory.

Would have liked to have been given a more concrete example of this, because I
don't have a strong enough imagination to really understand that scenario.
Finding the suspects fingerprints in more places around the crime scene? I
don't know, I guess that could help convict but could see it just as easily
distracting the jury and muddling the case.

------
MidoAssran
Really interesting, it's like overfitting applied to human decision making!

------
fixxer
This doesn't strike me as having anything to do with data volume. This is much
more about exploiting cognitive biases. This is not a new thing. If anything,
the subject is a classic one.

------
meigwilym
Slow motion video isn't _more_ data, it's the same data just presented in a
different way.

~~~
pedrocr
There's a limit to what we can absorb in real time so slow-motion will tend to
also generate more data ingestion. But most of the effect should indeed be
from our "intent sensing neural nets" being trained for real time speed and
going awry when fed slow motion replays. A big part of figuring out if
something is intentional in something as fast paced as sports will always be
"was there enough time for a human to react?".

------
kwhitefoot
The title is clickbait. The subtitle makes sense though.

~~~
actuallyalys
Clickbait doesn't mean any remotely catchy title. It means sensational and
misleading titles. More data _can_ make you more wrong and that's exactly what
the article is about.

~~~
0xcde4c3db
Agreed. A clickbait title for this article would be something like "The
Shocking Reason that Big Data is Failing" or "Think data leads to good
decisions? Think again.".

~~~
ballenf
or "Slow-Mo Killed the Video Star"

------
known
[https://en.wikipedia.org/wiki/Pareto_principle](https://en.wikipedia.org/wiki/Pareto_principle)

------
peteretep
Be careful with this source; as articulate and urbane as much of the writing
is, it's the UK's Breitbart.

~~~
te_chris
Bullshit. I'm often in disagreement with The Spectator but that is a
ridiculous slur.

~~~
peteretep
I was a print subscriber for two years, read every article in every issue, and
that was my conclusion.

