
Too Good to Be True: How More Evidence Can Decrease Confidence - mhb
http://marginalrevolution.com/marginalrevolution/2016/01/too-good-to-be-true.html#comments
======
Homunculiheaded
There is a similarly interesting result in ET Jaynes "Probability The Logic of
Science" (chapter 5). Where Jaynes demonstrates that increased evidence
actually decreases your belief in a hypothesis.

Jaynes gives an example of an experiment in which a psychic predicts cards,
getting n out of m correct. In classical hypothesis testing H0 would be "the
psychic got lucky" and H1 is "the psychic has mystic powers". Jaynes first
attempts to use Bayesian reasoning to work backward to determine your prior
belief in psychics. That is, how much evidence would it take to convince you,
compare that to the raw likelihood and what you have is your prior beliefs
quantified.

He then points out that he personally would never believe the subject was
psychic, this is because there are not just H0 and H1, but H2 "the psychic is
deceiving the experimenters", H3 "the experimenters are making an error" , H4
"the experiments are fudging the results", H5 ... Each of these has their own
prior.

If your prior for believing in psychics is low enough and your prior for "the
experimenters are fraud" is high enough, the more extreme the evidence the
more you will be convinced that the experimenters are disreputable con
artists, and subsequently the less you will believe the subject is psychic.

This is actually Jaynes' solution to a huge problem with Bayesian reasoning as
a form of human reasoning: "If more data should override a bad prior, then why
in the 'age of information' does nobody argee on anything!" This example
shows, according to Jaynes, that while we can certainly have irrational priors
we can still explain human reason in Bayesian terms and still get a situation
where two people faced with plentiful information will arrive at contradictory
conclusions.

~~~
im2w1l
I don't like wording "arrive at contradictory conclusions", because it implies
the creation of differences in beliefs.

I'd rather frame it as "maintain contradictory preconceptions". And the cause
of this I would say is a _lack_ of information: Information capable of
distinguishing between H1-H5

~~~
im2w1l
Or... now I am not that sure anymore. What if Person A believe in H0:97%, H1:
2%, H2: 1%, and Person B believe in H0:97%, H1: 1%, H2: 2%. They have quite
similar beliefs, at least by some measures. But if H0 is ruled out, then
suddenly their beliefs will be very different.

~~~
marvy
How does this contradict your earlier comment? I think it's still valid:
Person A believes H1 (real psychic), Person B believes H2 (talented fraud).
The question now is how to get them to agree. And the answer is: design a
better experiment, such that fraud is not practical, and see what happens.

------
bsbechtel
>>What Gunn et al. show is that the accumulation of consistent (non-noisy)
evidence can reverse one’s confidence surprisingly quickly.

I know this comment won't be accepted very well here on HN, but I think this
is one of the reasons most people who don't believe in climate change don't.
Science, and the subsequent reporting on it, has been consistently reporting
for the past 20 years that climate change is real, and it's worse than we
thought. Every few months, there is a new study being widely reported saying
that X new evidence is proving that things will be much worse than we thought.
If there was a study every once in a while that said X study about climate
change was wrong, and we didn't consider Y factor, but climate change still
seems to be headed in a dangerous direction, then I think people might be more
willing to accept the thesis.

Of course, the paper doesn't suggest that we are wrong when we suspect
systemic error, and skeptics of climate change may very well be wrong, but
maybe this will help others understand their skepticism.

~~~
Symmetry
I think you're mixing up the science itself and the reporting on it. If a
paper was published saying that sea level rises were going to be 20% lower
than expected I wouldn't expect most people would ever hear about it. Or if
they did it would be reported as showing that sea level rise was even more of
a threat than we had believed.

If you want the truth the IPCC reports area great for providing the scientific
consensus along with the (large) uncertainty we have about exactly what
climate change will look like.

~~~
bsbechtel
That's a fair point, although my point was more about the general population's
perception of climate change and how it is shaped. That perception is most
definitely shaped by the consistent reporting the general media does on
climate change, which filters out the messy noise found in the actual hard
science.

------
muxme
I'm running into this problem on my startup website
([http://muxme.com](http://muxme.com)). I've found that the more evidence I
provide that the site is real, the more people start to question it. I.e. I
post pictures of the receipt of purchase, open source the raffle code script,
have a live drawing, post the winner's usernames (which reveal quite a bit
about them if you google them, I encourage them to change their usernames),
post USPS tracking numbers, post my phone number and address, and everyone
still thinks the site is a scam. Out of every company / website I've worked
on, I've never had people so skeptical of giving a name, email address and
phone number. I've made websites where I charged money and had better growth!

This is some real feedback I've received:

 _" It sounds like a scam to me. If you want to try to scam people by getting
them to click on your fake website, you should have enough common sense not to
use the website as your username. Better luck next time!!!!"_

 _" Scamming people on Christmas Eve using someone's else's platform. Tacky."_

But yeah... this site isn't working. I'm thinking of scrapping it and writing
an app/browser extension that automatically enters you into sweepstakes. Let's
say Person A enters 10 sweepstakes on different websites. Person B enters 5
sweepstakes on different websites. When person C downloads the app, he will
automatically be entered into the sweepstakes that Person A and Person B
entered (via my magic backend system that I have yet to create). The more
people that download the app and enter sweepstakes, the more sweepstakes
everyone with the app will automatically be entered into. Now people will
start winning prizes for literally doing nothing but downloading the app.

~~~
stordoff
I can't quite put my finger on the reason, but I can see why people are
sceptical. Few possible reasons, which may or may not be large factors:

\- Non-obvious business model - It's not obvious how you profit from the
raffles, whereas charging money is generally easier/quicker to understand

\- Use of memes (e.g. [http://muxme.com/giveaways/56-25-toys-r-us-gift-
card](http://muxme.com/giveaways/56-25-toys-r-us-gift-card) ) doesn't inspire
confidence (may be a personal quirk of mine)

\- Layout/design - My initial reaction when the page loaded was "Oh, it's a
penny auction site", which are known for being borderline scams. Obviously it
isn't, but that gut reaction is difficult for you to overcome (I'm thinking of
sites like swoggi.co.uk when I say penny auctions)

~~~
muxme
Only problem is charging money is illegal. I was thinking of "implying" that
you may have to pay one day, and having the register button say "Sign up for
your free trial", which will be an infinitely long free trial. But then again,
the site is all about trust, and you've already tricked them from the start
with this method.

Yeah I tried the approach of a quirky, 1 owner website, and it didn't work.
I'm still trying to clean up all the jokes, memes, etc.

Hmmmm, yeah I can see how a first impression is that the site is a penny
auction site. I definitely need to rethink the front end design and UX flow.
Maybe don't even show the prizes until they've registered.

~~~
vsync
Yeah the giant table of "win this! win that!" seems enormously scammy. Maybe
because scam ads just display something so similar. And, key thing with yours
and theirs, no context and leaves me wondering "what's the catch?".

Even if you just added a short paragraph explaining the concept (since
otherwise when I get to your site I look around to see what it does, and see
nothing but scam ads all over = scam site; if you actually say what the site's
for the prizes have context) and how it works (otherwise I'm like, "sure,
effectively free cash, yeah right" = scam site), and then have the prize list,
it would make a big difference in first impressions. Maybe have any list of
prizes on the front page not be a countdown of upcoming prizes, but a list of
selected featured prizes ("including these brands!").

------
Lazare
Another way of thinking about it is if your performance metrics, error logs,
or unit tests are _always_ 100% green/perfect with no failures or spikes or
anything, then it's probably a pretty good sign that your metrics/logs/tests
are broken.

------
bitwize
One of the ways in which I can spot a scam is to look it up online. If _all_
of the reviews of the product are glowingly positive (and use similar
language), then I start looking for a pyramid scheme or something driving
sales.

~~~
jacquesm
The error there is that people assume that online reviews are by real,
independent reviewers that have bought the product with their own money.

Sampling online reviews does not at all have the same kind of confidence
levels as independently interviewing actual users of a product.

~~~
bitwize
Here's the thing: In a multilevel marketing scheme, _even the word of the
product users_ is unreliable. This is because MLMs recruit their end users as
a sales and marketing force that works, essentially, for free -- with the
promise if big riches if you sell enough in the program.

You know "The Secret" and all that positive thinking Aladdin's genie hogwash?
That's actually the public facing side of MLM propaganda. Internally, MLMs
pump their customer base up with big promises about how if you think happy
thoughts you are guaranteed to fly with the hidden (or even explicit!) threat
that even the tiniest sad thought will send you crashing to the ground.

And this extends to what people are allowed to say. The key word is "edify".
You're supposed to "edify the product" (talk about how great it is) and "edify
your upline" (talk about what a great guy/gal the person who introduced you
into the program is). This is often backed up with threats and verbal
intimidation and abuse. (e.g., "Only losers say things like that. Losers and
quitters. You're not a loser, are you?" and it gets worse from there.)

So for some of the most prevalent scams, objectivity can't even be expected
from the end purchasers.

------
anotherhacker
Mo' Data, Mo' Problems.

The New Coke flop is a perfect example of this.

New Coke was the result of the largest marketing / consumer research project
ever. The conclusion of this research was to change Coke's formula. Coca-Cola
Chairman Roberto Goizueta claimed that the decision was “one of the easiest we
have ever made”. Coca-Cola thought this way for two reasons:

#1 Research was done with a sample of over 200,000 customers.

#2 Coca-Cola’s researchers triangulated the validity of their data with a
mixed-method approach. They used focus groups, various surveys, and individual
interviews.

All these data & research did the opposite of what they were supposed to.
Their large sample size gave them lots of useless data. Data
triangulation—which was supposed to safe guard them—did the opposite: it
convinced them that their useless data, were useful.

Why does this happen?

In an unnatural system, variance is unbounded. As your data set grows, the
unbounded variance grows nonlinearly compared to the valid data. As variance
increases, deviations grow larger, and happen more frequently. Spurious
relationships grow much faster than authentic ones. The noise becomes the
signal.

~~~
anotherhacker
Downvoted? I gave a perfect real world example that happens in product
development and marketing research.

------
gtonic
Also see the publications of Dr. Gerd Gigerenzer about this topic, e.g. in his
book 'Gut Feelings Intelligence'.

------
SixSigma
It's called equivocal data. It is known to management circles, it's the part
of management that requires experience and judgement rather than facts.
Computers can replace decisions based on unequivocal data.

------
peter303
Maybe later in this century we'll have the alternative of objective Trial By
A.I. instead of by peers. Lawyers wont like this because they win by
manipulating humans.

~~~
jessaustin
I suspect hackers will be more supportive of this reform.

------
tabbott
Sigh. This article describes a mathematical analysis that is more an artifact
of the problem setup than a proof that more evidence decreases confidence. In
particular, what's going on is that it assumes that while actual witnesses
have a probability of e.g. 50% of convicting, biased witnesses have a
probability of 100% of convicting(!), which is an incredibly strong assumption
(and in particular, if you changed 100% to anything else, one of the headline
results they advertise of unanimity being suspicious and near-unanimity not
being suspicious disappears).

Those values of the parameters don't make sense for many real-world scenarios
like picking a subject out of a lineup (you'd have to have a really really bad
lineup to have witnesses be guaranteed to pick the same wrong person!) or to
the output of a panel of judges or jurors.

However, the result is relevant in scenarios where it's very likely that you'd
have a 100% likely error, e.g. you're doing an experiment and your apparatus
isn't functioning properly or you have a bug in your computer program doing
the data analysis. So it's relevant for physicists and computer programmers,
but probably not to the criminal justice system.

For those curious, the math is quite simple. They model a world where one of
the following happens: * probability p: compromised experiment, in which case
fraction y (y=1 in the paper) of witnesses report guilty * probably 1-p:
normal experiment, in which case fraction x of witnesses report guilty

Given that, the probability that 100% of N witnesses agree the person is
guilty in each case is:

prob(unanimous and compromised) = p * y^N prob(unanimous and not compromised)
= (1-p) x^N ~= x^N (since p is assumed small)

So the probability that the experiment is compromised, given a unanimous
outcome, scales with their ratio, p y^N / x^N = p * (y/x)^N. A few things you
notice: * If y <= x (as you'd hope to be the case in a lineup where the bias
is less convincing than having actually seen the criminal!), unanimity is
never suspicious. This is the usual world where more evidence should increase
your belief in a hypothesis! * If y > x, you have exponential growth and so
with sufficiently large N you'll see unanimous results are almost always the
result of a compromised scenario. However, unless y/x is large and y is close
to 1, the actual probability of unanimity is basically 0 so this will
essentially never happen. * If y=1 and x << 1 (the case they analyze), with
large N, unanimity is suspicious (but if you change y to 0.95, near-unanimity
is, too, so ). This is a case very relevant in software, but less relevant in
the criminal justice system.

They picked y=1 and x=0.5, which means roughly you start to see a big effect
of this form when N > log_2(1/p), which for their ranges of p from 0.01 to
0.0001 means N ~ 15-30 range.

Unfortunately this is part of a much broader trend of people making bold
claims about society based on the results of either a study with a sample a
freshman psychology class that achieved 95% confidence (just like ~5% of the
thousands of other studies done on psychology classes!) or a mathematical
analysis of a confusingly designed problem that doesn't reflect well the broad
swath of reality its authors (or the people reporting on it to a wide
audience!) are claiming it describes.

~~~
anotherhacker
You're correct in attacking the math the authors use. It's clear they are
bending the math to fit the phenomena. However, that doesn't invalidate the
hypothesis.

This paper appears heavily influenced by Nassim Nicholas Taleb: systematic
failure, bounded vs unbounded variance, (more data) => (more noise) => (more
spurious relationships), natural vs artificial systems, and even the reference
to the heuristics of ancient cultures.

It looks like the authors were trying to apply math to observed phenomena.
They may not have the right math to make it work-so they fudged it a bit.

Nassim talks about similar phenomena here:

[http://arxiv.org/a/taleb_n_1.html](http://arxiv.org/a/taleb_n_1.html)

------
throwaway33817
I ran across a very interesting example of this recently. In a new tab take
some time to analyze this chart for yourself and what you think about it.

[http://geekologie.com/2009/03/percent-of-student-virgins-
per...](http://geekologie.com/2009/03/percent-of-student-virgins-per.php)

... done yet?

So, firstly I see that as you go toward more rigor, overall the rate goes up,
with studio art (no rigor, it's just a humanities field) at the left and
mathematics (full rigor, fully abstract) at the right. Like this -
[https://xkcd.com/435/](https://xkcd.com/435/)

Interestingly, computer science (of interest to people here) actually isn't as
far right as you'd think.

But what pops out to everyone!! There is obviously one MOST interesting fact
in the whole figure, the 0 for studio art. That really got me thinking.

Granted, this isn't the perfect example because it's not just 0 as opposed to
5-7 - the next higher one is 20%.

But what you really start thinking is stuff like - "Is it that they didn't
really have many subjects, just 2 or 3? what is n? Is it that these people are
lying? Is it a real fact?" and so forth.

The fact that it's 0 instead of a few percent immediately raises some
questions.

