
Could It Be? Spooky Experiments That 'See' The Future - zafka
http://www.npr.org/blogs/krulwich/2011/01/04/132622672/could-it-be-spooky-experiments-that-see-the-future
======
garyrob
See [http://commonsenseatheism.com/wp-
content/uploads/2010/11/Wag...](http://commonsenseatheism.com/wp-
content/uploads/2010/11/Wagenmakers-Why-Psychologists-Must-Change-the-Way-
They-Analyze-Their-Data.pdf) ("Why Psychologists Must Change the Way They
Analyze Their Data: The Case of Psi".) It demolishes the study referred to in
the npr article.

~~~
nervechannel
Just reading it now -- seems like a very thorough takedown of the whole thing,
in fairly non-technical language.

It makes the very good point that this paper makes lots of statistical tests,
and then bases big claims on the small minority that showed a significant
effect. This is in no way restricted to psychology, drugs companies for
example do this all the time. It's cheating, whether you realize it or not.

Statistical significance only tells you that a result is unlikely to be a
fluke -- not that it definitely isn't a fluke -- but the more tests you do,
the sooner you'll see a fluke on average.

~~~
CWuestefeld
_the more tests you do, the sooner you'll see a fluke on average_

In other words, if you toss a coin 1,000 times, then it's hideously unlikely
that you'll see a run of 100 consecutive heads. But if you toss the coin
100,000,000 times, you shouldn't be too surprised to see that 100-toss run
buried in there somewhere, even though the odds of getting 100 in a row are so
small.

Right?

~~~
beagle3
Not really.

If you toss 100,000,000 == 2 ^ 27, you should only expect around 30 in a row.
To have a good chance of getting 100 in a row, you need about a billion
squared times more.

And the problem is MUCH worse than described above: Let's say you test 1000
wrong hypothesises with p=0.05; 50 of those will be accepted as true, even
though all are wrong. If you test 980 wrong hypothesises and 20 right ones,
more than half of those that pass the p=0.05 "golden" significance test will
in fact be wrong.

Now, when you see a medical journal with 20 articles using p=0.05, which do
you think is more probable - that 19 are right and one is wrong, or 19 are
wrong and one is right? The latter has a much higher likelihood.

~~~
phob
This is why we replicate studies. Unreplicated parapsychology results are a
dime a dozen.

~~~
beagle3
Who are "we"? Physicists are the only ones who still do as far as I can tell
-- everyone else prefers to spend their research budget somewhere else.

~~~
nervechannel
Clinical researchers too. Because lives are at stake.

The whole field of systematic reviews and meta-analyses has developed around
the need to aggregate results from multiple studies of the same disease or
treatment, because you can't just trust one isolated result -- it's probably
wrong.

<http://en.wikipedia.org/wiki/Meta-analyses>

Statisticians working in EBM have developed techniques for detecting the
'file-drawer problem' of unpublished negative studies, and correcting for
multiple tests (data-dredging). Other fields have a lot to learn...

~~~
beagle3
Clinical researchers working for non-profits / universities do, occasionally.
I suspect it has become popular recently not because lives are at stake, but
because it lets you publish something meaningful without having to run
complex, error prone and lengthy experiments.

Regardless of the true reason, these are _never_ carried out before a new drug
or treatment is approved (because there is usually one or two studies
supporting said treatment, both positive).

And if you have pointers to techniques developed for/by EBM practitioners, I
would be grateful. Being a bayesian guy myself and having spent some time
reading Lancet, NEMJ and BMJ papers, I'm so far unimpressed, to say the least.

------
nostromo
"Extraordinary claims require extraordinary evidence" -- and I wouldn't count
these small studies as extraordinary.

Unfortunatley, publishing these kind of claims prematurely help the more
gullible among us to fall for ridiculous claims from psychics and others who
would take advantage of them. (The authors of "The Secret", I'm looking at
you.)

Commenters on NPR's website (not exactly the dumbest audience online) have
already shown this problem; "All of you criticizing this need to open up your
minds" and "The future, as well as the past, influence our dreams."

~~~
yummyfajitas
_Unfortunatley, publishing these kind of claims prematurely help the more
gullible among us to fall for ridiculous claims from psychics and others who
would take advantage of them._

True. But on the other hand, publishing ridiculous claims and incorrect
results is a necessary part of science.

When we publish only results we know to be correct, because they agree with
mainstream beliefs, we introduce a bias into the scientific process. In
reality, if you publish 20 experiments with p=0.05 [1], 1 of them should be
incorrect. If less than 1 in 20 of your papers isn't wrong (assuming p=0.05 is
the gold standard), you are not doing science.

You can see a perfect illustration of this when people tried to reproduce
Millikan's oil drop experiment. I'll quote Feynman: _Millikan measured the
charge on an electron...got an answer which we now know not to be quite
right...It's interesting to look at the history of measurements of the charge
of an electron, after Millikan. If you plot them as a function of time, you
find that one is a little bit bigger than Millikan's, and the next one's a
little bit bigger than that, and the next one's a little bit bigger than that,
until finally they settle down to a number which is higher.

Why didn't they discover the new number was higher right away? It's a thing
that scientists are ashamed of - this history - because it's apparent that
people did things like this: When they got a number that was too high above
Millikan's, they thought something must be wrong - and they would look for and
find a reason why something might be wrong. When they got a number close to
Millikan's value they didn't look so hard. And so they eliminated the numbers
that were too far off, and did other things like that..._

This is why I'm an advocate of accepting/rejecting scientific papers based
solely on methodology, with referees being given no information about the
conclusions and with authors being forbidden from post-hoc tweaks. You do your
experiment, and if you disagree with Millikan/conclude that ESP exists, so be
it. Everyone is allowed to be wrong 5% of the time.

[1] I'm wearing my frequentist hat for the purposes of this post. Even if you
are a Bayesian, you should still publish, however.

~~~
Eliezer
If you're going to use highly subjective frequentist statistics at all, p <
0.001 should be the _minimum_ gold standard for extraordinary claims. If the
phenomenon is real, and not bad statistics, it only requires two and a half
times as many subjects to get p < 0.001 instead of p < 0.05. Physicists, who
don't want to have to put up with this crap, use p < 0.0001. p < 0.05 is
asking for trouble.

~~~
bigfudge
Ok, then let's funding psychology like we fund physics. I would love to run
1000+ patient studies to test psychotherapies, and in fact we'd be able to
answer some really interesting questions if we did, but there is currently no
way of doing this.

~~~
Eliezer
I repeat, you do not need 1000 times as many subjects to get results that are
1000 times as significant! If 40 subjects gets you results with p < 0.05, then
100 subjects should get you results with p < 0.001. Doing half as many
experiments and having nearly _all_ the published results being real effects,
instead of most of them failing to replicate when tested, sounds like a
_great_ tradeoff to me.

And I suspect the ultimate reason it's not done this way... is that scientists
in certain fields would publish a lot fewer papers, not slightly fewer but a
lot fewer, if all the effects they were studying had to be real.

~~~
saurik
"1000+", not "1000x". Also, I'm assuming bigfudge was talking about p <
0.0001, given the comparison made to physicists.

~~~
bigfudge
Yes - thanks. The current norm for a 'suitably powered' trial of a
psychotherapy is about 300. We've just got a trial funded for that number
(admittedly in a challenging patient population) which will cost about £2.5m
in research and treatment costs. We would love to run 1000 patients and start
looking at therapist-client interactions, individual differences in treatment
suitability but that's out of the question.

------
maxklein
This is trivial software to write. Why does someone not quickly whip up a web
app that does the picture thing as described in the experiment, then we can
personally test if we have ESP senses or not?

Specs: Two buttons - ESP mode or non-ESP mode. In non-ESP mode, 60 random
pictures are shown and we are to guess. Then it gives the correct one. In ESP
mode, add some porn. If the results are different, then we have ESP. Use
Javascript for the randomisation algorithm so that we can be sure there is no
server trickery being done.

~~~
Fargren
Javascript random is only pseudoalleatory. I think you need a true random
number generator to truly test this.

~~~
endtime
Why would psuedorandomness be insufficient? Don't try to hack the JS and the
effect should be the same.

~~~
Fargren
If it's pseudorandom, that means there's some form of pattern in the numbers
it generates. It's impossible (or at least unreasonably hard) to know whether
the person doing the test may subconciously guess or estimate the pattern.
That invalidates the whole test.

~~~
endtime
I suppose that's true, though if someone can intuit a crypto-strong PRNG then
that's an interesting and significant result in itself.

~~~
nl
Javascript random() isn't a crypto-strong PRNG. Depending on how _any_ PRNG is
seeded, guessing it's output may not be very hard at all - especially if
you've seen it once before.

Eg, I once had a Poker playing game on an Amstrad that used a PRNG with a very
predicable seeding strategy. I could amaze my friends by knowing exactly what
cards I would be dealt.

~~~
endtime
I'm not suggesting that JS's random() is crypto-strong. I was replying to this
part:

>I think you need a true random number generator to truly test this

------
mcritz
I knew this comment would be up-voted before I wrote it.

------
bluekeybox
Extraordinary claims require extraordinary evidence. The classic 5% (or even
0.1%) statistical threshold is sometimes not enough. See here for an easy-to-
understand example of why that is the case:
[http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_...](http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer)

In other news, capital punishment has been installed for science journalists
publishing articles that contain a question in the title that can be
succinctly answered with "No."

~~~
gfodor
Can someone please, please explain this to me? I've never understood why the
oft stated line "extraordinary claims require extraordinary evidence" is
anything other than a clever saying. Why should it be that things that follow
your intuition require any less rigor to prove than those that do not, and
vice versa? Presumably there should be no subjectivity to cold hard science;
evidence is evidence, and a certain quantity of evidence should reflect fact
equally well regardless of how unusual that fact is.

edit: just to note, nowhere in constructing a statistical test is it required
that the creator decide how "extraordinary" the null hypothesis is.

~~~
ewjordan
It's simply the way Bayesian statistics work: if the prior probability of
something happening is very low, then for me to flip from thinking "didn't
happen" to "did" it will take some new information that is very powerful.

If you think that's illogical, I'd ask you to consider why a teacher is more
likely to accept the excuse "my dog ate my homework" than "aliens kidnapped me
and stole it". You seem to be arguing that given that the evidences are equal
(a mere statement from a kid), the teacher should properly consider both
occurrences to be equally likely.

~~~
lukev
This is true, but I'd add a caution: just because something _seems_ outlandish
or improbable doesn't mean it actually has a low prior probability. Human
intuition on what's weird and what's not is not a reliable oracle of prior
probability. If you're going to give the prior an actual number, you better
base it on actual facts.

In your example, based on existing data, it is indeed fair - some dogs do
sometimes eat homework, whereas there are no verified accounts of aliens
stealing it. So that's a legitimate adjustment of priors. Particularly if you
actually have data on the incidence of paper-hungry dogs.

But in science and philosophy, there's lots of important questions for which
we can't legitimately calculate priors, and "it would be too weird" is not at
all relevant when determining their values.

~~~
phob
But we do have reasonable priors on parapsychology from its wasteland of
unreplicated, flawed studies, with no convincing results despite decades of
effort.

------
qiqing
Hypothesis #1:

53% in an experiment that has 36 trials? Really? 50% is 18/36, but 19/36 is
52.7%, and 19/35 is 54.3%. Depending on experimental design, if it just so
happens that near the end of the 20 minutes, the subject has a tendency stop
on one of the last erotic pictures they're like to guess and let the time run
out.

Hypothesis #2: Depending on how the computer's random number generator was
seeded (and they might have a relatively short repeating sequence), subjects
may have, however unconsciously, "learned" to predict the randomness,
something they would have insufficient motivation to do in the other set of
pictures. [We can test for this by seeing if they were getting better at it
over the course of a session.]

------
Eliezer
[http://lesswrong.com/lw/1ib/parapsychology_the_control_group...](http://lesswrong.com/lw/1ib/parapsychology_the_control_group_for_science/)

------
simon_
Perhaps NPR celebrates inverse April Fool's Day (1/4)?

~~~
Semiapies
That'd be funnier than the simple reduction of "The answer to science article
titles phrased as questions is 'No'."

------
Fargren
Older discusssion: <http://news.ycombinator.com/item?id=1878160>

------
zdw
Krulwich's NPR Science pieces has some of the best verbal delivery,
storytelling form, and production values - it's worth listening to the audio
version of them.

Highly recommended if you're thinking about making a podcast.

~~~
yan
Krulwich also co-hosts Radiolab (.org), which is incredible.

------
forensic
Has anyone investigated Dean Radin's work? It demonstrates similar statistical
effects in a huge number of experiments. He has put forth quantum entanglement
as the explanation.

~~~
nervechannel
And how many unpublished studies with null results I wonder?

<http://en.wikipedia.org/wiki/Funnel_plot>

PS non-physicists invoking "quantum stuff" are bullshit merchants, p < 0.001
:-)

------
nwatson
I e-mailed a link to a related article from a few months ago
([http://www.newscientist.com/article/dn19712-evidence-that-
we...](http://www.newscientist.com/article/dn19712-evidence-that-we-can-see-
the-future-to-be-published.html)) -- I sent it to some co-workers and also
copied my wife.

My wife (a psychologist and a Christian) defended the (mostly psychologically
oriented) experiments as posed in the paper and the method behind them,
whereas several atheist/strictly-causality-believing coworkers and also a
conservative Christian with a strong anti-psychology bias dismissed the idea
of spooky action from the future outright. The stormy e-mail exchange raged on
(and I did not contribute to the discussion). The conservative guy accused me
of abandoning my wife in the argument. I believe she's well capable of
handling herself.

In any case I wrote this (bad) poem in response:

    
    
      My act is mostly mute and unseen,
      I wear no costume, I don’t vent my spleen.
      Spending most time behind the stage,
      Conceiving a plot I prepare the cage.
     
      I have few resources, can’t sponsor M-M-A,
      Must find some other way to while away the day.
      I step out briefly to address the crowd,
      I hope today they’ll surely be wowed.
     
      My mind’s been active, reading Hacker News,
      What’s this I see?  Some interesting views,
      on whether the future can affect our present,
      I’m sure this will stoke plenty of dissent.
    
      I have my materials for a good time today,
      Setting the stage is just an e-mail away.
      My fingers fly fast, the idea’s not hokey,
      My actors will soon be addressing the spooky.
     
      I press ‘Send’ and my time on stage is done,
      I’ve set the parameters, now it’s time for fun.
      The actors appear to have done my bidding,
      I just hope it doesn’t end in too much bleeding.
     
      Sure, I’ll show up from time to time,
      The audience gets tired of hearing everyone whine.
      They need to see larger schemes at play,
      The actor’s philosophies won’t save the day.
     
      Arguments, screeds, reasoning galore,
      It’s exciting for a time, not yet a bore.
      I’ll step back just about now,
      It’s time for some more.
     
      This audience of one will now sit back,
      Got a few more bugzillas to whack.
      I won’t make it to peer-reviewed journals,
      But empirically it’s great to see what sprouts from a kernel.

------
joeld42
My bet is a crappy pseudo-random number generator, or other software bug.

~~~
Fargren
Anotehr article about the same study: [http://hplusmagazine.com/editors-
blog/precognition-real-corn...](http://hplusmagazine.com/editors-
blog/precognition-real-cornell-university-lab-releases-powerful-new-evidence-
human-mind-can-)

From the article : "The sequencing of the pictures on these trials was
randomly determined by a randomizing algorithm … and their left/right target
positions were determined by an Araneus Alea I hardware-based random number
generator."

At the very least they were using Araneus Alea wich is a hardware random
number generator, so the numbers were not predictable. It's possible that the
"randomizing algorithm" did something dumb and made the sequence not random,
but I doubt it.

I think that it's more likely thet the sudy was done so many times that it
eventually gave significant results than it is that the sequence was not
random. Or maybe prescience is real to some degree, or the study is a
statistical glitch.

However the replication package they provide has the compiled program without
the source code, and that is a red flag to me.

------
ibejoeb
So it looks like I'll be studying my Mega Millions picks tonight. My
methodology:

1\. Buy ticket

2\. Go AWOL until after the drawing.

3\. Study numbers

4\. Look up results

5\. Profit

~~~
khafra
Judging from the porn study, you'll get better results if you write a script
to screen-scrape the winning numbers after they're posted online, then display
them to you alongside some hot pron amidst several sets of randomly chosen
lottery numbers alongside SFW pictures.

------
tokenadult
It's time to post a link to Peter Norvig's article "Warning Signs in
Experimental Design and Interpretation"

<http://norvig.com/experiment-design.html>

here on HN again. How well does Bem's set of experiments hold up?

------
david927
This violates our common sense, but it doesn't necessarily violate physics.
We've known for a long time from quantum mechanics that time may not fit with
our preconceived notions of what we want to think it is. Entropy only means
that time moves forward. The rest, such as that we can't know the future,
we've just assumed.

------
mbrubeck
Here is some very good discussion of (failed) attempts to replicate the study,
and at least one possible methodological flaw that could invalidate some of
the results:

[http://psychsciencenotes.blogspot.com/2010/11/brief-note-
dar...](http://psychsciencenotes.blogspot.com/2010/11/brief-note-daryl-bem-
and-precognition.html)

 _"The real lesson? This is the level of methodological scrutiny every paper
should receive, and not just the ones you think are crazy: the ones you like
and rely on for your own work should get a good working over like this too
(especially these ones; and I'm as guilty on this as everyone else)."_

~~~
lallysingh
I'm wondering if the real point of the paper is to point out how insufficient
the experimental design & statistics used in its contemporaries.

------
yters
People have been getting results like this for a very long time. The Soviets
took these phenomena very seriously and were trying to establish physical
mechanisms.

Here's one IEEE paper: "a perceptual channel for information transfer over
kilometer distances - historical perspective and recent research"
[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1454...](http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1454382)

Can't find the online version anymore, so here's the version I found (9.5MB
pdf): <http://www.box.net/shared/inxg0nld9r>

------
comice
This was discussed by people on the James Randi forum back in October:
<http://forums.randi.org/showthread.php?t=188366>

------
blago
Pretty sure this experiment will meet "cosmic
habituation"(<http://nyr.kr/fkzAaQ>) very soon. Pretty sure Bem knows it. His
time would be better spent studying why does the truth "wear off" (as The New
Yorker put it). What happend to science!

------
raphar
"There was nothing surprising about the results of the psychological
experiments conducted by Dr Bem. The porn used was OURS"

A (future) message from <your favorite porn provider name here>

------
jamesbressi
I am embarrassed to ask, but can someone explain the word flashing test a
different way? For some reason something is not clicking for me the way it is
written on NPR.

------
barmstrong
Good discussion of this sort of research:

<http://en.wikipedia.org/wiki/Ganzfeld_experiment>

------
tlrobinson
The test didn't include the obvious control group: have the subjects pick a
door but don't show them whether or not they picked the right one.

------
Sauce1971
The question is. Do they see the future or make the future. Somedays I'm
convinced computers react to my moods.

~~~
unoti
If you're convinced computer react to your moods, you should consider reading
Zen and the Art of Motorcycle Maintenance. There's an entire chapter dedicated
to the concept. It's philosophy, not science, but interesting and possibly
wise nonetheless.

------
Rickasaurus
I'm very disappointed in HN for voting up this unscientific garbage.

------
hc
and how many studies did he throw away, when they didnt get the statistical
significance he was looking for? now lets average all of them together

------
maeon3
Figuring out what time is may be what causes civilizations to go extinct. Once
you figure out how to probe the earlier states of the universe you find
everything vanishes, along with the evidence that the civilization ever
existed. This may be why our visible universe is not teeming with chit-chat.

------
Qz
Wow, the word retyping test seems like a total scam. Of course you're better
able to recall words after a test in which you were already better able to
recall those words. It's called memory. Correlation, not causation, as the
maxim goes.

~~~
forensic
I think you missed the point.

1\. People recalled as many as possible.

2\. People were divided into two groups, control and experimental.

3\. Experimental group was asked to retype the words.

4\. Control group was asked to go home.

5\. Experimental group found to score better on initial test.

~~~
Andrew_Quentin
I have not red the original article so maybe I am mistaken, but the post
submitted here seems to suggest that there was one group only not two groups.
There was one group and they retyped only half or so of the words and not the
other half and were found to remember the retyped words better.

I do not know if that makes a difference, but I think it might and could. The
retyped words might have whatever quality or for whatever reason might have
been easier to rememmber than the non retyped words.

Unless in the original article it suggests that the experiement was carried
out in the way you suggest, I do not think there was much control of
variables.

