
Warning Signs in Experimental Design and Interpretation (2007) - SatvikBeri
http://norvig.com/experiment-design.html
======
tokenadult
Readers who have observed my behavior after 1976 days of participation on
Hacker News know that this is by far my favorite link to share in a Hacker
News comment, so I'm glad to see this on the front page as an article
submission today.

My one comment about the article is that most of what gets submitted to HN as
a breathless press release on a research "breakthrough" is often not even
based on experimental research, but rather on correlational research, so the
study goes wrong with problems that Peter Norvig's excellent article doesn't
even discuss much. Many, many submissions to HN are based at bottom on press
releases, and press releases are well known for spinning preliminary research
findings beyond all recognition. This has been commented on in the PhD comic
"The Science News Cycle,"[1] which only exaggerates the process a very little.
More serious commentary in the edited group blog post "Related by coincidence
only? University and medical journal press releases versus journal
articles"[2] points to the same danger of taking press releases (and news
aggregator website articles based solely on press releases) too seriously.
Press releases are usually misleading.

But, yes, definitely read the submission here, as it will help you check each
submission to Hacker News you read for how many of the important issues in
interpreting research are NOT discussed in the submission.

[1]
[http://www.phdcomics.com/comics.php?f=1174](http://www.phdcomics.com/comics.php?f=1174)

[2] [http://www.sciencebasedmedicine.org/index.php/related-by-
coi...](http://www.sciencebasedmedicine.org/index.php/related-by-coincidence-
only-journal-press-releases-versus-journal-articles/)

------
news_to_me
Wow, this page has no <html> or <body> tags, it just gets down to business
with <div>. Is it a fragment meant to be included elsewhere?

Can... can the whole Web be like this?

~~~
svachalek
Sort of: [http://www.w3.org/TR/2011/WD-
html5-20110525/syntax.html#opti...](http://www.w3.org/TR/2011/WD-
html5-20110525/syntax.html#optional-tags)

------
DEinspanjer
This is an excellent article. It helped put numbers and precise definitions
behind several things I had sorta intuited before and also corrected me on a
couple of others. :)

As part of internalizing the bit about P(H|E) vs P(E|H) [Warning sign I4], I
wrote up a quick gdocs spreadsheet to let me play with the numbers:
[https://docs.google.com/spreadsheets/d/10JrG42iKY-
LhcnaKU7O7...](https://docs.google.com/spreadsheets/d/10JrG42iKY-
LhcnaKU7O7B5EezOKoZ0OH7Y4HeXZkWIc/edit?usp=sharing)

------
eli_gottlieb
It pretty actively depresses me that my statistics class never _actually_
discussed how loose the connection between hypothesis testing (what we
normally call a statistical test, frequentist) and a genuine updating of
hypothesis probabilities (Bayesian) actually is.

Dear frequentists and Bayesians: could you people kiss and make up already?
The rest of us are pretty tired of having to run bad numbers to get papers
published _and_ of having computationally intractable statistics to run.
Please come up with a compromise.

------
cassowary37
I think this conflates poorly designed studies and preliminary or hypothesis-
generating studies. If not for uncontrolled proof-of-concept studies, or
pseudorandomized designs such as those used for post marketing studies, we
would a) not have any new medications and b) rarely learn about unanticipated
toxicities. It turns out that us clinical trial folks are more Bayesian than
the cult of p<0.05 would lead you to believe.

At one end of the spectrum, we rely heavily on uncontrolled single- or
multiple-ascending dose studies to prove to ourselves that a treatment is
likely to be safe in next-step studies, and to guess at optimal dose. At the
other, we learn a great deal from post marketing surveillance about
unanticipated toxicities - because our priors based on big phase 3 studies may
still be insufficient to accurately estimate risk. Neither of these designs
are randomized or placebo-controlled - and in neither case is it an indication
that 'something is wrong', even though an RCT would be _better_ in both
contexts. Better, if cost were no object and patient safety were not a
concern.

I realize my fellow Brunonian does include some offhand caveats - but it
worries me to read comments about how this negates most social science
research.

~~~
darkxanthos
Not at all... He states merely that if someone has the option to have a
control and chooses not to, you should be concerned. Specifically he hits on
the fact that some fields can't have a control (sometimes just due to ethical
reasons).

~~~
cassowary37
I think the distinction is that we /often/ choose not to have a control, for a
multitude of reasons, so as a cause for concern it's highly nonspecific. Much
like the critique of small samples - there are underpowered 5000-patient multi
center trials, and well-powered 30-patient trials.

As I said, I think this is a nice introduction but can lead one to discount,
well, anything other than large randomized placebo-controlled trials.

------
Mz
I wonder how one moves in the opposite direction. I am not even sure how to
frame my question. I have been given hell quite a lot for trying to explore
it, conversationally. But say you have a working hypothesis and are getting
results and you know of some research that fits with the situation. How does
one effectively present something like that? Not as "proof" but as a place for
others to start thinking about the problem space?

I don't ever seem to see scenarios like that addressed.

~~~
abecedarius
Have you read [http://blog.sethroberts.net/](http://blog.sethroberts.net/) ?
It's not clear to me if that addresses your question, but Roberts criticizes
professional science for too much focus on expensive testing of popular
hypotheses and not enough on generating and cheap winnowing of ideas.

~~~
Mz
I have not seen that before, thank you, but it is currently not wanting to
open properly for me.

~~~
abecedarius
No idea about your access problem, but posts like these are why it came to
mind: [http://blog.sethroberts.net/category/personal-
science/](http://blog.sethroberts.net/category/personal-science/)
[http://blog.sethroberts.net/category/health/](http://blog.sethroberts.net/category/health/)

~~~
Mz
Those links work. Thanks!

------
mathattack
Great post! This shreds much of the research in social science. Should be
required reading for every science major or science journalist.

------
coldcode
I love Norvig's articles, but by not having a date on them, I wonder if I am
reading the news or ancient history.

~~~
praptak
This one is timeless anyway.

------
SatvikBeri
Google cache link:
[http://webcache.googleusercontent.com/search?q=cache:xeMschA...](http://webcache.googleusercontent.com/search?q=cache:xeMschA84JIJ:norvig.com/experiment-
design.html)

------
logfromblammo
And we get 503 responses already.

There won't be much discussion if no one can read the article.

~~~
keithpeter
Working here also cached in Google (search on the title + Norvig, click the
'cached' link)

