

Secrecy in science is a corrosive force - cwan
http://www.ft.com/cms/s/0/8aefbf52-d9e1-11de-b2d5-00144feabdc0.html

======
billswift
Secrecy in science makes it non-science. If your claims cannot be tested and
falsified, then it is not scientific; hidden data or techniques prevents
replication.

~~~
sethg
I think this is an overstatement. If I say "the mass of an electron is X and
this is how I did the experiment to determine X", you can repeat all the steps
of the same experiment without looking at my original lab notebook.

The more dubious situation is when I say "based on my analysis of a massive
quantity of data, which took my grad students a whole summer of full-time work
to collect and which I won't publish here, I observe the following pattern
which is evidence in favor of theory Y". But even in this case, a competing
scientist with his or her own graduate students can collect data from
different sources, or a different kind of data, and publish a paper saying
"hey, the pattern in this other data provides evidence _against_ theory Y".

------
dtf
I seem to remember reading, maybe about a year ago, of some movement trying to
promote the practice of supplementing scientific papers with reproducible
experiments and derivations - ie source code and data. Perhaps it was just in
the field of computer science? To a non-academic like me it sounded like a
completely obvious thing to do, but yet you hardly ever seem to see it. Surely
given the tools the internet offers these days (look at the wonders of github
it its ilk for sharing and modifying code), we should be moving academic
publishing forward to take advantage of such tools?

~~~
bbgm
Number of efforts across the board, including chemistry and biology. The
extreme side of the "open science" world is Open Notebook Science

<http://en.wikipedia.org/wiki/Open_Notebook_Science>

In general you are required to put enough into the methods section of a paper
to be able to reproduce the results, but as the computational side gets more
and more complex and supplementary data sets get bigger and more complicated,
the need to make the raw data and (if possible) code available for people to
reproduce the results and/or test the assumptions becomes almost a necessity.

A number of folks are thinking along the publishing side as well. Theres PLoS
One (<http://plosone.org>) and Cell has been doing some very interesting
prototyping on the "paper of the future". People have already mentioned JoVE
and OpenWetWare. It will happen, but it's going to take a few years. Too many
years of established practice.

------
marciovm123
Scientists spend loads of time worrying about what data to release and how to
present it because it's necessary in communicating your findings effectively
to the rest of the scientific community. The problem with the "release
everything" approach is that there is so much data generated every day that it
would be impossible for anyone to make sense of it. When journalists or other
non-technical people try to analyze raw data or scientists' communications on
their own they are very likely to mis-interpret or emphasize the wrong things.

~~~
LargeWu
You seem to be making the assumption that scientists are interpreting their
own data correctly in the first place. Reproducibility isn't just a check on
outright fraud, it's also a way to check for mistakes in the original
findings. The data isn't for lay-people and journalists, it's for other
researchers.

As somebody who has had to compile (economic) datasets for public release, I
understand that there is cleaning of raw data that needs to happen. However,
this needs to happen in a transparent manner, and any transformations should
be clearly called out or explained.

~~~
thras
Or better yet, release the raw data and the transformation scripts.

~~~
alan-crowe
It is important to automate the downstream processing so that you can perfom
it both on the raw data and on the cleaned data. This lets you investigate the
sensitivity of your conclusions to your various assumptions.

For example, if the conclusions are supported by the cleaned data but not the
raw data, you need to worry about whether you have accidently written your
conclusions in by hand during the cleaning process. One way to investigate
this is to have two cleaning scripts, a lax one that only does no-brainer
corrections, and a strict one that makes those delicate judgment calls. Then
you can reprocess. If lax cleaning of the data supports your conclusions then
they are fairly secure. If this sensitivity analysis reveals that the
difference between lax and strict cleaning matters to your conclusions, then
you groan, because you have waded into muddy waters and things are much harder
than you initially thought. Maybe you have to recruit an assistant to clean
the data blind, without knowing what the "right" answer is, or maybe you need
to go on field trips to do direct checks on instruments.

If your research has important and expensive implications for public policy
other people will want to do this kind of sensitivity analysis themselves and
reach their own conclusions.

------
sethg
IIUC this has long been an issue in archeology: for example, a small committee
of scholars had exclusive access to the Dead Sea Scrolls for decades, and the
scholars dragged their feet on publishing the complete text, until someone
reverse-engineered much of the text from a concordance.

------
known
"The secret to creativity is knowing how to hide your sources." --Einstein

