Hacker News new | past | comments | ask | show | jobs | submit login

Surely the flabbergasting thing is not whether this proves or disproves GW but that scientists are allowed to flat-out refuse to release the raw data they used to generate results.



A collection of raw data is full of systematic errors, accidental mistakes, misleading black swans, and false trails (some of which get followed for years before they finally turn out to be false). I've seen several talented, well-trained, and highly experienced scientists fool themselves for decades with their own raw data. That's why it is called raw. That's why you have to analyze data, over and over, until you can't stand it anymore, and only publish the last tiny fraction that comes out: Your best work, the stuff that you're confident in and prepared to stand behind. And that's why there's a lot more to science than just reading a lot of numbers off the front panel of your instrument and sticking them up on the web.

If I were a scientist in a controversial field, where every dropped decimal point, statistical anomaly, and speculative sentence (later to be disproved, and to make even its own author blush with the memory) was liable to be mined out of my notebooks and splashed all over the tabloids, I'd sure as hell refuse to release my raw data. Indeed, I might just decide not to release any data at all, but just switch to another field. That's obviously one of the goals of this campaign of intimidation.


One of the core parts of modern science is reproducable results - to allow anyone to take data, follow through the methods used, and locate errors (or see if something is an anomaly, in the case of experiments). Without it, science is basically meaningless - one must rely on the word of a group of people for their conclusion, and it is essentially pointless to publish the method (as it's impossible for anyone to recreate the research).


You have to release data and methods that allow other people to recreate the research. (And, obviously, your colleagues are free to object that you haven't published enough, and to ask you for more.)

But that's not the same as releasing everything you ever write down to anyone who asks, which is what the original comment seemed to be suggesting.

The problem with your raw data is that, in the hands of an opponent, especially one who argues in bad faith, the word raw is quickly and easily filed off and it gets described as "your data", despite the fact that you threw it away and didn't publish it, presumably for a reason.

It's easy to make a scientist look ridiculous -- to a nonscientist -- by poking fun at their unpublished data, just as it's easy to make a great novelist look ridiculous by poking fun at their grocery lists, their kindergarten handwriting assignments, or their unpublished first drafts.


If one is unwilling to share their data and methods, then they should not participate in scientific research. The American Physical Society, for one, expects scientists to "Expose their ideas and results to independent testing and replication by others. This requires the open exchange of data, procedures and materials."* In this case, reproduction is not an option—even by the original authors.

http://www.aps.org/policy/statements/99_6.cfm


A collection of raw data is full of systematic errors, accidental mistakes, misleading black swans, and false trails (some of which get followed for years before they finally turn out to be false). I've seen several talented, well-trained, and highly experienced scientists fool themselves for decades with their own raw data.

I've done a lot of data cleaning over the years, some of it geological. Yeah, there are usually some problems. But I've never had one that I couldn't resolve. Your post implicitly assumes that only one or the other can be published. Not true. As a condition of receiving grant money, document and publish ALL raw data and any cleaned data, in addition. The interwebs still has a few bits left to hold the extra.


Drug companies put up with that and more.

If you're going to insist that we spend $100s of billions because of your conclusions ....


Public funding should require full disclosure of all data as well as the paper generated by the research.


I believe in the free availability of all publicly-funded research papers.

And I'd be prepared to argue for the free availability of raw data as well, if we lived in a world where it wouldn't be cherry-picked by axe-grinders and used for character assassination. I don't think we live in that world. Maybe someday.


"[...] if we lived in a world where it wouldn't be cherry-picked by axe-grinders and used for character assassination."

Unfortunately, that is exactly the current problem as well, without the data being available.

The article this discussion is keyed off of Flawed climate data (http://www.financialpost.com) which argues pretty convincingly that the AGW researchers cherry-picked their data to show the "hockey stick."

The article The Economics of Climate Change http://online.wsj.com/article/SB1000142405274870349940457455... argues very convincingly that the AGW promoters in question had a financial axe to grind.

The purloined emails (disclaimer, I only read the commonly reported quotes) were replete with character assassination.

Between my assertions and mechanical_fish's assertions, releasing and hiding both lead to nastiness. IMHO, hiding the data caused as much or more nastiness as releasing it. Given that Good Science is verifiable, that argues strongly that the data must be released.

There's an old legal aphorism that goes, "If you have the facts on your side, pound the facts. If you have the law on your side, pound the law. If you have neither on your side, pound the table." -- http://en.wiktionary.org/wiki/pound_the_table

There is a lot of table pounding going on, and it sounds like it started with the AGW researchers not having quality data that could be pounded on, so they hid the data and pounded the table.


I would rather see raw data released, too. However, mechanical_fish certainly has a valid point.

For example, there's currently a bit of a tiff among some climate scientists over whether the statistical methods used to produce the "hockey stick" graphs are mathematically valid. These other scientists use different statistical methods which produce a different graph, and they argue that there's no intrinsic reason why the "hockey stick" method is better.

However, the "hockey stick" graphs correlate much more closely with CO2 measurements. If this other method is valid, then there needs to be an explanation of the divergence between temperature and CO2, which otherwise has been assumed to be closely related.

Given this disagreement among actual scientists, I can imagine the raucous noise produced when a whole bunch of armchair scientists get ahold of "raw" data and say, "Aha! Your data doesn't match your graphs! We're yanking your funding!"


I'm with you. I really really wanted to read (for example) "An observationally based energy balance for the Earth since 1950". However, all of the references to it that I can find are stuck behind paywalls, despite the paper having been authored primarily by NOAA scientists.

I fully support spending tax dollars on scientific research. That said, I think that the results of this particular bit of research have already been paid for.

I'm considering paying the nine bucks and putting it up for download. Any takers?


I'd download it. I think the idea of a "napster of academic research" is a great idea, fwiw :)


An academic friend of mine pitched that to me as a product idea (for me to develop; not him), but I chickened out thinking of the legal issues.

Also: http://www.techcrunch.com/2009/02/25/mendeley-snags-2-millio...


I downloaded it for you. Give me an email address to send it to you.


Email is now in my profile. Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: