Hacker News new | past | comments | ask | show | jobs | submit login
Why the world of scientific research needs to be disrupted (gigaom.com)
107 points by llambda on Nov 8, 2011 | hide | past | favorite | 79 comments



Two comments:

1. Scientists publish in slow peer reviewed journals, because scientific publication has to be peer reviewed. The reviewers are a very small selection of other scientists, who are expert in the field of the publication. As long as the reviewers don't signal white smoke, a publication is no science by definition. This way of work is important to keep quality excellent. We certainly don't need more quickly published junk that stirs the world but is found to be inaccurate later.

Can we accelerate the process of peer reviewing? I doubt that. As I have written, the reviewers are other scientists who have to find the time to read and comment new articles. The more time they spend with this, the less time they spend doing research. So the small amount of experts usually is responsible for being slow.

You may be quicker by internet publishing, if (IF!) you work on a highly dynamic field where other scientists are hungry for your results. As the author writes, this is the exception. Most scientific results will be useful only a long time after the publication, if they will be useful at all. In this way, science is much faster than application already.

2. The author writes, we could make progress faster. Please: why? It seems to be a sign of the time that we want to accelerate everything. It is most likely that this will only produce more noise. We need to slow down our lives and our thinking to stay accurate and produce real value. I am glad that science results, at least in my field, are still reliably scrutinized by many. This way I know that reading and spending the time to understand the stuff is worth my time.


On #2, it varies by field, but I agree somewhat, in that I generally think there's already a significant problem of people being too quick-on-the-draw without first familiarizing themselves with the literature. There is a lot of stuff published that rehashes (without citation) existing published results or minor variations on them, sometimes from decades earlier, that the authors and reviewers just didn't happen to be familiar with. If you accelerate everything and distribute the peer-review to people with less deep knowledge of the literature, I suspect this will only increase, adding more noise to the literature as we spend the 2010s rediscovering the 1970s, accompanied by triumphant press releases and blog posts about Breakthrough Discoveries.

I notice this particularly in computer science, where it seems nobody is capable of locating or reading journal articles published before around 1995, a handful of canonical "classic" papers excepted.


How much of that is because research from the 1970s is effectively locked away behind ACM and IEEE paywalls? How much research gets rehashed because the researcher can't find the result he or she is looking for via the atrocious search tools provided by the likes of Elsevier?

I think that more openness in scientific publication will in fact mitigate the problem of people being quick-on-the-draw, as contradictory findings and previous publications of similar results will be easier to find, even by the non-scientific audience.


>> I notice this particularly in computer science, where it seems nobody is capable of locating or reading journal articles published before around 1995, a handful of canonical "classic" papers excepted.

How much of that is because research from the 1970s is effectively locked away behind ACM and IEEE paywalls?

If you're serious about keeping up with research, a couple hundred dollars every year to get through a paywall doesn't sound so bad. The real problem with the stuff from the 1970s and earlier is that it's not readily available even to paying members. When I went looking for an article from 1970, the ACM Digital Library had an entry for it but not the document itself.


"A couple hundred dollars"? I see that you are not familiar with the pricing schemes of Elsevier and others.


I thought what I quoted was about ACM/IEEE. $198 per year (less than that for students) to the ACM gets you unlimited access to their online archives. I didn't check the IEEE rate, as I rarely encounter a paper I have to get from them instead of from ACM -- is it much more? What am I missing out on by not checking Elsevier's stash?


I am in math rather than CS/EE. I am not precisely sure, and indeed Elsevier goes to some effort to make their pricing complicated, but I am pretty sure it runs well into the thousands.

I am a professor at a reasonably good state university, having just come from Stanford. At Stanford they subscribed to everything, and here our department picks and chooses so that I constantly run into paywalls despite a university subscription.

I don't know how much it would be to upgrade to Stanford-level access, but if it were $200 a professor I assume they'd do it. (Certainly I'd pay $200 from my salary for that.) I'm guessing high four or low five figures per prof in the department.

I'm guessing ACM/IEEE are nonprofits? Kudos to them for making their prices reasonable. There are some professional organizations in math (e.g. the AMS) that do something similar. But unfortunately a lot of our journals are published by for-profit companies.


The ACM and the IEEE are the key professional societies for computer science, computer engineering, and electrical engineering research.

Wikipedia says the IEEE is non-profit, not sure about the ACM.

IEEE weights towards the EE and high-math end of things, ACM weights towards the CS end of things.

The Journal of the ACM is about $300USD/yr for print/online access (nonmember).

I have a wide variety of interests and would prefer to have full access to about 5-7 journals (Some Wiley, some Elsevier). Assuming $300/year prices that's a pretty decent sized price to keep up with research interests. :-(


Agreed. Vast amounts of very solid literature have 'fallen' off the public face of the planet in CS, except in the archives of publishing houses and libraries.

(Of course, I'm confident part of the problem is that expensive paywalls discourage exhaustive trawling by non-academics).


I have heard stories implying that you can reliably produce an "original" doctoral thesis by digging out something from the vast amount of work published 70–100 years ago in German.


To add to that, 3. Scientists (particularly academics) should focus on a smaller number of solid, flushed-out papers rather than seeking out the minimum publishable unit. The latter seems to be a byproduct of the tenure process (IMO) and just adds noise.


How about you focus on publishing diffs to an overview (or so) instead of self-contained articles?


What do you do with the details too small to belong in an overview?


Overview was probably the wrong word.


The article made the point that this is an "accident of paper". Peer review comes from the expense of curating paper documents.

Of course it is arguable that peer review weeds out noise too. But you have to get your work in front of a peer, and they have to understand your contribution. Its also arguable that peer review leads to stagnation and missed opportunities/discoveries. In fact, surely it does.


>But you have to get your work in front of a peer, and they have to understand your contribution.

A very small (but growing) minority of journals, like PLoS One or , make no effort to understand or estimate the significance of the contribution, only whether or not the work is scientifically sound.


I agree that time isn't a matter of concern for most scientists.

The article seems to be a badly researched rehash of other Open Science articles. Open Science is not about faster delivery, it's about access to information. The problem is not that publications are slow, it's that without paying for the magazine other scientists, the media and non scientists can't access those papers and learn, divulge, criticize or contribute.


Speed is important for a number of factors, mistakes cost less if they can be quickly corrected. So being risky and publishing about your failures becomes useful (no one publishes failures...). Speed also reduces duplicate work.

Peer review is slow because publishing and retracting used to take a long time, now it is instant. And as for the quality of the reviewer, perhaps it would make sense for the journal to employ professional scientists that only review -- they'll be up to date on all the current research (that's their job), can highlight gaps, and can help cross pollinate other disciplines by reviewing 2 or 3 different but related topics.

The idea that a scientist should do everything seems to be more and more inefficient. We need to break down tasks like we do in the commercial world so that experts can move faster.


Peer review is slow because publishing and retracting used to take a long time, now it is instant.

Have you ever done a peer review? It took me hours to do each one. To do it well, you have to stare at each procedure and each conclusion and try to imagine what could be going wrong.

You may also have to read a lot of literature, both in general (so that you can recognize the difference between something genuinely new and something that was already published in 1975) and in particular (so that you can make sure the manuscript is not misrepresenting its own references).

And, given what's at stake in a review – you're basically holding a year or more of some grad student's life in your hands, at a minimum – it's only respectful to take it seriously.

perhaps it would make sense for the journal to employ professional scientists that only review...

This is either obvious, or silly. Obvious, to the extent that journals always have employed scientifically trained editors – they're called "editors" – to do everything from tweaking bad phrasing to offering criticism to ultimately making the final decision about what gets published. Silly, because journal editors are rarely experts in your specific field. How can they be? There are a lot of scientific fields. It's impossible to be "up to date on all the current research" in every single one of them, to the requisite level of detail. And most fields aren't awash in money to the extent that they can support a full-time editor with complete expertise. In the general case, you have to rely on peer review because only your peers have the incentive to be experts in your field.


I'm not saying that peer review doesn't take a long time to do it right, but I'm saying that perhaps a new job can be created who's job is to review and stay current on the literature. I'm not saying one person can know everything, but that there may be a better way. We have more scientists and papers being published every year, and I'm making the argument that we've outgrown current procedures. This is not an editorial job, this job does exactly what current peer reviewers do -- but it's their full time job.

Could I be wrong, sure -- but if the field is so big that a dedicated person can't keep on top of it then there is no chance for a scientist to do that either. The biggest issue is whether there is an incentive for someone with the skills to do it; why would I want to just review the work of others when I have the skills to do my own.


The point is that the skill set to be able to do science and to judge scientific manuscripts are very close together. Keeping on top of one is keeping on top of the other.


Journals should probably pay people to review works (well, really it will be included in a submission cost). But many people are resistent to the idea.


>Speed is important for a number of factors, mistakes cost less if they can be quickly corrected. So being risky and publishing about your failures becomes useful (no one publishes failures...). Speed also reduces duplicate work.

You're missing the point of scientific publication. The entire point of publication is so others can reproduce your work. Speed doesn't work if you need a billion data points taken over 20 years to prove a long term issue.

>The idea that a scientist should do everything seems to be more and more inefficient. We need to break down tasks like we do in the commercial world so that experts can move faster.

The problem here is that for a lot of cutting edge theoretical research, the experts are the only ones qualified to really vet the paper. And finding experts that: 1. Are capable of understanding everything in the paper and 2. Are willing to put aside their personal research to review the paper is a VERY small set of people.

For example, the P=NP proofs produced over the last few years were only presented to a small group of about 20 very qualified mathematicians to vet. I, with an undergrad CS degree, could barely understand the summary of the proofs. As far as I know, those proofs still have not been completely refuted or accepted. I don't think that this is inefficient, it's simply the fact that there aren't enough people with enough time to really work with extraordinarily complex concepts and ideas.


> 1. Are capable of understanding everything in the paper

Agreed. In my academic field (Debuggers), to understand and know enough to give a real review of the subject requires reading what I would estimate to be somewhere around a thousand papers. It also requires keeping up with the major industrial producers of the product. I think the field has something like two to three thousand papers right now, I haven't checked in a while.

For my work, I am just hitting the seminal papers and trying to avoid spending time reading work that led nowhere. I've read maybe two-three hundred articles (haven't tracked), and my thesis cites over one hundred.

So no, ordinary people won't do that. Ordinary computer programmers won't do that. Most people are not capable (or prepared) to understand everything in a given paper and comment on what has gone before and what has been tried before.

Academic knowledge maintenance is the total philosophical opposite of tl;dr and Twitter.


Out of curiosity, is there some place to locate a clear listing of the seminal works in your field?

I'm genuinely curious how much of the difficulty here is actual breadth/depth of the subject matter, and how much of the difficulty is due to some systemic inefficiency in the way research is published and consumed.


The trick is usually to look at the references section of a paper. If there's an old paper in there (say, >10 years old), then it's probably seminal. If you see the same paper in a lot of reference sections, then it's probably seminal.

So: to know what's seminal so you can skip reading a bunch of papers, you need to read a bunch of papers. Right.


Sounds like a machine learning problem.


Not that I've seen, but that does not mean I simply haven't stumbled into the right FAQ.


> You're missing the point of scientific publication. The entire point of publication is so others can reproduce your work. Speed doesn't work if you need a billion data points taken over 20 years to prove a long term issue.

In theory, that's the point. In practice, no journals ever publish replications, so nobody wastes their time reproducing others' work when they could be working on something publishable or their next grant proposal.


>In practice, no journals ever publish replications, so nobody wastes their time reproducing others' work when they could be working on something publishable or their next grant proposal.

Uh, this is exactly how Science works. It's not worth publishing the exact same results of the exact same experiment by multiple people. That only adds noise to the discussion.

If I arrive to the same conclusion after running the experiment again, then there is little benefit to anyone if I do a full writeup and publish it. On the other hand, if I am unable to arrive to the same results, there is a tremendous value in publishing my findings. Was the original study flawed? Were my own methods? That's what peer review and publishing results help determine.


Not publishing positive replications is just as bad a problem as not publishing negative original results. We have things like BigTable and Hadoop now; if 100 laboratories repeat an experiment and publish their results, that just means we can raise our confidence in the result by the sum of their likelihood ratios.

Getting more data improves the accuracy your results even better than using more sophisticated algorithms: http://www.catonmat.net/blog/theorizing-from-data-by-peter-n...


I remember seeing an article suggesting that the incentives related to publication (specifically in medicine) set us up for conditions where a majority of published results are wrong, and we have no way of knowing it:

Effectively requiring a positive result for publication means two phenomena can be the subject of multiple studies, with the one fluke that finds correlation being the one that gets published. At that point, we only get corrected if someone actually attempts to replicate the result, but replication effort may well be seen as a waste of resources.


Speaking as an academic mathematician, I don't think it is so small.

If the paper is groundbreaking, everyone will want to read it anyway.

If it is so-so, an expert will be able to determine if it is correct by skimming it pretty quickly.


I'd make a slight correction and say that the point of publication is not so others can reproduce your work, it is so others can confirm your results.

Exact reproduction can be problematic in science because if there was a flaw in the original experimental design or method, an exact reproduction could "confirm" that same flawed result. It's better if other scientists can learn enough to understand the result, and design their own experiments to confirm or disprove it.

A super simple example is if I drop something and then report a value for gravitational acceleration. But what if I dropped a feather? If you simply reproduce the experiment, you'll get the same (wrong) result. Whereas if you select your object to drop, there's a better chance you'll pick something denser and get a different result.


I'm sorry, your claim seems to be: The process is working perfectly right now. This is absurd. The scientific method is a lot more open to interpretation than you suggest, and indeed it is interpreted with wide variety in different disciplines.

The methods of science will be disrupted and improved. Bright lay people will absolutely be able to poke holes in research once the methods and data are completely open, as has been indisputably proven with the advent and prominence of open source software. Research has shown systemic flaws in numerous disciplines. These sorts of flaws fester because of the closed nature of the system. Open research will have its day. It's only a matter of time.


> Bright lay people will absolutely be able to poke holes in research

By the time they can astutely manage the actual field with understanding, they will no longer be lay people.


Why do we need the process of peer review?

Peer review is not robust against even low levels of collusion (http://arxiv.org/abs/1008.4324v1). Scientists who win the Nobel Prize find their other work suddenly being heavily cited (http://www.nature.com/news/2011/110506/full/news.2011.270.ht...), suggesting either that the community either badly failed in recognizing the work's true value or that they are now sucking up & attempting to look better by the halo effect. (A mathematician once told me that often, to boost a paper's acceptance chance, they would add citations to papers by the journal's editors - a practice that will surprise none familiar with Goodhart's law and the use of citations in tenure & grants.)

Physicist Michael Nielsen points out (http://michaelnielsen.org/blog/three-myths-about-scientific-...) that peer review is historically rare (just one of Einstein's 300 papers was peer reviewed! the famous _Nature_ did not institute peer review until 1967), has been poorly studied (http://jama.ama-assn.org/cgi/content/abstract/287/21/2784) & not shown to be effective, is nationally biased (http://jama.ama-assn.org/cgi/content/full/295/14/1675), erroneously rejects many historic discoveries (one study lists "34 Nobel Laureates whose awarded work was rejected by peer review" (http://www.canonicalscience.org/publications/canonicalscienc...); Horribin 1990 (http://jama.ama-assn.org/content/263/10/1438.abstract) lists others like the discovery of quarks), and catches only a small fraction (http://jama.ama-assn.org/cgi/content/abstract/280/3/237) of errors. And fraud, like the one we just saw in psychology? Forget about it (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjourna...);

> "A pooled weighted average of 1.97% (N = 7, 95%CI: 0.86–4.45) of scientists admitted to have fabricated, falsified or modified data or results at least once –a serious form of misconduct by any standard– and up to 33.7% admitted other questionable research practices. In surveys asking about the behaviour of colleagues, admission rates were 14.12% (N = 12, 95% CI: 9.91–19.72) for falsification, and up to 72% for other questionable research practices....When these factors were controlled for, misconduct was reported more frequently by medical/pharmacological researchers than others."

No, peer review is not the secret sauce of science. Replication is more like it.


Your conclusion seems to be: why do we need peer review if the results are this bad?

But an alternative perspective might be: think of how much worse things would be _without_ peer review.


> But an alternative perspective might be: think of how much worse things would be _without_ peer review.

Hey, I have this elephant repellant for sale. I know it works awesome because I haven't ever seen any elephants around here. (I also cited historical data about the absence of peer review working, you know, pretty well.)


In order to check, if what you write makes sense, we have to read at least all texts you link to, probably also links in these text. This will take perhaps one day, if done accurately. Do you prefer that all readers spend the time, or wouldn't it be an advantage, if a few knowledgeable experts would do the job and signal to everybody: what he writes makes sense?

Sure, peer review is not flawless. But flawed procedures can either be replaced or improved. Improvement keeps what is good and changes the rest. Complete replacement may come with new problems.

Don't think that you can escape vanity where people are involved. Not even in science. There are also always people who make use of system errors. As long as that is only a few percent, we do pretty well.

And yes, I think snowwrestler also has a very good point.


> Do you prefer that all readers spend the time, or wouldn't it be an advantage, if a few knowledgeable experts would do the job and signal to everybody: what he writes makes sense?

Division of labor does not imply peer review. Peer review is merely an ad hoc, unproven, flawed way of implementing division of labor.


The article makes at lest one important point. There are only two incentives for a scientist: publications and grant money. Getting published fast reduces quality of research. Also, the focus on publications hinders other research outcomes like publicly available data or software.

Industry or start-ups may not make full use of publications that attack the same problem over and over again but they surely can improve with a new data or newly implemented algorithms.

[EDIT] Publicly available data and software make replicability more realistic. Currently lack of details in publications make it almost impossible.

If only there was some way to influence NIH and NSF grant requirements ...


There are only two incentives for a scientist: publications and grant money.

I disagree. There are three incentives: first, getting a permanent job, second getting grant money, and third, doing good work. The two former are unfortunately sometimes in opposition to the third.


NIH are among the proponents of Open access so they should be one of the easier organisations to convince. I believe all articles resulting from NIH sponsored research must now be published in Open access papers.


I'm sorry, but I must disagree with the spirit of your points.

Scientists publish in slow peer reviewed journals, because scientific publication has to be peer reviewed. The reviewers are a very small selection of other scientists, who are expert in the field of the publication.

So, when you throw work out into the open, anyone with an interest in your topic (your functional peers) will be able to do their own analysis of your work (a review)--some sort of peer-review, if you will. All without needing to go through a stodgy paper review process.

This way of work is important to keep quality excellent. We certainly don't need more quickly published junk that stirs the world but is found to be inaccurate later.

So, if there is one thing we've seen in the programming world, secluded cabals of experts don't always result in excellent quality (I submit to you, OpenGL, C++, CORBA, for a few examples).

New patches are submitted to open-source projects all the time; some are accepted, some are rebuked publicly, and some merely catalyze interesting discussion. "Stirring the world" is hardly a bad thing--in the case of Fleischmann-Pons, the debunking happened quickly and the world was free to move on. In a similar avenue, one of the reasons that the eCat story is frustrating is the lack of transparency (both from the inventors and the media refusing to even air solid rebuttals)--were the information more open, we could sort it into bunk/science and move on.

Can we accelerate the process of peer reviewing? I doubt that. As I have written, the reviewers are other scientists who have to find the time to read and comment new articles. The more time they spend with this, the less time they spend doing research. So the small amount of experts usually is responsible for being slow.

So, you make the (valid) claim that the process can't go faster, predicating on the scarcity of experts. You seem to have the answer at hand: get more experts. This can be accomplished by either getting more researchers reviewing, or loosening the restrictions on who is considered a valid source. Indeed, places like StackExchange have shown us how to pick out decently good judges of these things--we don't have as great a need for the older method of establishing experts, anymore.

2. The author writes, we could make progress faster. Please: why? It seems to be a sign of the time that we want to accelerate everything. It is most likely that this will only produce more noise. We need to slow down our lives and our thinking to stay accurate and produce real value.

In order to help motivate the need for faster progress, please note the following:

Earth has over 7 billion people now. (http://www.unfpa.org/swp/)

The bottom 50% of people in the world (rightly or wrongly) own around 1% of the wealth. (http://escholarship.org/uc/item/3jv048hx#page-4)

We (at least those of us in the first world) don't have a lot of time left, unless we can find ways of solving the Big Problems. Progress in tech is likely the only palatable solution to this predicament. We can't really afford to slow down now.

I am glad that science results, at least in my field, are still reliably scrutinized by many. This way I know that reading and spending the time to understand the stuff is worth my time.

I'm greatly concerned about this sort of attitude. Might I suggest that an occasional dalliance in "time-wasting" articles might prove refreshing, and further that it could help ferret out ideas that would otherwise be overlooked?

edit: Modified for civility. Sorry folks.


Here's a paper I read yesterday. Kindly review it for me:

http://arxiv.org/pdf/1107.2875.pdf

The scheme you are advocating might have succeeded 200 years ago, when science was happening at a much for fundamental level than it is now. These days, things are so specialized that what the OP says is spot-on: only a handful of people are qualified to knowledgeably comment on your work. What's more, these people are all /incredibly/ busy; I would know since I spend my time trying to get their attention (grad student) :-) In short, the way we conduct research is kind of like what Churchill said about democracy; it's the worst system except for all the other systems.


Thank you sir/madam for an opportunity to better explore the depths of my ignorance. I have previously wondered what sort of use abstract algebra and geometry would be put to outside of various parts of applied mathematics or physics, and I would now appear to have an excellent example. :)

As I've mentioned in another reply, I'd like to see the system cleaned up and made more accessible. I'm a mechanical engineer by schooling and a graphics programmer by trade, so I don't frighten easily at math or algorithms. That said, this paper gave me quite a start. How deep down the rabbit-hole would you say it is in the field of computer vision? Is it fairly advanced/niche in application, or is it about some sort of foundational knowledge that is supposed to be known backwards and forwards by practitioners of the art?

I feel as though a better way of presenting the papers might aid in understanding (clear list of dependencies for knowledge, clear list of applications, clear note of "hey, this is a minor optimization, so don't sweat the details if you don't get it", etc.), but my personal metrics for judging this may not be correct. Any feedback would be appreciated.


"don't sweat the details if you don't get it" is probably the opposite of what you want to communicate to a reviewer ;) if there is a problem in "the details" the entire work might need to be scrapped.


If you can't be bothered to read new things unless they're "worth your time", the hell with you.

Realistically, even in a narrow field just reading (forget about working through the mathematics, checking the results, etc.) all the papers that are published through the slow existing process would take up more than 100% of working time. Clearly there has to be some kind of filtering process.

As it happens, I think that this process should be both faster and more open, but it is foolish to dismiss realistic concerns about how to decide which research it is most worthwhile to read.

There's a big difference between that and being afraid to be exposed to something outside their comfort zone (I am not too sure what you mean by that). You may hope to be so lucky as to have competitors with people like that on your team, but would you be happy to have on your team someone who spent literally their entire workday every day reading reviews of other people's work without contributing anything new of their own?


In my personal experience the main problem with scientific research today is caused by publication counts representing the sole metric for academic success.

Research is 'optimized' for publications and segmented according to conference schedules. That in itself I consider detrimental to the research effort. However the major damage in terms of advancing resarch in my opinion is caused by the induced tendency of scholars to closely stick to the main stream paradigms, trends and topics. Naturally it is much 'harder' to do research that does not directly build up on the current state of the art. Such research usually takes much longer and therefore results in a lower publication frequency. Secondly it is carrying a much higher risk of not bringing about positive results. Thirdly I deem the chances for acceptance in peer reviewed conferences and journals to be much lower. Such venues are often biased towards current main stream research trends.In my observation there is also an aversion against research that questions the state of the art. People who have spent years or decades to master every aspect of the state of the art have a strong incentive to shoot down anything potentially disruptive.

As such the incentives are misaligned with regard to the aim of science of figuring out new things.


> In my personal experience the main problem with scientific research today is caused by publication counts representing the sole metric for academic success.

It also leads to bloat in the literature: People learn the "least publishable increment" and go publish that.


Exactly. A study by Tuliving and Madigan (http://alicekim.ca/AnnRev70.pdf) outlines this nicely. In the words of David T. Lykken (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.118...):

In their 1970 Annual review chapter on Memory and Verbal learning, Tulving and madigan reported that they had independently rated 540 published articles in terms of their "contribution to knowledge". With "remarkable agreement", they found that they had sorted two-thirds of the articles into a category labelled "utterly inconsequential". "The primary function these papers serve is to give something to do to people who count papers instead of reading them. Future research and understanding of verbal learning and memory would not be affected at all if none of the papers in this category seen the light of day"

About 25 percent of the articles were classified as: "run-of-the-mill .. these articles also do not add anything really new to knowledge... [such articles] make one wish that at least some writers, faced with the decision of whether to publish or perish, should have seriously considered the latter alternative"

Only about 10% of the entire set of published papers received the modest compliment of being classified as "worthwhile". Given that memory and verbal learning was then a popular and relatively 'hard' area of psychological research, attracting some of the brightest students, this is a devastating assessment of the end product. Hence, of the research ideas generated by these psychologists, who are all card-carrying scientists and who liked these ideas well enough to invest weeks or months of their lives working on them, less than 25% of 40% of 10% = 1% actually appear to make some sort of contribution to the discipline.

Given that this work was published in the 1970s when the "publish or perish" pressure was less strongly developed I expect today's number to be worse. Concerning Information Retrieval the field of research of my PhD I would certainly attest that this applies.


I have been trawling through large numbers of papers recently as part of my computer science Master's background research, and some papers are quite frankly unintelligible garble... I can't even understand how they passed peer review.

I would say that of the articles I've read, maybe 5% are meaningful contributions that advance the science; maybe 25% contribute something to the current understanding, perhaps 60% are "we did something cool", and 10% are "are these reviewers awake".


Thanks for those links. During my own PhD, I was pretty sure that the "least publishable increment" became negative.

The underlying problem is that there is no penalty for publishing non-increments. For some inexplicable reason, citation impact assessments do not divide by the total number of publications.

I have no doubt that such a normalization factor would go a long way to fixing the problem.


There's been an attempt to measure paper quality using the "impact factor", but as with seemingly every measure, it's led to shenanigans. Some publishers encourage authors to cite papers from their journals, and some editors write survey papers that happen to cite a bunch of articles from their journals. A recent AMS Notices covered this in math, and I sort of expect it's worse in other disciplines.


To me it looks a lot like a manipulation of search engine. Maybe Google or Microsoft will not disclose their rank inflation counter-tactics, but what they could do is to provide their solution for this specific verticals.

As much as I'd like open-source effort for that, I understand that knowing the rules allows easy manipulation.


Well, there already is Google Scholar. I don't know precisely how much effort goes into the ranking algorithms used there -- but I do already use it in preference to Pubmed.


To disrupt scientific research you need to change the way that science is funded. Science is not for-profit, unlike the disrupted industries mentioned here. If you have to show how determining how a flagella works is going to produce revenue, then you are going to stifle science in another way.

Good science is slow, deliberate, and motivated by open-ended questions and discovery, not by end points.

We need to fund the process of science, and make sure that the knowledge gained is shared publicly. No other disruption is necessary.


"Science is not for-profit"

Science is also not not-for-profit. Plenty of for-profit businesses do research, and publish in scientific journals. Some research fields (drug development comes first to mind) have a very large industry presence.


Tim Gowers, fields medallist and the person who started the first polymath project just wrote a couple of blog posts (the latter including suggestions from the former) on a new model of math publishing. They are highly detailed and well thought out and also talk about things one might not understand unless one works in academia.

https://gowers.wordpress.com/2011/10/31/how-might-we-get-to-...

https://gowers.wordpress.com/2011/11/03/a-more-modest-propos...


Another vector for disruption is with statistics. Scientists use statistics to analyze the results of their experiments all the time. Unfortunately, most scientists are not statisticians. They spend a lot of time on statistics that they want to spend doing things they're good at. They make lots of mistakes, for example using the wrong statistical test or deciding what test to use after the experiment, etc.

With today's computer speeds there is no reason to do the statistics the way they're doing it. You don't need to make strong and unjustified assumptions about your data because then you can use a simple T-test. Today with simulation you can do pretty much any statistical test you want. You can do Bayesian inference.

However, most of these scientists are not capable of programming that themselves in R. What is needed is a simple GUI interface where they can state their assumptions and enter their data. Then the program calculates their posteriors and does any hypothesis test they want. The statistics scientists use today are optimized for pen and paper. The assumptions no longer hold. Who cares if the computation takes 500 milliseconds on a computer instead of 3 milliseconds?


A very practical problem with making data available publicly is privacy. This is likely not going to be an issue for data coming out of the LHSC, but comes into play in CS research. I know a bunch of researchers who work with large datasets that have user locations, cell tower communication, social network data, etc. Of course, even researchers work with data that is annonymized. But almost nothing is truly anonymous. You have to assume that researchers working with the data are going to be sensible about how they use it. If not, they will face serious consequences (i.e. lose their jobs and hurt their reputations). A dataset made public typically cannot be controlled in this manner.

Researchers must also go through IRBs (independent review boards) at their institution prior to engaging in research that deals with human subjects. If the collected data is going to be made available publicly, it makes the process more arduous.

Btw ... there is a decent dataset repository for CS researchers doing mobile computing called Crawdad.


"Researchers must also go through IRBs (independent review boards) at their institution prior to engaging in research that deals with human subjects. If the collected data is going to be made available publicly, it makes the process more arduous."

That is not really my experience. Getting IRB approval to release human data just means proper de-identification which one should do anyway. For example, generally subjects are given randomly generated IDs with only the PI having the master list.


Unless scientists and researchers start to put the interests of collaboration and “open science” ahead of their desire to be promoted or win tenure the system will not change

This is the main problem because most won't.


I absolutely would if I could do it and still get funded. I'd love to share everything openly. Just pay my bills as I do it.

Currently, science is funded based on individual results. You can't blame folk for wanting to be promoted or to win tenure.


> Currently, science is funded based on individual results.

Yeah, there's a direct tension between the move over the past few decades towards very competitive grant funding (10% funding rates, short grant terms, much fewer long-term, large block grants for centers), and a goal of having everyone share everything altruistically. If you purposely set up science funding so that it encourages cutthroat competition, people are going to have to behave like cutthroat competitors, and those who try not to will (on average) lose to those who do.


Yea, the real question is how to show their efforts in an open system so they can get tenure without fear of being "scooped" by being open


Bingo. There is good reason to fear being scooped. It can cost you funding, and thus, your job.


For sure - I wish we could find ways of seeing whose "code" was "forked" and thus whose shoulders everyone is collectively standing one - the some of that is your advances in the field right?

The current system uses paper publishing (# / impact factor) as a proxy for this, but it produces a lot of perverse incentives. If we found a way to see whose concepts were being used and when - people would instead throwing out any idea they'd have hoping that it be picked up (even unknowingly) by other researchers so they can get some of the credit.

If this could be done, would it fix this status / scooped problem, and is there actually a way to replicate the "fork" structure of something like GitHub? Unfortunately, science is both: 1.) hard to pull in-depth semantic data out of (I'm guessing a lot of nuance) and 2.) not formally coded like programming language.


A scientist who does not get promoted or win tenure is likely to cease being a scientist. Losing those scientists who favour change makes the system even less likely to change.


Actually, there are quite a few points where modern technologies would allow for a significant streamlining of research processes. Off the top of my head (as I work in the field):

1. data acquisition - you won't believe in how many labs devices with a computer interface (gpib or whatever) are running in standalone mode, costing hours in grad student hours where parameters are changed and values are read by hand - in the best case people duct-tape something together with a labview program. No. Just No.

2. collaborative data sharing - if you want to show your boss a graph, you email him a jpg - where is the site to upload a csv and show a graph to other people/edit together?

3. Writing papers: The state of the art is mailing a LaTeX(!) or Doc(!) file to your colleagues with .v1.edited appended... Reviewing the published material is just the last step.

PS: I'm working on a solution for #3 (Etherpad + LaTeX preview + export in the appropriate journal format). Drop me a line if you're interested in details.


My company, Collaborative Drug Discovery (https://www.collaborativedrug.com) has been developing and offering a secure collaborative data sharing environment for drug discovery data (chemistry and SAR data). There are other tools in the space as well though labs and companies are slow to change. The sector is very secretive and closed so it takes time for habits to change.

Some factors have been promoting more collaboration and data sharing:

* The increasing cost of research

* Specialization and the emergence of micro-biotech (5 people biotech startups)

* Foundations like the Bill and Melinda Gates Foundation that push for more collaboration amongst recipients of their grants (disclaimer: Collaborative Drug Discovery has received grants from the Gates foundation as well)


I think the major funding agencies are aware of the problem and the potential, and are working on solutions in their own way (for example, the NSF Office of Cyberinfrastructure's DataNet program, http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503141). But major cultural change in the way scientific research is conducted just isn't going to happen overnight.


There are already some efforts to speed up peer-review and make more scientific publications available for free to anyone, e.g., http://www.plosone.org/

> Unless scientists and researchers start to put the interests of collaboration and “open science” ahead of their desire to be promoted or win tenure the system will not change

Something similar could be said about so many industries, including our own.


Note that the nature of research varies quite a bit from discipline to discipline. Physics has a preprint-oriented culture that reduces the conflict between individual interests and the wider ends of research.


I wish people would stop using the word "disruptive" all the time...


Why?


Society for Amateur Scientists:

http://www.sas.org/





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: