

The Future of Science - RichardPrice
http://techcrunch.com/2012/04/29/the-future-of-scienc

======
nkoren
On a related note, some years ago I read an academic paper -- alas, it was
printed on a dead tree, and I can't find a link to a digital version of it --
which pointed out that the rate at which papers were cited was driven
primarily by the rate at which papers were cited. This is not as much of a
tautology as it sounds like.

Think about the process of writing a paper: you do some keyword searches for
recent articles on related subject. You then look at the bibliographies for
those articles, pick out whatever looks relavent to your topic, look at the
bibliographies of _those_ articles, etc. What this means is that apart from
your initial keyword search, the primary criteria for including an article in
your research is: "has it been cited already?". Relevance is merely a
secondary filter.

This paper pointed out the effects of this phenomena: the vast majority of
published scientific papers are _never_ cited again; a moderate number are
cited only a few times, and the remaining few -- having reached a
bibliographic critical mass -- are cited thousands of times. The authors of
the paper made a strong case that this was not a good reflection of the
quality of research. In many cases, the process of reaching bibliographic
critical was simply based on the almost random chance of acquiring those first
few citations. The authors provided several examples of important scientific
ideas which had been lost for decades, arguably because they had not attracted
a critical mass of citations in the years immediately after publication.

In other words, humans suck at pagerank.

Anyhow, it occurred to me that this is a problem which could be solved with
technology. Imagine an online word processor which -- in a sidebar -- suggests
potentially related articles from ArXiv and Google Scholar. This would be
based not on crawling bibliographies, but rather on semantic analysis of the
adjacent paragraphs.

I think this would create some real benefits. It would remove much of the
problems with citation bias, ensuring that important ideas aren't lost, and
also that prior research isn't unwittingly duplicated. Wish I had time to
implement something like this!

~~~
sb
I think that your impression of doing related work is overly simplified. Since
you are submitting to a conference/journal in your field, chances are high
that the reviewers are knowledgable in the subject area and will point out
errors in attributing due credits to related work.

While I agree that there are systemic problems with peer review and how the
science "enterprise" works, there is fitting analogy from politics by Winston
Churchill: "The worst form of government except for all the others."

~~~
nkoren
I think you misunderstand. I'm not talking about attribution errors. I'm
talking about the fact that discoverability and cross-linking is severely
hampered the bias towards previously cited works. This doesn't create _errors_
per se, but can narrow the scope of inquiry to the point where it becomes
detrimental to the institution of science as a whole. It's a naturally
emergent silo, but a silo nonetheless.

A NLP-based referencing system would not need to be _correct_ all the time; it
would merely need to be helpfully _suggestive_. As you're writing your paper,
it would put tips in the sidebar: "Maybe this is relavent? (Hover to read
abstract)". As long as there aren't an intolerable number of false positives,
it would be quite a useful tool, I think.

~~~
sb
Well, actually I was not only having attribution errors in mind, too. Peer
review ensures that colleagues will tell you about related work that you don't
know about. Sometimes, people will tell you that something is related, even
though you yourself don't actually think it's related work. Only with some
time and acceptance, you will see that the remarks are really related,
probably not directly to your own contribution, but to the bigger field that
you orient yourself in.

Come to think of it that this is probably the most important detractor for
having an NLP based "recommender." Personally, something like this might be
interesting, probably even a great help, but at the end of the day, people
need to really read _a lot_ of papers, follow the proceedings of their target
conferences, journals, and asking colleagues for their bibliographies. This
has the added benefit of teaching them how to present their own work in
contrast to others, do meaningful evaluations (in the best of all worlds, of
course!) and figure out who is doing interesting work and might be valuable to
get into contact with. Of course, some parts could be automated, but there is
currently no incentive for scientists to do so.

IMHO, it would be a much more important step for CS researches to publish
their code, too, because I frequently come across papers that have no
implementation or evaluation at all--and that's really bad, because then the
least-publishable unit becomes an idea with nice pictures. Researchers can be
very successful using this "publication strategy." Come to think of it, there
should be another approach to rank scientists by the number of publications,
or their impact; unfortunately, I have no idea what could work instead.

~~~
RichardPrice
I totally agree about the importance of scientists to publish their code. That
is critical. It's one of the many parts of the scientific process where the
community would benefit from greater sharing.

------
simonster
Yes, there is a time lag problem. However, instant distribution has been
around for a long time (in the case of arXiv.org, since 1991). It's widely
accepted in the physics community, but it hasn't gained much traction in most
other scientific disciplines. I think there are two reasons for this: the
chicken and egg problem, and the peer review problem.

The chicken and egg problem is that no one in these disciplines publishes
unreviewed manuscripts because no one reads them. The corollary here is that
if you do something interesting and someone happens to read it, take your
idea, and publish first, as far as credit goes, you're fucked. This happens
with any form of public presentation of ideas, not all that often but often
enough that every scientist knows someone who it has happened to. If you just
sank a year of your life into a project, you want to make damn sure you're
going to get credit for it. At present, instant distribution is too risky. If
the profile of instant distribution can rise to the point where a manuscript
will be sufficiently widely read to be acknowledged as the source of an idea,
scientists in less competitive areas may be more open to it.

The bigger issue is, I think, that scientists actually appreciate peer review.
Peer review ensures both quality and fairness in research. If I read a paper
in a high-impact journal, I generally believe can trust the results regardless
of who wrote it. By contrast, any reputation-based metrics will be strongly
colored by the reputation of the lab from which the paper originates. (I have
a hunch that this is already true for citation metrics.) Replacing peer review
with reputation-based metrics may mean research gets out there faster, but it
may also mean that a lot of valuable research gets ignored. This still sucks,
and it may suck more. Turning a paper into a startup that may succeed or may
fail depending on how well a scientist can market his or her findings would
absolutely suck ass. IMHO, scientific funding is already too concentrated in
the hands of established labs, and these labs are often too large to make
effective use of their personnel. Reputation-based metrics would only
contribute to this problem. They would also lead to confusion in the popular
press, which is already somewhat incapable of triaging important and
unimportant scientific results. This is a much bigger deal in biomedical
science than in theoretical physics, because the former has direct bearing on
individuals' lives.

On top of this, citation metrics are simply not peer review. In his previous
article, Richard Price pointed out that researchers need to spend a lot of
time performing peer review. This is absolutely the way it should be.
Researchers should spend hours poring over new papers, suggest ways of
improving them to the authors, and ultimately ensure that whatever makes it
into press is as high quality as possible. IMHO, the easiest way to get
quality research out faster is to encourage journals to set shorter peer
review deadlines and encourage researchers to meet them, not to throw away the
entire system.

OTOH, I think open sharing of data sets among researchers will massively
enhance scientific progress, and has a reasonable chance of happening because
the push is coming from funding agencies, not startups. As a scientist, the
idea of being able to ask my own questions with other people's data gets me
far more excited than being able to read their papers before release.

~~~
RichardPrice
I totally agree with you about data-sharing. I wanted to spend more time on
that in the article, but didn't because I didn't want to make the article
longer. I think the ability to share and ask questions about data really has
enormous potential to drive science forward. The fact that enormous amounts of
scientific data remains private to the lab, and not shared, is really a big
loss to science. It's going to be very exciting as that data starts getting
shared more.

The key to making that happen is disrupting the credit system. Right now
scientists aren't incentivized to curate and share their data, so they don't
put in the work to do it. You can't put data-sets on your resume, much like
you can't put blog posts, or anything that is not a paper. As soon as
scientists start getting credit for sharing data-sets, I think we'll start to
see it happen.

Similar points apply, as you mention, to instant distribution. Instant
distribution will happen more as scientists start getting credit for
scientific ideas that they distribute instantly. You are already seeing some
disruption to the credit system. In the last 5-10 years, since citation counts
have been made publicly available by Google Scholar, citation counts have
started to play a much larger role in resource allocation decisions, e.g.
decisions by hiring committees and grant committees. I did my PhD at Oxford in
philosophy from 2001-2007, and remained involved with some of the hiring
decisions at the Oxford philosophy department until 2011, and it's been very
interesting to watch the increased influence, over those years, of citation
counts in hiring decisions.

Citation counts aren't perfect, but they are another signal. Hiring
committees, I have experienced, are desperate for more signals that they can
take into account when comparing candidates. Comparing candidates is a tough
job. As with any signal, to wield it properly, you need to know its pros and
cons. Fundamentally what the community is looking for here is a variety of
signals that show how much a highly respected chunk of the scientific
community has interacted with a piece of your content, and found it useful.

To get data-sets, and other media, to attract scientific credit, we need to
develop metrics that demonstrate the traction that those pieces of media are
getting in highly respected parts of the scientific community. I think those
metrics will get developed, and that new metrics will play an enormous role in
allowing different kinds of media to be shared, and everything to be shared
faster.

------
3am
Admirable cause, but the author doesn't do themselves any favors by
dramatically overstating the role of publication in knowledge sharing
(informal channels & conferences exist, publication serves more of a
recognition purpose), and with somewhat offensive, unsupported claims like,

"The stakes are high. If these inefficiencies can be removed, science would
accelerate tremendously. A faster science would lead to faster innovation in
medicine and technology. Cancer could be cured 2-3 years sooner than it
otherwise would be, which would save millions of lives"

~~~
RichardPrice
I don't think informal channels, and conferences, which are infrequent, and
really expensive to travel to, are enough. Before the 1600s, science was
largely done by wealthy people who had large enough houses to have a
laboratory in. Scientific results weren't publicly shared; at best they were
shared between the experimenter and a few of his/her friends, who communicated
by private letters.

In the late 1600s, the first journal was founded, which meant that it became
the norm for all scientific results to be shared publicly. This era coincided
with the birth of the Scientific Revolution, which was an incredible
flourishing of scientific thinking, that formed the basis of modern science.

I think that with a much more connected scientific community, that operated
more as a global brain, rather than relatively disconnected nodes, scientific
progress could double. So if cancer would normally be cured in, x years, I can
see that coming down to 1/2x years, with an accelerated science, and, given
the length of x, that shortening is likely to be a matter of years.

~~~
3am
Best of luck with academia.edu!

------
reasonattlm
This is a time for revolution in the methods of science and the funding of
science, long overdue and enabled by the internet. It will be a mix of
removing the barriers to entry, blurring the priesthood at the edges, open and
iterative publishing of data, drawing crowdfunding directly from interested
groups of the public rather than just talking to the traditional funding
bodies.

Astronomy has long been heading in this direction, actually - it's a leading
indicator for where fields like medicine and biotechnology are going. People
can today do useful and novel life science work for a few tens of thousands of
dollars, and open biotechnology groups are starting to formalize (such as
biocurious in the Bay Area).

There is a lot of good science and good application of science that can be
parallelized, broken up into small fragments, distributed amongst
collaborative communities. The SENS Foundation's discovery process for finding
bacterial species that might help in attacking age-related buildup of
lipofuscin, for example: cheap, could be very parallel. In this, these forms
of work are much like software development - consider how that has shifted in
the past few decades from the varied enclosed towers to the open market
squares below.

This greater process is very important to all of us, as it is necessary to
speed up progress in fields that have great potential, such as biotechnology.
Only a fraction of what could be done will be done within our lifetimes
without a great opening of funding and data and the methodologies of getting
the work done.

~~~
fl3tch
I agree with a lot of what you said, but this:

> People can today do useful and novel life science work for a few tens of
> thousands of dollars

makes me wonder if you've ever done bench work or furnished a lab. Sure, you
can do a few weeks or months of work for tends of thousands of dollars (which
doesn't produce _a lot_ of useful results in that time frame, but can produce
some), but that's assuming you have a functioning lab. It often takes $100K
just to stock one, which is why most new investigators get special startup
money just for that. To produce _useful_ results often takes years at a rate
of at least $100K a year. Equipment and reagents (especially enzymes) can be
expensive. $300 for a gel box here, $1000 for a pipette there, $5000 for a
thermocycler, $2000 for an enzyme, it adds up. And that's not even getting
into salaries. Most people don't work alone.

I think it would be incredibly difficult to crowdfound science a la something
like Kickstarter, especially with the amount of money currently spent on
science (about $50 billion annually in the US alone). But maybe someone on HN
will be the person who proves me wrong.

------
timdellinger
The problem with distributed, grass roots peer review is that you get poor
quality reviewers. The current structure is slow and very "old media", but it
is this way because it's the only way to guarantee quality peer reviews.

If journals cease to exist, and a new publish-it-anywhere-then-publicize-it
paradigm emerges, along with some associated metrics (kinda sorta like
Reddit), then I predict that conference presentations will become the new
metric of success. They have gatekeepers, and scarcity due to limited
bandwidth (i.e. there are a limited number of time slots available). The whole
journal publishing infrastructure will just be shifted over to conferences...
along with the ecosystem of for-profit vs. trade group, etc., and the Slowness
and Single Mode of Publication problems that the OP describes.

------
thisisnotmyname
"The norms don’t encourage the sharing of an interactive, full-color, 3
dimensional model of the protein, even if that would be a more suitable media
format for the kind of knowledge that is being shared."

This is simply wrong - when you solve a protein structure is is mandatory that
you submit it to the pdb (i.e <http://www.pdb.org/pdb/101/motm.do?momID=148>)
and nearly every journal I read has both color figures and extensive online
supplementary materials.

~~~
RichardPrice
You're right, there are some trends in the right direction, which is terrific.
But these are early. By and large scientists only get credit for publishing
papers, which means they aren't taking advantage of the full interactive power
of the web.

It's rare for scientists to share things like data sets, or other things like
a video of a certain physical process that is going on. Most graphs or tables
in scientific papers are non-interactive: you can't change the x and y axes,
or other properties of the graph, as you can with graphs in Google Analytics,
or generally data that is displayed for native web consumption. Similarly the
code that scientists use to run on their data sets, which generates
conclusions that end up in their papers, doesn't get shared.

I think the key to opening up richer sharing is to provide credit metrics that
incentivize this kind of activity. When scientists can get credit for sharing
data-sets, code, videos, and a wider array of rich media, they will start
sharing more, and taking greater advantage of the rich media power of the web.

------
archgoon
No 3d models for new proteins?

The protein databank exists precisely for that reason.

<http://www.rcsb.org/pdb/home/home.do>

~~~
rflrob
More broadly, lots of data that doesn't lend itself to a single figure has
either repositories (like the Gene Expression Omnibus) or supplemental
attachments to the paper that can be included.

------
wiggins37
I'm glad that the author is thinking about ways to increase communication
between scientific authors, but some of the statements he made, specifically
regarding "curing cancer 2-3 years sooner" make him sound ignorant of some of
the challenges facing researchers. Not all scientific knowledge is presented
only through journal articles. As others have already mentioned, conferences
with "poster presentations" are pretty common in medicine to discuss ideas
before the paper comes out. In addition, labs across the country working on
similar problems often exchange ideas and substrates by email and mail
respectively. I agree that it would be great if there was a more centralized
repository of information online to get information. If anyone has any
experience with blogs, forums or websites specifically addressing oncology
(that are not just press releases) I would appreciate learning about them.

------
tel
_Imagine if all the stories in your Facebook News Feed were 12 months old.
People would be storming the steps of Congress, demanding change._

To play devil's advocate, the time lag forces your conversations to strive to
a higher standard of quality, comprehensiveness, correctness, and context than
Facebook updates could even be compared to.

Then again, striving for the higher standard also invents pseudosciences, bad
stat, and outright fraud.

In short, I don't think the solution is to replace the paper with something
instantaneous. I agree that instantaneous (public) communication could be
better used in the academic community, but there's a trend that way already as
blog posts begin to signal a certain kind of good advisor.

I especially don't agree that search engines have any business replacing peer
review.

------
stephenhandley
Many of the new approaches to science publishing I’ve seen haven’t done enough
to directly address the silo problem or provide significantly improved
alternatives. I suspect this is primarily because they’re trying to create
viable scientific publishing businesses of their own. I believe taking a
different approach around free, distributed, open source publishing and
aggregation software would be better suited to transforming scientific
communication into a more open, continuous, efficient, and data-driven
process.

more here: <http://tldr.person.sh/on-the-future-of-science>

------
kirk21
It always strikes me how much time it takes to finish your paper (eg.
conference template, correcting spelling mistakes, formatting figures etc.)
while you could outsource this. Student-assistants are an option

Furthermore it would be nice to discuss your ideas without having to spend
months writing a paper. Guess there is a difference between alpha and beta
sciences.

Finding relevant conferences is a challenge as well (since I'm still a junior
researcher).

------
mukaiji
I recently had a chit-chat with a 5th year phd friend of mine in front of the
stanford bookstore. We both did academic research, and both know all too well
the incredible frustration of tech not-having full penetrated academic
research. If ever you'd like to work toward making research faster, reply to
this. We could think about a couple of things and start cranking out some
solutions.

~~~
drewbuschhorn
(since you're in cali, and I'm not) I'd like to draw your attention to the
science hack days in SF. Have a look at this wiki[0]. WilliamGunn and cazDev
on github are two people I know who've participated in / have some connections
that might help a project like this happen.

[0]
[http://sciencehackday.pbworks.com/w/page/45740104/SFideas#ag...](http://sciencehackday.pbworks.com/w/page/45740104/SFideas#agora)

