

Machines are better referees than humans but we’ll be sued if we use them - inglesp
http://blogs.ch.cam.ac.uk/pmr/2014/02/18/machines-are-better-referees-than-humans-but-well-be-sued-if-we-use-them/

======
Blahah
Peter Murray Rust (author of this blog post) is a really great man. He's been
a tireless advocate for dismantling privelege and setting knowledge free for
several decades. I'm proud to say he's becoming a sort of mentor to me. Last
week I spent a couple of days with his research group and saw this software in
action - it's really impressive.

They can take an ancient paper with very low quality diagrams of complex
chemical structures, parse the image into an open markup language and
reconstruct the chemical formula and the correct image. Chemical symbols are
just one of many plugins for their core software which interprets
unstructured, information rich data like raster diagrams. They also have
plugins for phylogenetic trees, plots, species names, gene names and reagents.
You can develop plugins easily for whatever you want, and they're recruiting
open source contributors (see
[https://solvers.io/projects/QADhJNcCkcKXfiCQ6](https://solvers.io/projects/QADhJNcCkcKXfiCQ6),
[https://solvers.io/projects/4K3cvLEoHQqhhzBan](https://solvers.io/projects/4K3cvLEoHQqhhzBan)).

As a side effect of how their software works, it can detect tiny suggestive
imperfections in images that reveal scientific fraud. I was shown a demo where
a trace from a mass spec (like this
[http://en.wikipedia.org/wiki/File:ObwiedniaPeptydu.gif](http://en.wikipedia.org/wiki/File:ObwiedniaPeptydu.gif))
was analysed. As well as reading the data from the plot, it revealed a peak
that had been covered up with a square - the author had deliberately obscured
a peak in their data that was inconvenient. Scientific fraud. It's terrifying
that they find this in _most_ chemistry papers they analyse.

Peter's group can analyse thousands or hundreds of thousands of papers an
hour, automatically detecting errors and fraud and simultaneously making the
data, which are _facts_ and therefore not copyrightable, free. This is one of
the best things that has happened to science in many years, except that
publishers deliberately prevent it. Their work also made me realise it would
be possible to continue Aaron Swartz' work on a much bigger scale
([http://blahah.net/2014/02/11/knowledge-sets-us-
free/](http://blahah.net/2014/02/11/knowledge-sets-us-free/)).

Academic publishers who are suppressing this are literally the enemies of
humanity.

~~~
yoha
> Academic publishers are literally the enemies of humanity.

I would not be so extreme: they used to be a necessary evil before the
Internet became mainstream. They are more of a millstone from the past that
research still drags with it because changing an institution is hard.

~~~
rayiner
What proponents of "open access science" fail to realize is that science is a
business like any other, and a very big one involving lots of money. Like in
any business, some form of marketing and reputation management is key.
Academic publishers provide essential marketing services: filtering, sorting,
and branding. You self publish, and most people regard you as a kook. You
publish in _Nature_ , and suddenly you're someone worth reading. Branding,
credentialing, signaling, all these things are just as important if not more
so in science than in any other field, and that's the service academic
publishers provide.

At the end of the day, nobody is holding a gun to the heads of professors
forcing them to publish in these journals. These are freely-negotiated
agreements between researchers and publishers. The reason researchers continue
to support the existing system is that it serves their purposes and furthers
their careers. Any alternative system will have to provide similar benefits.

~~~
revelation
None of this explains why they should get the copyright.

~~~
rayiner
They negotiate with researchers for it.

~~~
silentOpen
Just like I negotiate with my hospital for medical care payment.

~~~
patio11
Life lesson for US HNers who may never have been in this situation: you can
negotiate with healthcare providers on price. It is actually fairly routine.
They won't think less of you.

70 cents on the dollar delivered immediately in cash beats the heck out of 100
cents on the dollar transferred to a debt-collection agency, worked for 6
months, and negotiated down to 60 cents of which the agency keeps 15.

------
yoha
Google cache:
[https://webcache.googleusercontent.com/search?q=cache:b2trH5...](https://webcache.googleusercontent.com/search?q=cache:b2trH5OA3P8J:http://blogs.ch.cam.ac.uk/pmr/2014/02/18/machines-
are-better-referees-than-humans-but-well-be-sued-if-we-use-them/)

------
atmosx
When I asked my journalist friend, why in football (soccer) games the ref
don't use high-tech, he thought about it for 5 minutes and then told me: "If
they use technology it will be really hard to set up games. If you take from a
league the ability to set-up games and promote specific teams/individuals,
then I don't know how the game will be shaped".

Of course it's _universal_ , it's not like _everything_ is a set-up but
happens more often than most would likely imagine, especially since betting
came into play.

So there you got it.

~~~
JonnieCache
The reasoning for not using hi-tech refereeing equipment in football is
apparently a desire to keep the game played at the top level the same as that
played in streets and fields the world over.

The friction of imperfect decisions is also considered part of the drama of
the game, rather than a flaw.

~~~
patrickk
> The reasoning for not using hi-tech refereeing equipment in football is
> apparently a desire to keep the game played at the top level the same as
> that played in streets and fields the world over.

That's already gone with goal line technology. In fact, it was never the case
- how many street games/underage kids games even have assistant refs?

> The friction of imperfect decisions is also considered part of the drama of
> the game, rather than a flaw.

That _very much_ depends on who you ask. Personally, I'd like to see some
technology come in (in high profile games played in stadiums already equipped
with tv cameras) to help get rid of the more ridiculous refereeing errors.

The most comical example I can think of contrived, convoluted FIFA rules is
the Zidane red card in the World Cup final 2006[1]. Allegedly an assistant ref
saw Zidane headbutt another player on a tv monitor, then alerted the ref, who
red-carded Zidane. Technically the ref didn't follow the rules, as to send
someone off the ref or assistants have to have seen the incident in real time.
This means the average football fan with only basic knowledge of the rules has
a better view of controversial, game changing decisions in high profile games
than the ref (fans can watch instant replays in slo mo whereas the ref has to
watch at full speed, sometimes with an obstructed view of an incident.)

[1] [http://www.football-italia.net/42269/ref-who-saw-zidane-
hit-...](http://www.football-italia.net/42269/ref-who-saw-zidane-hit-
materazzi)

------
JackFr
This should be supported (both financially and ideologically) by the National
Library of Medicine at the National Institutes of Health. The NIH doles out
about $30 billion in research grants every year. If they could spend a tiny
fraction of a percent to dramatically improve the quality of the rest _and_
make such automatic checking a standard practice that would be tremendous bang
for the buck.

Oh yeah -- and they're big enough to fight academic publishers.

------
tomp
Can they release the software to the world? Maybe, if we all make an effort to
analyse whatever papers we can access, we will together make enough noise that
it will be impossible to ignore, and also impossible to silence (cf. The
Pirate Bay). This could be one of the most important advancements of science
in the past few years.

~~~
Blahah
It's all FOSS:

[https://bitbucket.org/petermr](https://bitbucket.org/petermr)
[https://bitbucket.org/petermr/imageanalysis](https://bitbucket.org/petermr/imageanalysis)

~~~
j_m_b
This is great work, another fantasy of mine made reality and posted to HN!

Is their a tutorial for getting started with OSCAR? A "HOWTO" for analyzing a
paper would make this program more accessible. If I could learn how to use it
without spending too much time doing so, I would use it as tool for reviewing
manuscripts. I would also like to use it on my own manuscripts to find
mistakes.

I have a feature request: optical pattern recognition of mathematical
formulas. It would be awesome to feed a program a pdf and have all of the
mathematical formulas translated to LaTeX.

------
Shivetya
At first I thought the article would be about sports, which in itself would
make for an interesting discussion about using machines to judge rules
adherence, not that I would want to take that human element out of sports.

However this is more along the lines of validating what is published. Of any
group you would hope that scientist and their like would jump on technology
like this so as to provide the most accurate representation of their work as
possible. The same for publishers, why wouldn't they want to brag the use the
most advanced interrogation methods for the papers they publish?

I guess they are people too, hyper sensitive that fault will be found

~~~
jsmeaton
> I guess they are people too, hyper sensitive that fault will be found

Or straight up fraud. Or a dismantling of their monopoly of information/data.

------
_greim_
So as a non-scientist, let me see if I understand.

There are lots of uncaught errors floating around out there in scientific
papers, and many of them can now be found with this software. But the exposing
the errors so that they can be corrected is tricky because: A) you have to
have legal access to a paper in order to scan it, and B) even if you do have
access, under the current rules only the publishers have the right expose the
errors, and they're not interested because they want to avoid the
embarrassment.

Am I understanding it?

------
Udo
I see a very exciting possibility for the future of academic papers in certain
disciplines where we could have a machine validation step performed
automatically, not only on submission but as a tool for the author to check
their work. Like a git commit hook that forces a test suite to run. Of course,
this would require some formalism to tag data, diagrams, and formulae but it's
probably in our best interest in the long run to make the body of our research
more machine-accessible anyway.

~~~
anonymouz
I have a hard time seeing how anything but a teeny tiny fraction of scientific
results would ever be amenable to such an automatic checking. And I am a
mathematician -- in principle this should be easiest in mathematics, since at
least we have well defined axioms and _in principle_ one could derive
everything from those axioms. In practice, this seems completely unfeasible
for most mathematics, at least currently.

~~~
Udo
I'm guessing you're not alone, I would go so far as to say that validation
software would offend most authors - the same way code validation tools hurt
programmers' feelings.

"But I'm right and the tool is wrong! The tool doesn't understand the
complexity/brilliance of my work!" And sometimes you're going to be right with
that assertion. Other times, however, it will push you and your reviewers
towards better quality.

I find it fascinating that the entire article is in fact about this issue, the
copyright thing is purely incidental.

------
sov
For those curious, the 5 membered ring in cyclopiazonic acid should have a NH
atom rather than a CH2.

------
bloaf
When people talk about the future, they always seem to think that it will be
the scientific jobs that get roboticized last. I think it will be the
opposite, it won't be long before systems like this one will be able to
analyze the scientific literature, identify shortcomings, and tell us what
experiments to do next. Science will become less about creativity and problem
solving, and more about following directions; eventually becoming completely
automated.

[http://www.aejournal.net/content/2/1/1](http://www.aejournal.net/content/2/1/1)

------
nder
Any chance you could farm out the software to lab in a nationality with MUCH
MUCH looser copyright laws, and a court system that would be problematic for
outside law suits?

~~~
dhoulb
That's what I was thinking. Find someone who isn't under such a restrictive
licence, and let them feed in the data.

I presume this is what will happen.

------
dflock
This blog post is down, try here:
[http://blogs.ch.cam.ac.uk.nyud.net/pmr/2014/02/18/machines-a...](http://blogs.ch.cam.ac.uk.nyud.net/pmr/2014/02/18/machines-
are-better-referees-than-humans-but-well-be-sued-if-we-use-them/)

------
ylem
I suppose one way around this would be the NSF to require any grant awardees
to deposit their structures in a publicly accessible database...But, I'm a bit
surprised--is there nothing like arxiv.org for chemistry? Why not?

------
nl
There is of course a way around the problems cited in the article.

If the referees ran the software on the preprint it would find the same
problem.

I agree this isn't as good, but it would be a step forward.

------
bloaf
I think the dream would be to couple a literature-analyzer like this with a
specialized search engine like Wolfram Alpha.

