This is the crux of the of the problem IMHO - at least for the fields I study (AI/ML). Replicating the results in papers I read, is way harder than it needs be, i.e. for these fields it should just be fire up a jupyter notebook and download the actual dataset they used (much harder than it seems to actually get your hands on). Very few papers actually contain links to all of this in a final polished manner so that it's #1 understandable and #2 repeatable.
Honestly, I'd much rather have your actual code and data that you used to get your results than read through the research paper if I had to choose (assuming the paper is not pure theory) - but instead there is a disproportionate focus on paper quality over "project quality" at least IMHO.
I don't really know what the solution is since apparently most academics have been perfectly fine with the status quo. I feel like we could build a much better system if we redefined our goals, since I don't think the current system is optimal for disseminating knowledge or finding and fixing mistakes in research or even generally working in a fast iterative process.
It's like best practices for computer security -- always strive to minimize the attack surface. :) Without source code there is much less stuff to criticize!
I don't know precisely what field you submitted in, but this is maliciously bad reviewing practice. You should have submitted a rebuttal and written to the editor calling the relevance of such "reviews" into question.
I suspect that's also why some papers are unnecesarely verbose and describe simple things as complicated as possible. Can't criticize something that can't be understood.
Hiding code, obfuscating language, fudging data, all are symptoms of the same problem: of being interested in getting paper on cv instead of doing research.
There are many circumstances that can put even a good scientist in a situation where he/she has to do this but that's not a good argument for not sharing the code.
I also detest simple things made complex, though. In my experience (with has covered electronics, epidemiology and geography) reviewers tend to pick up on obtuse issues in text but miss glaring errors in the math. It's sad, and you can see why someone less than scrupulous would exploit that tendency by over complicating things. That said I think plenty of authors are honest but just not very clear thinkers!
Time before the submission deadline is usually used to do more experiments and write text, not to polish the code. And after the deadline there is no hurry. What people (who want to share code) consider important is to release it till a bit before the actual conference (but this doesn't transfer to journal-based fields).
Or a paper should be published in a probationary form, and not certified (by the journal) until an independent lab replicates the result. A paper that isn't making adequate progress toward replication should be retracted by the publishing journal.
I'm entirely open to being shown otherwise, but working in that field seems more akin to working in physics than in software engineering (those are engineering problems, after all, not computer science; and sometimes they're only opinions)— and being critiqued for that in an ML/AI paper would be like critiquing a physics paper over the author's coding style— it is misdirected, IMO.
They could be squashing some legitimately good work by being too heavy handed around coding style and build process.
My main bugbear with the whole thing is the incorrect spelling of "artefact". And when that's my main bugbear... well, things aren't too bad!
I'm really sorry to hear about your experience.
The way it should work is you put your stuff with code and all data on github. People interested in the field or working for journals read it, and rate it, journals collect links to paper repositories that are highly rated by scientists who have many highly rated papers in the field, and call that publication.
While it's true that minimizing the attack surface is something that can work in papers, in my field reviewers typically don't look at the code. Many of my papers include code or links to it, and I haven't ever had a comment about it in reviews.
If something important is missing or does not make sense, I usually just email the first author. Usually they respond within a couple of days, and unlike looking at code, I can also get an explanation of why they did it that way.
In fact, I don't even usually care that much about stated results (such as improvements in state of the art).
Things that matter are: deep insight into a problem, new angle to look at something, discovery of a new phenomenon, high quality explanation, practical tricks to save resources, and comprehensive prior/related work review. That's why I read papers.
sometimes you need to replicate exactly the same training method, on exactly the same data — for instance if you want to use it as a baseline on a known dataset. then it becomes really important to have the code, because while an adequate replication might be easy, it takes a lot of trial and error to get perfectly the same model.
I wish! :)
you get spammed with "good ideas"
Again, I wish!
In the subfield I'm focused on at the moment (efficient mapping of NN algorithms to specialized hardware, low precision computation, model compression) I don't see good ideas very often (fewer than one good paper a week). Previously I worked on music generation - also didn't really feel spammed with good ideas.
As an example, recently I saw a paper on NN weight quantization, which had a very interesting idea, but the results were not impressive. I don't remember if they had any code published or not, but it didn't matter - I wanted to see what kind of results I'd get if I implemented it. Turned out it works really well, much better than what they reported in the paper.
How would you implement that?
I was leaving it purposefully vague, just do the "inverse" of what it says in that paper.
In a way bringing about the kind of change you reference in scientific publishing would actually be a pretty significant research accomplishment -- the field would be that much better for your efforts! But the road to get there is filled with political wrangling, talking to and serving on committees, probably forming dedicated organizations and painstakingly getting buy-in. This is not something you can realistically achieve without probably a good career's worth of political capital in your field and the drive and people skills to make it happen.
Until it does happen, making your own lab adhere to these standards is admirable but with unfortunately limited upside. I'm not saying the status quo is good, just that there are reasons for it still being the status quo.
I think at first new students know this wrong but then get dragged into the circular logic of:
it is standard in the field -> it is ok -> it is standard in the field
Simple - change the incentives. Currently, academics are evaluated based on paper publications not "actual code". If you want code and data to be shipped, create enough incentive for them and you'd see the change.
One problem is bitrot. Stuff that runs now is not guaranteed to work in 1 or 2 years, let alone 10 years.
Even more so when it runs on fancy hardware, like GPUs.
I work with some genome guys and they have this problem as their sequencers basically turn over in a year or two the advances are so fast. So they have to maintain the specimen as well as all the software versions they used for analysis. It’s a pain, but otherwise nothing is reproducible.
In the US at least, research costs a LOT of cash. Many departments are chronically underfunded. In my state, the university only gets ~10% of it's funding from the state-house. The rest is grants. The only real writers of grants are the professor corps. So, departments look to the professors to fund the enterprise. Some of my advisers spent about 40 hours per week just on grant writing, neglecting the teaching and research hours required alongside. It is not a fun/good job. So most/all research is done by students, mostly PhD students, with little to no input from their advisers, and it's a stressful mess. As a result, most research is, well, amateur. Stats get mangled, code quality is non-existent, rats get loose, etc. Yes, yes, none of that 'actually' happens, but for real? It's a shitshow.
So, where does that leave the PhD student that has been in the program for 7 years? They may have one first author paper, if that, a thumb-drive filled with nearly unreadable 'data', and a dozen failed experiments. Failed experiments don't get published, mostly because science is hard and doing all the controls to say that you have a genuine/real failure is much harder. So the professor, now running into a very firm deadline to graduate the student via the grad office, must rush and publish something, just to get the student to leave. The professor's track record in graduating students is part of their evaluation, as well as their publication record. Hence, the unreadable graduation paper; one of two types of unreadable paper.
This paper is a targeted missile that is meant to do one thing: get the student off the payroll. It is not meant to be good, or a viable piece of science. It is never meant to be replicated. It is trying to be obtuse. It is there just to graduate a student, nothing more, nothing less.
The other class of unreadable paper is the turf-war paper. These papers are also meant to be just readable enough, but not so much as to be repeatable. The reason is that the paper is a 'big' paper. What is published is meant to stake a claim in a 'big' area of the field. Hopefully this will guarantee more funding in the future as now that professor is a 'big' player in it. Hopefully no others can report that it is unrepeatable before the next grant comes in. The trick is make certain that the paper exposes just enough of the experimental design as to truly 'claim' the new big thing, but not enough that you can replicate at all. Karl Disseroth is infamous for this in the bio world. The paper creates jazz, but safeguards the turf of the lab from any other lab that may want to replicate it independently; they need the first lab to re-do it, and they must come with funding in hand.
So, to sum up: papers are weapons. One type is the missile that causes a student to graduate. The other is a trap with a golden idol on it.
I was already on the way out of science when I started working at that job, but the publish or perish culture really accelerated my departure.
It's also interesting how the current incentives really warp the incentive structures not just at big research universities, but also at small liberal arts colleges. I grew up as a fac brat, and so I've been able to tune into a lot of dialogue about the latest crop of new professors coming in to replace older professors as they retire, and a lot of the older professors are genuinely shocked at how little emphasis the newer professors place on teaching (traditionally what SLACs have focused on) compared to research. Even at schools with around 2000 students, new professors are demanding generous starter packages that no one would really have thought to ask for in the 70s.
Peer-review takes a large amount of time from most academics, time that is totally unpaid. With the status quo, we are OK with that - it's a service we do to each other (we need our own papers reviewed, after all) and reviewing also has the advantage of finding new ideas sooner. Although precisely in AI/ML, many academics are currently complaining: due to the rapid expansion of the field, the peer-review load has gone beyond acceptable in many cases. For the last AAAI conference I had to review 6 papers in a not too long deadline. In the last 3 months I have reviewed like 40 or so papers, and I'm very far from being a top-tier star in my field, there's people who are probably getting much more review requests (although they're probably saying no to some if they want to keep sanity).
Reviewing code and data seriously can take, how long? I would estimate an order of magnitude more than reviewing a conventional research paper in PDF.
So currently, the situation is that if you post a link to source code you may get some positive reaction in the reviews, but in 99% of the cases reviewers are not going to actually look at the code (or at least not beyond a cursory look to see if it seems coherent at a first glance) because there is just no time.
Unless we fix this, I don't think we will see papers really focusing on the code and data, regardless of good intentions.
So, basically - a paper consists of what it takes to replicate the paper, and the blockchain journal's first step is running the replication.
This would be problematic for papers that require expensive computation, however...
Good God, I'm getting tired of every single thing needing use the magic word 'blockchain' ATM.
It actually seems like journals could benefit from the application of this technology.
So yes, you don't need a block chain to set these requirements, and if you're using a blockchain you don't need these requirements, _but_, a blockchain journal and these requirements would likely pair very well together, as they cover respective weak-points (centralized journals might only ensure the journal's publisher can replicate; decentralized journals have to have some kind of automated validation).
Buzzwords become buzzwords because there's something to them, after all.
The article seems to conflate the praxis of science with the archival of it. Scientists do all of the above on gigantic clusters, not on an IPython/Mathematica notebook. The purpose of publishing papers, on the other hand, is adding to the archival of knowledge, and they can be easily rendered in a laptop with LaTeX.
And they are excellent at archival, by the way. You can see papers from the 19th century still being cited. On the other hand I have had issues running a Mathematica notebook from a few releases back -- and I seriously doubt one will be able to read any of my Mathematica notebooks 150 years from now. The same with the nifty web-based redesign of the Nature paper that is mentioned: I bet the original Nature article will be readable 150 years from now, whereas I doubt the web version will last 20.
I did not claim the opposite, just that it regularly happens without interactive notebooks. This seems like an interesting project though. Regarding the blog posts, it seems that there's a bug that makes all the entries appear as published on September 20, 2017?
The problem of software packaging will be solved by then, at least well enough to trivially emulate any of the popular environments we have today.
There were two forces working against us. First many of the grants came from governments, and a stipulation was that we would devote some resources to helping startups commercialise the output of the research. Some felt that open sourcing would remove the need for the startups to work directly with them to integrate the algorithms, and that this would hurt future grant applications by making the research look ineffective.
The main opposition though came from PhD and Postdoc students. Most didn't want anything related to their work open sourced. They believed that it would make it easy for others to pick up from where they were and render their next paper unpublishable by beating them to the punch.
Sadly I think there was some truth to both claims. Papers are the currency of academics, and all metrics for grants and careers hinge off it. It hinders cooperation and fosters a cynical environment of trying to game the metrics to secure a future in academics.
I don't know how else you should measure academics performance, but until those incentives change the journal paper in its current form is going nowhere.
I honestly can't contemplate who in their right mind would want "a future in academics" where academia is defined as a constant stream of metric gaming rather than actually accomplishing what you originally set out to accomplish.
But I think most people tell themselves "once I get the PhD... once I have a permanent position... once I have tenure..." and by that stage they're institutionalized.
Maybe embargo works, to be progressively published over time, with grace period being extended by reaching goals/milestones.
In other words: use it or lose it.
But I do think there is a difference between scientific professionals communicating amongst each other and scientific communication to the public. And if mathematicians understood Strogatz his paper at the time when it was published, and there were enough mathematicians to disseminate the knowledge, then should you require that you create algorithms as animations?
Part of the reason why mathematicians and computer scientists (as researchers) conceive of new algorithms in the first place is because a lot of them are very strong in visualizing algorithms and 'being their own computer'.
Though, if a scientist wants to appeal to a broader group of scientists, then I'd recommend her or him to use every educational tool possible. For example, they could create an interactive blogpost a la parable of the polygons and link that in their paper.
On an unrelated note, it is such a pity ncase isn't mentioned at all in this article!
Also related is explorabl.es, not everything is science communication in an interactive way but a lot of it is.
Can you explain this in more detail?
If your background is in not-academia, you may well be forgiven for looking at the scene and seeing no progress. However, academia is more slow-moving and conservative in some aspects than other places. This is not a bad thing; not everyone has to move fast and break things. But at academia's normal pace, things are going pretty fast.
These problems are not unique to science. Take any large group of humans, be it government, the military, a company, a set of companies: they will coalesce onto the use of a certain number of tools, and it will take a LOT of energy and work to switch tools. Getting an entire industry to switch tools is extremely hard, perhaps downright impossible, and I'm not sure if it can be done in the rather slow-moving world of academic science. It simply won't be possible to enact a huge change on the system with so much institutional momentum built up.
It seems to me the best solution is to make incremental changes toward a github like system, training new scientists in grad programs to have better practices. Maybe we can't get researchers to post their full data set for all the public, but perhaps it can be made available to non-competitors, those who researchers aren't actively competing with for the same grants. Maybe researchers can be required by journals to post the peer-review comments/responses, as well as the drafts of the article, alongside the article itself. Maybe we can have researchers posting extremely detailed code/methodological information alongside the article, without forcing them to give over the entire dataset to other researchers?
Finally, maybe the whole incentive structure of academic science needs to change: maybe articles/publishing metrics should be de-valued in favor of teaching skills, mentorship ability, and collaboration within and across disciplines?
Who says we need to convince them? How about we leave them behind? They are rentiers, gatekeeping society's access to publicly funded scientific knowledge. I can't think of a reason why society should allow this hostage situation to continue.
I'm not saying this to be dismissive - I'm strongly in favor of faculties organizing to unseat administrators from their privileged positions. It's baffling to me that bright minds on campuses complain at length about the state of higher education but seem oddly averse to doing anything about it.
As far as tracking incremental improvements over time goes, I think it'll be hard to do better than our current method of including references to papers. It's impossible to track ideas the same way tracking code works (which itself is limited for similar reasons).
It would be nice if you could reference papers in a way that immediately allowed you to access them. And the technology to do so is there, but has limited use if papers aren't freely accessible from the internet.
But, given that the system of papers with references is essentially a DAG, I imagine someone will attempt to 'solve' the problem with blockchains before you can say 'initial coin offering'.
Why? I recently wrote a paper on GitHub. I loved it.
When you have a paper deadline in two weeks and four authors are furiously hacking in edits left and right, Git is absolutely invaluable when writing a paper. I can't fathom how people handled merges of written text without a version control system.
Open development is a policy for these projects and part of the grant stipulations.
Of course, data sets are more closely guarded initially until the groups that created them can publish. After that, though, CERN /LHC has done a decent job of making data publicly available from my understanding (not as someone directly involved).
I would be interested to hear more from scientists involved in projects doing open science.
> It needs to be combined with a mechanism for peer-review and publishing...
Maybe an "interpretation" mechanism, similar to the Distill project is doing, can serve two purposes at once: review and digestion.
I also feel a bit weird about badging in science in general, since most of the most passionate people I know in science are intrinsically motivated enough that I could never really see them really concerning themselves with such carrots unless it meant that they'd get more funds to do more of what they find fun.
I work everyday with papers from decades ago, and I hope people will work with my papers in the future. How can I guarantee that researches of 2050 will be able to run my Jupyter notebooks?
Moreover, it is not uncommon to not be able to publish source code. I can write about models and algorithms, but I am not allowed to publish the code I write for some projects.
If I wanted to prove to someone this statement was true, what would be the most effective way to do that?
Is author basing this conclusion on job postings somewhere?
Has he interviewed anyone working in these fields?
Has he worked in a lab or for a company doing R&D?
How does he know?
What evidence (cf. media hype) could I cite in order to convince someone he is right?
When I look at the other articles he has written, they seem focused on popularised notions about computers, but I do not see any articles about the academic disciplines he mentions.
edit: as is Chris Olah's Distill project:
With iPython this is also an issue -- tracking code in JSON is much less clean than tracking code in text files.
It's interesting that Mathematica and iPython both left code-as-plain-text behind as a storage format. I wonder if it would have been possible to come up with a hybrid solution, i.e. retain plain-text code files but with a serialized data structure (JSON-like, or binary) as the glue.
as a philosophical matter, for computation heavy fields, i would love to see literate programming tools become de rigeur in the peer-reviewed distribution of results. In some fields (AI) this basically happens already — the blog post with code snippets and a link to arxiv at the end is a pretty common thing now.
I think what appeals to me about literate programming style is that it encourages a return to a more clear and expository style of writing, which has been squeezed out of scientific writing in journals over the years. I don't care what instantiation is required to produce a more uniformly clear and cogent document, I just care that it happens.
* Date of publication and dates of research should be required in every paper. It's really difficult to trace out the research path if you start from google or random papers you find in various archive searches. Yes that info can be present but often its in metadata where the PDF is linked rather than the PDF itself. Even worse is the "pubname, vol, issue" info rather than a year that you get... now I have to track down when the publication started publishing, how they mark off volumes and so on. I just want to know when the thing was published.
* Software versions used - if you are telling me about kernel modules or plugins/interfaces to existing software, I need to know the version to make my stuff work. Again - eventually it can be tracked down, but running a 'git bisect' on some source tree to find out when the code listings will compile is not OK.
* actual permalinks to data, code, and other supplimental information. Some 3rd party escrow service is not a terrible idea even. I hate trying to track down something from a paper only to find the link is dead and the info is no longer available or has moved a several hour google journey away.
The basic function of a scientific paper is understanding and reproducibility (inspired by jfaucett his comment).
I wonder, is reproducibility necessary? Is it even possible when things get really complex? Isn't consensus enough? I feel in the field of psychology (and most social sciences) that is what happens. I suppose consensus can be easily gamed by publication bias and a whole slew of other things. So I suppose as jfaucett puts it, a "discover for yourself" type of thing should still be there. I wonder how qualitative research could be saved and if you could call it science. In Dutch it is all called "wetenschap" and "weten" means to know.
But how should we go about design then? HCI papers use a a lot of design of design that is never justified. The paper is like: we build a system, it improved our user metrics. But is there any intuition or theory written down as to why they designed something a certain way? Not really.
I suppose one strong way to get reproducibility is by getting all the inputs needed. In a psychology study this means getting a dataset. Correlations are fuzzy but if I get the same answers out of the same dataset, then the claims must be true for that particular dataset.
Regarding design and qualitative studies, maybe, film everything? The general themes that everybody would agree upon watching everything would be the reproducible part of it?
Ok, I'll stop. The whole idea of that a paper needs to satisfy the criterion of reproducibility confuses me when I look at what science is nowadays.
If my results are not reproductible, then I'm basically asking for your trust.
So now, instead of actually doing the experiment, there's an incentive for me to forge my results,
And they don't have to match reality anymore, by the way.
Gotta go ; back to working on my paper about psychic powers.
There are a number of problems in scientific publishing. Two big ones are:
1) Distribution hurdles and paywalls imposed by rent seeking journals - who knows how much this has prevented innovation and scientific advancement in the last 20 years
2) Easily replicating experiments / easily verifying accuracy and significance of results - this is related to for instance making data used in research more easily accessible and making it easier to spot p-value hacking
Fixing these might not require a completely new format for papers. Or it could. I can envision solutions both ways.
I really like what the folks from Fermat's Library have been doing. They have been developing tools that are actually useful at the present time and push us in the right direction. I use their arXiv chrome extension https://fermatslibrary.com/librarian all the time for extracting references and bibtex. At the same time they are playing with entirely new concepts - they just posted a neat article on medium about a new unit for academic publishing https://medium.com/@fermatslibrary/a-new-unit-of-academic-pu...
They still think this.
Linnarsson's group just pre-published a paper cataloguing all cell types in the mouse brain, classifying them based on gene expression. The whole reason that I was hired was as an "experiment" to see if there was a way to make the enormous amount of data behind it more accessible for quick explorations than raw dumps of data. The viewer uses a lot of recent (as well as slightly-less-recent-but-underused) browser technologies.
Instead of downloading the full data set (which is typically around 28k genes by N cells, where N is in the tens to hundreds of thousands), only the general metadata plus requested genes are downloaded in the form of compressed JSON arrays containing raw numbers or strings. The viewer converts them to Typed Arrays (yes, even with string arrays) and then renders nearly everything on the fly client-side. This also makes it possible to interactively tweak view settings. Because the viewer makes almost no assupmtions of what the data represents, we recently re-used the scatterplot view to display individual cells in a tissue section.
Furthermore, this data is stored off-line through IndexedDB, so repeat viewings of the same dataset or specific genes within it does not require re-downloading the (meta)data. This minimises data transfer even further, and makes the whole thing a lot snappier (not to mention cheaper to host, which may matter if you're a small research group). The only reason it isn't completely offline-first is that using service workers is giving me weird interactions with react-router. Being the lone developer I have to prioritise other, more pressing bugs.
Personally, I think there isn't enough praise for the pragmatic DocuWiki approach. My contract ends next week. I intend to keep contributing to the viewer, working out the (way too many) rough edges and small bugs that remain, but it won't be full-time. I hope someone will be able to maintain and develop this further. I think the DocuWiki has a better chance of still being on-line and working ten years from now.
 http://loom.linnarssonlab.org/dataset/cells/osmFISH/osmFISH_..., https://i.imgur.com/a7Mjyuu.png
But outside of computer science you need laboratories to replicate experiments. Scientific papers are perfectly fine vehicles to record the necessary information to replicate experiments in this setting. Historically appendices are used for the extended details. And yes, replication is hard, but it's part of science.
Of course, making interactive diagrams often takes dramatically more work than sketching pictures with a pen (or just writing down equations), and mathematicians are not typically trained to do it, so it would be an uphill slog for many. But I would love it if there was more funding/prestige/etc. available for mathematicians to make their papers more accessible by adding better visuals.
But it would be better to react to the substance of the article, which is more interesting.
"Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."
"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."
Would you please (re-)read
https://news.ycombinator.com/newsguidelines.html and not post like this here?