Hacker News new | past | comments | ask | show | jobs | submit login
Let’s Publish Everything (columbia.edu)
153 points by luu 9 days ago | hide | past | web | favorite | 27 comments





The lede got buried here. This article does not just propose "let's publish everything", it proposes "let's publish everything and have endorsement as a completely separate step." I think that is a far more rational approach than the current model. The current model assumes that the only way to distribute information is by printing on paper. That is clearly no longer the case, in fact it is a laughable assumption. So let's get the papers published online, and then separately have discussions leading to various kinds of endorsements. Or not, depending on the strength of the arguments.

The same thing is trying to happen in scientific publication that's been happening in every other published medium: scarcity is gone. So why not embrace the abundance? Well if there's any contrary case to be made, it's that the explosion in quantity inevitably reduces the average quality. In the arts, low quality corresponds to that ineffable quality of shittiness. In news, it's fakeness. In science, it's irreproducibility. So what do you do? All the former built-in scarcity-based ways of filtering for quality went out the window along with the scarcity itself. You can no longer use the limited space on physical sheets of papyrus (or vinyl LP records, or celluloid) as an excuse to exclude things. So you need new ways of sorting out good quality work from the (suddenly vast) quantities of mediocre work. Seems like you're doing pretty well if you're lucky enough to have a whole new class of people step in to help with the filtering part. Unless of course you believe a scientific discovery is fundamentally the work of a person, an ego, rather than a fact that existed long before that particular person happened to discover it.

Democratization ends elitism; unfortunately elites sometimes really do consist of the best of a given thing. Especially newer elites. Every elite starts out as a meritocracy and ends up some kind of weird legacy-based cabal/cartel that deserves to be overthrown (that's the one you're probably accustomed to thinking of when you hear the term "elitism").


(I agree with the parent poster, but this seems like the right place to add the following.)

One thing that people who claim "the scarcity is gone, let's just publish everything" do miss is that there still is a scarcity: The time and attention of researchers.

Let me expand a bit on that from my own field of studying reconnection in astrophysical plasma. As you can tell from the description it is not a well defined closed topic. Instead there is tons of more or less adjacent fields. The study of solar flares (which might be triggered by reconnection), the study of coronal heating (the heat might be due to magnetic field getting converted to heat by reconnection), astronomical observations of AGN jets (reconnection might be what produces energetic particles there that we see in the SDE of those sources), observations of pulsar wind nebulae (the energetic particles there might have been accelerated at the termination shock. or due to reconnection), observations of giant pulses in (some) pulsars (that might be due to reconnection at the Y point of the current sheet just outside the light cylinder) and so on. On top of the related physical topics I need to keep on top of development methods in the simulation method I use (particle in cell codes) and in several related simulation methods (either because improvements there might make them viable to study reconnection which wasn't possible before, or because the neat new trick in method X might also improve the characteristics in PiC codes).

If I wanted to I could spend easily 100 hours per week just reading papers. But staying current with the field is just 5 or maybe 10 percent of my job. So what I do is the following: Papers that fall close enough to my special topic I will read. All of them. I will actually print them out, annotate them, go over them with a fine toothed comb. For many other papers I will read the abstracts (10 to 50 each morning) to find the 1 or 2 paper that are worth reading. And this is where selection by the editors and limitations to publications come in. They rank important papers. I am much more likely to read "a novel approach to plasma simulation" if it was accepted into the Astrophysical Journal than if it just appears on ArXiv. Because ApJ does not like pure code papers. So if it made the cut 2 or 3 experts in the field deemed it worth the time of the community.

Now that doesn't mean that all ArXiv papers are bad, that we should publish less or anything like that. When I talk to a colleague and ask about a technical detail I am SOOOO happy when they say "it's in the Arxiv paper ID 1906.bla". And on the other hand there is the notion of "it was in Nature but it still might be right". Bottom line is:

Do not discount the sorting by topic (numerical vs observational vs theoretical), impact ("here is one data point" vs "here is a completely new approach") and quality ("I'm not too convinced, but maybe it gives somebody an idea" vs "holy smokes how did we all miss that!") that journals provide. Any better, future alternative needs this sorting and ranking. Just dumping it onto the internet is not the solution.


Interesting. This seems to correlate almost exactly with discovery in music streaming too. Given the torrent of new music available, how do you discover the new music that you like?

At the moment, AI is doing a pretty bad job of this - my Spotify Discover Weekly is an interesting listen, but I know it's not the best selection of new music out there suited to my tastes (to be fair, it's not really trying to be that, though). My "recommended" list on Netflix bores me. I get why they're recommending them, but it's all things that I've seen and rejected from My List, rather than things that I would find genuinely new and interesting.

I think this is the next big problem to solve, for everyone. The combination of Discoverability (how do I get my music/paper/novel/art/film discovered by the audience of people who will like it?) and Search (how do I find new music/papers/novels/art/film that suit my tastes/research subject?).


Way back it was always through word of mouth that one discovered things. I still hold it true. You may have 10 friends how have their nieche and once in a while they recommend something they think you want to listen to.

I think that the music matter more based on whom recommended it. An algorithm won't have the same effect. It's missing the storyline on how you ended up with watching x movie or listening to y song.


really good point. There are a number of bands that I listened to (and ended up liking) purely because of who told me about them.

I think the same applies to most stuff. Your idols influence alot. I see this as the best reason to not rely purely on algorithms.

Producing playlists is also a sort of art.


You're right, it does need some type of ranking. I think the endorsement could be something like that. Instead of having journals endorse it, let specific academics do it. So if Dr. John Doe really likes an article, he can endorse it personally, and put his personal reputation up to bat for it.

And then have an algorithm that sorts based on endorsements, or how trusted the endorsers are ('Do you agree with this endorsement', or how many endorsements they themselves have). Let the authors easily input all the topics they want as tags, making it easy to search as well. And, of course, public comments about what would need to change to get an endorsement, as well as allowing readers to look at all previous versions of an article to see what changed. Basically, let the academics take care of endorsing and topic tagging themselves.


Reviewing every paper by just one to three qualified people is already hard and putting a lot of strain on the system. Trying to solve that by "just have everybody read everything and endorse what they like" is going to break it. Or rather most paper will never be read nor endorsed by a professional. All it would do it turn science into even more of a who like whom.

The same is true within IT, you need to read a lot to make sure you're in the loop.

I think it correlates to map structures, if every point in a map is connected to every other point there will be a large surface for each point in the beginning but as the amount of points grow the surface (of possible attention in this case) will shrink.

How much area of attention is needed to perform the best in a creative research, to produce the best work in a given time? Maybe the answer to that question will yield more papers with better quality.

My personal belief is that we need to dream more and do less work to make sure the work that is done is of more quality such that no filtration is needed and we can truly publish everything.


Even if every paper is amazing, I still don't have the time to read about all topics. So even if every submitted paper is great and should be published, we still need sorting by topic.

Add to that the fact that "publish or perish" has lead to a decline in at least amount of "actually new information" per paper if not outright in quality of papers. And that is not going away easily.


Yeah, I agree that force publishing something which is not ready is always bad.

Maybe publishing in itself is a way that does not scale well. Text has always been quite the slow way of making progress. And now with VR we have better ways to visualize and learn from others work.

How would you like to work, given free time and funding? Would you like to be able to keep track of every related research?


This very site is an example of a way to handle abundant content.

"Post all of the things" doesn't mean that content shouldn't be prioritized. The process is just asynchronous. It avoids abuse in specialized fields where a low number of experts are the gateway to publication.


This site works ok to find content that is intersting to a large community. The problem in science is "how do you find content for the 5 people working on problem X". For every X.

Great insight, thanks! It sounds like AI could play a part here in order to achieve the required efficiency, e.g. sorting out the right papers for you.

All I can say is that all the feeds that currently promise to solve that problem (Nasa ADS, Google Scholar, Researchgate and the feeds of several journals) are all pretty bad. I get several suggestions for medical/biophysical papers every day (probably due to "plasma" and "particle in _C_ell").

This is a complex topic and I think Gelman (the linked poster) is either misinterpreting and/or confused by Kaufman and Glǎveanu (the article he's discussing). Just for some context, I agree with Gelman and KG both in their main arguments.

There's different issues colliding in the open science movement. One is what you're referring to, the fact that scarcity is gone. Combined with the overcrowding and hypercompetitiveness of science, you not only have what you are referring to, a decrease in verdicality, you also have a decrease in signal-to-noise in general. So the nonsense increases, but so do the traditional signals to quality. So nonsense appears in high-profile journals, and very quality work appears in low-profile journals or even as "unpublished" pieces.

The other problem though, what my colleagues refer to as the "science police", is an increasing tendency for certain groups to argue that a certain set of practices are not only good, but necessary for "good" science, and by implication, everything that does not is "bad" science, in a black-and-white kind of way.

For one thing, not all problems are with replication. Nonsense can replicate well, and very important legitimate phenomena can be difficult to replicate. If something is really not replicable at all, that's a problem, but replicability per se is only one part of scientific progress, and it comes in degrees with various causes.

It's also much more difficult to determine what is replicable sometimes than it might seem on the surface. Replicate what? What's important to replicate? How? Sometimes this is clear, but other times it is not.

Also, when you really delve into it, there's not really a good rationale for what, exactly, are the important ingredients for open science, or why. For example, is it really necessary to have preregistered studies? What's to keep someone from preregistering but then silently declining to publish null results? Or to "preregister" something they've already collected? If an important unanalyzed pre-existing dataset becomes available, is that "tainted" because it wasn't preregistered? Is it important to preregister, or just to make the data openly available? Is it better to use modeling to identify anomalies in studies, or to rely on preregistration? These issues aren't always clear.

I think there's a sense sometimes that the open science movement is not only trying to dismantle a broken system run by an established elite, but to replace it dogmatically with a new system run by a new elite, with its own imperfect rules. Already I've seen misuses of open science guidelines used to bully and discredit legitimate work (for example, by suggesting that someone is hiding something by not sharing data, when the data contains protected healthcare information and would be accessible to them anyway if they would just go through proper channels). This is tricky to discuss, as you might imagine, so it comes out in pieces like KG's piece. Gelman is asking "why not publish everything", which is responding (I think) to something different from what KG are responding to. Maybe I'm misreading KG, but I think they might also argue "why not publish everything"; they just have a different group they're addressing when they would say that.


>>> Publishing in Psychological Science and PNAS has value because these journals reject a lot of papers.

That seems to be the crux - scientific papers will have to fall into two camps "blogs or basically deciphering the lab notes of everyone in your field" and "look this is a real effect and worthy of your attention"

People seem to want the second without wading through the first - but i don't think you can


> That seems to be the crux - scientific papers will have to fall into two camps "blogs or basically deciphering the lab notes of everyone in your field" and "look this is a real effect and worthy of your attention"

This ignores reputation and expertise. If you’re studying the sociology of Hollywood there’s Gabriel Rossman and maybe six other people who do quantitative work. They all know each other and read each other’s work. If one of this small group has a grad student who does quantitative sociology work on rich datasets from IMDB or somewhere else they’ll be introduced at some point while doing their doctorate. He also partakes of other networks within sociology, Economic sociologists and Princeton graduates. All of these networks are relatively small and reputation and gossip travel quickly. People hear that others are good or not getting tenure, etc.

Sociology is a normal social science. Papers go from an idea to a sketch of an idea or maybe a poster to a conference paper or graduate seminar discussion to working paper before finally being published. The peer review step is the last one for the diffusion of knowledge about what’s good. If people stopped publishing in legacy journals tomorrow peer review would still happen, it’d just be post publication peer review of working papers, the system Einstein published all but three of his papers under.

You can get the second without personally wading through the lab notes of everyone in your field because if you’re trying to get attention you need to present your results and advertise them so people will care, unless they already care because you have a reputation.


For PNAS, I've heard, getting sponsored by a NAS member greatly increases chances for publication. So it seems an odd example to cite for "value".

Its more that NAS members can publish some articles basically for free, so co-authoring with them is a good way to get in easier.

That being said, NAS members I know do not want to erode their reputation by publishing junk.


I am in sympathy with the author's goals but the major problem with publishing everything on arXiv like forums is that: (i) it becomes impossible to sort out the good stuff from the nonsense produced by cranks, and (ii) it unfairly advantages "high profile" groups.

Today, if I have a grad student interested in security who wants some ideas for things to work on, I could ask them to go look up papers in Oakland, CCS and NDSS over the last couple of years and see if anything catches their fancy. This works because there's a reasonable number of papers that I could reasonably expect a PhD student to look through. Asking them to look up all the stuff that ends up on arXiv or the IACR's ePrints is not reasonable. There's just too much stuff there and most of it is not worth looking at that.

So, you might say the endorsements will take care of this. CCS et al. could just endorse some limited subset of the IACR ePrints. But this leads to problem (ii) above. Right now, we have a bunch of high-profile researchers (who are mostly at places like MIT and Stanford) who just put up stuff on arXiv without any peer review and they start picking-up citations right away because of their "elite" status. Some of this is deserved, because these people have done good work. But some of this is also just a publication cartel where everybody cites their friends' work and make it impossible for others to break into a field.

The larger point is that in a scenario where there are so many papers that nobody could possibly look at all of them will lead to a few groups accumulating all the citations and all the awards. This will especially be unfair to researchers in the developing world -- people at places like Bilikent, Tsinghua, and the IITs -- where researchers don't have the PR muscle power to highlight unpublished stuff.


> I could ask them to go look up papers in Oakland, CCS and NDSS over the last couple of years and see if anything catches their fancy.

Isn't this exactly the major thing that is wrong with research today? Limiting work/creativity to a few well known conferences done by elites for elites? I read blog posts, posted daily here on HN, that are way more informative, honest, and replicable than many papers published in the three conferences you named.

> "elite" status. Some of this is deserved, because these people have done good work. But some of this is also just a publication cartel where everybody cites their friends' work and make it impossible for others to break into a field.

Elite status happens exactly because there are conferences like the ones you mentioned. If you work with an advisor that publishes in Oakland, your chances of getting a paper in Oakland gets increased multiplicatively. And hint, that's not because your ideas (or papers) are better than anybody else's.

> The larger point is that in a scenario where there are so many papers that nobody could possibly look at all of them will lead to a few groups accumulating all the citations and all the awards.

This is already happening. Look at all the "prestigious" conferences.

> where researchers don't have the PR muscle power to highlight unpublished stuff.

Who cares? If the work is worth anything, people will cite it. If not, it will remain as is. Why does it matter? Why do you care if 10 people cited your work or 100 people if you are happy with the work?

Unfortunately, nobody in this forsaken field (computer science) cares about the scientific aspect of the field anymore; everybody wants their name to be known and that's all there is to it. The measure of success is how many papers you publish in elite conferences ...

I do actually think that by breaking down all the barriers people care less about having their name in conference X or Y and more about the scientific aspect or citation cartels. First one is good, second one can be fixed (at least more easily than giving a few elites lots of power with no checks and balances).


> Isn't this exactly the major thing that is wrong with research today? Limiting work/creativity to a few well known conferences done by elites for elites? I read blog posts, posted daily here on HN, that are way more informative, honest, and replicable than many papers published in the three conferences you named.

I totally disagree. The quality of papers at the "elite" conferences is way higher than most things I've read on HN. What is an example of an HN post that in your opinion is better than equivalent academic research in that area?

> Elite status happens exactly because there are conferences like the ones you mentioned. If you work with an advisor that publishes in Oakland, your chances of getting a paper in Oakland gets increased multiplicatively. And hint, that's not because your ideas (or papers) are better than anybody else's.

Sure, having an advisor on the Oakland PC helps a great deal. But it doesn't follow that your work is just the same as everyone else. Have you peer reviewed papers for these conferences? A majority of submissions, even at the "elite" conferences are just junk. That doesn't mean everything that gets published is not junk, but the stuff that does get published is significantly better than the average submission.

> Who cares? If the work is worth anything, people will cite it. If not, it will remain as is. Why does it matter? Why do you care if 10 people cited your work or 100 people if you are happy with the work?

Because the point of my research is not to sit in an ivory tower and produce academese that no one cares about. The goal is to have real impact on computer system design, and in my specific case, push practitioners towards methodologies that make systems more secure. That's not going to happen if no one reads our work.

Another way of looking at it is that a lot of our work is funded by taxpayer money. They aren't paying us to have fun proving lemmas that no one else cares about, the taxpayer would like us to produce research that results in tangible improvements in computer system design. In the system that we have today, the only way to have this tangible impact is to produce high quality papers that other people read, cite and build on top of.


I believe that there is a misunderstanding regarding the role of academic publishing. Before the Internet era, the only way to make your work known to other scientists was to publish it in a journal—prestigious if possible. Journals were a medium to spread scientific information and make it available.

Nowadays the situation is very much different. For example, there are preprint repositories such as arXiv where researchers can publish papers with very little oversight[1]. The journals don't serve the role of making information available anymore. Their main role is to act as a filter for scientific information, and the reputation of a journal signals whether it's a high-pass or a low-pass filter. Researchers want to publish in prestigious journals since it signals that their work is high quality, and it gives greater exposure to their work—prestigious journals have a broader reader base than unknown journals.

This considered, the author got a point. In experimental domains of science such as Biology, Physics, Chemistry, Psychology (and sometimes even Math! [2]), reproducibility is of paramount importance and publishing negative results helps both researchers save time by not doing repeated work and avoid statistical bias in meta studies.

In fact, what the author of this article implicitly looks for already exists![3] But they usually fail to gain traction, and this for a precise reason: writing and publishing a paper takes a great amount of time. Publishing a paper in favor of the null hypothesis usually does not bring much recognition, and the time could instead be used to research or write the next paper.

What can be done then? A solution could be to create a platform where scientific results can be made available without the whole publishing overload. No literature review, no verbose explanations, only "what we did" and "what we got". And if possible, in a machine-readable format. Any fellow HNers?

[1] https://en.wikipedia.org/wiki/ArXiv#Controversy

[2] https://arxiv.org/abs/1201.0749

[3] http://www.jasnh.com/


It's a good idea, but without scarcity and the attendant prestige of being a reviewer for fancy publications, how do you get people to volunteer their time to review and endorse papers?

> the authors of the above article, and other people who present similar anti-replication arguments

Study replication is good, and fairly rare.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: