Hacker News new | past | comments | ask | show | jobs | submit login
Detecting manuscripts and publications from paper mills (wiley.com)
32 points by blopeur 39 days ago | hide | past | web | favorite | 2 comments

> Paper mills are believed to be fueled by unrealistic publication requirements or quotas, combined with monetary publication rewards to authors [2-5]. Where unrealistic publication requirements are widely applied over long periods of time, this could create a broad and growing base of paper mill clients. To both meet client demand and their capacity to pay, paper mills will likely aim to generate large numbers of manuscripts at minimum cost. This will likely require at least some falsified or fabricated data, as performing genuine experiments could render manuscripts unaffordable.

> At the same time, externally supplied manuscripts should resemble genuine manuscripts if they are to be accepted for publication, so paper mills need to balance their requirements for efficiency and volume with the requirement that their manuscripts also appear to be genuine.


the paper doesn't say too much about the state of research around CS / BigTech related topics. I have been browsing arxiv (and arxiv-sanity.com) for more than a decade and doing so on at least a weekly basis for work and to satisfy my private curiosity. My interest are mostly classic CompSci topics and Security. The genre seems increasingly polluted by papers which can only be described as BS research.

You can usually smell the bad with a few simple rules that help to know what to look for[1]. It increasingly feels like dumpster diving in the domains around AI/ML. My partner who is in healthcare tells me that medicine is even worse whenever I complain (though I have no evidence for it other than anecdotal).

[1] Hanson 1999: Efficient Reading of Papers in Science and Technology: https://www.cs.columbia.edu/~hgs/netbib/efficientReading.pdf

recent (published in the past month) random examples of what falls IMO into the category of BS papers (these aren't outliers but are becoming the norm!):

P4-to-blockchain: A secure blockchain-enabled packet parser for software defined networking: https://www.sciencedirect.com/science/article/pii/S016740481...

Security & Privacy in IoT Using Machine Learning & Blockchain: Threats & Countermeasures https://arxiv.org/abs/2002.03488v1

REST: A thread embedding approach for identifying and classifying user-specified information in security forums https://arxiv.org/abs/2001.02660v1

Artificial Design: Modeling Artificial Super Intelligence with Extended General Relativity and Universal Darwinism via Geometrization for Universal Design Automation https://openreview.net/forum?id=SyxQ_TEFwS

I'm surprised it's taken this long. There is no peer review to arXiv. If you've ever reviewed for a conference or a journal (10-30% acceptance rates) you've seen the "raw feed" of submitted papers and you realize the average isn't very good. Many get rejected because they simply have a few flaws, but some are badly flawed (or just plain wrong) and shouldn't be published anywhere. ArXiv has no peer review so it's virtually the raw feed. I cringe whenever I see arXiv papers cited as if they were published work --- doing an end-run around peer review can't be good for science.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact