As someone who hasn't been involved with a university or a laboratory for many years, I find myself continually extremely frustrated by how difficult it can be for me to keep up with new developments in the fields I studied in college.
Example: Cambridge Univ Press takes books that were Victorian databases of medieval records, and in the name of "protecting our precious history", they just photocopy the Victorian pages, reprint them on newer paper, and put them under new copyright. If you want to look something up in the 10-volume database, then either blindly buy the $300 set of books and hope to find some items of interest scattered therein or go to the nearest library that has the full set, which turns out to be on the other side of the planet at Leeds Uni in Yorkshire, which will allow you to see these ancient texts printed in 2013 if you pay for library use and make an appointment at least 2 days in advance....
Meanwhile, the full text of the books is on Google Books at Google's expense and available in any browser, but Google is forced by the "protectors of history" at Cambridge to cloak many of the pages of 1000-yr-old data, because history is too precious to allow the unworthy to see their photocopied Victorian texts without first going on an old-fashioned quest, bribing the boatman, answering the troll's three questions, etc.
My hope is that at some point there will be a cultural change among "historical preservation" organizations where they decide that the greatest thing they can do to promote their field is to find every original document in every collection, carefully photograph it in hi-rez using whatever optical frequencies bring out the most faded detail, and contribute it to a free online host. Next step is then to create and freely post transcriptions, translations, and indexes, so that ANYONE can use the data for research, not just those who have enough gold in their purse, time for the quest, and can "answer me these questions three".
I'm trying to do historical research at my own expense and, like many others, will continue to do so as long as I can get access to raw data. Google will provide much of the data for free, and I (and MANY others) will do the analysis for free. It's another "open source" vs "commercial" situation. No one is required to give me expensive stuff for free, BUT if OTHERS are willing to pay the costs to scan, host, and provide it to everyone for free, but the "protectors of our precious heritage" won't allow it, it's no longer about the love of history but more like a repetition of history: we've got the power, so pay us for our, uh, "services".
As for scanning costs, though, Google is already scanning and hosting many of these materials for free and are being actively prevented from doing more (and even showing what they've already scanned), so I suspect we aren't far from having all we need for a 21st Century equivalent of a Carnegie Foundation. If Andrew Carnegie could build public libraries around the world, the robber barons of our century could fund an online Library of Alexandria to serve as a central repository. They could then offer to museums, private collectors, etc., to do the scanning and hosting and maybe even some ad sponsorship or donation mechanism to provide a small income stream back to the owners of the originals.
Not sure why they should.
Should scientists and everyone else get papers for free paid by tax payers? Yes.
Hopefully, more and more researcher will continue to preprint their work. Something we're trying to help with at https://www.authorea.com (disclosure: I work there)
It'd be cool if it also showed which papers have preprints online. And which (I presume virtually all) are available via SciHub. Even links, if possible :)
I agree, but for the most part newspapers aren't even getting the "referencing" part right. That is, they just lazily write "a new study" instead of providing an actual, useful reference to the primary source. I don't think this would magically change with open access: Even for non-open works there would usually already be the possibility to link to a public abstract, but most science reporting simply does not care enough.
I am usually able to find an Arxiv listing for most of the papers that I read.
And there is always Sci Hub for the ones that do :)
What I'd like to see is an IPFS feature that showed the "least shared" files in a set, so you could say "I want to help host the rarest 10 GB", for example.
Edit: you should be able to find the URLs with some googling, not going to post them here
Which UX is better?
At that time I calculated that it would cost about $1000 in refurbished 4TB external hard drives. Which is probably not the best way to store it.
Or about 2000 Empty Blu-Rays if you want to make searching for an article a nightmare.
Ask Google about that.
I genuinely am trying to understand.
So yeah, I'm not sure why you'd have to "crawl" through physical media when a search index ought to tell you right where your match is located. Is that not the entire point of indexing and searching?
Say, a file reference within a storage hierarchy or record within a database.
(The concepts are fundamentally identical.)
1. Provide an independent storage tier for the actual content. Say, "stacks". This would store various formats of documents, uniquely identified (say, by a corresponding hash), and maintained independently of access.
2. I'd have an interface tier that would, simply put, be a search interface. The concept of "filesystem as document management" could present this in a familiar, apparently hierarchical fashion, but the hierarchy would actually be searches against various metadata. Say: author, publisher, title, publication date (as range), subject, or keywords.
3. These metadata would be comprised of one or more indices of the actual contents. The indices are what you want on fast storage, in memory if at all possible. Various forms of caching and distribution might provide for this.
4. In order to associate metadata with works and their attributes, you'd need an intake process. This means that as works are onboarded, the would be, effectively, catalogued and classified, with metadata fields supplied. It turns out that there are available metadata repositories for a large class of works (the US Library of Congress and OCLC are two organisations providing same), and it's possible that various distributed, crowdsourced, automated, AI, or similar mechanisms could be used to fill in metadata for works not otherwise categorised.
5. Further capabilities, access, reports, summaries, groupings, workflows, workgroups, projects, security layers, publication, editing, revision, editing, reference, exploration, etc., might be provided by filesystem analogues as possible, other means where not.
If the overal system sounds a lot like a library and cataloguing system, that might be because it rather much is.
Google's web index operates in much the same way. Primary storage is the origin server of a given URL. The index is maintained and accessed by Google within its own systems. Google typically returns a response in a few hundredths of a second. Retrieval of the referenced source typically takes a number of seconds, or roughly 2-3 orders of magnitude slower.
1. See: https://redd.it/6bgowu
It would achieve the original goal of the world wide web as envisioned by Tim Berners-Lee, except even better.
A large part of the problem is the ridiculous duration of copyright. "Adsorption of Gases in Multimolecular Layers" is from 1938 and still paywalled.
In practice, almost all papers this popular will be available on random .edu sites and Google Scholar will find those technically-forbidden copies for you. But it is a significant problem if you don't have an institutional affiliation and you want to read articles that aren't among the top 5% cited. (Or at least it was a problem for me before sci-hub; I retained academic contacts who could email me any papers I wanted, but I had to cross a pretty high interest threshold before I'd bug someone to request that favor.)
EDIT: My point here is that the statement in the article "the world’s most important research is inaccessible from the majority of the world" isn't exactly true. This isn't supposed to be an endorsement of academic publishing practices: if anything the fact that these publishers are effectively trying to scam readers out of money is all the more evident.
This statement is still true even with your trick. The vast majority of people don't know about this and also even if they did know about this it would be technically difficult for many of them who aren't tech savvy. This is a huge barrier that shouldn't be discounted.
Of them, 5 are actually available directly from the publisher so they shouldn't be listed as paywalls, and all of the remainder are available from at least one of Google Scholar/Google/Libgen; of the 60 actually-paywall papers, 54 are available from GS/G and only 6 force you to go all the way to Libgen. (I am taking the liberty of rehosting 10 of them myself, though, to get them into GS.)
Of the 65, notes on the ones not immediately available in GS:
> Density-functional thermochemistry. III. The role of exact exchange
Citation-only in Google Scholar but easily found in Google or SH/LG.
> Detection of specific sequences among DNA fragments separated by gel-electrophoresis
Paywall-only in Google Scholar, not immediately available in Google but easily gotten from SH/LG.
> Processing of X-ray diffraction data collected in oscillation mode
GS paywall-only, not in Google, but SH/LG.
> Isolation of biologically active ribonucleic acid from sources enriched in ribonuclease
> the attractions of proteins for small molecules and ions
> Helical microtubules of graphitic carbon
GS links to paywall but findable in G.
> A technique for radiolabeling DNA restriction endonuclease fragments to high specific activity
GS/G paywall but SH/LG.
> Phase annealing in SHELX-90: direct methods for larger structures
I am not sure why this one was listed as 'paywall' when it appears to be available directly from the publisher: http://journals.iucr.org/a/issues/1990/06/00/an0278/an0278.p...
> A study of the conditions and mechanism of the diphenylamine reaction for the colorimetric estimation of deoxyribonucleic acid
Also directly available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1215910/pdf/bio...
> Multiple range and multiple F tests
Also directly available (possibly with a free JSTOR account but if not, SH/LG): https://www.jstor.org/stable/pdf/3001478.pdf
> A new look at the statistical model identification
GS paywall but G & SH/LG.
> Improved M13 phage cloning vectors and host strains: nucleotide sequences of the M13mpl8 and pUC19 vectors
> Nitric oxide: physiology, pathophysiology, and pharmacology
G/GS paywall but SH/LG.
> An algorithm for least-squares estimation of nonlinear parameters
GS paywall but G & SH/LG
> A low-viscosity epoxy resin embedding medium for electron microscopy
> Continuous cultures of fused cells secreting antibody of predefined specificity
G, and directly available: http://www.jimmunol.org/content/jimmunol/174/5/2453.full.pdf
> Homeostasis model assessment: insulin resistance and β-cell function from fasting plasma glucose and insulin concentrations in man
Directly available: https://link.springer.com/content/pdf/10.1007/BF00280883.pdf
> I am not sure why this one was listed as 'paywall' when it appears to be available directly from the publisher:
It pops up a http authentication box for me.
> > Multiple range and multiple F tests
> Also directly available (possibly with a free JSTOR account but if not, SH/LG):
You can only view 3 free items every 14 days, wouldn't call that exactly freely available.
Might be referral-based. Try searching the title and going from the abstract.
> You can only view 3 free items every 14 days, wouldn't call that exactly freely available.
There's no verification of .edu addresses or anything, so you can make as many as you need. I wouldn't call that exactly paywalled either.
Even if you don't have an affiliation (or your school doesn't subscribe to a particular journal), if you use Google Scholar to search for an article you can easily find pre-prints which are essentially the same thing. Additionally, if that still fails then the next option is to just email the author - they actually want you to read their work and will just send it out.
The real issue is that the societies are essentially extorting universities for hundreds of thousands of dollars per year when the writers have to pay to submit and readers have to pay to read. Many of the newer journals are becoming open access, but few of them have been able to make enough in roads to be considered "good journals." This is a completely separate topic than the one from the above article.
A very tiny number of people have affiliations; this isn't a realistic option.
> if you use Google Scholar to search for an article you can easily find pre-prints which are essentially the same thing
That hasn't been my experience. Lots of things I can't read.
In my field (microbiology/bioinformatics), it's almost impossible to find pre-prints of pay walled papers and emailing the authors and hoping that they will respond at all also doesn't seem an extremely efficient process for literature research.
The other thing that may only be localised to my field is that every author recognizes this problem and makes a copy available on their website or university's working paper site. I would only rarely resort to going to an actual journal because Google Scholar was way easier to find a copy of the article.
This is, of course, going to differ for a number of fields and there is a growing trend for authors and journals to open up their articles.
Because it works. It delivers information and knowledge to those who need it.
Because information and knowledge are public goods. As CUNY/GC says, an "increasingly unpopular idea",1,2,3 but an absolutely correct one.
Because it democratises information.
Because much the world cannot afford to pay US/EU/JP/AU prices for content. Including many of those in the US/EU/JP/AU. And most certainly virtually all outside. Billions and billions of people.
Because the research is (often) publicly funded, conducted in public institutions, and meant for the public.
Because information and markets simply don't work.
Deadweight losses from restricted access and perverse incentives for publication both taint the system.
Because much the content, EVERYTHING published before 1962, would have been public domain under the copyright law in force at the time, and much up through 1976 and the retrospective extensions of copyright it, and multiple subsequent copyright acts, have created.
Because 30% profit margins are excessive by any measure. Greed, in this case, is not good.
Because the interfaces to existing systems, a patchwork fragment of poorly administered, poorly designed, limited-access, and all partial systems are frankly far more tedious to navigate than Sci-Hub: Submit DOI or URL, get paper.
Because unaffiliated independent research is a thing.
Because the old regime is absolutely unsustainable. It will die. It is dying as we write this.
Because the roles of financing research and publication need not parallel the activity of accessing content. Ronald Coase's "Theory of the Firm" (1937, ), a paper which should be public domain today under the law in which it was created and published, and should have been by 1991 at the latest, but isn't, tells us why: transactions themselves have costs.
Because journals no longer serve a primary role as publishers of academic material, but as gatekeepers over academic professional advancement. This perpetrates multiple pathologies: papers don't advance knowledge, academics are blackmailed into the system, and access to knowledge is curtailed
Because what the academic publishing industry calls "theft" the world calls "research".
See GC Presents, "At the Graduate Center, we believe knowledge is a public good. This idea inspires our research, teaching, and public events. We invite you to join us for timely discussions, diverse cultural perspectives, and thought-provoking ideas."
See GC President Chase F. Robinson, introducing a conversation between Paul Krugman and Olivier Blanchard. A rare moment where the introduction itself contains some provocative thoughts. At about 50s into the video. (The remaining 72 minutes and 20 seconds aren't bad either if you're interested in discussions of global economics.)
Joseph Stiglitz, "Knowledge as a Global Public Good," in Global Public Goods: International Cooperation in the 21st Century, Inge Kaul, Isabelle Grunberg, Marc A. Stern (eds.), United Nations Development Programme, New York: Oxford University Press, 1999, pp. 308-325.
(This has proved to be among my more popular articles, including being picked up by the Open Access community.)
[SciHub](https://scihub.org/) is a project to "provide free access to research articles and latest research information without any barrier". It can also be used via Telegram at @scihubot.
Plus, I'd be stunned if all the people making the citations actually read the full paper. Some papers are cited because everyone knows you need to cite them.
So yes you ARE paying. Just not directly.
Basic biology and medical research is another story. Until we see fundamental changes in how researchers are assessed, I don't see how the situation is going to change though. But change it must, and will. Eventually.
I used to read the occasional paper, usually related to something outside my degree. Those papers are all legally inaccessible to me now.