Hacker News new | comments | show | ask | jobs | submit login
JSTOR torrent (thepiratebay.se)
407 points by gasull 1380 days ago | hide | past | web | 82 comments | favorite

Please keep in mind that JSTOR paid almost $100,000 to digitize these files[1]. I think a lot of people don't realize that they're a non-profit organization with similar goals to many of you.

I don't think it's such a bad idea to give them money to continue digitizing works that no one would have had access to otherwise. They provide full-text search of all of their documents and undoubtedly employ programmers and designers much like yourself.


The non-profit status doesn't automagically make an organization "good". The executives of the institution still get paid and have an interest in perpetuating and growing the organization, even if it goes against public interest

I don't know much about JSTOR, but I know the IEEE (for example) can be a good bunch of sharks. In the past, they forced you to hand them your copyright for the privilege of publishing your work in their journals, and proceeded to go after you if you committed the cardinal sin of distributing your own papers through your research website. They also put your work behind a 30$ paywall without, of course, giving you a dime.

Exactly, being a non profit only means that they distribute all their profit amongst themselves as salaries or investments.

The data would be much better at the hands of an entity that keeps the data accessible and gives the authors a right to reproduce their works elsewhere. A startup which does this would be excellent.

You guys are vastly oversimplifying this whole thing. The only reason JSTOR gets access to these documents in the first place is because of licensing agreements with the publishers.

You can try and a start a company that gives it all away for free, but you're not going to be given access to any documents or you'll be sued for releasing the ones you don't have rights to.

This is not as simple as "Just throw that stuff up on a Torrent and we're good to go".

>This is not as simple as "Just throw that stuff up on a Torrent and we're good to go".

As a scientist, that would pretty much solve my access problems, and help me do a better job.

If that were the case, they would not be a non-profit. The IRS zealously audits nonprofits (except for the religious ones) to make sure that their resources are not going to the benefit of insiders. Penalties for such situations are...extreme.

>The executives of the institution still get paid...

God forbid. I hear they even pay their other employees.

>I don't know much about JSTOR, but I know the IEEE (for example) can be a good bunch of sharks. In the past, they forced you to hand them your copyright for the privilege of publishing your work in their journals, and proceeded to go after you if you committed the cardinal sin of distributing your own papers through your research website. They also put your work behind a 30$ paywall without, of course, giving you a dime.

I have yet to hear about a similar story with JSTOR, so I'm not sure how that's relevant.

JStor is not a publisher, so they don't have the same relationship to authors that IEEE does, it's not possible for them to do what you're talking about.

All JStor does is get scholarly output from those who do hold the copyright or publishing rights (for which they pay these rightsholders), aggregate it on their own platform (which has a much better UI than most of their for-profit competiters), and then resell access to others. Their prices are largely determined by those set by the actual rights holders they have to pay for the content.

Whether or not JStor is doing enough to increase public access to scholarly content (I think nobody in the industry really is), they are doing _more_ than most of their peers in the industry, and are _far_ from the worst, the most greedy, or the most venal in the industry.

I'm not saying don't pirate JStor content, pirate whatever scholarly content you want as far as I'm concerned, no skin off my back.

But if you're looking for a target as the worst or the most evil or the most responsible for inequity in and high cost of access to scholarly output, you're looking in the wrong place if you're looking at JStor -- I'd look at the publishers (not aggregators) and for-profit ones rather than non-profit ones. Google "most profitable scholarly publishers" and see what companies you are led to by following links (it won't be JStor).

$100,000 is nothing, that a single fundraising call to parties interested in the freedom of information cannot collect.

I would personally be ready to donate 100$ immediately.

And then there is the example of the Wikipedia Foundation that seems to do quite well (and could incorporate this effort bringing fund raising and software engineering power to the table)

Whining about $100,000 for such a trove of information is laughable.

>I would personally be ready to donate 100$ immediately.

Then you should have no problem paying the fee.

The donation would cover not only his own access, but access of others who may not in a position to donate themselves. It therefore only makes sense that many people would be more willing to donate than pay.

JSTOR charges an access fee so that it does not have to waste resources on constant fundraising.

Fundraising for nonprofits is not like fundraising in SV. Nonprofits can easily spend 50% of their income on fundraising because it is damned expensive to convince people to donate.

In case anybody is interested in a quick comparison to another non-profit with a similar mission, Archive.org spends 2% of their budget on fundraising.

I am pleased to see this is the top comment. It is very frustrating to see people who completely disregard the cost and effort associated with digitizing and managing records.

And if any educational institution used torrents instead of paying it would be unethical, but some day, and some day soon, restricting information will need to stop because it must. Armies of people would be perfectly willing to do the job of JSTOR pro bono publico only needing a bit of equipment easily enough donated.

So far 88 Seeds & 261 Leechers.

I think its fair to donate some money to JSTOR. Even if its 1-10$ they can recover their losses & may be even profit for digitizing.

If they can open a small donation channel I would like to donate & thank them. Make knowledge easily accessible to all & we should encourage it.

I think it's fair to donate to JSTOR and seed that torrent. Production does not need to be tied to distribution.

Fairer would be to coax researchers to not submit content to closed journals.

JSTOR may be a non-profit, but they earn a lot of money and they're not shy about spending it.


Of course earning money by itself isn't an evil thing. The bigger problem is that JSTOR is part of a system which many people have come to feel is unjust-- a system whereby the public finances research which is then put behind paywalls.

It isn't just wild-eyed hackers who feel this way. Even Donalth Knuth has commented about how little value the academic journals really provide, and how much they charge.

It is the public who pays for this system. We pay because our taxes and tuition money subsidize the research that we're not allowed to see. The government should require publicly funded research to be made available on a site like arxiv.org. It is those guys who are really in favor of open access, not JSTOR. Throwing us a bone-- some 80-year old manuscripts which are in the public domain anyway-- shouldn't obscure that.

There are some newer journals which have open access as a central tenet such as PLOS. It's a non-profit and could always use more donations:



The government generally does require that publicly funded research be made publicly available without cost. However, that same statute grants agencies the right to have such research withheld from public distribution for various reasons (though "national security" is the most popular).

Not quite, it's just the Philosophical Transactions of the Royal Society. It's a good start, I guess.

http://news.ycombinator.com/item?id=2789709 (lots of comments)

    >   This archive contains 18,592 scientific publications totaling
    > 33GiB, all from Philosophical Transactions of the Royal Society
    > and which should be  available to everyone at no cost, but most
    > have previously only been made available at high prices through
    > paywall gatekeepers like JSTOR.
Btw, the court documents from 2011-2012 show that aaronsw transferred his collection to an unidentified server in China. Maybe he has a deadman's switch? Or maybe it's time I go on a modern-day pirate treasure hunt.. yarr.

I've been thinking that perhaps some sort of mass downloading could be organized, to be distributed among current college students with access to JSTOR.

If it is thousands of students all doing a small part of the downloading, what could be done to stop it? The trick would be distributing the tasks, and collecting all the results.

This is all assuming there is no dead-man's switch, but since he went out on his own terms I assume that would be triggered already.

    > to be distributed among current college students
I've been thinking about a mobile proxy app that students run on their phones, and a server that distributes tasks. The app would HTTP itself to the server and ask for a task, then HTTP the results back. Metadata (and the pdf url) would be extracted with zotero/translation-server, and a second request would be sent to phones to finally grab the actual file. Let me know if you're interested, contact deets in profile.

proof of concept of the zotero/translation-server doing its job: https://github.com/kanzure/paperbot

You might look at the Archive Team for examples of distributed mirroring.

I mentioned the Archive Team in another comment - and they're ON IT.


This archive is not directly related to Aaron Swartz's prosecution, it's something different. "The portion of the collection included in this archive, ones published prior to 1923 and therefore obviously in the public domain, total some 18,592 papers and 33 gigabytes of data."

Yes, as the file description says, this was released by Gregory Maxwell rather than Swartz, though it's tangentially related. Maxwell had assembled this collection of public-domain articles earlier, but hadn't decided whether to release it yet. After the Swartz/JSTOR case broke, he was spurred to release this torrent (the linked file description contains a statement from Maxwell explaining his motives).

I don't think Swartz's famous JSTOR collection has surfaced.

To hell with this shit. I don't want what the copyright owners do not want to give. I don't want what they got by arm twisting authors. I don't want what they got for free but now want to make money off. They can die with this in their collective behinds. I will never submit anything to a closed journal, never ever.

Are you an academic? You're right that their arms are being twisted. I can't have an academic career and not publish in closed journals. I hate it and I don't know what I can do about it. If I only publish in open access journals I'm just offering my career up as a meaningless sacrifice.

No longer an academic, but as a student I always appreciated it when authors put their own work on their own websites as a form of common man's dissent. I have done the same with my own meagre two publications, for which IEEE and ACM (or probably Elsevier and friends) charge up to $31 each.

Ironically, due to the closed nature of their websites and the fact that the PDFs on my websites have since been indexed by various 3rd party research portals, they now far outrank the official (paywalled) versions on the ACM and IEEE websites.

Seriously, thanks for doing that. It's a wonderful feeling to search for the title of a paper and have the first result be a PDF link to an academic or personal site rather than a springerlink/JSTOR/citeseer result.

Citeseer is an open indexer, collecting files from the web and making them openly available. Many of those PDF links you mention are the sources for its index.

It is nothing like Springer, Elsevier, ScienceDirect, Thompson, ... websites. Please don't lump them together.

JSTOR sits somewhere in between.

You're right, I didn't mean to imply that they're all the same. It's just that when a PDF is openly available, it's often one of the first results, so seeing citeseer is a bad sign. I seem to remember them creating pages for citations they didn't have a download link for but now I can't find any examples so I may be misremembering.

Excellent work.

Never did any research worth publishing in a journal. Working a 9 to 5 job now and no longer have the opportunity to research. But I would have done it your way if I could have. An alternative now for me would be to help build open journals. The authors do the research, the publishers are simply a medium. Doesn't make sense for mediums to bite the hands of authors.

Thanks, am in a similar position now. Open journals are tricky though. I would like to think that there is a way for universities to somehow self-fund platforms for peer reviewed publications, but it just isn't happening. They keep paying huge subscription fees to the publishers of established journals with a reputation and history that is difficult to replace. It's a pretty perverse system, especially if you consider that in many countries universities are funded with public money, but I don't see it changing any time soon unless that change comes from within.

Well at least you can also upload your paper on arxive.org. But I agree, the closed journal is required for the reputation because they are peer reviewed.

(Not sure if they have some policy forbidding to disclose the paper on arxive etc. But I don't think so, the department I did my thesis, most papers were also available on arxive.org)

Most of the major publishers do now allow that kind of "self-archiving" on personal homepages and preprint repositories like arXiv. At least, IEEE, ACM, Springer, and Elsevier do, some of them as of fairly recently.

I've also self-archived some stuff that didn't formally permit it, and haven't heard a complaint. Given publishers' current political interests, I think it's pretty low-risk: I don't think publishers want the publicity that would come from suing an academic for posting a version of his own paper.

The "copyright owners" don't get the copyright through legitimate means. Basically, academics are forced to hand them the copyright to publish in prestigious journals they control. Then the publishers exploit this copyright by putting publicly funded research behind paywalls, and doesn't give a dime to the original author.

We wish it didn't have to be that way. We wish that information, particular academic publications, could be share openly and legally.

Now, it's 1855. You want the slaves to be freed. By law. By right.

So, do you shut down the Underground Railroad?

Comparing intellectual property law to chattel slavery is a bit tasteless, no?

I don't think he is suggesting, explicitly nor implicitly, an equivalency in severity.

He's not suggesting equivalency but he did make a comparison, one which is a bit tasteless, is it not?

No, I don't think so.

If we want to learn from history it is important that we allow ourselves to make comparisons to the past, even if the magnitude of severity is completely off. If we assume that we are progressing as time passes, then with any luck the severity in comparisons to the past should always be off. That should not phase us, but rather be seen as a sign of progress.

Now, if the situation is a teenager calling his mother "literally Hitler" because he was given a bedtime, then Godwin's law is probably pretty applicable, but that is an incredibly extreme case.

The topic here is civil disobedience in the face of unjust laws. One law happens to be much worse than other, but I simply cannot find the comparison itself to be tasteless or offensive. That period in our history teaches us a painful collective lesson about the ethics of civil disobedience, I would find it more offensive to ignore that lesson.

Assertions of equivalency would be troubling, but I see none here.

Ideally, as history and society progress, there will always be bigger evils in the past than the present. This is a sign of progress. In that light, comparing the evils of the present to the evils of the past is not so tasteless, because there aren't as many evils of the present on the same scale.

Edit: I promise I wrote this without reading jlgreco's comment.

brb, torrenting Django.

Where can I get my medal?

Which Django?

It took a moment for me to realize that you probably didn't mean the Python one.

Underground Railroad: people risking their lives to help slaves fleeing slavery, rape, beatings, murder, and the wholesale destruction of their families.

Aaron Swartz: releasing academic articles that are already available for free or a low cost simply by visiting your local university and acquiring a guest access card.

Not even remotely comparable.

Especially considering JSTOR is a non-profit that has to employ many people with similar occupations to those of HN. They digitize documents that wouldn't have otherwise been available at all before.

It's incredible how if they hadn't digitized the works, no one would be outraged. But when they ask for a fee to cover the costs of such things, everyone paints them as villains who try and lock information away from the masses.

Isn't Google digitizing all printed works they can get their hands on?

Google has ridiculous amounts of capitol. They aren't digitizing printed works out of some sort of sense of social service. Street View is much the same way. They aren't doing it to be nice, and they aren't giving you access for free out of the goodness of their hearts.

Everything they do that isn't funded by ad revenue or working towards greater ad revenue is a speculative exploration into future markets that will one day be monetized, but is not currently by virtue of their tremendously deep pockets.

I've been wondering recently- I think companies like Google have started warping expectations beyond the realistic, and I'm both curious and apprehensive where that will take us.

Many of Google's digital works aren't freely available, so I'm not sure where you're going with this.

The original argument was that JSTOR is doing the service of digitizing many papers that are out of copyright, but otherwise not available in electronic form. And if it wasn't for them charging access for this, then the work wouldn't be done. I was pointing out that Google is doing the same thing, and NOT charging for access where possible (i.e., the out-of-copyright works).

Google has several billions in free cash lying around to do stuff like this, and many billions more coming in from its various income sources.

JSTOR does not many billions in cash, nor any income sources other than access fees.

Ergo, JSTOR must continue to charge an access fee so that it can continue to perform its function of archiving articles and providing access to those articles.

>academic articles that are already available for free or a low cost simply by visiting your local university and acquiring a guest access card.

Haha, try that outside of the US. Also, be aware that your local university pays unbelievably high fees for this access, money that could be better used, by funding scientists for example.

Please downvote this. A torrent of the JSTOR content shouldn't be Aaron Swartz's legacy. With JSTOR, Swartz was making a larger point; if all you have is these docs, you missed it.

Each download isn't a copy of some files, it's a statement that Aaron's cause was the right one, even if I disagreed with his methods.

But this is the point: he made something freely available that is supposed to be freely available. The individuals publishing do publish because they want their papers to be read and cited by others. Unfortunately there is hardly an established free peer reviewing system available, so most research results are only available to a small elite. Being no student anymore, I cannot afford reading most papers because $30 to read one paper is too much. (JSTOR seems to have more easy conditions, but when you google papers you sometimes get pointed to this site and sometimes to some other site...)

The torrent isn't his legacy but rather the spirit of his activism.

You can't downvote on HN.

Yes you can. It just requires sufficient karma.

You can downvote comments, not submissions. You can only upvote and flag submissions.

Oh. Thanks, I didn't know this. Curious choice.

Recent change, after the downvote wars this past fall in which Microsoft articles would get unceremoniously downvoted to oblivion the instant they hit the front page.

The idea is that articles should rise to the top on their own merits, but not be pushed to the bottom simply by people disagreeing with their content.

But there is still the floagging. I have noticed some pro MS articles still disappearing suddenly, due to the flagging I presume.

Articles have never been downvotable, only flaggable.

I think we should pressure JSTOR to release the documents into the public domain. If it is in the best interest.

JSTOR doesn't own the copyrights, they have the documents under licence from the journal publishers.

JSTOR has happily allows publishers to restrict documents which are lawfully in the public domain and over which the publishers have no say except via the TOS they clipwrap you with when you try to access them.

Excusing JSTOR is like saying "Hate the game, not the players". While JSTOR isn't doing everything in their power to fix this broken system I say: Hate the game _and_ the players.

Ignoring the fact public domain gets complicated when you're an international organization (there's many journals which are public domain in the US but aren't anywhere in the world because of historical anomalies in US copyright law), JSTOR is essentially built on relationships.

JSTOR has access to the current archives of the journals because of the relationships it has with the publishers, if the publishers weren't happy with what JSTOR were doing they could just pull their licences for their current journals. If that happened JSTOR would basically collapse.

It just doesn't make sense for JSTOR to do it, when a third-party who doesn't have any relationships to the publishers could do it just as well (you can easily get physical copies of most historical public domain journals either via the open market or via libraries and scan them).

JSTOR does not have the power to direct what others do. If the publisher chooses to restrict a document, JSTOR has no say in that decision--JSTOR is merely a repository. JSTOR has access to essentially every academic article in existence precisely because it does not attempt to enforce any political goal on contributors.

If you want to change what publishers do with articles which should be in the public domain, get the law changed. But don't hold JSTOR responsible for something outside of their control.

The law already is what it needs to be. JSTOR has limited access— and continues to do so through their terms of services— access to even old documents which are already lawfully in the public domain.

They have operating costs. They're a non-profit but they pay people to digitize the works. It costs a lot of money to access some of these things. It's not prohibitively expensive to access these documents. You can read three articles free every two weeks if you're exceedingly down on your luck.

You couldn't a week ago. This isn't some accident. Their openness comes hard won with threats of their reputation.

Their digitization is largely done by a for-profit company. It's not competitively bid. There are many parties who would digitize these works at no cost— archive.org and google for example, and could afford to do it without pay walling the results.

>Their digitization is largely done by a for-profit company.

I was unaware of this. What company does the digitization for them?

Then why don't they do so ?

Anyone at all (including you) is free to scan journals that are in the public domain and publish them online for free. If you scan/ocr them then there are many places (including Project Gutenberg and archive.org) who will host them for you.

JSTOR is merely a repository; it does not control what journals do with their articles.

Some contributors to JSTOR also make their articles freely available online. Many do not. Either way, it's not for JSTOR to decide.

So is this safe to dl? Newbie here. I don't even know what these papers are.

I finished downloading this last week.

So, do you want upvotes?

That was the first thing on my mind.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact