Hacker News new | past | comments | ask | show | jobs | submit login
Aaronsw indicted for hacking MIT network to download millions of JSTOR docs (documentcloud.org)
603 points by Estragon on July 19, 2011 | hide | past | web | favorite | 313 comments

The repeated use of "stole" in the indictment is interesting, even beyond the usual metaphorical usage to discuss copyright infringement.

In this case, the indictment alleges that the documents were stolen from JSTOR, which does not even own them! In the vast majority of cases JSTOR scanned documents whose copyright is owned by someone else, and acquired or was donated a non-exclusive license to distribute copies via its service. In many cases the documents are even public domain. The indictment continues the theft metaphor by discussing the effort and expense JSTOR incurred in scanning the documents, and the alleged attempt to render this less valuable by redistributing "its" documents, analogizing this to the loss someone suffers in a theft.

But effort expended to build a private repository consisting of copies of things you don't own doesn't give you ownership of the result, any more than Google Books doing the same has given them ownership of the documents that they've scanned. If you scraped Google and "stole" their scans, you would be violating Google's Terms of Service, and Google might indeed feel subjectively like you've taken something of value (their exclusive access to this repository of scans), but I think it would be a stretch to say that you've "stolen" "their" documents.

They are talking about theft of services, not copyright infringement. In any event these charges are going to be very difficult to beat since they're federal, even though there are some obvious holes in the indictment. It will be almost impossible to get any of the evidence thrown out even if there was an illegal search and seizure. His best bet is probably to get the Harvard legal team to go to bat for him, although it's difficult to say how likely that is.

I wasn't really commenting on the legal sufficiency of the indictment, just the rhetorical dishonesty of accusing someone of "steal[ing] well over 4,000,000 articles from JSTOR" (quote from the indictment) when JSTOR didn't own those articles. They could've just alleged violation of JSTOR's TOS and thereby theft of network services. I suspect JSTOR or people sympathetic to them had a hand in writing the indictment, though; JSTOR has a long history of attempting to spread the misinformation that it somehow "owns" its archive.

Well, if someone stole $100,000 of property from a storage facility, that wouldn't mean the storage facility claimed ownership of the property, just that it was the location of the theft. Maybe you're overthinking this a bit.

Well this is more akin to breaking into the storage facility and making a copy of all the Paintings stored there. The value of the goods being stored has not been reduced.

Not to torture this analogy any further, but would you feel safe storing your stuff at such a facility after something like that? No, you'd probably look elsewhere for your storage needs. Breaking in is still bad and would be the subject of criminal charges. If the US attorneys decide that use of the word 'stole' is somewhat over the top, then guess what? they can amend the indictment - just as the defense can amend their motions.

My point is not that the government is correct or morally justified in bringing this indictment, but that getting hung up on terminology like this obscures the legally problematic issue of having (allegedly) bypassed the security systems to download material he was not supposed to have access to, regardless of who actually owns said material.

But: (1) JSTOR isn't a storage facility in that sense; the copyright holders do not pay JSTOR to store their items, so this is a bad analogy.

(2) If the outrage is supposed to be about bypassing security systems, why is the government hung up on the "theft" terminology? Especially when JSTOR, the party arguably injured (in some way not specified), has asked the government not to prosecute?

No, this is clearly a convenient way to get a politically inconvenient person labeled a felon.

1. The analogy is only to point out that a 3rd party repository can be negatively affected by a break-in event if it doesn't have an ownership interest in the materials it stores.

2. The government is not hung up on the 'theft' terminology. The words 'steal' or 'stole' only appear three times in the 15 page indictment and the actual offenses he is charged with are wire fraud, computer fraud, unlawfully obtaining information from a protected computer, and recklessly damaging a protected computer.

Except that copyright infringement is definitely not theft. The owner doesn't lose his documents. It's fine if your opinion is that copyright infringement is wrong, but let's not call it by inappropriate names.

He is accused of stealing bandwidth from JSTOR, not the documents. "Theft of services" not theft of property. Theft of bandwidth is almost as absurd as theft via copying. JSTOR apparently isn't interested in free transmission of knowledge

If you read the indictment you'll see that they very much are not interested in free transmission of knowledge.

They charge >$50k/yr for access: " For a large research university, this annual subscription fee for JSTOR’s various collections of content can cost more than $50,000."

That price actually seems pretty reasonable for a large research university.

The real question is how much they charge individuals who want to get an article. My first google search (http://www.jstor.org/pss/27757488) results in $12/article. This is very steep when you're trying to do research and don't even know if the article is what you're looking for.

Well, you wouldn't want any old rabble getting access to valuable knowledge. Far better for that access to be safely controlled by the major research institutions, who can clearly be trusted to pursue knowledge in a responsible manner.

How is that reasonable? Sounds like Mr. Swartz was willing to host them for free! And he would have gotten away with it too if it wasn't for those meddling police.

But seriously, $12/article is ludicrous. That must be way above cost recovery or they're not doing a very efficient job of running JSTOR. Perhaps the co-founder of Reddit would do a better job...

Most public libraries have relationships with JSTOR that allow members to access the articles online. I use the Boston Public Library and look up articles via Google Scholar. All free.

Some public libraries do, but the vast majority of public libraries in the world do not.

Are you sure? Maybe not in the world, but I'm pretty sure all large public libraries in the US do subscribe to these kinds of databases.

I admit that I don't have statistics [edit: on libraries], but most libraries in the world are not large or in the US, and JSTOR's prices for a "small" library in "the rest of the world" are much, much larger than [edit: wrong — comparable to or perhaps a bit larger than, but not much, much larger than] their entire budget. Check out http://support.jstor.org/csp/PriceCalculator/. This code (for Chrome) gives me a yearly price of $81162.70, although it hangs the browser for a while first:

    function mouseEvent() { var event = document.createEvent("MouseEvents"); event.initMouseEvent("click", true, true, window, 0, 0, 0, 0, 0, false, false, false, false, 0, null); return event; }
    function each(list, thunk) { list = Array.prototype.slice.call(list); for (var ii = 0; ii < list.length; ii++) { thunk(list[ii]); } }
    each(document.getElementsByClassName('expand'), function(link) { link.dispatchEvent(mouseEvent()) })
    each(document.getElementsByClassName('e-only'), function(link) { link.dispatchEvent(mouseEvent()) })

It's sad that you have to write javascript code to do that! (But also cool that you did. :)

"Complete Current Scholarship Collection" for 22751.90 is a duplicate of all the things above it. So I think some of the entries have been double counted.

The real price for most libraries may about 1/2 or less of your estimate (they won't be interested in everything). And 20,000 to 40,000 is (well, shouldn't) be a lot of money for a public library.

That's the salary for a single employee! I would expect a library to have at least 5 employees, plus a budget to buy books.

Also I would expect a small library to have only a subset of the papers, and for serious research you would need to "go into the city".

Oh, thanks for finding that error!

I think you're thinking very much of US salaries. $40,000 a year shouldn't be a lot of money for a public library in the US, because it's the salary for a single employee (or the total costs for half an employee!), and the wonderful public library system in the US does indeed have multiple libraries. But world GDP per person is about US$10k per year, compared to the US's US$47k — and the bulk of that GDP comes from a few rich countries with only a small fraction of the population. An average country is something like Jamaica, Thailand, or the Dominican Republic, where the per-capita GDP is something like US$8.8k.

So US$40k per year is the salary for almost five employees. Except that within Jamaica or Thailand (or, to a lesser extent, the US) the median salary is much lower. And it's probably not the prime minister's niece who's working the librarian job. So maybe it's more like eight to ten employees.

So, yeah, most libraries — even measured numerically, but especially measured by the number of people who rely on them — are a lot poorer than what you're used to.

I haven't checked yet to see if the National Library here in Buenos Aires has JSTOR access.

I don't know this for sure, but I suspect that if you contacted JSTOR from a low income country they may give a better deal.

BTW, if you really do need JSTOR, it's not hard to find a library card number from a US library and use that for access anywhere. (Well, I don't know JSTOR specifically, but all the other databases I've used from my library are available to me at home after I put in my library card number.)

Their price schedule divides "Public Library – Small" into "US", "Canada", and "Rest of the World". It's possible that someone phoning them up from Senegal or Paraguay would be able to negotiate a lower price, but it's not as if their existing price list doesn't recognize the existence of different countries. (Still, lumping Switzerland and Malawi into the same category might not represent a deep level of consideration of the issues.)

For what it's worth, I was using their web site from my house here in Argentina, which is usually classified as a "middle-income country," but where you can hire a full-time employee illegally for US$4000 per year.

The prices are the same for all the versions (size or location), so I don't know why they ask.

The only thing that seems to change the price is the organization type.

So a Mercedes should cost 1/10th in Zimbabwe of what it does in the West, if people make 1/10th there?

I was rebutting a factual claim ("Most public libraries have relationships with JSTOR that allow members to access the articles online"), not a normative one. An analogous factual claim might be that most Zimbabweans drive Mercedes. Even without having access to Mercedes's sales figures by nation, that ought to appear unlikely to you?

Yes, let's agree on and further reason from the the premise that it is not currently true that most Zimbabweans drive Mercedeses ;)

My point was: your argument seems to be based on refuting the argument that the JSTOR subscription is not expensive for the average library because it is only about one yearly salary of the average rank-and-file employee, by saying that that only holds for the libraries in the US (maybe some parts of Europe, but let's say the US for the sake of this argument), and that in many other countries salaries are lower and therefor the relative cost of a JSTOR subscription higher.

So, my (perhaps naive) interpretation of this is that your ulterior argument is that JSTOR is too expensive for many libraries outside of the US, and that they therefore don't have access to its contents.

I further deduce from that, from the context in which you bring it up, is that you don't find it a problem that people take the content from JSTOR and redistribute it to people who don't have easy access to libraries who do have a subscription. Now I'll grant that this is a fairly big leap to make, and maybe you're not holding that position; but within the given context (of people arguing pro and con the actions of the Reddit guy what's-his-name), I think it's not unreasonable of me to assume so, either.

So, to close the circle, my 'question' was (but of course it is a 'question' that is, in the end, a way of stating my position in the discussion...) if it is reasonable to hold that when something is too expensive for people, it is OK to circumvent the rights holders' restrictions on the use of something. (I'm deliberately being vague on issues like 'moral ought' vs 'legal ought', if JSTOR really has a common-law variation of a database right on their collection, jurisdiction etc. - I don't really think they're important for the question at hand).

relationships = they pay the institutional fee (possibly reduced) to JSTOR

> somehow "owns" its archive

It does own its archive. They just may not own the exclusive copyright to the contents of the archive... there is a subtle distinction.

There is no non-exclusive copyright. J-STOR is not the copyright owner, period.

They do own the right to the composition of their collection, so someone who got the whole collection would be liable to infringing their right on the composition of the collection; in contrast, a random sample of articles would infringe on the publishers' IP rights rather than J-STOR's.

The subtle point is that J-STOR is absolutely not interested in the original copyright owners having to hunt down abusers, because that would (in all likelihood) appear like an additional, avoidable hassle to the latter and would make them less likely to agree to have J-STOR distribute their content. [edit: apparently it's the US Attorney General more than J-STOR who is pushing this case forward]

In comparison: if someone sneaks into a cinema to see a movie, you would accuse him of cheating them of the entrance fee, and not of "stealing the movie". If someone sneaks into a cinema and uses his camcorder to record the movie, he is cheating the movie theater of the entrance fee and misappropriating the production company's movie (with the suspicion that he might pirate it later), but he did not steal the movie from the theater. That would involve something like walking away with the movie theater's copy of the movie, which would fulfill the criterion that what's stolen is not there afterwards.

Misappropriation of IP is not stealing. It's unauthorized copying - that certainly has the potential to harm the bottomline of the copyright owner, but with an impact that is much harder to quantify than the stealing of an actual physical thing.

IP owners and friends of them who use the word 'stealing' want to frame the situation in such a way that appeal to the nonexistence of monetary loss is excluded - mostly because these same owners are investors in, and not creators of, the IP and do not have any other perspective than squeezing whatever value they can out of their investment.

(The authors of the original articles probably couldn't care less about some punk illegally downloading their texts, because they don't see any money from it anyways).

Notice on the first page of every JSTOR pdf:

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unlessyou have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at .http://www.jstor.org/action/showPublisher?publisherCode=hyi.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.

"you may not download an entire issue of a journal"

I didn't know this. It's as if the New York Times told paid subscribers they can only read 90% of any one issue.

Yeah, fortunately they don't appear to actually enforce that regularly through technical measures. As a researcher with legitimate paid access (via my institution) to JSTOR, it would be absurd if this were enforced. If there is a special issue of a journal exactly in my research area, I pretty much need to read all the articles in it, or at least skim them. To comply with the terms, do I really have to choose an article to avoid reading, so I only download (N-1) of the articles in the issue?

I agree with most of your comment, but there are a couple of points where I wanted to add some commentary.

Compilation copyright only applies to compilations where some creativity is employed in selecting the items to be included. There is no "sweat of the brow" database right under US law. JSTOR almost certainly does not have a compilation copyright on their collection, since any creativity being employed in selection is being employed by the journals they archive, not JSTOR employees.

At any rate, the indictment does not include any charges of copyright infringement.

> They do own the right to the composition of their collection

This was the point I was trying to make.

They own their database, even if they don't own the articles in it. The GP poster was trying to claim that they didn't "own their archive".

> the right to the composition of their collection

Is this a right recognized under US law?

§ 103. Subject matter of copyright: Compilations and derivative works

(a) The subject matter of copyright as specified by section 102 includes compilations and derivative works, but protection for a work employing preexisting material in which copyright subsists does not extend to any part of the work in which such material has been used unlawfully.

(b) The copyright in a compilation or derivative work extends only to the material contributed by the author of such work, as distinguished from the preexisting material employed in the work, and does not imply any exclusive right in the preexisting material. The copyright in such work is independent of, and does not affect or enlarge the scope, duration, ownership, or subsistence of, any copyright protection in the preexisting material.


IANAL, but I found some testimony from the US Copyright office regarding this.


Excerpt: In the terminology of the copyright law, a database is a “compilation.” The Copyright Act defines a compilation as “a work formed by the collection and assembling of preexisting materials or of data....” (1) Compilations were protected as “books” as early as the Copyright Act of 1790.

By that logic nothing was stolen. They still own their archives.

I hate to quibble over words, but a painting can be stolen "from the Louvre" even if it's there on loan from a private collection, just as you could say, "The necklace was stolen from my jewelry box," without implying that your jewelry box was the legal owner of the necklace.

This is more like taking a photograph of a painting on loan to the Louvre, though--- they're alleging that a copy was made, in violation of their terms of service, of a document that they don't even own (but do host). In that case, I would think that you might be violating the Louvre's camera policy, and you might even be colloquially "stealing" something from the painting's author (e.g. if you go on to publish illicit copies from your photo), but you aren't plausibly stealing anything from the Louvre.

The argument over whether something digital can be stolen at all is a different argument than the one you made in the comment I replied to. The question of taking photographs of artwork (which are not exact copies, but which have their own cultural, educational, and commercial value) is a third question which is interesting in its own right. At this pace, I'm having a hard time keeping track of what we're arguing about.

In any case, my intention was not to signal my support for one of two predefined sides in a battle over the concept of intellectual property, it was just to point out stealing "from" doesn't have to have the meaning you read into it.

> They are talking about theft of services

if it's theft of service why measure it on number of works?

it's like accusing you of using your neighbour wifi to steal over one thousand emails (which you read from your own email account)

My point being, i have no idea what service they provide, didn't read the fine article. But if they have an all-you-can-eat plan for like $300 a month per seat. Then the charges should have been "joe doe hacks MIT network and steals $300 worth of replaceable goods".

In the same lines that if someone break into a car-wash and use 5min of their water, he will hardly be convicted of organized crime and causing damages over $300 (because that's what they charge for the premium wash)... why it's so messed up when you put a computer in the middle?

Yeah, that bothered me too. Especially on page 14, where they demand that he give back the "proceeds obtained." How is that going to be determined in such an unrealistic sense?

You're precisely incorrect as far as the law is concerned.

If I make a painting, I own the copyright. If you take a picture of said painting, you own the copyright on the picture. If someone makes a collage of your picture they own the copyright on the collage. Both of you are liable for copyright infringement against my rights, but this is independent of your own rights as described above.

This is not a legal advice, talk to a lawyer.

Let's try another analogy.

What if someone hacked into Netflix and downloaded copies of all of the media that Netflix offered?

> But effort expended to build a private repository consisting of copies of things you don't own doesn't give you ownership of the result

Surprisingly, it sometimes does in the EU: http://en.wikipedia.org/wiki/Database_right

What Aaron did sounds seriously sketchy (sneaking into MIT wiring closets, trying to download the entire database, etc.), a fact that Demand Progress and several commenters here seem to be ignoring.

Defending his actions would require a very strong, multi-pronged version of the argument "if it's physically / technologically possible, it must be ok." Can MIT legally limit guest access to its network? Can JSTOR limit access to its content? Well, technically, their software didn't limit it, right? He just changed his IP address and they let him right back on, gave him permission. And then he had to change his MAC address. And then physically move to a different building.

But it doesn't matter anyway, because legal restrictions are legal restrictions. It's impossible to enforce every legal restriction in software. Put another way, we don't have to read JSTOR's server code to figure out if there's a violation of policy here -- the policy is written out as a legal document.

In the hacker world, there's a tendency to think that if something's possible, even easy, then it shouldn't be considered "breaking in" or "stealing." If my Gmail password is "password," then of course you're going to read my email! I had it coming. In the real world, though, this is still a crime.

>What Aaron did sounds seriously sketchy (sneaking into MIT wiring closets, trying to download the entire database, etc.),

reminded me about ancient IP infringement case:


+1 for correlation with ancient Greek mythology.

I seriously recommend Prometheus Unbound for a more modern treatment of the subject.


Compared to that, 35 years in jail seems quite humane ;-)

Clay Davis agrees with you


Right, because the standard penalty for trespass onto campus property is a federal indictment. Good thing no MIT student has ever snuck into a restricted area before.

Only he's a fellow at Harvard, not an MIT student or faculty. So no, they couldn't handle this internally.

Possibly with the local police, but that may have been what caused it to escalate.

Evidently you're unaware of MIT's history of pranks.

Even with MIT's charitable (but declining) tolerance towards student hacking/pranking, they take a very dim view of non-students doing the same (even alumni).

There used to be considerable tolerance of Harvard students doing pranks at MIT also, since Harvard/MIT had a bit of a prank-exchange rivalry (though MIT students engaging in shenanigans on the Harvard campus is more common than vice-versa).

I believe that the storytelling after breaking and entering is an important part of it.

While I'm no MIT student, I have seen the tomb of the unknown tool. (Though, at the time, I was accompanying a bunch of MIT students).

Thanks for the dissenting opinion.

Is this worse than the sort of thing that goes on in the early days of most startups, including our most revered? People around here have a lot of respect for pg, rtm, tlb and their startup Viaweb - go back to "Founders at Work" and read how they got computer time needed to get the startup going. That kind of thing is practically universal in startups. The good ones, anyway.

So Aaron appears to have cut some corners in getting an interesting project off the ground. Slap him on the wrist.

I wonder if it has anything to do with the fact that he's in charge of a PAC rather than a startup. IOW these charges could be politically motivated.

Well, this again is a very hacker-centric perspective. Even if it's truly important that someone be able to surreptitiously copy all of JSTOR once in a while for the sake of innovation, who is in a position to let him off based on that?

Plus, putting everything else aside and evaluating him as a hacker, he doesn't come out looking too good. If he'd scraped all of JSTOR without getting caught, it would make a better story. As it is, he attracted a lot of attention, and the report of how he was caught has an air of inevitability to it. JSTOR called up MIT, MIT was looking for him, and he gave them a lot of time to find him.

sketchy != illegal

Trying to download entire databases!!! Oh my. Won't someone please think of the children!!!

It's definitely illegal, but some are trying to paint it as not sketchy, in what I think are disingenuous ways.

This is the most technically competent charging document I've ever read. I guess there must have been some hackers on the grand jury.

Paragraph 35 & 36: which "protected computer" on MIT's network did he access? Certainly they're not trying to claim his laptop was a protected computer? Are they talking about the DHCP server or whatever registration frontend MIT has for the DHCP assignments? I have trouble with the concept that a violation of a computer use agreement (when there are no operative security barriers in place) constitutes a violation of the computer fraud and abuse act. Then again, I've always thought that act was vague and therefore overbroad.

Obviously what he did was bad in some sense (at least from the perspective of JSTOR and MIT), but even if it should be a crime rather than a civil dispute or internal disciplinary action at MIT, I don't like the fact that just about any misbehavior on the internet becomes a federal case because the probability of no interstate resources being used is very low.

Finally, I take issue with the notion that someone who is accessing a service through a public interface is criminally responsible for downtime if too high an access rate causes service degradation or an outage. The claims that JSTOR's servers were overloaded and (one?) even went down at some point are clearly there to set up a later claim of damages. Haven't they heard of rate limiting (in this case, since it was a rogue laptop stashed in a data closet, rate limiting by IP)? That wouldn't work against a concerted denial of service attack, but this was no denial of service attack. JSTOR seems to have been relying on manual intervention to stop article leeching that could lead to a (partial) outage. That's naive, and not a good idea.

More than likely, the document was written by the prosecutors office.

The procedure as I understand it is:

* Prosecutor assembles evidence, writes indictment.

* Prosecutor presents evidence to Grand Jury. This may include witnesses or documents.

* Grand Jury votes on if there is enough there to approve indictment

If they've got a computer crimes division, then they're going to have hacker types in the prosecutor's office to do this stuff and get the details right.

The indictment is going to be the most slam dunk part of the evidence that there is, as it's written by the prosecutor and there's no counter to it. If it doesn't look airtight, then it's probably a very weak case.

Though, looking at it here, It's not looking very good for aaronsw. The combination of mac address spoofing and a locked wiring cabinet show physical and electronic security that was bypassed, repeatedly. That's easy to explain to a jury.

A funny omission is that the indictment never actually says that the wiring cabinet was locked, or how Aaron supposedly broke into it. I infer that it wasn't locked.

locked doors are never a problem for MIT students.. (and apparently even some harvard students)

But, given the rest of the indictment, I'd think that the prosecutor would be sure to throw in "Swartz picked the lock on the restricted wiring closet in order to introduce the Acer computer," since it would incline the grand jury to be more likely to hand down an indictment — unless she knew this was false.

> no operative security barriers in place

I don't know... Even if my front door was open, you still aren't allowed to enter my house without my permission.

But there is an element of permission inherent in DHCP. Your device is actively configuring my device specifically to allow network access. It's not an open door; it's a sign saying "This way please". That said, its obvious this was an attempt to circumvent access controls.

There's an element of permission inherent in a door! I mean, it's a breach in a wall, specifically put there at great additional expense, just to allow people entry! In either case, an enabling technology isn't inherently an invitation. Again, just because you CAN use a technology to do something, doesn't mean you MAY.

And with respect to the comment about "this way please", and the "actively configuring my device", please note that the client initiates the DHCP conversation with a Discovery message.

  Discovery: "Can anyone give me DC info so I can set up my H, please?", 
  Offer: "Yes, I can, here's one configuration option!"
  Request: "Yes please, that sounds good, I'll take it"
  Ack: "Okay, you got it".
Note that the DHCP Discovery

I'm unfamiliar with MIT's guest setup, but I assume they let you get an IP address, but before you can access anything, you have to acknowledge their terms of service / acceptable use policy. If you fail to abide by this, you'd be accessing the network without permission.

You're right that (as alleged), this would be an obvious attempt to circumvent access controls.

Given the way things are worded, I'm guessing that the MIT computer that was improperly accessed was a router or switch. Hell, just plugging into the switch directly could be construed as unapproved access to a computer device. I think the Federal law treats anything with a processor a computer.

The MIT Guest network is actually pretty awesome, it doesn't ask you to acknowledge anything, it just grants instant access.

Yeah, I think it's a bad argument to personify network protocols or imbue them with intent. Legally speaking, what is more important is the intent of the person using them.

A stranger would probably be allowed to enter the property unless you had posted No Trespassing signs. They'd be required to leave upon request but I'm not sure that simply entering your house via an open door would constitute a crime or tort. An open door could likely be construed as implied consent.

Do you have any basis for claiming that "an open door could likely be construed as implied consent"?

I would not advise that you try that in many parts of the US. You'll be risking getting shot, and the homeowner would not have committed a crime.

>I take issue with the notion that someone who is accessing a service through a public interface is criminally responsible for downtime if too high an access rate causes service degradation or an outage

Ah... well surely you've heard of people being prosecuted for denial of service attacks? Most recently, members of anon getting raided because they used LOIC? If you use a network in a way that is intended to degrade others' quality of service, even if you are just accessing things via normal protocols at a really high rate, you are breaking the law. In this case, it does not look like they are alleging that he intended to cause a service disruption, but they claim that he repeatedly circumvented measures that JSTOR put in place to halt his unauthorized activities, which caused service disruptions for other legitimate users and therefore denial of service.

I think 18USC1030 is pretty broad in its definition of a "computer". Back in the MBTA hacking case, MBTA claimed a magnetized piece of paper was a computer under this clause, and the first judge that looked at it bought that (sanity prevailed and that decision was later over-ruled). I wouldn't be surprised if they are considering the MIT network or at least the routers that were configured to prevent his access the protected computer in this case.

Yeah, the switch that he hard-wired into would certainly count as a "computer".

I suppose the router could be construed as a protected computer, since it had blacklisted his MAC address and it is a computer with an OS.

>This is the most technically competent charging document I've ever read.

I'm guessing MIT had a hand in penning it, or at least provided someone who could easily explain the relevant material to the DA/Grand Jury.

I don't like the fact that just about any misbehavior on the internet becomes a federal case because the probability of no interstate resources being used is very low.

So is that why Comcast routes traffic to networks 30 miles away across three states and back?


“It’s even more strange because the alleged victim has settled any claims against Aaron, explained they’ve suffered no loss or damage, and asked the government not to prosecute,” Segal added.

Nowhere do they say he did not do it however.

cached: http://webcache.googleusercontent.com/search?q=cache:http://...

JSTOR is being very vague about their role in this, so that might unfortunately be wrong about just how settled JSTOR considers things on their side. Their statement feels extremely carefully worded: http://about.jstor.org/news-events/news/jstor-statement-misu...

MIT (and the alleged disruption to other MIT JSTOR users' access) may also be relevant to the decision to charge.

My understanding is that MIT has a freewheeling attitude to information, but also deep organizational and funding links to national security institutions. So their hacker ethos might tell them to brush it off, while their federal relationships require them to take a tougher stance.

In terms of broader implications, I would actually have many fewer problems with MIT pressing straightforward trespassing charges. If he broke into a server room and messed with equipment there without the owner's permission, they could prosecute that under very ordinary state criminal law long predating the computer age.

It's the weird "stealing documents from JSTOR" federal case under the Computer Fraud and Abuse Act that's more worrying, because it's extremely vague what those kinds of charges can cover (in some interpretations, essentially any violation of a ToS).

JSTOR knows they'd better be very vague if they don't want too many rather intelligent people wondering just how they actually add value to the chain.

This is almost too good:

"As Swartz entered the wiring closet, he held his bicycle helmet like a mask to shield his face, looking through ventilation holes in the helmet."

I was waiting for a part where he cut eyeholes into a newspaper.

Welcome to Cambridge. This is as gangsta as it gets.

This reminded me of Jacques Mattheij's list of ideas for startups (see http://jacquesmattheij.com/My+list+of+ideas+for+when+you+are...):

"(45) OpenPapers

A place where all academic research that has been funded in part by public funds is published, journals be damned. Hopefully with deep pockets to fight off the lawsuits."

arXiv seems to fit this bill, and has a reasonable bulk data access policy http://arxiv.org/help/bulk_data

For NIH-funded studies, PubMed Central is actually doing a pretty good job. It would be great if PMC actually had a mandate to collect papers for all US federal grant-funded research, though.

This all hinges on what he was going to do with the documents. If he was looking to perform some large-scale analysis (such as he has done before) and publish the results academically, then this would fall under the academic mission of MIT, and therefore be legit. But if this were the case, why go through the hassle of hacking the system? Why not just ask JSTOR for cooperation? Or maybe he did, and they rejected it?

There has got to me more to this story, because I just can't for the life of me believe that he would download the documents to "free" them on internet (as is alleged).

A detail you missed is that Swartz was not a student or faculty member at MIT. He was a fellow at Harvard.

I wonder what they'll push for. He sounds pretty screwed if this evidence pans out. Looks like he could even end up with a few years' time if the prosecutors want.

1. Wire fraud maxes out at 20 years outside of a presidentially-declared emergency. No fine cap, it seems. http://uscode.house.gov/download/pls/18C63.txt

2. Computer fraud under 1030(a)(4) caps out at 5 years with no prior offense, no fine cap. http://uscode.house.gov/download/pls/18C47.txt

3. 1030(a)(2), (c)(2)(B)(iii) looks to be another cap of 5 years. Ibid.

4. 1030(a)(5)(B), (c)(4)(A)(i)(I),(VI) looks like another cap of 5 years. Ibid.

IANAL, just trying my best to read the code itself.

Ignoring legality, Aaron's actions, case specifics, etc., I have to admit: I really wish that the data in question was free and publicly available.

And you have a right to demand it: by federal law, all publicly funded research should be in the public domain (unless national security blah blah blah...). Unfortunately this law isn't enforced except in some aspects through provisions of all recent NIH grants (and maybe others).

Anybody with a library card to a major public library can get this stuff for free over the web.

There's an enormous difference between controlled access to individual globules of data, and free access to the entirety of a dataset.

When someone risks 35 years in jail for something like this, you know your justice system is broken.

I know he won't get 35 years, but it's nevertheless outrageous that it could happen.

Do you think he'd get life in Italy?

The title is inaccurate.

It is alleged that he signed up for guest accounts on their network with different laptops, changed his MAC address and re-registered if the IP he was using was blocked (by JSTOR) or cut off of the network (by MIT), and finally connected a laptop in a basement networking closet.

I guess you could say that is 'hacking' in the unauthorized access sense, but not in any meaningful sense. It isn't breaking and entering if someone repeatedly trespasses somewhere (say, banned from a store) even if they change their clothes to avoid detection.

He found ways to get around the (minor) protections put in place using a computer. That fits the colloquial definition of hacking. We don't own the term anymore - if we ever did.

IANAL, but legally speaking unauthorized access is a crime regardless of how easy it was to gain access.

This was struck down by the courts by the way. Why? If unauthorized access is a federal offense, and unauthorized access can be determined by (e.g.) a EULA, then it basically allows a business to dictate criminal law. For example:

1. This would allow all of those 'You agree that you are not a law enforcement official' B.S. 'terms of use' on warez servers to actually have teeth.

2. "If you are a {black,hispanic,gay,etc} person, you are not allowed to access my website."

Do any of these sound reasonable? I should hope not.

That's because those restrictions are illegal themselves. "Don't download the whole database" is not an illegal restriction.

* How about "You are not allowed to hyperlink to this page?"

* How about that 'MySpace Hacking' case? From Wikipedia:

  Judge Wu summed up his opinion by stating that allowing a violation of a
  website's Terms of Service to constitute an intentional access of a computer
  without authorization or exceeding authorization would "result in
  transforming section 1030(a)(2)(C) into an overwhelmingly overbroad enactment
  that would convert a multitude of otherwise innocent Internet users into
  misdemeanant criminals." For these reasons, Judge Wu granted Drew's motion
  for acquittal.  Government eventually decided not to appeal [16].

I don't think hyperlinking is the crime in question for this case.

My post wasn't very clear. I was making two separate points.

There are two separate issues as well, whether or not a restriction is valid and the ease with which it is circumvented. Lots of EULA/TOS terms are unreasonably broad. But let's say the TOS says "use of service without a password is unauthorized". Seems reasonable.

Somebody runs a program to guess passwords until they find one that works. "Ah ha," they say, "I have a password, so my usage is now authorized." I don't think a judge or jury will buy that. Even if the password was easy to guess.

I believe that was SpikeGronim's point. Assuming the definition of unauthorized is legit, there's no such "it was too easy" defense. Or "I could do it, therefore I must have been allowed to do it."

I think the parent was just disappointed at using "hacking" to refer to something as mundane as getting guest accounts to the network.

This is a very frightening belief.

How long until posting a negative comment to a blog is "unautorized access" to that blog? Gaining access was easy: all you had to do was type a comment and hit submit. But some Powers That Be decided they didn't really want you to post that, so now it's a federal computer crime.

Land of the free.

Speaking of "TV drama levels of understanding of criminal justice"... :)

With only a few exceptions, persons accused of crimes are not presumed to have mens rea. Statutory rape, for instance, has "strict liability"; even if you don't know you're committing a crime, you're liable. Most criminal offenses are not like this. The state is required to establish mens rea.

A prosecutor could say that a ToS-infringing blog comment is a criminal violation, but unless that prosecutor can establish that the comment was made in purposeful, knowing, or reckless violation of the ToS, they'd be wasting their time.

So where's the mens rea in this case?

Aaron's? I respectfully decline to lay out a case against Aaron on HN.

That's currently unsettled, and yes a lot of computer-crime experts (even typically law-and-order-leaning ones) consider it a huge problem--- it might actually be a federal crime to post a comment on a blog in violation of the blog's Terms of Service.

This case is currently winding its way through the courts, and will hopefully be overturned: http://volokh.com/2011/06/14/petition-for-rehearing-filed-in...

This problem extends way beyond hacking laws—the average person in the US probably commits several felonies a day: http://www.threefelonies.com/Youtoo/tabid/86/Default.aspx

I read this book and think that federal crimes are punished too harshly, but the people profiled seem far from average.

You're getting hung up on the "access" part, it's the "authorization" that's important. In your example it would likely be almost impossible for the blog owner to prove that someone intent on posting negative comments was not authorized to do so.

As a side note, I think we all should be more careful with "omg, we're losing our freedoms!" comments. They're usually not as clever as they feel at the time, and have the dangerous property of inuring us to such claims.

It isn't breaking and entering if someone repeatedly trespasses somewhere (say, banned from a store) even if they change their clothes to avoid detection.

It might not be breaking & entering, but it's still trespassing, which is a crime.

> It might not be breaking & entering, but it's still trespassing, which is a crime.

Jaywalking is murder! Well, it's still a crime...

I'm not sure what you're appealing to. Real-world analogies don't always transfer readily to digital law. Even in the words of the ageing CFAA, "exceeding authorization" to steal commercial information is most certainly a punishable crime. You don't have to pull out nmap and zero days to be convicted of hacking.

The indictment mentions only one laptop, BTW. No indication he signed up with "different laptops".

22. On October 8, 2010, Swartz connected a second computer to MIT’s network and registered as a guest, using similar naming conventions: the computer was registered under the name "Grace Host," the computer client name "ghost macbook," and the throw-away e-mail address “ghost42@mailinator.com."

23. The next day, October 9, 2010, Swartz used both the “ghost laptop” and the “ghost macbook” to systematically and rapidly access and download an extraordinary volume of articles from JSTOR. The pace was so fast that it brought down some of JSTOR’s computer servers.

Huh; my bad.

Also mentioned is that Swartz used the network closet to take 2 IP addresses. Are we to infer he hooked up both laptops with an IP each? Or is the real scenario that he was using one laptop under 2 identities? (Not that impossible.)

How is this not "unauthorized access" "in any meaningful sense"?

They blocked his IP, they blocked his MAC, and he hid a machine in a wiring closet to get on MIT's network. What would he have to do to make it "meaningful"?

Crack a password. Use SQL injections. Steal a credit card. Spoof someone else's MAC and IP. Steal a cookie. Something like that.

He was accused of using a guest network account on MIT, with a fake name, new MAC and IP, and throw-away email address. From there, he used a script to download lots of JSTOR documents.

This isn't the internet equivalent of "checking out too many library books". It's the internet equivalent of "checking out too many library books whilst wearing a false mustache".

So it should only be illegal if a hacker wouldn't think it's lame? Got it.

Nah. It should only be illegal if normal people can't do it. Most of what Aaron did is stuff lots of people do.

There's a difference between wearing a dummy badge that says "I am Gary Host", a badge that incorrectly says "I am Bill Gates" (as that would be some kind of identity theft) and forging a passport in "Gary Host"'s name. What aaronsw did was far closer to the first.

Now, you could argue that scripts are power tools, and using them requires a higher standard of behavior. If you are driving a plane, giving dummy credentials over the radio is a lot more serious than a kid with a toy CV radio.

Even then, the dummy credentials didn't really cause any damage. The damage was done by the script itself. Even if he used his real name, the damage would have been done.

As a hacker, I think this would solve a lot of problems in the world. ;p

I did not say that it was not unauthorized access, just that hacking (in the breaking into computer systems sense) involves actually breaking in.

Changing a MAC address (on a device that you own and control) does not constitute hacking. It is as simple as ifconfig(8) and if you have a consumer router it probably has an option in the web interface to do it.

The indictment asserts that Mr. Swartz intended to distribute the files downloaded but did not substantiate this claim. I wonder what proof they have of this? (There are, of course, a great many laws dealing with probable intent that need only convince a jury of said intent without demonstrating it's validity.)

That was what he did in that case, but it wasn't what he did with the four hundred thousand law review articles he analyzed for conflicts of interest more recently.

The indictment doesn't have to provide all the evidence presented to the Grand Jury, and the Grand Jury process itself is secret (the actual court case won't be).

I've heard that, in drug law, possession of more than a certain amount is automatically interpreted as intent to distribute. Maybe the prosecutors in this case intend to argue something similar based on the nature or quantity of the documents retrieved.

Right... the real question seems to be what was he going to do with the files. The indictment doesn't really offer any evidence to this, and I doubt Aaron will be doing much talking until his defense.

The Boston Globe's got an article written on this that they're updating: http://www.boston.com/Boston/metrodesk/2011/07/cambridge-man...

Posted in another thread: http://news.ycombinator.com/item?id=2782752

This is not the first time he has done something like this if memory serves me. In late 2008 Mr. Swartz and Carl Malamud went to select libraries, ones with free PACER access, and proceeded to download ~700 GB of information that was behind a paywall. After which they made all of it available on Mr. Malamud's website.


Demand Progress PAC's website is down, but they released a statement:

(from: http://webcache.googleusercontent.com/search?q=cache:9k5ryiX... )

Cambridge, MA– Moments ago, Aaron Swartz, former executive director and founder of Demand Progress, was indicted by the US government. As best as we can tell, he is being charged with allegedly downloading too many scholarly journal articles from the Web. The government contends that downloading said articles is actually felony computer hacking and should be punished with time in prison.

“This makes no sense,” said Demand Progress Executive Director David Segal; “it’s like trying to put someone in jail for allegedly checking too many books out of the library.”

“It’s even more strange because the alleged victim has settled any claims against Aaron, explained they’ve suffered no loss or damage, and asked the government not to prosecute,” Segal added.

James Jacobs, the Government Documents Librarian at Stanford University, also denounced the arrest: “Aaron’s prosecution undermines academic inquiry and democratic principles,” Jacobs said. “It’s incredible that the government would try to lock someone up for allegedly looking up articles at a library.”

Demand Progress is collecting statements of support for Aaron on its website at …URL…

“Aaron’s career has focused on serving the public interest by promoting ethics, open government, and democratic politics,” Segal said. “We hope to soon see him cleared of these bizarre charges.”

Demand Progress is a 500,000-member online activism group that advocates for civil liberties, civil rights, and other progressive causes.

About Aaron

Aaron Swartz is a former executive director and founder of Demand Progress, a nonprofit political action group with more than 500,000 members.

He is the author of numerous articles on a variety of topics, especially the corrupting influence of big money on institutions including nonprofits, the media, politics, and public opinion. In conjunction with Shireen Barday, he downloaded and analyzed 441,170 law review articles to determine the source of their funding; the results were published in the Stanford Law Review. From 2010-11, he researched these topics as a Fellow at the Harvard Ethics Center Lab on Institutional Corruption.

He has also assisted many other researchers in collecting and analyzing large data sets with theinfo.org. His landmark analysis of Wikipedia, Who Writes Wikipedia?, has been widely cited. He helped develop standards and tutorials for Linked Open Data while serving on the W3C’s RDF Core Working Group and helped popularize them as Metadata Advisor to the nonprofit Creative Commons and coauthor of the RSS 1.0 specification.

In 2008, he created the nonprofit site watchdog.net, making it easier for people to find and access government data. He also served on the board of Change Congress, a good government nonprofit.

In 2007, he led the development of the nonprofit Open Library, an ambitious project to collect information about every book ever published. He also cofounded the online news site Reddit, where he released as free software the web framework he developed, web.py.

Press inquiries can be directed to demandprogressinfo@gmail.com or 571- 336- 2637

“This makes no sense,” said Demand Progress Executive Director David Segal; “it’s like trying to put someone in jail for allegedly checking too many books out of the library.”

No it's not. It's like sneaking into the library at night and making photocopies of all the books. Then, upon getting caught, the perpetrator sneaks back into the library in a different disguise and continues to photocopy more books. Repeat this action of getting caught and sneaking back in a few more times and combine this with the fact that his downloading of documents affected JSTOR performance for other legitimate users of the archive and you get a sense of what he's really done.

How is this excusable?

I'm completely onboard with those who claim that we need some reform in scientific publishing, but Aaron's actions smack of low ethical standards to me, not to mention extremely poor judgement on his part.

EDIT: Hi downvoter! Can you please explain why you think I'm wrong?

Not a downvoter, but the way governments are reacting to those demanding transparency is what I call draconian. This is no longer symmetric opposition, this is a way of terrorizing those who want a little more freedom. Faced with obstacles as the one Aaron faced, I would do the same. So sue me.

Disobedience is not the same as terrorism although they would like you to think so. Disproportionate punishments are what I term a terrorist act. Stop being such a conformist and stop using their language. Every time you test these boundaries and fight for it you will be fighting for your freedoms.

Fair enough. I agree with most of what you're saying here. I definitely believe that any punishment involving more than a fine and/or some community service would be disproportionate to the crime here.

What I didn't like was the statement portraying him as some kind of hero. I'm just pointing out that he's not. He didn't have a legal right to be using the documents, he shouldn't have been trying to download the documents using a guest account at MIT obtained by submitting false information, and he certainly shouldn't have tried to get back into the network after being banned multiple times.

If his goal was to put a lot of scientific papers into the public domain, I can think of many other ways he could have achieved this. So I'm also a bit puzzled by his approach here.

Stop being such a conformist and stop using their language.

This attack seems rather uncalled for.

Every time you test these boundaries and fight for it you will be fighting for your freedoms.

I can see where you're coming from on this, but I'm not entirely convinced that you're right.

How did Aaron get access to the for-pay articles (page 9)?

Also: nice going, Aaron! Drag research access into the 21st century, kicking and screaming!

Does anyone think it's odd that an Acer laptop could write these files to disk faster than JSTOR could serve them?

"Does anyone think it's odd that an Acer laptop could write these files to disk faster than JSTOR could serve them?"

Nope. I bet the JSTOR servers are serving many concurrent requests. If he had the servers to himself then yes, that would be surprising.

FTA: This is more than one hundred times the number of downloads during the same period by all the legitimate MIT JSTOR users combined

It sounds like he was using the majority of their resources. But maybe those servers serve lots of places, and they only mentioned MIT as an example.

They do. Universities across Canada also subscribe to this.

Yeah, my college also had it. But the article made it sound like only MIT was using these servers.

Anyone on MITNet has access to JSTOR articles free of charge to the user. Similarly the ACM, IEEE, etc. all have agreements like this with major universities.

Oh, so some articles are available to buy, and the subscription includes those and more? I thought the subscription was the subset.

JSTOR offers packages and any articles not included in those packages can be purchased by individuals. Most choose instead to request it via interlibary loan and you'd have it within 24 hours usually.

Yeah, I was laughing about how an Acer laptop took down their service and did damage to their network. If I was JSTOR, I wouldn't prosecute just because it makes our company look ridiculous.

Because, of course, JSTOR plans for 100x usage spikes and performs neither logging nor accounting, so serving a file is just as simple for them as downloading it is for the client.

> Does anyone think it's odd that an Acer laptop could write these files to disk faster than JSTOR could serve them?

Why would that be odd? SATA = 3 Gbps throughput with minimal overhead, Ethernet = 1 Gbps with lots of overhead (IP headers, Ethernet headers, HTTP headers)

SATA = 3 Gbps throughput with minimal overhead

USB HD's top out about 200Mbit/s. JSTOR's RAID arrays should be able to drown his laptop without breaking a sweat.

It sounds like JSTOR's servers aren't really optimized for high article download rates, to the point that his one laptop accounted for a significant part of normal continental US load. They probably had some bottleneck in the system that they never noticed before — maybe their logging infrastructure was absurdly slow or something.

JSTOR isn't at MIT; it's an online service. He just needed an MIT IP address to be able to access it without paying.

Did it specify that he was writing to local disk? If I did a stunt like this, I'd probably try to write to some cloud storage like Amazon S3. I'd bet that Amazon's servers can drown JSTOR's, particularly if there's a big pipe like MIT's in-between.

Yeah, it said he went in at least once to swap out the external disk. I assumed it was USB and not eSATA or something.

Also, it's a lot easier to find the owner of a Amazon account (credit card) than a "ghost" laptop under a cardboard box in a closet.

Does anyone think it's odd that an Acer laptop could write these files to disk faster than JSTOR could serve them?

Not at all. Writing to the hard drive is going to take much less time than downloading the articles from a remote server.

"Aaron Swartz ... was a fellow at Harvard University’s Center for Ethics"

It is not impossible to imagine a code of morality which views his alleged actions as ethical.

I'd argue that he's fully ethical and it's our laws that are on the wrong side here. Pity the latter have the guns...

The only reason he's not a hero is that he didn't put all the PDFs up on a torrent. We are the library now.

Wouldn't any code of ethics endorsed by Harvard view hacking MIT as a virtuous act?

Definitely true -- I thought it was a relevant bit of information so I quoted it.

That's fair, I was just refuting what your comment implied, that it was an unethical action.

(I understand that you may not have intended to imply it. Just wanted to put that out there.)

So since we don't disagree on that: Do you think what he did is ethical?

edit: also, boston.com says he "is a fellow at Harvard’s University’s Center for Ethics". A quick google says that he was at least as of October 2010:


I guess it's the perfect occasion to link to the work of Eric Schwitzgebel: http://schwitzsplintersethicsprofs.blogspot.com/

"Do Ethicists Steal More Books?" is particularly tasty.

It's hard to trivialize downloading 4 million articles using a web scraper's bag of tricks and then some. If the information was publicly accessible these charges wouldn't stand unless he tried to distribute it. If it was something so commendable, why would you cloak your activities or go to a different university to do your dirt instead of Harvard (where your a fellow of some sort) or Stanford (where you attended). Regardless of the motives and ideals or the excess of the charges, this isn't one of those hapless grandma versus the RIAA stories. He must have known what he was doing.

The pricing and restrictions on the dissemination of academic papers is by any rational evaluation nothing short of ridiculous and contradicts the academic ideal of free exchange of ideas for the advancement of knowledge. However, history of scholarship is also a history of patronage, academic politics and in-fighting for greater prestige.

It's sad that someone like Aaron has to be treated like a domestic terrorist. It's sad that we have a vindictive justice system willing to flaunt the Constitution in this day and age with what effectively amounts to cruel and unusual punishment so they can "make an example" out of someone.

However, it's no one's fault that Aaron was so emboldened to take this initiative without sufficiently ensuring that he would be free from criminal prosecution.

Am I alone in thinking that these "hacktivists" will only prompt government to push more frivolous data theft laws and heavier punishment for offenses that may one day victimize hapless, innocent people? It's going to get a lot worse before it gets better.

> However, it's no one's fault that Aaron was so emboldened to take this initiative without sufficiently ensuring that he would be free from criminal prosecution.

Maybe he did it knowing full well what the consequences might be. He seems to be a pretty principled guy.

Consider showing your support for Aaron here:


Demand Progress is an organization Aaron co-founded. They've done some great watchdog work on things like PROTECT IP, the Patriot Act, the Internet Blacklist Bill etc.

There is a riser closet in my office with various internet service providers wiring in it feeding the entire building.

If I were to enter this riser closet and plug into my laptop into one of these lines, I would be charged with theft of service and deservedly be sent to jail. It doesn't matter if the door is locked or not. It doesn't matter what kind of security they put in place or not. It doesn't matter if I only sent a few bytes of data on their network and didn't harm anyone elses' service. It is still theft of service.

Why the hell is MIT stashing information in closed systems in first place? I thought the idea (OCW etc.) was to enable more people to learn, participate and benefit from work of academics and researchers. Hell I even donate a few hundred bucks every now and then to OCW.

It is mind boggling how the supposedly smart people are not getting their heads out of their asses so late in a world frighteningly short on distribution of knowledge that can be effectively used to solve the wicked problems that are crippling it for so long.

We really need a global, openly accessible knowledge network and a platform where all eligible can contribute and collaborate to research at least when it comes to areas that impact human society at large - medicines, natural resources etc. It is hard otherwise to see how things like Cancer and Energy shortage can be tackled.

The papers are from JSTOR, not MIT. MIT pays for a subscription to JSTOR like many other universities do.

But is it clear that MIT does not also store its research papers in JSTOR instead of making them publicly available? Reading the article I did not get that impression.

While many researchers at MIT publish in journals that sometimes have restrictions on the distribution of their papers (exclusivity, etc.), most MIT publications are issued via:

1) DSpace http://dspace.mit.edu (an open publishing platform for academic material. Most all theses produced by MIT researchers can be accessed by the public here)

2) as an MIT technical report (mostly published via dspace)

3) as an MIT "Working Paper"

You can find more information here: http://libraries.mit.edu/docs/research-publications.html

Certainly, a huge number of MIT publications do end up being mirrored in JSTOR, but they are usually published via whatever journal or conference proceeding they are accepted to first. If an author cannot find a journal that is willing to publish their paper, then they will probably issue it as an MIT technical report.

Ok, that's actually useful information. I find MIT's approach fairly reasonable - may be all others should follow suit and JSTOR-like walls would not be necessary some day.

I donate to OCW as well (LOVE OCW) but I don't think any of this information is on a closed system. I believe JSTOR allowed all mit ip addresses access (for free) to every article in their system.

Aaron's downloads came from an MIT address to the JSTOR database.

I agree with you that knowledge needs to be more accessible, but this is not the method to achieve it.

can someone please explain what the deal is here for us uninitiated? sounds like they are throwing the book at him for stealing books? seriously? why is the prosecution being so aggressive? did he profit from it or something? this sounds so petty

The cynic in me suspects that he has annoyed someone in power with his many other political activities (Demand Progress etc). When I first had contact with Aaron he was just this guy who wrote rss2email, it's been very inspiring to see him move on to hacking politics.

He violated Terms of Service for a large database of journals at an egregious rate. If he had stolen say...10,000 articles this may be a non-issue. The fact that they can tie 4,000,000 articles to him is what makes this so bad.

The other part that is extremely bad is his measure to continually evade MIT MAC banning through spoofing. That measure of evasion proves, to some extent, his intent to steal and is a form of computer fraud.

I wouldn't be too surprised if him causing a month (months?) long interruption of JSTOR for an entire campus deeply involved in government contracts was also part of the ire.

He posted on his blog yesterday that there would be a "major announcement" on blog.demandprogress.org today, but nothing has been posted.


He also wrote this yesterday: http://www.aaronsw.com/weblog/delegation

It doesn't seem directly related but still curious.

33. Swartz intended to distribute a significant portion of JSTOR’s archive of digitized journal articles through one or more file-sharing sites.

How do they know this? Has he said something to that effect?

--edited for formatting.

Well, I'm going to guess that the prosecution is putting that in there just in case they can get the jury to agree. They can claim anything they want, the jury has to decide if it's true without a doubt.

Um, I don't know, maybe because he wrote a manifesto to that effect and has circulated various methods for doing so? He was sponsoring a google group for article requests for awhile. Seems to be gone now.

Context: JSTOR blocks you automatically if you download articles in quick succession.

Wait a minute. All he needed was a guest account to access JSTOR? That's like saying, ANYONE IS ALLOWED TO DOWNLOAD FROM JSTOR. This isn't just bad security, this is no security.

Most academic journal repositories grant institutional access based on IP address blocks. Some institutions keep this narrow and force you to use an HTTP proxy, which gives them the ability to put additional institutional-level authentication in place. The upside is that you can access the journals from off-network, the downside is using the proxy can be a major pain. Other networks, e.g. MIT, are permissive with the IP restriction and don't mandate the use of one centralized proxy as long as you are on-network. I suppose that may change after this case.

No, he connected to MIT's wireless network using a guest account. JSTOR grants full access to all MIT IP addresses.

Ballsy, considering his previous brush with the FBI over similar things: http://www.aaronsw.com/weblog/fbifile

From the JSTOR website (http://about.jstor.org/participate-jstor/worldwide-access)

"Our ultimate long-term objective is to make JSTOR available to everyone who wants access to it, while doing so in a way that ensures sustainability of the service."

Cynically, it seems like the bit about "ensures sustainability" can be translated as "we will aggressively prosecute in order to protect our bloated salaries."

The salaries of Ithaka Harbors Inc. (JSTOR's holding company), a supposedly not-for-profit organisation are indeed bloated. See pp. 7-8, 23, 26 of http://www.guidestar.org/FinDocuments/2009/133/857/2009-1338...

For those that don't like reading, the document describes multiple employees of the "nonprofit" with compensation in excess of 200k.

JSTOR's not doing any prosecution. In fact, if you read other comments / stories, it sounds like they don't want the DA to prosecute him.

I'm probably Advocating Crime but, couldn't a bunch of people coordinate and do what he was doing, over a year or two, distributed, from several universities? You totally could. The more, the less noticeable, and punishable.

edit: it wouldn't surprise me if something like that showed up; the problem has been highlighted, the legal issues made clearer, and JSTOR bloodied. Go Aaron.

There's a loosely-organized project at Wikimedia Commons to liberate at least the public-domain works locked up in JSTOR. This almost certainly violates JSTOR access policies, but once downloaded and stripped of their JSTOR title page, it's probably not illegal for the Wikimedia Foundation to host the result, since the result is in the public domain, even if ToS were violated in the document's acquisition.

See: http://commons.wikimedia.org/wiki/Category:Scanned_English_t...

JSTOR might watermark PDFs to prevent this, as some other similar services do.

So he allegedly goes out and buys a laptop just to do this heist...and then he blows his cover by doing a scrape fast enough to apparently bring down some of the MIT servers? Why was he in such a rush?

Most likely the servers crashed for whatever reason, and they looked at the logs and saw lots of recent accesses from a particular computer, and post hoc ergo propter hoc. Sysadmins often blame outlier users for crashes.

t the same time, he did this from Sept-Jan at full speed, and still only got half the archive... so..

JSTOR's statement http://about.jstor.org/news-events/news/jstor-statement-misu...

JSTOR Statement: Misuse Incident and Criminal Case

The United States Department of Justice announced today the criminal indictment of an individual, Aaron Swartz, on charges related to computer fraud and abuse stemming from his misuse of the JSTOR database. We have been subpoenaed by the United States Attorney’s Office in this case and are fully cooperating. While we cannot comment on this case, we would like to share background information about the incident and about our mission and work with the academic community and the public.

What Happened

Last fall and winter, JSTOR experienced a significant misuse of our database. A substantial portion of our publisher partners’ content was downloaded in an unauthorized fashion using the network at the Massachusetts Institute of Technology, one of our participating institutions. The content taken was systematically downloaded using an approach designed to avoid detection by our monitoring systems.

The downloaded content included over 4 million articles, book reviews, and other content from our publisher partner’s academic journals and other publications; it did not include any personally identifying information about JSTOR users.

We stopped this downloading activity, and the individual responsible, Mr. Swartz, was identified. We secured from Mr. Swartz the content that was taken, and received confirmation that the content was not and would not be used, copied, transferred, or distributed.

The criminal investigation and today’s indictment of Mr. Swartz has been directed by the United States Attorney’s Office.

Our Mission and Work

Our mission at JSTOR is supporting scholarly work and access to knowledge around the world. Faculty, teachers, and students at more than 7,000 institutions in 153 countries rely upon us for affordable and in some cases free access to content on JSTOR. Since our founding in 1995, we have digitized the complete back runs of nearly 1,400 academic journals from over 800 publishers. Our ultimate objective is to provide affordable access to scholarly content to anyone who needs it.

It is important to note that we support and encourage the legitimate use of large sets of content from JSTOR for research purposes. We regularly provide scholars with access to content for this purpose. Our Data for Research site (http://dfr.jstor.org) was established expressly to support text mining and other projects, and our Advanced Technologies Group is an eager collaborator with researchers in the academic community.

Even as we work to increase access, usage, and the impact of scholarship, we must also be responsible stewards of this content. We monitor usage to guard against unauthorized use of the material in JSTOR, which is how we became aware of this particular incident.

"Our mission at JSTOR is supporting scholarly work and access to knowledge around the world. Faculty, teachers, and students at more than 7,000 institutions in 153 countries rely upon us for affordable and in some cases free access to content on JSTOR. Since our founding in 1995, we have digitized the complete back runs of nearly 1,400 academic journals from over 800 publishers. Our ultimate objective is to provide affordable access to scholarly content to anyone who needs it."

AHAHAHAHAHAHAHA, at 50K a year? When they won't even sell it to institutions they don't consider proper schools? Somebody needs to re-evaluate their mission statement...

50k is cheap as hell. UMass Boston had 125,000 JSTOR downloads last year. Let's assume we pay $50k (close enough), that's 40 cents an article. This is cheap as hell compared to other databases that charge more than that for a single search!

It's further harming the possibilities of being an independent scholar or even reading academic literature outside academia, though, which I think is a significant detriment to academia. I'm not as worried about the $50k a university library has to pay, but if you're an independent scholar for even one year, it becomes clear how harmful to the research community JSTOR is.

Journals that used to sell archive access to individuals for, say, $50 or $100 annually, now won't sell you a membership at all, because to save money on hosting they've moved their subscription infrastructure to JSTOR, and JSTOR refuses to sell individual subscriptions. So you end up "stealing" your access; at various times I've gotten my JSTOR access via ssh -D proxies to a friendly grad student. Now I try to return the favor by providing such access to independent scholars where needed. This sort of gaming shouldn't be necessary with a non-profit organization that is supposed to be working in the public interest, though. Hell, public domain journals from the 18th century that were scanned using public grant money are locked up behind a JSTOR paywall!

Many university libraries allow to you join as a "friend" of the library, which typically costs ~$100/year and gives you full access to all library resources.

Most such libraries will restrict your use in some serious way (compared to what proper campus members get.)

For example, the University of Maryland will let you check out books and no more.

I see your point, but you could just as easily walk into your local university, use their network as a guest, and access millions of dollars worth of content for free. Most databases allow you to download PDFs so you can download a bunch of articles and take them home, legally. Proxying through your grad student friends is breaking the law.

Even among state universities access to electronic journals for guests is becoming increasingly rare (usually prohibited in the license agreement). Also if you look to much into how scholarly communications works, the idea of charging 50k is insane since that vast majority of the cost of journals are actually paid for by universities in the first place.

edit: although I do agree with your point that in with the current state of things the cost per usage of JSTOR is significantly less than most other electronic journal collections

I would be impressed if it's an actual trend that guests were being blocked from accessing licensed content.

I work in this field and I can say that almost all databases and journals have IP authentication, generally with the entire campus IP range white listed. If the vendors don't implement that, they generally use password protected accounts. There are some alternatives such as uploading a list of barcodes, Referring URLs (I'm not kidding ProQuest for example allows this), Athens, Shibboleth, and a few others. These, however, are not commonly implemented by libraries because they are not widely adopted by vendors and require additional IT support that libraries simply don't have.

Therefore, unless libraries proxy all their users through their proxy servers (They don't for on-campus users. Usually, all links will go through the proxy, but if on campus the proxy will redirect you directly to the database to avoid the overhead), it would be near impossible to enforce the restriction of guests. I would wager that in many of these scenarios, it's all smoke and mirrors, and that access is really there if you know what you're doing (e.g. just visit the vendor site while on-campus without going through the library links).

I've been at at least one university (a smallish liberal-arts college) that used a separate foo_guest wifi network for guests, which dumped people into an IP range that wasn't in the database-access whitelist (enrolled students and faculty would connect to a different wifi network, using their username/password, that was whitelisted). I agree that that isn't usual practice, though, despite being technically required by some database agreements.

The configuration I've seen is to have not only guest accounts but also to have guest computers which are on a separate IP range, same with the guest wireless network. The important issue is actually what do the details of your license agreement say regarding guest users. Some state universities will actually request for this to specifically allowed since in many states it's considered, even if unofficially, a right of tax payers to have access if they need it

But many state schools let you join the library even if you're not a student. e.g. http://www.lib.iastate.edu/info/6197 - $20/year for a non-affiliated member.

ISU's extramural borrower card doesn't allow holders remote access to (paid) academic databases.

"It's further harming the possibilities of being an independent scholar or even reading academic literature outside academia,"

And giving it away for free harms the possibilities of being an independent scholar as well. Do you seriously not understand that?

They are probably bound by contracts with the actual content owners which limits who they can give access to. Not saying it's right but even the price is probably not all their doing.

This seems contrary to Demand Progress's statement:

“It’s even more strange because the alleged victim has settled any claims against Aaron, explained they’ve suffered no loss or damage, and asked the government not to prosecute,” Segal added.

I got the impression Demand Progress was referring in that sentence to MIT as the "alleged victim", which seems slightly disingenuous to me. It also seems plausible that MIT would behave in the manner described. However I have not yet seen a statement from MIT that would corroborate this, either.

And even if it were true, the "alleged victims" don't get to decide whether the government prosecutes. The only cases where the victim can force the prosecution to abandon their case are those crimes such as spousal abuse where the only hard evidence is the testimony of the victim themselves. In other cases, the government can get all of the evidence it needs through subpoena.

That sentence may have been updated to clarify: “It’s even more strange because JSTOR has settled any claims against Aaron, explained they’ve suffered no loss or damage, and asked the government not to prosecute,” Segal added.

Yeah, I haven't seen anything that backs up Demand Progress's position.

There's a horrible misuse of the word "taken", when "copied" is so much more accurate.

Will this be sufficient to ward off a hackstorm against JSTOR?

This sounds very hypocritical. If their "Mission and Work" is to support scholarly work and academic usage, and encourage usage of large sets of data, why are they suing Aaron's ass instead of working with him to further understand what were his needs that their system was lacking and how they could reduce that gap? Why else would Aaron hack their system if not for research?

Sounds like instead of spending money in improving their system and security, they found a more beneficial (to their eyes) approach: charging Aaron will instead set the example.

Edit: see @mbreese's correction.

Edit 2: looks like my comment missed some important pieces of the puzzle. See further down the thread.

Here's what's supposed to happen: A student on campus wants to do a research project analyzing a large number of journal articles, or even just the metadata around a large number of journal articles. They approach their librarian, the librarian approaches the journal vendor (JSTOR or someone else), and everyone works together to find a way to get the student their data. Maybe the vendor hands over a special dataset. Maybe they give them back-end access to their database. Maybe they allow the student to run a Python script against their website, but only in the middle of the night so as not to slow down service for other users.

Here's what sometimes happens instead: The student writes and runs a clever script, the vendor notices that their servers have slowed due to automated script activity on their webpages, shuts down access from that IP address, and lets the school know. University IT staff and librarians drop what they're doing and try to track down the party responsible. Once they've been identified, the nice librarian has to have a talk with the student about what's permitted under the university's license agreement with the publisher, and together they go to the data vendor to ask forgiveness and permission. They usually get it.

Here's what Aaron did: He walked onto the MIT campus, set up a script not to analyze metadata but to actually download large numbers of documents, and when his IP was blocked, he used traditional hacking as well as Johnny Long-style "no-tech hacking" to get around it. http://video.google.com/videoplay?docid=-2160824376898701015

Publishers, rightly or wrongly, assume that someone systematically downloading entire journal runs is intending to set up a shadow database to give away their content for free. The feds seem to agree, in this case.

The thing that gets me is that JSTOR was more than reasonable in this case. They didn't immediately shut down the whole campus (like some overly-aggressive publishers do), but started with the IP addresses involved. When it didn't stop, they had to cut off access completely. Aaron, who isn't even a student at MIT, managed to kick the whole campus off JSTOR for weeks, and as soon as they restored access, he went at it again. If I were a librarian at MIT, I'd want the book thrown at him.

It's outright sociopathic to suggest that justice for a few weeks inconvenience is 35 years in prison and a million dollar fine.

It's excessive, and I don't think that's what he'll serve. But he was warned multiple times that MIT and JSTOR didn't want him doing what he was doing, and he continued doing it, so as far as I can tell he's getting exactly what he asked for. He probably thinks he's the Rosa Parks of scholarly communication. I think he's reckless and grandiose.

They aren't suing him... this is a criminal case:

The criminal investigation and today’s indictment of Mr. Swartz has been directed by the United States Attorney’s Office

You're right, thanks. Replace suing by charging and strike make them a little richer and. The rest of the point still stands.

Swartz's organization seems to pretty clearly claim that both MIT and JSTOR asked the feds not to charge him. While JSTOR's statement doesn't actually say thatº, it certainly seems plausible to me since they very clearly say that they privately engaged and dealt with the matter with Swartz. You simply don't do that if you're pursuing criminal charges.

º I would probably be hesitant to say that publicly as well, considering the feds came down with a monster 35-year indictment against a juicy target. The "victim" announcing on Day 1 that they don't support the charges would not be accepted very charitably by federal prosecutors.

Unless JSTOR is itself at risk of prosecution for something, it doesn't make a whole lot of sense to me that they'd care how Federal prosecutors felt about their public statements.

I immediately thought of that book as well. As Silverglate frequently points out, the federal attorney is using extreme penalties to force a plea deal in this case. It's de facto denial of a fair trial when the Sword of Damocles (i.e. 35 years in prison) is hanging over your head.

Oh well we can't be upsetting the government. Carry on.

That does shed a new light to their statement. Seems like both parties made some lousy calls which led to a very ugly situation.

I don't understand the legal mechanisms that could lead to JSTOR not showing any public support for Aaron, and if those exist, but I hope they will show some support during the process (as well as the MIT). Gov shouldn't put digital theft in the same basket as physical theft.

There are no legal mechanisms, but showing support for someone that intended to destroy the value of their publisher's copyright would probably displease their partners...

Err, I should have said "someone who is accused of intending..."

I don't think it's "them" at all. They sound as if they'd be happier if the incident was already closed, as they thought it was.

Victims don't charge.

Why did Aaron hack their system instead of working with them to further understand what were his needs that their system was lacking and how they could reduce that gap?

Is the DOJ trying to 'make an example' here? JSTOR and MIT aren't pursuing this, the Feds are.

I don't understand why he was so desperate to access it via MIT. There are dozens of libraries in the Boston area with access to JSTOR with guest access to their network. If he wasn't in such a rush, he could have easily bounced around campuses and likely have avoided detection.

Perhaps to have enough legitimate traffic surrounding you to mask the fact that you're trying to download every single article.

I admire Aaron's persistence. Sadly it was quite rude of him to repeatedly crash JSTOR's servers. If only he'd throttled his script back a bit.

Liberating those documents from JSTOR would have been quite a gift to society.

If the indictment is accurate, he did throttle back his script a bit, and continued to run it for several months after the last JSTOR server crash or blockage.

Why would he do this? What is the purpose of having all those documents?

This is from his personal site, http://www.aaronsw.com:

He is the author of numerous articles on a variety of topics, especially the corrupting influence of big money on institutions including nonprofits, the media, politics, and public opinion. In conjunction with Shireen Barday, he downloaded and analyzed 441,170 law review articles to determine the source of their funding; the results were published in the Stanford Law Review. From 2010-11, he researched these topics as a Fellow at the Harvard Ethics Center Lab on Institutional Corruption.

He has also assisted many other researchers in collecting and analyzing large data sets with theinfo.org. His landmark analysis of Wikipedia, Who Writes Wikipedia?, has been widely cited.

He liberated millions of documents from the PACER legal archive a while back:


All of which were public domain court documents, it should be noted.

He has a history of collecting and analyzing large data sets. This sounds about par for the course.

33. Swartz intended to distribute a significant portion of JSTOR’s archive of digitized journal articles through one or more file-sharing sites.

Interestingly, while the Grand Jury Indictment does discuss evidence for many of the other accusations, they do not discuss any evidence for this intent-to-distribute claim.

Conjecture? Since his previous projects seem to involve downloading and analyzing huge sets of data.

This does not necessarily mean shovelling the articles out to file-sharing sites.

Information wants to be free.

Information doesn't want to be anthropormphized.

"anthropormphized" --> "anthropomorphi[zs]ed", from anthropos = "human" + morphe = "form" + ize/ise = "make", i.e. "make the form of a human".

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact