I'll try to answer any questions that pop up here as well.
In addition to the structured text we're currently serving through the API, we also have 300DPI color scans and per-word coordinates and confidence scores, so there's a lot more we can do with the OCR data that isn't exposed yet.
What do you think would happen if somebody with a research agreement downloaded everything and released it all to the public? You'd probably then find that the two situations are very similar.
Now, what would happen if you breached the research agreement? You’d probably be sued for breach of contract. But that’s not what case.law is doing. It’s abiding by everyone’s legal rights: the original publishers who collected, archived, indexed, and added annotations to the case law, and the folks who helped digitize versions that could be freely distributed. The government paid for the courts so the cases are free, but it didn’t pay for all those other things and they aren’t free.
> If someone did a very similar thing here to what Swartz did with JSTOR, this situation would then be very similar to what Swartz did with JSTOR.
Analogously, one could ask what would happen if someone avoided their taxes vs. evaded their taxes. Both could be seen as morally the same act, but the legal consequences are different.
 PACER is literally a read-only view into the same databases courts and lawyers use to file documents and orders in cases. Some people want a mass-publishing system for court documents, and maybe we should build such a thing. But calls to abuse PACER for that purpose are just an end-run around the political challenges of getting the government to spend public money building such a system.
If I understand you correctly there is such a thing. https://www.courtlistener.com/recap/ is a public archive populated by browser plugins by paid PACER users.
The issue is that PACER is designed primarily for attorneys. That's why the usage fees are so high--it's a basically a tax on attorneys that goes to funding the operations of the courts. (Pro se individuals are entitled to receive filings in their cases for free.)
The open access folks have a legitimate point that PACER makes it hard for the public to access those same documents. But the solution to that isn't to abuse PACER. If we think everyone should have free access to these documents, the solution is to build a website where these things are published. And, since that would undercut the value of PACER, arrangements would have to be made to replace that revenue with general appropriations.
 Note the reason we would want to do this is that these are public records, not because they constitute "the law." Court opinions with precedential value are already published on courts' websites in PDF format. What PACER contains is everything else.
Makes me wonder how much overreach goes on and we don't even know about it...
Ortiz and Heymann are definitely horrible people in my eyes but we have to think of them as responding to incentives. They saw they had an opportunity to pad their numbers and went for it. I don't think we have done anything to fix the core issue, which I think is how do we judge the performance of a prosecutor?
I'll read a bit more on the site, but offhand does anyone know if this is an ongoing effort, in that new (2019 and beyond) cases will be brought in as well?
We're hopeful that all courts will switch to official digital-first publishing over the next few years, as a few courts already have. Once the transition is complete, it might make sense for us to go back and fill in the gap volumes.
They were all unwilling.
PACER is at least somewhat reasonable, since we did not offer to pay them the 145 million a year they were making at the time, and they felt Congress would kill them if they gave up that revenue source, which is probably not wrong.
(the others, what we offered was much more than they were making).
So I'm not sure why you have such hope.
For all states hat have moved to digital first publishing, just about all of them have struck agreements with lexis/etc whereby they have token "free access" sites and the data is still otherwise locked up.
Hey, I work at the Library Innovation Lab -- being hopeful about open access scenarios is just one of the services I provide. :)
But basically I'm hopeful in this instance because (a) there's less and less incentive for commercial publishers to try to control this particular low-bandwidth stream of public domain text; (b) there's more and more platforms that would benefit from a standard open feed; and (c) the courts have had a lot more time to think about it (in the grand scheme of how old courts are vs. how old the internet is) and see other courts try it first.
On that last point, we have a dozen or so state supreme courts already using another service of ours, Perma.cc, so we do have some idea how they think about adopting new technology.
Would love to chat more with you or anyone else who's been thinking about how to crack this -- your experience sounds really interesting, and "hopeful" doesn't mean I think it'll be easy. Contact info is in my profile.