
In memory of Aaron, bulk XML of every federal and state law and court ruling - friendofaclu
http://webpolicy.org/category/empirical-law/legal-data/
======
slapshot
Which versions are these? I ask because, to lawyers, pagination matters -- a
LOT. Lawyers refer to a case by the book and page number, and refer to parts
of a case by the page numbers. It's probably not a great system, but the
courts don't have anchors in text for better or worse. Also, what version is
it as to corrections? Most courts issue typo changes after the fact. Most are
minor, but a few do materially affect the meaning (such as changing the page
numbers of the parts of a different case that have been overruled).

Don't get me wrong, it's better than nothing, but to get any buy-in from
existing lawyers these issues need to be addressed. I'm concerned that they
haven't on the site.

~~~
rayiner
> Lawyers refer to a case by the book and page number, and refer to parts of a
> case by the page numbers. It's probably not a great system [...]

It's a fantastic system (and a wonderful demonstration of the advantages of
immutable graph data structures). You can read cases from a century ago and
still find every cited reference. Hyperlinking pales in comparison.

~~~
gnu8
It's not really hyperlinking that fails, but the fact that web sites (the ones
that don't disappear without a trace that is) don't maintain their structure
over time. Many don't even attempt to provide meaningful forwards, instead
just dumping you back to the front page. Microsoft and Oracle are particularly
egregious offenders.

What we need is a technical mechanism for embedding referenced source
documents into a document, in a way that is as easy as hyperlinking to add the
references and follow them. Probably also a new fair-use provision in
copyright law as well.

~~~
wlesieutre
I think it's fair to say that hyperlinking fails. Sites don't maintaining
their structure over time would be the cause of the failure, but the link
still doesn't work... Semantics maybe, but eh.

Agreed on having a way to embed documents. Especially in a way that supported
some kind of signing. If I could have a reputable third party (web.archive.org
or anybody else) sign an embedded snippet of a page and say "Yes, this was
actually posted on X website at Y time" that would be fantastic.

------
tgflynn
A few questions come to mind that aren't answered by perusing the site, though
I haven't yet looked at the downloadable files.

1) How comprehensive are the court decisions ? For example which Federal
Courts are covered and for what time periods ? If there are variations in
coverage of state courts what are the high low and typical cases of coverage -
both for dates and court levels ?

2) How was the court decision data obtained ? I was under the impression that
there were significant obstacles to obtaining much of this data since although
statutes are available freely online for many jurisdictions access to court
decisions is typically very costly. I once payed several hundred dollars for a
months access to NYS court decisions and I believe that service no longer
exists, having been replaced by much more expensive long term plans that are
out of reach of anyone except law firms or large corporations.

Getting court decisions online for free or at an affordable cost would be of
great benefit to anyone needing/wanting to do legal research in the US and
would help improve the increasingly dismal state of democracy in this country.

~~~
MWil
When I get a chance to finish the 6GB of the District Courts I'll try and
remember to come back and update you all

However, I can say that I extracted my state's cases and it came out to 100k+
so it's certainly more comprehensive than any other free data source (for my
state)

EDIT: Downloads went down (Dropbox) right as I posted so I reached out to
author to see how to get my hands on the rest of the files

FINAL EDIT: See my note below about him being on vacation and taking a look
when he gets back near better internet.

------
panarky
This is awesome, thank you for sharing these. I was able to get a couple
states, but it looks like Dropbox has cut you off.

    
    
      Connecting to dl.dropboxusercontent.com
      (dl.dropboxusercontent.com)|23.23.88.93|:443... connected.
      HTTP request sent, awaiting response... 509 Bandwidth Error
      2014-01-08 23:36:42 ERROR 509: Bandwidth Error.
    

Dropbox limits personal accounts to 20GB per day of public sharing.

[https://www.dropbox.com/help/45/en](https://www.dropbox.com/help/45/en)

~~~
MWil
He said he pays for a pro account, not sure how much they offer but we
exceeded that apparently.

------
FireBeyond
A sincere question, is the collation of this material and its publication,
/actually/ dedicated to Aaron Swartz (as I see
[https://www.google.com/#q=aaron+swartz+site:webpolicy.org&sa...](https://www.google.com/#q=aaron+swartz+site:webpolicy.org&safe=off)
shows zero results) or rather, editorializing / opining by the submitter?

~~~
tzs
It was a posted by a brand new account, and this is the only activity on the
account. Probably doesn't know the rule about changing titles, and stuck in
the Swartz reference to get more people to click.

I'm a little surprised the moderators haven't fixed it yet.

------
thehooplehead
It looks like Dropbox just shut down the account's transfers. I take it a
couple dozen people started mass-downloading each state's data. This would be
the ideal use for a torrent network, right?

------
MWil
I've emailed Jonathan at his Stanford email to ask him to put it up as a
torrent

Will update if I hear back

~~~
MWil
UPDATE

I just heard back and he's on vacation (go figure!).

He says he thought he had a pretty generous amount of traffic but he'll take a
look when he can.

~~~
pessimizer
Torrents are definitely the way to deal with data this size.

------
NatW
See [http://freelawproject.org/](http://freelawproject.org/) for excellent
free/open source legal data.

~~~
esbranson
I don't think it provides state law, does it? Or state court decisions? Only
federal court opinions, right?

~~~
anseljh
Free Law Project does have some state courts.

------
esbranson
I hope those working on this type of material consider Akoma Ntoso
([http://www.akomantoso.org/](http://www.akomantoso.org/)), currently being
standardized as OASIS LegalDocML ([http://www.oasis-
open.org/committees/legaldocml](http://www.oasis-
open.org/committees/legaldocml)), and maybe OASIS LegalXML
([https://www.oasis-open.org/committees/legalxml-
courtfiling](https://www.oasis-open.org/committees/legalxml-courtfiling)).

I think Akoma Ntoso would make bulk access, maybe even piecemeal API access as
with other similar works like this, easier for consumption (think NLTK). The
Italian Senate (the Senate in Rome) uses it, the Library of Congress has
introduced some "data challenges" using it as well, and I think it is the
future. Using a common data format / XML schema has its advantages.

------
_delirium
Is this related to the XML data collected by public.resource.org? E.g. Supreme
Court decisions are available in XML here:
[https://bulk.resource.org/courts.gov/c/US/](https://bulk.resource.org/courts.gov/c/US/)

~~~
esbranson
I don't know, but Public.Resource.Org only has legislation for a few states
and territories.

------
WaterSponge
Is there a free or paid service that has this type of data and a api for
accessing it?

~~~
MWil
There's CourtListener and I'm sure several others.

[http://freelawproject.org/](http://freelawproject.org/)

I'm building a product and API that would take advantage of the whole
collection but it's not ready for primetime yet.

~~~
WaterSponge
Thanks... I ask because I wonder if law firms do any data analysis for all
court opinions(Sentiment Analysis), judges, attorneys and outcomes. Also if
states have data warehouses to review the application of laws local and
federal.

~~~
anseljh
The Administrative Office of the US Courts and the Federal Judicial Center
collect and publish some statistics, but they're pretty basic. For example,
they collect "case type" as a single field with a single value. Most cases
have more than one kind of claim, so this is incredibly under-inclusive. Also
there aren't codes for lots of kinds of cases.

[http://www.uscourts.gov/FederalCourts/UnderstandingtheFedera...](http://www.uscourts.gov/FederalCourts/UnderstandingtheFederalCourts/AdministrativeOffice.aspx)
[http://www.fjc.gov/](http://www.fjc.gov/)

States are a grab bag but generally statistics poor.

------
pseingatl
Not so long ago, this collection would have been priceless. 10-15 years ago
there was an article in Wired about efforts to obtain access to case law,
which was pretty much locked down by West Publishing and Lexis/Nexis. So a few
comments:

1) To make this set usable from a practical point of view you have to know
when it starts and finishes. "[E]very federal court ruling" is a bold
statement. Federal Reporter Third? All 1000 volumes of F2d? What about the
original Federal Reporter? F.Supp.? Not all federal district court decisions
are published. Since our federal courts have become criminal courts (starting
in the 1980's) most of the written decisions will be at the appellate level.
What about "Do Not Publish" opinions? There are thousands of them and they are
still useful. Usually only DoJ has copies.

2) Not having everything is not critical to the practical value of the set. In
the 1990's a West salesman would tell you that there was no need to buy
anything before 500 F2d if you were trying to put together a small federal
library. For most states they would try to sell you everything, except perhaps
New York, California and few others. The issue is updating. Florida updates
(or used to) its appellate decisions on a monthly basis. You could sign up and
they would send you a zip file every month. I don't know if all states do
this. The problem of recency is a major one. A case could have been decided
yesterday but you won't find out about it for a month. You can fix the problem
on appeal--theoretically, assuming a client who wants to pay--because judges
will not, except in rare cases, revisit older decisions they have made because
case law that was not available at the time was dispositive.

3\. The issue of citing to a particular page of a decision in addition to the
official citation is not a huge problem. In many states, appellate decisions
are relatively short and court rules have provided for the use of just the
official citation. Cites to new Westlaw and Lexis cases do not have page
numbers. When page numbers are unavailable, you can cite them as ( U.S.
)(2014) [my Blue Book syntax is probably a little off here). If you cite an
unpublished opinion you normally have to provide the judge and your
counterparty a copy of the decision.

4\. FLITE was the U.S. Air Force's effort to computerize case law in the
1980's. Westlaw and Lexis fought ferociously to prevent this database from
being released to the public. They were successful. The same is the case with
JURIS, a DoJ caselaw database. Now there are several providers (such as
Fastcase) which compete with Westlaw and Juris. Access to PACER, the U.S.
courts database of case, is limited. Efforts to mass download the database
have been frustrated. The courts use PACER as a revenue tool. Also, criminal
cases at the district court level are not on PACER (unless this has changed)
supposedly to protect informants. So it would be interesting to know how this
database was obtained.

5\. Putting aside the practical value of this database, once the extent of the
content is established, it could have real value for researchers. Could it be
used to spot trends in the law? I wonder what might be shown if tools to
measure things like historical market performance were used to analyze the
database. You could see all sorts of data points for terms like "Dalkon
Shield" or "asbestos" occurring within specific time ranges. There is
definitely a "me too" aspect to the law. And while judges make law all the
time, they have no control (usually) over the cases brought to them. Do cases
involving "terrorists" match the pattern of cases involving "communists"? Or,
say in the period 1910-1920, "Germans"? On a practical level, what is the
statistical incidence of cases involving the Statute of Frauds? The "ancient
document" exception to the hearsay rule? Are criminal conversation causes of
action really coming back? If historically the incidence of data points A,
then B always led to C can an analysis of such points today of any use in
predicting future decisions?

Just a few thoughts.

~~~
esbranson
A major part of the work at hand:

> federal and state law

Your entire post is about:

> case law

Suffice to say, those in the biz tend to focus mostly on case law, probably
because they know the basics of statutory and common law, case law is not
codified, and (all) case law is not offered for free even on horribly
designed, practically useless government websites.

But, at least for my purposes, state legislation and statutory codifications
(and their regulatory counterparts) are also very important to have bulk
access to. Even outdated materials are useful, as once I know a section is
relevant (because I can run complicated queries against entire datasets), I
can begin research using more arcane methods (such as government websites and
printed materials.) My treks through the California Codes and the California
Code of Regulations would not have been possible without bulk access (I had to
do it myself of course, after the Legislative Counsel got forced by CFAC/FAC
and MAPLight.org to release the DB), and there was little case law on the
issues I was doing research on (that I knew or know of) to guide me.

Outdated material may not be so useful for practicing lawyers, but its
_extremely_ useful for the 99%.

~~~
pseingatl
Unfortunately, in the U.S.statutory law is of limited utility without the case
law which in interpreting it, modifies it. I agree that there are all sorts of
forgotten gems to be mined--such as when the U.S. Congress established a
church.

~~~
esbranson
Unfortunately, the case law is also of limited utility without the
statutory/regulatory law underlying it.

~~~
pseingatl
Statutes play less of a role than you would think. I'm not saying that they
are unnecessary.

------
anseljh
I'd love for the title of this post to be accurate, but it's not. Many court
rulings are available only in sliced tree format for a modest copying fee from
a clerk behind a glass window. Sorry.

------
joshuaheard
This is fantastic! Now you need a script to convert the daily slip opinions
from the courts and the updated statutes from the legislatures to add them to
the databases to keep the content current.

------
chris_wot
Has someone mirrored this?

------
dynamic99
Links are down...

