
Case law is set free; what next? - anigbrowl
http://googlescholar.blogspot.com/2014/10/caselaw-is-set-free-what-next.html
======
lalawyer14
Google deserves plenty of credit for Scholar. It's the best public legal
research system out there.

Let's not pretend, though, that Google Scholar has "set free" court opinions
or offers "open access." You're trapped in Google's ecosystem. You can't
download opinions in bulk for any reason, whether it's legal practice or
academic analysis. You certainly can't build your own legal research system.

The best "open access" resource I've found, oddly enough, is bulk XML from a
Stanford grad student. It includes over 10 million documents in a fairly
standard format.

[http://webpolicy.org/2012/12/28/advancing-empirical-legal-
sc...](http://webpolicy.org/2012/12/28/advancing-empirical-legal-scholarship/)

[http://webpolicy.org/2013/12/29/advancing-empirical-legal-
sc...](http://webpolicy.org/2013/12/29/advancing-empirical-legal-scholarship-
state-materials/)

[http://webpolicy.org/2013/05/03/advancing-empirical-legal-
sc...](http://webpolicy.org/2013/05/03/advancing-empirical-legal-scholarship-
federal-appellate-opinions-and-rules/)

Also, it appears to be replicated on Carl Malamud's public.resource.org.

[https://law.resource.org/pub/us/case/federal/](https://law.resource.org/pub/us/case/federal/)

[https://law.resource.org/pub/us/case/state/](https://law.resource.org/pub/us/case/state/)

------
thinkcomp
Here's some suggestions for what next:

\- Free access to all of PACER:
[http://www.plainsite.org/dockets/29himg3wm/california-
northe...](http://www.plainsite.org/dockets/29himg3wm/california-northern-
district-court/think-computer-foundation-et-al-v-administrative-office-of-the-
united-states-courts-et-al/)

\- Vendor-agnostic legal citations:
[http://www.plainsite.org/articles/20140115/a-modest-
proposal...](http://www.plainsite.org/articles/20140115/a-modest-proposal-
modern-legal-citations/)

\- Standardized digital legal opinions, motions, and other documents:
[http://www.plainsite.org/articles/20140116/a-notsomodest-
pro...](http://www.plainsite.org/articles/20140116/a-notsomodest-proposal-no-
more-legal-pdfs/)

\- Better free access to state court systems:
[http://www.plainsite.org/articles/20140114/xerox-strongly-
ur...](http://www.plainsite.org/articles/20140114/xerox-strongly-urges-you-
not-to-copy-this-data/)

\- Coming soon: the Bad Lawyer Database (BLDB), a compendium of lawyers who
have been formally sanctioned, disciplined, and themselves involved in
litigation concerning their conduct

------
mountainair
Please, oh please, make this a reality: "By classifying regulations using the
same system that science librarians use to organize papers in agriculture, we
can determine which scientific papers may form the rationale for particular
regulations, and link the regulations to the papers that explain the
underlying science."

I cannot begin to describe how incredibly useful this would be. Regulators
rely on the information provided by regulatory attorneys to craft their
policies, so it's critical that the attorneys have a deep understanding of the
issues. And paid legal research tools just don't/can't/wont provide that sort
of information. In this vein, it would be wonderful to include social sciences
- topics like economics and finance. Perhaps SSRN is a good option.

~~~
nl
I work in related field(s) in the technical sense (ie, NLP/Knowledge
Engineering etc, but not related to law or legal services at all).

Is this a real problem? Is it really as simple as some ontology linking?

~~~
mountainair
It's a problem I face almost every day.

The role of a regulatory attorney is to explain to the regulatory agency why
they should do something (think: PPM in power plant emissions, high-frequency
trading controls, restrictions on flight paths and requirements for airport
construction, you name it, it's regulated). But the attorney doing the
explaining is trained in law, not in whatever the technical subject matter is.
So the attorney relies on his client's experts in the field for information.
But if an attorney doesn't have a basic understanding of the technical
aspects, he won't know what questions to ask to get the right details, and he
won't be able to make meaningful strategy decisions. In turn, most regulatory
agencies are required by law to make decisions based only on the documents and
information provided to them in the hearing/filing process. And all those
documents are prepared by attorneys. If the attorneys miss a detail, the
regulatory agency misses it too.

I would envision this sort of tool as providing background that will allow the
attorney to ask the right questions, rather than a complete education on the
technical subject matter.

~~~
nl
Thanks. I appreciate you taking the time to write an answer.

------
Brian-Puccio
Carl Malamud has been a big advocate for making not just case law [1], but
public safety codes [2], tax records of non-profits [3] and other supposedly
"free" information free and easy to access as well. Efforts to digitize
documents that are only available in print format often at considerable
expense through scanning [4] and he even had a (failed) KickStarter to make
accessible safety codes of the world [5].

[0]
[http://en.wikipedia.org/wiki/Carl_Malamud](http://en.wikipedia.org/wiki/Carl_Malamud)

[1] [https://law.resource.org](https://law.resource.org)

[2]
[https://law.resource.org/pub/us/code/safety.html](https://law.resource.org/pub/us/code/safety.html)

[3] [http://philanthropy.com/article/Open-Records-Activist-
Shuts/...](http://philanthropy.com/article/Open-Records-Activist-Shuts/147177)

[4] [https://yeswescan.org](https://yeswescan.org)

[5]
[https://www.kickstarter.com/projects/publicresource/public-s...](https://www.kickstarter.com/projects/publicresource/public-
safety-codes-of-the-world-stand-up-for-safe)

------
tzs
Some people might be curious how lawyers were able to do legal research
efficiently and effectively before computers and online legal databases. Here
is a brief overview. I'll write in the present tense for convenience, but keep
in mind that the "present" for this is before the age of computer law. Circa
1960, for instance.

First, let's look at statutes. The output of Congress is a stream of laws. The
first law the comes out of Congress #X and is signed by the President is
Public Law #X-1. The second one is P.L. #X-2, and so on. These public laws are
collected and published in a series of books called the "Statutes at Large".

The Statutes at Large is not a convenient tool for legal research, since it is
just a sequential listing of the laws passed by Congress and signed by the
President. There's no organization by subject, so in theory you would have to
look at everything starting at page 1 of the first volume up to the last page
of the present volume (and then look at laws that have been passed since the
last volume was printed...) and note which are relevant to the problem you are
researching.

To make it easier to find law, private publishers took the Public Laws and
organized them into a code. A code is basically a statement of the law,
organized by subject rather than chronologically. These privately published
codes had no official status. The official statement of the law remained the
Statutes at Large.

In 1874 Congress made an official code of the US laws, and they updated it in
1878. These were authoritative, by which I mean if the code said one thing and
the underlying Public Law said something else, the code version won. This
meant that if you were researching a topic in, say, 1890, you would start at
the 1878 code, and then only had to look at the Statutes at Large from between
then and 1890, instead of having to go back to the very beginning.

Congress got its act together in the '20s, and started producing an official
United States Code, and updating it every six years, with annual supplements.
It's important to note that the USC is not automatically the official
authoritative statement of the law. The law codified in the USC only
supersedes the Statutes at Large when Congress explicitly says so. Congress
does so on a title by title basis, by passing a law that basically says that
title X of the USC is now the complete statement of a particular area of law,
superseding all prior Public Laws in that area.

In addition to the official USC, published by the government, there are
unofficial versions. The most important is the United States Code Annotated,
published by a private company, West Publishing. It consists of the text of
the USC code with, as you've probably guessed from the name, annotations
supplied by West. The annotations give for each thing in the USC a list of
appellate cases that have cited or construed that part of the court, along
with a summary of what that case said. They also give legislative history
information. The USCA is immensely useful. Suppose you have a question about
fair use in copyright. You can read what the statute says in the USC or in
USCA. But in USCA you will also see summaries of hundreds of court cases that
have interpreted that statute--organized by West in a logical system based on
what aspect of fair use they were construing.

Case law is similar. Courts issues opinions, and those opinions are officially
collected and published in chronological order in a series of volumes. As with
the similar Statutes at Large, this is not the best form for research.

Private publishers stepped in to address this. West Publishing is again one of
the top players. They produced an outline of the law...a giant tree that
breaks the law down into multiple levels of categories and sub-categories and
assigns an identifier to each leaf. As court decisions come out, West takes
them, identifies which areas of the law they touch on, and writes a short
summary of what the court said about that area of law. This is published in a
series of volumes, which is indexed by the leaf identifiers. There are other
index volumes published that index this collection by time. The net result is
that if you are interested in some particular area of law, you can find it in
West's outline, get the relevant identifiers, and then go to West's index
books and get pointed to the relevant cases. You can then look those up in
West's books and read the notes to find out which cases you need to look at in
detail.

That same outline and identifier system in West's case reporters is also used
in West's USCA, so it all fits together.

Once you have found a relevant case, you need to find out of it is still good
law. You don't want to build your argument around an appellate court decision
that was later overturned by the Supreme Court. That is very embarrassing for
a lawyer.

For this, you turn to a series of books from the Frank Shepard Company. These
books list cases, and then tell you what other cases cite them and how they
cite them (e.g., agreed with them, overturned them, distinguished them, and so
on).

It sounds a bit awkward, but it actually works very well. With a good law
library, you can reasonably answer any legal question concerning the law in
your jurisdiction without too much flailing about. There are frequent
supplements to the books (each book had a slip in the cover to insert a
supplement), and cumulative indexes every so often so that you do not have to
go through all the index volumes sequentially to find things.

If you need to research something outside your jurisdiction, it can be harder.
A small county in Idaho, for instance, might not have Florida state court
decisions in its library, so if for some reason Florida case law is relevant
the lawyers or their researchers might need to go to a bigger library.

~~~
w1ntermute
Have these systems been completely superseded by digital ones, or do many
lawyers continue to use paper-based systems for their research?

~~~
rayiner
The digital systems are direct continuations of the paper ones. For example,
federal court opinions are published on a court's website in PDF form. West
Publishing collects them and prints bound volumes of the Federal Reporter. In
parallel, it pulls the text of the opinions into the Westlaw online service,
and incorporates the pagination of the printed versions by embedding starred
page numbers into the text.

Also, West still categorizes the digital system using the index developed for
paper cases. Using this index is usually faster than using search queries. In
fact, search really doesn't work very well for legal research, and Google-like
free form search works particularly badly (it tends to take you to highly-
cited cases rather than highly-relevant ones). I generally start my research
by looking up the topic in the digital version of a printed treatise or legal
encyclopedia and then doing a reverse citation search of the listed cases.
Failing that, I'll do low-level search queries (e.g., "foo" within N words of
"bar").

But the only time I ever actually used a book was when I was working for a
judge. Every now and then West screws up and the pagination differs between
the digital and printed versions of the Federal Reporter. So we'd do our final
cite-check using the printed volumes.

~~~
Animats
The reason the West system works is because law is retrospective. Cases cite
older cases. Amendments to laws cite older laws. Much of the "annotation"
involves inverting those backlinks, so you can look up legislation and find
all the cases which reference it. That process is now, of course, automated.

------
zedpm
Providing open access to caselaw (and constitutions, statutes, and
periodicals, and ...) is wonderful, but there's a lot of room left in the
realm of legal-oriented computing and software. The big boys (Westlaw and
LexisNexis) and the other guys (FastCase, Bloomberg, etc.) provide a lot more
than case text and search capability. Signals/Shepardization provides a quick
way to understand treatment and subsequent history. Headnotes provide expert
analysis and highlights for a given document.

Other functionality includes citation formatting, which as this article
briefly touches on, can be devilishly complex. The ability to generate a
formatted pin cite to a case (or other document) in Bluebook or another
format, could be a huge time saver.

Non-US legal systems are a whole other can of worms and a huge opportunity.

These are just a few quick points; again, there's a huge amount of room to
innovate in legal-oriented software.

~~~
digikata
I've wondered in the past how good some expert system like Watson would do
with legal databases. There's a lot of speculation on how automation is going
to reduce factory jobs, but very little discussion about Watson could maybe
wipe out much of human-time needed for legal research. Has the hurdle mostly
been that the data is locked up in difficult to access databases?

~~~
rhino369
There isn't a ton of human time needed for legal research as it is. The time
is in the fact finding, analysis, and arguments, not finding the law.

I'm a junior associate in a law firm, which means the legal research falls
squarely on my shoulders (partners don't do legal research). And even then
it's not a huge part of my job. Probably less than 10%.

Better index searching could be of help, but it'll be hard to actually make it
better. Legal research databases are a huge industry and they spend a ton on
improving search capabilities, but it is hard.

I think the search capability will be evolutionary not really revolutionary.
But maybe I'm underestimating Watson.

------
joshuaheard
I don't see that it includes codes or statutes (not for Washington state
anyway). That would seem like something to do next.

------
anonbanker
Replying as breadcrumbs to an example of a High-Quality HN thread.

