
Harvard Law Library sacrifices trove of legal volumes to digitize them - wellokthen
http://www.nytimes.com/2015/10/29/us/harvard-law-library-sacrifices-a-trove-for-the-sake-of-a-free-database.html
======
JackC
I work at the Harvard Library Innovation Lab with the folks who are making
this happen. Super excited that it's finally public. If anyone has questions
I'm happy to dig up answers.

Here's how big this is: we don't even know yet how many cases we'll end up
with, to within the nearest million.

PSA: we're hiring a devops engineer[1]. In addition to building amazing tools
to access all this data, we're running a distributed linkrot preservation
service[2] that after just two years is in use by 40% of American law schools
and 10% of state supreme courts; an open-textbook-as-forkable-playlist[3] tool
in use at Harvard Law and a half dozen other law schools; and a research
project on distributed encrypted library archives[4] for preserving high-value
cultural records. We're basically the alien in the brain of a 200-year-old
library -- it's a fun place to work.

[1] [http://librarylab.law.harvard.edu/blog/2015/10/20/hiring-
dev...](http://librarylab.law.harvard.edu/blog/2015/10/20/hiring-devops-
energy-wanted/) [2] [http://perma.cc](http://perma.cc) [3]
[http://librarylab.law.harvard.edu/projects/h2o](http://librarylab.law.harvard.edu/projects/h2o)
[4] [http://librarylab.law.harvard.edu/projects/time-capsule-
encr...](http://librarylab.law.harvard.edu/projects/time-capsule-encryption)

~~~
nekopa
Thanks for stopping by! Do you happen to know what type of license the
materials will be under when they open up access?

I am an English teacher, and about 80% of my students are professional
lawyers, so I wonder how free I will be to use this material in my classes. I
already use Harvard Law School's free case studies, they're great, and under
CC license if I remember correctly.

~~~
JackC
Bottom line: everything becomes public domain after eight years. Before then,
you'll be able to search/view/download up to 500 cases per day through either
a web interface or API. As far as I know there's no licensing on the
individual cases you download.

For academic researchers, before the eight years are up, we can also provide a
full data dump -- you just have to sign an agreement not to redistribute bulk
data.

------
jakeogh
"You can imagine the way your heart skips a small beat when you put a book
under a chopper like that," he said. After the volumes are scanned, workers
reattach the spine to the pages, encase the book in shrink-wrap and, he said,
"put it back in the depository for the apocalypse."

------
david-given
Vernor Vinge's _Rainbows End_ has a massive book-scanning project which
consists of dropping books into an industrial shredder and then blowing the
shreds through a long tube lined with cameras. The little fragments of imagery
are then stitched together using sufficiently advanced software.

This isn't there yet, but I wonder if that's where we're heading...

~~~
Natanael_L
Too inefficient. Opening up the books and slicing off the pages cleanly would
be more effective, and we already have age-old matching techology from bill
counters and card sorters.

~~~
david-given
Why do you call it efficient? They were scanning a book every few seconds!
While cutting the spines off and running the pages through a sheet feeder
requires sustained human interaction for every book.

~~~
GFK_of_xmaspast
> They were scanning a book every few seconds

What's the hurry?

~~~
bpodgursky
IIRC that was part of the plot too. There was no string reason to prefer this
technology, except the company that owned the destructive-scanning IP was
pushing their solution before the other ones matured enough to be cost-
effective.

------
andyjohnson0
_Harvard Law Library sacrifices trove of legal volumes to digitize them_

Please don't editorialise submission titles [1]. The title of the NYT article
is "Harvard Law Library Readies Trove of Decisions for Digital Age".

[1] FWIW, I agree with the sentiment of loss implied by "sacrifices". On a
worse day than this I might even think of these books as being mutilated.

~~~
cooper12
Eh, its an overly sentimental way to look at things. Yeah on the surface it
sounds like these books are being damaged, but in reality they're being given
new life. Before you'd have to be physically present at Harvard to access
these tomes, but after digitation they'll be available and searchable by
anyone over the world. People might actually stumble upon the information via
a related search instead of having to know of the existence of the specific
document. This also increases their shelf life dramatically because they'll be
spread digitally instead of being at the mercy of one fire or flood.

------
ethanpil
I'm curious as to why they need to destroy the books? Remember Google's open
book scanner?

[http://linearbookscanner.org/](http://linearbookscanner.org/)

[https://code.google.com/p/linear-book-
scanner/](https://code.google.com/p/linear-book-scanner/)

~~~
cokernel
Presumably they found it more important to preserve the pages than to preserve
the volume binding. The FAQ for the linear book scanner indicates that
Prototype 1 mangles pages in some way (tears or folds) in about 45% of books
it scans.

Moreover, it used to be common to create bound volumes by binding multiple
issues of a serial together. (In some cases the bindery would crop pages to
fit!) The binding is often so tight that the volume cannot open flat enough
for a full scan. Separating the pages from the spine allows for the entire
page to be imaged without distortion.

Edit: Incorporated correction. I had accidentally stated the claim more
strongly than the FAQ supported; however, my point does not depend on the
claim being as strong as the form in which I had stated it.

~~~
markbao
Correction: it mangles one or two pages in 45% of books it scans.

    
    
        Prototype 1 could scan the majority of books without 
        damage, but may tear one or two pages in some books. Out 
        of 50 books tested, 45% had one or two of their pages 
        either torn or folded. This is a very early prototype and 
        there are many areas for improvement in the design.
    

[http://linearbookscanner.org/faq/](http://linearbookscanner.org/faq/)

~~~
cokernel
Incidentally, 45% of 50 is not an integer. This makes me wonder how many books
were actually tested. Or were they just rounding up from 44%?

------
NatW
If you want to search court cases today for free, check out
[https://www.courtlistener.com](https://www.courtlistener.com) , part of
[http://freelawproject.org](http://freelawproject.org)

~~~
nkw
Those guys do really good work. It would be nice if this effort (the Harvard
one) coordinated with the Free Law Project.

------
kbutler
Rather than sacrificing the volumes, they are sacrificing the bindings. The
re-bound pages themselves will likely be much better preserved in archival
storage than they would be in human-accessible, human-handled form on the
shelves.

------
JoeAltmaier
I'm wondering at the actual value of old cases. Surely there's a halflife for
law. This worship of precedent as semi-holy writing is strange to me. As an
Engineer, old books are far less useful than new ones. Things change; people
change; law changes too. Maybe its just fine the books were mutilated for this
process. Its a way of 'burning bridges' and putting it all behind us.

~~~
rtkwe
Old cases are valid so long as the laws haven't changed, there hasn't been a
new ruling that overrides that, and the situation is similar. Judges aren't
completely bound by precedent either if the situation means it shouldn't apply
they're able to rule differently and explain why in their judgements. The
concept of precedent is really important in law because of all the vagaries
and rules surrounding laws and how their applied, it also supports the even
application of the law so that laws and their effects are more predictable.

It also makes cases easier for to argue by mapping out the corner cases of the
law and legal concepts and keeps every lawyer from having to be an absolute
expert in every facet of every case they're litigating. The same applies to
judges.

Precedents do have a limited lifetime but it's not measured in years, if
someone is being tried under the same law as existed 100 years ago old rulings
can be just as relevant as one made last year.

------
n0us
I have the good fortune to have worked in the same office as Alex Gulotta my
senior year at an internship. This was before he left to work at the Bay Area
Legal Aid but he was a huge presence and boost for the office in
Charlottesville.

