
Perma.cc helps scholars and courts create permalinks of the web sites they cite - danso
https://perma.cc
======
JackC
Hi! I work on Perma.cc at the Harvard Library Innovation Lab.

For what it's worth, the way I think about permanence as a lawyer and
programmer is that nothing is really permanent. We have court decisions from
hundreds of years ago, for example, but we might easily lose them in the next
apocalypse. We've lost lots of records of equal value already. (For a great
example, check out the fire-damaged Jefferson collection at the Library of
Congress -- there are many published books from Thomas Jefferson's personal
library that, once they were lost, could never be replaced.)

The reason we've kept court decisions as long as we have is that we make lots
of copies, and keep them in lots of libraries, and libraries care about
hanging on to things.

But court decisions now cite to random web pages all the time, and the
halflife of a URL is like 18 months. So with Perma we're opening up law
library shelves to web pages as well as court decisions -- allowing law
review, courts, and others to say, "this web page is culturally important
enough for me to cite to, so please hang onto it." To me this is the magical
weirdness of the internet -- no one has ever quite had this level of write
access to library shelves.

This has made a huge difference for web preservation in the legal field in a
short period of time -- we now have over half of American law schools signed
on to use Perma, several US government agencies, and nine state supreme courts
with more on the way.

On the backend we're building out a private LOCKSS network that will put those
records in the hands of lots of libraries. (We're hiring for this, by the way
-- drop me a line if you're in Boston.) We can't prove that any of those
libraries will still be around in a few decades, but we think it's a good bet.

Happy to answer any questions ...

~~~
kawera
Any special reason for choosing mysql over postgres?

~~~
JackC
Nope -- other than, it's a decision that was made before I got there and there
hasn't been a reason to change.

------
anc84
[https://perma.cc/contingency-plan](https://perma.cc/contingency-plan) looks
nice but how is it enforced? How can we be _guaranteed_ that this will happen.
I have heard enough shallow promises torn apart when the grim reaper came to a
web service so I have zero trust.

Why not publish the full database continuously all the time? Openly licensed,
freely available.

edit: "Perma.cc users will now be able to create up to 10 links per month".
That's very very little.

Typos: "preseve phsyical"

~~~
JackC
Thanks -- typo will be fixed on our next push.[1]

There's limits to how open we can be with data, because the service is used
for e.g. court opinions that aren't published yet. But for public links we're
working on API, Memento endpoints, Internet Archive mirrors, etc. -- it's
definitely something we're thinking about.

Re: sustainability (and account limits), as you can tell we're not trying to
maximize growth here. Quite the opposite -- we're opening up access only as
much as we feel we can support for the long term. It's a free service and
there are no promises, but it's run by the Harvard Law School Library and
isn't a short term play.

[1] [https://github.com/harvard-
lil/perma/commit/ca84be28ccccf0f1...](https://github.com/harvard-
lil/perma/commit/ca84be28ccccf0f1873358aec9a42be94b5f96b1)

------
kennydude
I tend to just use Archive.org for this kind of thing. They've been around a
long time and probably will be still around for a long time coming.
[http://archive.org/web/](http://archive.org/web/)

~~~
afandian
Does Archive.org commit to providing a persistent identifier? It's one thing
having a link to the content, another having an identifier which will continue
to identify the content into the future.

~~~
kennydude
Yes. It uses the date in the URL

~~~
afandian
Persistent URLs can be opaque. Persistence is more about committing that, for
example, the URL structures won't ever change in a backward compatible way.

~~~
greglindahl
To explain a bit better: every capture has a crawl time, that time is embedded
in the WARC with that capture in it, and that's how the CDX index can refer to
that page. Anyone who got our WARCs could easily present the URL structures
the same way we do. That time is an essential part of the capture. We worked
with others to make sure that the WARC standard does this.

This is actually a really important aspect of Wayback links. Since the URL and
time are always explicit in the Wayback URL, you don't have to depend on some
opaque database to learn important things about that link. An archiving
service that just has opaque links is like a bit.ly shortened link: you have
absolutely no idea what it is if bit.ly dies.

------
robto
I wonder if this could be improved with IPFS[0] - it would gain the benefits
of a distributed infrastructure, content addressing is built in, and it has an
http portal for convenience.

[0][http://ipfs.io/](http://ipfs.io/)

------
mhuffman
... until perma.cc goes down, or runs out of money, that is.

~~~
yannis
>Perma.cc is developed and maintained by the Harvard Library Innovation Lab at
the Harvard Law School Library. Perma.cc is administered by a consortium of
libraries, with each library assisting its local journals and faculty users.

This is a problem for many people, so hopefully this will survive.

------
exit
i wonder what happens to tlds like .cc/Cocos Islands when rising sea levels
wash away their associated territories

~~~
pcora
they probably become a TLD without country affiliation. like .amazon or
.google

~~~
SXX
More like .su

------
pcora
great idea, but having an centralized service is risky. if they go away, you
will not only have a 404 but a dns error. and while it's nice that they have a
contingency plan, abide it can not be possible.

~~~
JackC
Yeah, this is a basic problem with the web as it works now -- Perma's long-
term storage can and will be decentralized behind the scenes, but control over
DNS routing is inherently centralized.

I don't really know a way around this. When a lawyer is reading a court
decision that include a link, and the link is broken, showing them "(archived
at perma GUID ABCD-1234)" won't mean a thing to them. Showing them "(archived
at [http://perma.cc/ABCD-1234)"](http://perma.cc/ABCD-1234\)") actually solves
their problem.

~~~
extra88
Think about how it works with other legal citations. "Fed. R. Civ. P.
12(b)(6)." means something to a lawyer and is irrespective of where they would
physically find that court rule to read.

An issue with Perma's id format is it doesn't contain anything to
differentiate it from any other use of two sets of four uppercase letters or
numbers separated by a dash, it's not "LAWCITE:A1C4-5F7H" it's just
"A1C4-5F7H." The domain name could serve that purpose so wherever else Perma's
content is stored, the route should contain perma.cc. So all of the following
would have the same content, e.g.: [http://perma.cc/48VC-
ZS62](http://perma.cc/48VC-ZS62) [http://archive.org/perma.cc/48VC-
ZS62](http://archive.org/perma.cc/48VC-ZS62)
[http://doomsday.preppers/rebuilding-America/perma.cc/48VC-
ZS...](http://doomsday.preppers/rebuilding-America/perma.cc/48VC-ZS62)

And in reference to the other comment, if the .cc TLD goes away for some
reason, "perma.cc" could still remain a part of the URL even at the project's
"home" site, e.g.: [http://perma.law.harvard.edu/perma.cc/48VC-
ZS62](http://perma.law.harvard.edu/perma.cc/48VC-ZS62) [http://perma-
cc.com/perma.cc/48VC-ZS62](http://perma-cc.com/perma.cc/48VC-ZS62)
[http://perma.mars/perma.cc/48VC-ZS62](http://perma.mars/perma.cc/48VC-ZS62)

~~~
JackC
> Think about how it works with other legal citations. "Fed. R. Civ. P.
> 12(b)(6)." means something to a lawyer and is irrespective of where they
> would physically find that court rule to read.

Yeah! When I first started up with Perma I advocated a really aggressive shift
in how we cite websites, using something that looks a lot more like other
legal citations. Maybe something like:

    
    
      Example Title, perma.cc/T75S-NF5K (<original domain>, <capture date>).
    

... where "perma.cc/####" can be treated as a URL if you like, but also just
as a legal cite like "### U.S. ###". This looks _so much_ nicer in legal
citations. There's lots of interesting variations along these lines.

Buuuut that's basically a non-starter for most of the legal profession
(including courts and law reviews) that just want their citations to make
sense to readers today. For now the Bluebook is recommending a much more
verbose vendor-neutral "(archived at <url>)" citation format (with Perma as an
example!), and we're happy with that.

