
Federal Judge Says Internet Archive's Wayback Machine a Legit Source of Evidence - kpwags
https://www.techdirt.com/articles/20160518/08175934474/federal-judge-says-internet-archives-wayback-machine-perfectly-legitimate-source-evidence.shtml
======
throwaway7767
One point to note here, is that the Wayback Machine obeys robots.txt
retroactively, so sites can hide evidence by changing their robots.txt to
disallow indexing of specific content. The data won't be purged from the
servers, but it will not be displayed through the site.

This is quite disappointing (though probably wise from a legal standpoint). It
makes it less useful if a party in a lawsuit can retroactively hide evidence
of wrongdoing, and thereby deny access to evidence.

It's also a problem if a domain name changes ownership and the new owner
suddenly adds a restrictive robots.txt - the old content will no longer be
accessible even though the current owner has no claim to it.

~~~
bonyt
In 2009, there was an interesting case where a federal district court ordered
a plaintiff to disable its robots.txt to allow the Wayback Machine to disclose
an old version of the plaintiff's website.[1] Someone at the Internet Archive
provided a declaration stating that it would place a significant burden on
them[2] to respond to a subpoena themselves, whereas all the plaintiff had to
do was modify its robots.txt. The court reasoned that since the plaintiff had
the technical ability to un-block access, they could be compelled to do so,
and made them disable their robots.txt.

[1]: [http://www.american-
justice.org/upload/page/123/69/docket-18...](http://www.american-
justice.org/upload/page/123/69/docket-187-order-on-IA-motion.pdf)

[2]:
[https://tonybox.net/tmp/ia_decl_pacer.pdf](https://tonybox.net/tmp/ia_decl_pacer.pdf)
(downloaded from PACER)

~~~
nightcracker
> The court reasoned that since the plaintiff had the technical ability to un-
> block access, they could be compelled to do so, and made them disable their
> robots.txt.

How is this not indirect self-incrimination?

~~~
anigbrowl
For one, losing in a civil trial isn't the same as being subject to criminal
penalties, so you can't self-incriminate unless what you're hiding would
itself have been a crime - many contractual disputes have no criminal
dimension whatsoever and so your liberty isn't at risk.

But more to the point, archive.org already has a copy of what you previously
made public, and there's no reason to burden them with the expense of manually
retrieving it as they're not party to the case, and it would be wasteful of
the court's resources and archive.org's time to make them sue you for the
expense involved.

Here's a more detail breakdown of how the 5th applies in a civil litigation
context:
[http://www.litigationandtrial.com/2013/04/articles/attorney/...](http://www.litigationandtrial.com/2013/04/articles/attorney/pleading-
the-fifth-adverse-inferences/)

The bottom line is that the 5th can help you stay out of prison, but you can't
use it just to avoid losing a dispute.

~~~
Lawtonfogle
Sounds like saying we already have clear evidence but it'll take time to dig
through it so you are now compelled to show us where you hid the body.

------
throwaway7767
In many European countries, we have legal deposit laws that require the
national library to run archival web crawls, at least of the country TLD and
sometimes on a best-effort basis for material outside the TLD that's
considered relevant (based on language, for example).

These laws specify that the crawls are stored untampered and guarantee that
the results can be considered valid evidence.

Interestingly, that's how the Internet Archive's heritrix crawler came about -
the nordic national libraries were saddled with this requirement but didn't
really have the technical infrastructure to implement it. They formed a
coalition among themselves and brought the Internet Archive into it (the
IIPC[0]), and used it to fund development of heritrix.

[0] [http://netpreserve.org/](http://netpreserve.org/)

~~~
corecoder
That's interesting, I'd like to know more about the specific European
situation on this.

There have been a few cases (in Italy, but not only) where someone sued
someone else for defamation (or is it libel?), bringing a screenshot of a
tweet, Facebook post etc. as "proof"; all such cases have been dropped because
a screenshot cannot possibly be used as evidence.

It is not clear (at least not to me) how someone could proceed in order to
obtain proof in these cases.

~~~
pdabbadabba
> defamation (or is it libel?)

Both are right. Defamation is a general term that encompasses both libel and
slander. Libel is written, slander is spoken. When in doubt, just say
"defamation."

> a screenshot cannot possibly be used as evidence

Generally speaking, I see no reason why this should be the case. And
screenshots are routinely used as evidence in U.S. courts. Of course, if the
opposing side challenges the accuracy of the screenshot, then you'll need to
give more evidence (testimony, probably) about how it was produced. But that
doesn't mean that screenshots are per se unreliable.

~~~
sandworm101
You always need testimony in US courts. Evidence is not just admitted, someone
has to vouch for it (not the legal term).

~~~
pdabbadabba
Yes, that's right. I was just trying to stay out of the procedural weeds a
bit.

------
aakilfernandes
Would be great if archival services recursively hashed their documents and put
them in a blockchain. Then youd get 100% certainty the records havent been
updated since they were first recorded

~~~
ikeboy
If they aren't trusted, timestamping doesn't help. If they are, it's not
needed.

What use case do you have in mind where it helps if the archive proves
timestamps?

(Timestamping can certainly help sometimes, but with an archive you're
trusting them anyway.)

~~~
vidarh
A blockchain does not have to just confirm the timestamp. It can record a
consensus of facts about the page as well. E.g. have multiple parties run
crawlers and confirm that a majority of them agree about the content of the
page to some delta (and include the deltas) at time X.

Do it right and the blockchain can _provide_ trust by ensuring that the record
demonstrates that a sufficient number of _other parties_ have confirmed each
part of the record.

You have a point in that if an archive is trusted the motivations for doing
this largely falls away. The problem is of course that we don't know if the
archive will always be trustworthy (e.g. at some point they may accidentally
hire someone who is not trustworthy into a position where they are able to do
damage), and if/when they're not is when they're likely to be most resistant
to putting in place means to prove they are trustworthy.

~~~
ikeboy
If "n parties confirm at time X that the page said something" is good enough,
then "n parties confirm that the page said something at time X" should also be
good enough. If you trust that collection of parties to not have colluded at
time X, you should also trust them to not be colluding now.

~~~
vidarh
But this requires the record to be maintained at a sufficient number of those
parties, and that's a problem - maintaining records over long periods of time
is hard. This is why creating a cryptographically secured record is worthwhile
- it does not require the parties to still be around to vouch for the record.

~~~
ikeboy
But that doesn't require a blockchain: each party can individually sign it
with their own key, and some third party can collect all the records and the
signatures.

~~~
vidarh
Now you have to trust that the third party don't omit anything or add entries
from other parties.

There are certainly alternative means to guarantee the same properties,
depending on your trust model, but they will quickly start acquiring the same
complexities.

------
jedberg
Sounds like now is a great time to be a WM employee if you don't mind taking
bribes from criminals.

I love the WM, but this is terrible, because they don't have controls around
the chain of evidence. Any page can be modified in the archive by an employee,
both before and after a page has been identified as evidence.

~~~
jedberg
A note to the above since I can't edit anymore: I'm basing what I said on a
guess about their internal controls and assuming they don't spend the time and
money to maintain a chain of evidence, but I have no direct knowledge of their
internal controls.

------
koolba
Is there any sort of blockchain based authentication of the data saved by
archive.org?

I'm not saying I don't trust them[1] but this seems like a perfect use case
for saving the content hash to prove that content X existed at least as early
as time Y.

[1]: _Well maybe I am saying that..._

------
MicroBerto
We recently posted an exposé that created a legal "situation". It includes
archive.org links which definitely help.

Earlier for another situation, my lawyer stated that they've successfully used
it in the past.

If you're going to get a bit crazy on your blog,
[https://archive.org/web/](https://archive.org/web/) (see Save Page Now
section) is great stuff.

However, one big issue is that Facebook is now blocking the ability to archive
links that go directly to comments made in public postings.

So does anyone have a workaround or an archive.org-like site that can archive
Facebook comments, full with working JS that allows the exact comment to get
archived? (to get the URL of the comment, right-click on the timestamp and
copy link).

------
ikeboy
How hard is it to mitm the IA over http, thus producing fake evidence that a
site said something once?

~~~
thatcat
I wonder if they save the IP address for each scraping session

~~~
ikeboy
I also wonder if they respect HSTS and HPKP for a given site, making this
attack only possible against sites without such protection.

------
grenoire
I think it's time to create the Wayback Machine of the Wayback Machine.

------
rjdevereux
A Federal judge said "legit"? Times are a changin'

~~~
greglindahl
No, it's a consequence of HN limiting the length of titles.

------
awqrre
Fabricating evidence just got easier... you only have to hack one site...

