
Removing Your Site from the Wayback Machine - caltlgin
https://nixnet.xyz/blog/excluding-your-site-from-the-wayback-machine-keybase-only/
======
lkramer
I'm having the opposite problem. I want a site I admin added back to the
Wayback Machine, the previous owner of the domain got it removed. It seems
really hard to get anyones attention, I've tried email and twitter...

~~~
6d6b73
Have you tried
[https://twitter.com/textfiles/](https://twitter.com/textfiles/) ?

~~~
toomuchtodo
Jason also frequents here under the same username.

------
tfolbrecht
For every copy of your site made by archive.org, there are thousands from
webcrawlers that don't respect robottxt.

A tasteful way to go about this would be to give context to past posts with a
note at the same url.

To the people who want to be forgotten, you will be, but the internet is too
valuable to the future for retroactive censorship.

~~~
luckylion
> but the internet is too valuable to the future for retroactive censorship

If I don't allow you to copy and publish my content, I'm not censoring you.
Using that term in this context makes no sense.

As for webcrawlers: sure, they are out there. And when their makers publish
your content, they are infringing your rights and (in many countries) commit a
crime.

~~~
tfolbrecht
I'm not a lawyer, but I'd argue you've published your content by making it
available in your index. An archive or library doesn't publish content, but
offers access to content having been published.

If you made a reasonable effort to control access to your works that might be
a different story.

~~~
luckylion
That's true for books (and public libraries, a status which the Internet
Archive holds in the US, if I'm not mistaken; that doesn't change their status
w.r.t. the rest of the world, however), but I don't believe you can directly
transfer rules regarding books to websites.

The Internet Archive publishes the content, they don't use a robots.txt,
canonical-tag (they do set a link-header) or robots meta-tag asking search
engines not to index their version. For all intents and purposes, it's just a
copy of your content published on their website.

I understand their goal, I'm not opposed to it on a fundamental level, but I
do believe that the choice of participation should rest with the content
creator.

~~~
gpav
"...but I don't believe you can directly transfer rules regarding books to
websites."

The rules for copyright do apply to websites. Here is what the US Copyright
Office says:

"What does copyright protect? Copyright, a form of intellectual property law,
protects original works of authorship including literary, dramatic, musical,
and artistic works, such as poetry, novels, movies, songs, computer software,
and architecture. Copyright does not protect facts, ideas, systems, or methods
of operation, although it may protect the way these things are expressed. See
Circular 1, Copyright Basics, section 'What Works Are Protected.' ... When is
my work protected? Your work is under copyright protection the moment it is
created and fixed in a tangible form that it is perceptible either directly or
with the aid of a machine or device."

Source URL = [https://www.copyright.gov/help/faq/faq-
general.html](https://www.copyright.gov/help/faq/faq-general.html)

More FAQs:
[https://www.copyright.gov/help/faq/](https://www.copyright.gov/help/faq/)

'Circular 1' URL =
[https://www.copyright.gov/circs/circ01.pdf](https://www.copyright.gov/circs/circ01.pdf)

~~~
ghaff
The difference is that, while the contents of a book are covered by copyright
in the same way that a website, a handwritten letter, or a photograph is, the
physical artifact that is the book itself can be loaned, sold, etc. because of
first sale doctrine.

------
Animats
"archive.org" obeys "robots.txt". Over-obeys, in fact. If you add a robots.txt
file that locks them out, old archives disappear, too. Even if the domain name
has changed hands.

~~~
testcross
> A few months ago we stopped referring to robots.txt files on U.S. government
> and military web sites for both crawling and displaying web pages (though we
> respond to removal requests sent to info@archive.org). As we have moved
> towards broader access it has not caused problems, which we take as a good
> sign. We are now looking to do this more broadly.

It might not be true forever. Unfortunately.

[https://blog.archive.org/2017/04/17/robots-txt-meant-for-
sea...](https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-
engines-dont-work-well-for-web-archives/)

~~~
JoshTriplett
This is an important and useful improvement. Many domain parkers/squatters/etc
who snap up dead domains have robots.txt files that block everything or almost
everything, breaking the ability to see the previous site via archive.org.

(Side note: domain name expiration was a mistake.)

~~~
judge2020
A fun thing to do before your website expires is set up HSTS with
"includeSubDomains" and enroll your website in HSTS preload. Many of the bots
that backorder domains in order to put ad pages on them don't use SSL at all
(not even LetsEncrypt) and the domain ends up becoming useless for them.

------
maguay
Don’t do this. Please don’t do this. The Wayback Machine is one of the only
records of history we have on the internet, often the only way to look back
and see what has been. It’s invaluable for that—and if your site is to be any
part of the internet’s history, it should be available there too.

~~~
mprev
This fundamentalism is unhelpful.

What if people don’t want to be part of internet history? It’s hard enough to
be anonymous, or even to move on from mistakes, as it is.

~~~
shifto
Then why make a site? That's like saying you don't want any photos of you to
exist but you've been outside for years while people where making photos and
now you're telling them to delete those.

~~~
mprev
So, what you’re saying is that a decision you make as, say, a 17 year old is
one that you must stand by for the rest of your life?

And, yes, I get it, there’s more than the IA but it’s a point of principle for
me. I’m not talking about erasing newspaper articles but rather the blog a kid
posts when they are naive.

~~~
dkersten
> So, what you’re saying is that a decision you make as, say, a 17 year old is
> one that you must stand by for the rest of your life?

That's basically the way it is anyway. If you go out in public (physically or
virtually), you've lost some control over how long-reaching your actions might
be. If you do something stupid in public, you can't prevent people from
posting their videos/photos of it, or just talking about you. The internet is
no different. While you can get your stuff removed from some places, you have
no control over it generally speaking. Somebody might have screenshots for
example.

~~~
sasasassy
In most parts of the world you are entitled to privacy even if you are
outside, and you can demand photos/videos taken of you without your consent to
be deleted.

Plus, just because I have a website, that doesn't make it's content open
domain, for some business to copy all it's contents and publish them without
my knowledge.

~~~
pbhjpbhj
>you can demand photos/videos taken of you without your consent to be deleted.
//

Can you name maybe five large countries where that's true? It's not true in
USA, nor UK AFAIK. I understand it's not true in Germany either.

So, I only know contradictions, interested to hear. China and Russia, don't
seem likely to have such laws - maybe they're common in South America, Africa?

------
Causality1
Does Archive.org make any judgment calls when it comes to honoring requests to
remove content? For example, I can see people trying to scrub evidence of
their own lies or promises or other damaging misdeeds asking for their content
to be removed.

~~~
briandear
It feels like burning old newspapers if a subject of an old story doesn’t like
the story. Or a book author forcing a library to remove her books from the
shelves. There is something Orwellian about letting people purge history of
they don’t like it. When something is published and public, the bell has
already been rung. Should we force people who saw the original content to
never speak of it? Can we sue them to prevent them from talking about the
“bad” content? Erasing sites from Wayback, to me, feels like the sanctioning
of censorship, or erasing history. The so-called “right” to be forgotten is a
strange right in free societies. Does the right to be forgotten give people
the right to destroy old newspapers than someone has saved? Can people go into
someone’s home and seize books that depict the claimant in a negative light?
Wayback is like a photo gallery of the past. We shouldn’t be allowing people
to rewrite history.

~~~
tannhaeuser
I personally find a "right ro be forgotten" as such laughable, but I
understand it was specifically introduced not to expose something stupid you
did or said, or a non-advantageous photo taken from you or similar as your
only public record in times of clickbait and staged polarizing crap. Then
there's the problem of copyrighted material, and of publishing stuff on your
site with the intent of making money off your user's attention via ads, one of
the very few avenues of financing content creation. All these concerns have to
be balanced against another, which creates a difficult legal environment for
archival sites.

~~~
Fnoord
Think of all the things you did as teenager or child. Now think of all these
things as being documented on the internet. Do we really want to be haunted by
our past in such a way?

~~~
rootlocus
Yes, we all agree children and teenagers do stupid shit. Since we all
acknowledge it, can we be mature about it and not "haunt" people with it?

~~~
Fnoord
Easy to say as a bystander. What if its you kid who's being bullied or who got
bullied? What are you gonna do about it? My other post in this thread
addresses that point [1]

[1]
[https://news.ycombinator.com/item?id=20162590](https://news.ycombinator.com/item?id=20162590)

~~~
rootlocus
I'm sorry, but these kind of straw men are making discussions on the subject
impossible. I argued that we shouldn't judge adults based on the stupid things
they did as children.

> Do we really want to be haunted by our past in such a way?

To which my reply was: let's be mature about it and not care about
trivialities from someone's past.

Now you change the subject to: "What if its you kid who's being bullied or who
got bullied?". My kid being bullied "right now" is not the same as "my kid did
some stupid shit 10 years ago and people are making fun of him now because of
it". This is another problem with another solution, and it's not something I
argued about.

------
dredmorbius
Since the question of why removal might be sensible has been raised, I thought
I'd offer a historical perspective.

From a 1966 BBC documentary:

 _" Well, he who has access to information controls the game. This is very
dangerous. I think both your country and mine have never trusted the
government completely. We do so for good reason. Here we have a mechanism that
could be abused. Here we have a mechanism that would allow the creation of a
dictator. . ._

 _I 've yet to see an expression by anyone in Congress about this new type of
danger. In fact, we see proposals for centralizing information, we see
proposals for rushing ahead into new, more efficient computer information
systems, and very little thought is being given to the dangers of the misuse
of these systems. . . I ask a lot of people about privacy, why they valued it,
and I was surprised by the number of people who said "Well, I don't do
anything wrong. Why should I worry about privacy?" And then, on the other
hand, I think there's a more wise group that says, 'Privacy is really the
right to be wrong, then go on and live the rest of your life, without having
it mark you forever.' I tend to think this latter view is the view we should
hold."_

[https://youtube.com/watch?v=FwaDvJYZTVk&t=29m31s](https://youtube.com/watch?v=FwaDvJYZTVk&t=29m31s)

The speaker is Paul Baran, of RAND Corporation, and the inventor of packet-
based switching -- the technology which makes the Internet possible.

If you want to know who could possibly have forseen the negative consequences
of universal information networks might have been: their creator did.

Baran's full archive of RAND publications are now freely downloadable from
RAND, after I'd requested access in July of 2018, for which I'm immensely
grateful.

Backstory:

[https://web.archive.org/web/20180725104347/https://plus.goog...](https://web.archive.org/web/20180725104347/https://plus.google.com/104092656004159577193/posts/J7oL5ZwuTzY)

Archive:

[https://www.rand.org/pubs/authors/b/baran_paul.html](https://www.rand.org/pubs/authors/b/baran_paul.html)

------
bookofjoe
The comments here bring to mind this: Attorney to witness: What did you see?
Witness: It looked like he was saying "XYZ." Opposing attorney: Objection.
Judge: Sustained. The jury will disregard the previous question and answer.
—Except: the jury has heard it — too late.

------
pleasecalllater
It's good that the only place where people still cannot remove information is
a library.

