
The Internet Archive is waging war on misinformation - praveenscience
https://www.ft.com/content/5be1f2ee-d60b-11e9-a0bd-ab8ec6435630
======
ineedasername
I like that the internet archive exists, but I don't like how they implement
their opt-out policy. Sites can opt-out at any time, which I suppose is the
correct thing to allow, but once they opt out then everything the Archive has
gathered up to that point also goes away, which seems wrong.

~~~
toomuchtodo
It doesn’t go away, it’s just not public. This is to comply with copyright
law. It’s still safely stored on disk.

~~~
ineedasername
Okay, if they have no choice in it then I understand. But as for keeping a
copy on disk that practically no one can ever see, to me that strikes seems
like a distinction without a difference.

~~~
diggan
I'm pretty sure you can visit the building that Internet Archive operates from
([https://archive.org/about/contact.php](https://archive.org/about/contact.php))
and freely surf/look at all the content, even the content that is not
available from the web version.

~~~
raxxorrax
I see no advantage in restricting access. I just believe this to be a
mechanism to hide uncomfortable content, which I believe has no benefit as
long as no personal information is revealed.

------
devmunchies
> _But Mr Graham argued that simply removing false information or offensive
> content isn’t necessarily the answer. Hateful material need not remain
> publicly available, he said, but certain researchers and politicians should
> be able to study it._

They're planting the seed and seeing how the public reacts. This is
ridiculous! Mark something "controversial", "possibly fake", but don't memory-
hole us. An archive only archiving history they agree with is not worth the
funding.

~~~
JulianMorrison
There is no public-interest benefit in hate screeds. Those are memetic viruses
with established body counts and should be studied, if at all, in the manner
one studies smallpox.

~~~
fromthestart
There are people who would put Das Kapital the same category of unacceptable
speech as others would Mein Kampf.

You have no right to quarantine information like you would a virus, because
there is no way to agree on which information is "bad". Further, even overall
hateful or dangerous content can have meritful parts - not everything Hitler
said or did, for example, was unreasonable or evil.

Another example, what about the Unabomber Manifesto? Yeah, he was a terrorist
and he wrote some arguably hateful things about liberalism/leftism, but there
was also a wealth of thoughtful observations regarding the human condition in
his work.

All information should be free for study by anyone.

~~~
JulianMorrison
And they would be wrong. Liberation literature is not in the same category as
hate screeds. I don't buy the moral abdication you're selling. You know what's
evil - it's the stuff full of race hate and misogyny. There is no moral
equivalence.

~~~
journalctl
It’s amazing how if you try to make a progressive political point here, you
tend to get criticized. But when it comes time to judge the morality of things
like racism or misogyny, suddenly it’s impossible to know the morality of any
idea ever.

~~~
lolinder
People aren't arguing that racism and misogyny are moral, they're arguing that
if you allow any censorship at all, then someone must be appointed censor, and
you may not agree with that someone. The only way to be sure that your ideas
can be freely expressed is to allow everyone, no matter how immoral, to freely
express their ideas.

------
hugh4life
Extremely lame article...

I remember back 10 years ago either google or the internet archive had a
temporary search engine that went through the wayback machine... and a lot of
interesting things were found. It was through this IIRC that it was discovered
that Obama filled out a questionnaire in 1996 saying he supported same-sex
marriage when that was not his public position at the time. I've always
wondered why that search engine went away and who currently has access to do
such searches.

~~~
5-
You appear to be remembering Google publishing their 2001 search index that
was briefly available in October 2008: [https://www.cnet.com/news/who-were-
you-in-2001-check-googles...](https://www.cnet.com/news/who-were-you-
in-2001-check-googles-old-index/)

Extremely useful thing, I wonder how the legal side of things was managed,
even for such a short exposure.

------
jakeogh
It's pretty obvious they were next on the hit list.

[https://news.ycombinator.com/item?id=20623177](https://news.ycombinator.com/item?id=20623177)

archive.org is a huge problem for the rewriters.

------
haunter
Does anyone how the Internet Archive get a bypass on DMCA? It's full of
pirated games for example. I'm aware that it's one thing to archive something
but anyone can download it as well, so it's essentially distribution too. Just
curious how does it work. Personally that's how I get old SNES, NES, DS games.

~~~
gwern
The IA actually has a DMCA exemption for the pirated games:
[https://archive.org/about/dmca.php](https://archive.org/about/dmca.php)

~~~
rasz
Not so sure about that, their big Amiga collection lasted couple of days.

~~~
LukeShu
Specifically, the part of the DMCA that the exemption is for is the part that
makes it criminal to bypass copy-protection; it does not grant them additional
rights to make or distribute copies.

There are lots of forms of copying where the copying itself is legal under
copyright law. However, the DMCA made it criminal to bypass copy-protection
mechanisms, _even if_ they copying itself would otherwise be legal. That's
what the exception was for.

------
Johnny555
If they really want to be trusted as a source of truth, they need to find a
way to publish cryptographic signatures of every page that can be validated to
prove that the page was captured when they say it was. Maybe some sort of
blockchain.

Otherwise there's no reason to trust that an Internet Archive snapshot is an
accurate representation of the archived page.

~~~
khawkins
I wouldn't say "no reason", I'd just say that it should be self-aware that
they are functionally similar to a news website in that they live and die on
their reputation. The second someone can demonstrate that they've manually
misrepresented some archived page they lose nearly all of that credibility.

Better than crypto stuff which might not scale well would be to have multiple
independent archives run by people with a diversity of political views and
nationalities. The authenticity can be easily verified if the same page is
archived to a site run by left-wing people in San Francisco and also a site
run by right-wing people in London.

~~~
Johnny555
But you need more than two, or who would break the tie when the left-leaning
site says one thing, and the right-leaning site says the opposite.

Running a general internet archive isn't cheap, so it's not like there would
be dozens of them.

------
3xblah
"Mr Graham said he was an "optimist", but conceded that the archive had not
yet saved as much as he would like. Take YouTube, for example: the team is
only archiving a "small fraction" of all the videos published each week."

Is it true one can nudge the IA crawlers to archive a page that would
otherwise be ignored?

[https://blog.archive.org/2017/01/25/see-something-save-
somet...](https://blog.archive.org/2017/01/25/see-something-save-something/)

Is this as simple as sending a GET request, something like:

    
    
       curl -O /dev/null https://web.archive.org/web/save/https://www.youtube.com/watch?v=5oBh6Ng8bI4^1
    

1 Are web developers the only ones who use "Overrides" in Chrome DevTools? For
example, could users replace external resources such as tracking pixels or
JS/CSS that obscures text with their own resource files?

~~~
crisnoble
If you put that link as a link in your video description, google crawlers
might follow the link, thus triggering the GET request the same as curl.

------
taborj
They mention archiving a bunch of Trump stuff, but are they just focusing on
conservatives? There's a bunch of misinformation on both sides, and if they're
really after the truth, they should be paying the same amount of time and
attention to both right _and_ left.

Somewhat related: it's sad but perhaps not inevitable that everyone basically
has to do their own research to find out what's actually going on. These days,
I don't trust any one source; I find out what that source used as _their_
source, and go read that. And if that source has a source, I look at that.
This means I oftentimes find myself looking at the raw numbers of, say, gun
deaths or immigration numbers, etc. It's extremely time consuming, but
immensely illuminating.

If only there were some organizations that would do this for me, without
adding in some bias or interpretation. We could call them "news
organizations."

~~~
SamBam
What gives you the impression that they're "focusing on conservatives?" Do you
have evidence for this statement?

You mention archiving Trump's tweets. Do you think that they might do this
because he is the President of the United States?

~~~
weberc2
(I don't have a dog in this fight) From TFA: "The result is Trump and Brexit!"
and generally the lack of anything disparaging toward the left. The article
and the Internet Archive plainly isn't "focused on conservatives", but it sort
of undermines its purported mission (commitment to preserving the truth) in
the implication that conservatives are unique in espousing misinformation,
even if the implication is unintentional.

~~~
magashna
Wouldn't it be up to X side to submit links to IA? Also if a site opt-out and
their historical data is wiped, that isn't IA's bias, it's the site choosing
to opt-out.

~~~
weberc2
Per the article, we're discussing IA's efforts to flag "misinformation".

------
ravenstine
Archiving the web is a fine thing on its own, but why must everything become a
"war" against misinformation, fake news, etc.? It can simply be a side benefit
to having the Internet Archive. But I'm wary of anyone who claims to be trying
to combat misinformation, because usually there's a political agenda. (and I
don't care whose agenda it is, but by all means, _downvote_ me...)

I don't want the Internet Archive to be around for the sake of keeping tabs on
Trump or pointing out fake news. Just archive web content. The end, please.

~~~
devmunchies
agree. the unix philosophy. _This service only archives the web. To know
whether the content is true, see: <other service>_

------
neonate
[http://archive.is/M7zGj](http://archive.is/M7zGj)

~~~
IfOnlyYouKnew
This is funny, but the FT actually has opened their paywall today, so this may
be a good time to check out everything you didn't get to read in the past.

------
scoutt
I'm am glad the Archive exists and also a frequent user. The only thing that
gets me thinking is that at some point in the future, they will own the past.
That is, if it wasn't in the archive then it didn't happened (in what content-
in-a-webpage regards). And by owning the past they would be able to manipulate
it (probably).

One can say the same thing about Wikipedia, but there are other encyclopedias
that can be used to check a certain fact.

Do we need an archive of the Internet Archive?

------
RobertRoberts
9/11 in the Archive.

Go look at all the news websites on September 10th, 2001... and back about a
month or two. All empty until 9/11.

~~~
abacadaba
Na I remember pretty well, there was literally nothing in the news for months
before that, top story was shark attacks or something.

~~~
briantakita
9/10/2001: Donald Rumsfeld admitting that $2.3 trillion was unaccounted for
from the Defense budget. Backpage story for sure.
[https://www.youtube.com/watch?v=xU4GdHLUHwU](https://www.youtube.com/watch?v=xU4GdHLUHwU)

~~~
abacadaba
o_0

------
ash
> Around the sides of the room [in Internet Archive headquarters] stand around
> 130 3ft porcelain figurines — replicas of every employee who has spent at
> least three years at the archive.

Wow. That's a lot of figurines.

------
rkagerer
Thank you for using the word "misinformation" instead of Fake News.

------
siliconunit
I find this kind of news deeply worrying...critical thinking should be applied
to everything, and purging an archive of some of its controversial stuff
(albeit a common practice in real ones too as pointed out) is a terrifying
perspective, what's the next subject, where do you stop, is the internet to
become a 'validated safe space'? A chilling perspective. If you cannot handle
discussion about absolutely anything without being afraid or influenced
something is deeply wrong in the way you have been educated.

------
brewsterkahle1
Full copy of the article:
[https://archive.org/details/howtheinternetarchiveiswagingwar...](https://archive.org/details/howtheinternetarchiveiswagingwaronmisinformation)

------
dredmorbius
OT: the FT's paywall appears to have opened up, apparently in general, not
just for HN referrers.

I'd previously been using the archive.is / archive.fo workaround.

The FT is among the very highest-quality mainstream/establishment news
publications, and is well worth the subscription, if you can afford it. It
will reward your attention even if not.

Update: this is a one-day promo:

[https://4estate.media/@aendrew/102814393931699671](https://4estate.media/@aendrew/102814393931699671)

------
jimmays
With GDPR and all it's impossible not to provide an opt-out of some kind in
this day and age.

------
kd3
These guys are heroes.

~~~
tpmx
I used to think that. Then I tried volunteering via their IRC channel and was
met with some next level hostility.

My read is:

a) They're doing a job that needs to be done.

b) They're _relatively_ flush with cash from donors

c) They're not nearly as efficient as they could be.

d) The people there seemed more concerned about competition than
collaboration.

e) Probably most controversial: The organization seems like it has stagnated.

~~~
ineedasername
How did you approach them, what did you ask to help with? Not saying you did
this, but there's a big difference between "hey folks, how can I help?"
compared to "I noticed your page load times are ridiculous, I can get you to a
place that isn't awful".

~~~
scrollaway
FWIW this is a fairly common experience. Jason Scott is quite famously a
problematic guy. "But", he volunteers an insane amount so he gets a pass on a
lot of his antics (hands up if you've heard that story before).

Getting kicked out of the IRC channel for pissing him off for basically
anything is frequent. I used to hang out on IRC and saw it happen a lot.
Eventually happened to me as well (couple years ago). I didn't go back; it's
not worth putting up with him. Didn't stop me from contributing to the archive
/ AT / archival communities; in fact, shortly after I got kicked out, I was
invited by someone else to a jason-scott-free community I contributed to for a
while.

~~~
tpmx
That resonates a lot with my experience :/.

Now he's employed by the IA, so not volunteering, right?

------
rc_kas
I gotta donate to them again, I always forgot about them, but those fuckers
are hero's.

------
jakeogh
Meta: Are we going to click these cookie notifications until the end of time?
I have JS off, so usually I dont see them... /thanks EU

------
s9w
This is a blatant and cheap hit piece, 100% politics and should not be here

~~~
dole
2 paragraphs out of 21 mentioned Trump, 2-3 mentioned politics otherwise.

