Hacker News new | past | comments | ask | show | jobs | submit login
Web Archive appears to be down (archive.org)
93 points by pana1 on Nov 27, 2022 | hide | past | favorite | 18 comments



I recently wondered, and I assume the answer is on the website that is currently down, about what guarantees that Web Archive will survive longer than your standard website?

What makes it an archive in terms of longevity? Is it backed by governments or NGOs? What if the funding stops? Etc.


I used to work in Web preservation and we often discussed/collaborated with internet archive.

I was working in a French government-funded Web preservation project.

Many countries have such government-led projects (france, norway, iceland, italy, usa - via the library of congress, slovenia). For a full list of who does that, see https://netpreserve.org/

Short answer: partial replication of the data at other institutions + overlap of the archive at other institutions

Many projects like internet archive exist, they are not necessarily publicly accessible via the internet (in France the archive can be consulted from public libraries via dedicated computers).

If you are interested in the topic, also take a look at the LOCKSS project from Stanford: https://www.lockss.org/


Is there an existing service where I can query an URL and get a list of preservation projects that have successfully archived that URL?


You are looking for Memento.

- http://timetravel.mementoweb.org/ (search seems to be down)

- https://www.webarchive.org.uk/mementos/search (search not responsive for me)

- https://mementoweb.github.io/SiteStory/redirector.html (protocol and tools)


There are a few browser add-ons that will list about a dozen archives for every page you visit. However, that doesn't include non-public services like the one mentioned above.


How does replication in the EU work with right to be forgotten laws?


There are public interest exceptions to European right-to-be-forgotten laws.

Cases are handled on an individual basis. Sometimes those exceptions apply, sometimes the needs of the individual supercedes those public interests.


That’s why they’re collaborating with IPFS, a p2p content addressed file system


I haven't been able to figure out how to access the IPFS versions of the archive. They technically have a version of their site hosted through IPFS (https://www-dweb-cors.dev.archive.org/web), but when searching for a specific url like nytimes.com, it just redirects to the standard archive.org url.


I've been thinking about this too. It should be backed by a p2p network to be resilient. But probably there are mirrors, they are simply not advertised for security? (plenty of copyright infringement on archive.org)


Their infrastructure is explained in this talk: https://www.youtube.com/watch?v=neBeDgICOeA


Canonical link is at https://archive.org/details/jonah-edwards-presentation -- I'm also always happy to answer any questions folks might have about it if I see them.


I’m not overly hopeful. I suspect the ill-thought out ‘COVID library' ends up getting…expensive, at best.


You can see the loss of traffic in their charts [1] (from archive stats [2]). Looks like it started on the morning of the 26th Nov.

[1] https://analytics1.archive.org/stats/wb.php

[2] https://archive.org/stats/


And it seems to be back now. Also visible in the charts.


Just visit it in the wayback machine...


Just the web subdomain seems to be down.


Since a dead link doesn't reveal much: this outage appears to affect just the Wayback Machine. The other collections appear to be accessible through http://archive.org




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: