Hacker News new | past | comments | ask | show | jobs | submit login

It's interesting to think about how HTML could be modified to fix the issue. Initial thought: along with HREF, provide AREF- a list of archive links. The browser could automatically try a backup if the main one fails. The user should be able to right-click the link to select a specific backup. Another idea is to allow the web-page author to provide a rewrite rule to automatically generate wayback machine (or whatever) links from the original. This seems less error prone and browsers could provide a default that authors could override.

Anyway, the fix should work even with plain HTML. I'm sure there are a bunch of corner cases and security issues involved..

Well as mentioned by others, there is a browser extension. It's interesting to read the issues people have with it:


So this is a little indirect, but it does avoid the case where the Wayback machine goes down (or is subverted): include a HASHREF which is a hash of the state of the content when linked. Then you could find the resource using the content-addressable system of your choice. (Including, it must be said, the wayback machine itself).

I've found that web pages have so much dynamic content these days that even something that feels relatively static generates two different hashes almost on every pageload.

Indeed. I don't think you could or should hash the DOM - not least of which because it is, in general, the structured output of a program. Ideally you could hash the source. This might be a huge problem for single page applications, except you can always pre-render a SPA at any given URL, which solves the problem. (This is done all the time - the most elegant way is to run e.g. React on the server to pre-render, but you can also use another templating system in an arbitrary language, although you end up doing all features maybe not twice, but about 1.5x).

> (Including, it must be said, the wayback machine itself).

Citation needed? Eg something like http://web.archive.org/cdx/search/cdx?url=http://haskell.cs.... produces lines of the form:

  edu,yale,cs,haskell)/wp-content/uploads/2011/01/haskell-report-1.2.pdf 20170628055823 http://haskell.cs.yale.edu/wp-content/uploads/2011/01/haskell-report-1.2.pdf warc/revisit - WVI3426JEX42SRMSYNK74V2B7IEIYHAS 563
But there seems to be no documented way to turn WVI3426JEX42SRMSYNK74V2B7IEIYHAS (which I presume to be the hash) into a actual file. (Though http://web.archive.org/web/$DATEim_/$URL works fine, so it hasn't been a problem in practice.)

> Citation needed

Oh, sorry, I don't think the WM supports this today. I only meant that it could support it "trivially" (I put that in quotes since I don't know how WM is implemented. But in theory it would be easy to hash all their content and add an endpoint that maps from hashes to URLs).

My point was that you could add an addressing system that is both independent of the Wayback Machine, but which you could still (theoretically) use with it. But you'd have to add the facility to the WM.

Ah, that's disappointing, but oh well.

This is literally where my brain was going and I was glad to see someone went in the same direction. Given the <img> tag’s addition of srcset in recent years, there is precedent for doing something more with href.

Yup, I've been using the extension for probably about a year now and get the same issues they do. It really isn't that bad, most of the time backing out of the message once or twice does the trick, but it's funny because most of the time I get that message when going to the IA web uploader.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact