The service periodically checks the original sources and redirects broken links to the archive.org snapshot automatically when the sources fail. Any redirection can be customized or fall back to an author’s website.
The usual problem is that anything that is supposed to prevent link rot itself is prone to rot.
Which is why my SaaS solution has a self-off boarding function: writers can use custom domains for their links and export a fully functional nginx and Apache-based redirection config file at any time. I also have a contingency plan for the next decade, which includes setting up such a low cost redirection server myself.
I needed this service for my own books. So I just built it. Dogfooding is what I recommend to Bootstrapper’s anyway, so I am my first customer.
It’s highly reliable (it’s a 301 forwarding system after all) and very cheap to host.
You can find this at https://permanent.link
So, yeah, then you end up linking to the archive.org page (and if it's not there you could submit it to archive.org for archiving). For a time archive.org delisted pages if the current robots.txt blocked access (even if the page was archived a decade ago), I can't verify if this is still the case.
A really poorly quick scan of ~400 links from my decaying blog shows about 40% either do not resolve at all or return content from a link farm (another 30-40% appear to redirect to https versions of the site, but I didn't check further to see if the content was what was intended).
It provides a list of broken and a list of redirected links. I first ‘fixed’ the redirects to ensure I wasn’t redirecting to content that had been replaced or turned to a link farm.
For fixing the broken links, I found many links were available via archive.org and swapping them out was a two click breeze. For the non-archived pages, I often found the new URL for the same resource by Google searching the link.
My only request would be a feature to archive all outgoing links
> Filecoin (⨎) is an open-source, public cryptocurrency and digital payment system intended to be a blockchain-based cooperative digital storage and data retrieval method. It is made by Protocol Labs and builds on top of InterPlanetary File System, allowing users to rent unused hard drive space. A blockchain mechanism is used to register the deals. It is a decentralized storage system that aims to “store humanity’s most important information.” Filecoin is open protocol and backed by a blockchain that records commitments made by the network’s participants, with transactions made using FIL, the blockchain’s native currency. The blockchain is based on both proof-of-replication and proof-of-spacetime.
I guess it’s more complicated than just that because you’d have to take into account the date the page was originally linked to to get the closest match but should be doable.
There might very well be reasons this isn’t a good idea, it’s just something that occurred to me might be useful since I have a lot of old links and no doubt many of them are broken now.
You would also end up sending the validation request for every client page view
If the url returns a 404, check the internet archive, if it’s there then return that otherwise return a 404.
If CORS is an issue just proxy the requests via the server that is serving the website.
The links of big institutions used to change a lot, much less so now. Some, like WIRED, actually go to the trouble of making old content links resolve to their newer addresses.
If the content would still have value in 40 years, it will probably survive. (See the NYTimes archive.) But links themselves have little value (unless they're widely published on paper!). There may be a few big institutions thinking that far out.
worrying about linkrot, some years ago I wrote a little local webserver that took a link via a bookmarklet and saved a page in multiple formats:
- html with wget -archive | tar.gz
- pdf with wkhtmltopdf
- txt with links
- png with firefox --screenshot