We're working on several initiatives, like bringing OpenAnnotations to the wayback machine, reducing link rot (see: vinay goel & http://vinay.blog.archive.org/2014/12/17/9/), universalizing access to open-access publications, and supporting efforts to distribute and decentralize the web.
Send me an email if you're interested in checking out our slack channel and meeting fellow contributors. Also, if you have a public-good or open-access project you would like help with or resources for, we'd love to hear about it --
Link rot is only going to get bad from now. It's real, and it's awful.
It's not even random websites. NYT will post a link to a popular Youtube video, two days later the video gets pulled, and one week into the article, the links are already stale.
As Wiki gets more 'reputable', newspapers will posts links to it. Wiki being what it is, the reference to the page will go stale.
So here's an idea for a service: hash all the outgoing/inside pages on a website. If they change, either 1) give options for users to review, 2) update the link 3) delete the reference. If the original website 404's, change the link to the archive page. If a video link is outdated, provide tools to search for similar videos to link, from inside the dashboard. For Wikipedia, automatically link to a certain version of history of a page, not the general page. This is different from the idea posted below because it integrates with existing publishing systems, so it'd be more b2b, and one would start cashflow right away.
For links to social media, use a 'photo snippet' tool that looks to check if the link is valid, and goes to the image version of the link if validity is dead.
I'm certain people would pay GOOD MONEY for this service. I know I would if I were running a publication.
Give me some spare change if it succeeds. : )
I've personally started saving sites I want to keep using the print-to-PDF feature. Bookmarks aren't enough when you really care to save the data.
If you want to do this yourself, there's several crawlers out there that you can find the data yourself.
Many publications have been reluctant to link out to sites over the years because of this very thing--link rot.
Unless the page was non-notable and got deleted, there'll be a redirect.
And the Tool Support section - the most valuable part of the page, and which I vehemently contend constitutes essential secondary source material - remains deleted. It took me 15 minutes of digging through the page history to find it, which I'd never do if I didn't already know it was there.
EDIT: This is a hack until the content addressable web arrives.
I haven't done it yet, but would be interested in other's experiences. It would be interesting to have a crawler that does it automatically for each new post.
Also, copyright issues aside, the prospect of a recursively archived Web is a little mind boggling. But storage is always cheaper, so hey.
I now save every single page I find important or interesting on my local HDD and then move it to a backup disk later on.
The addon can be found here: https://github.com/rahiel/archiveror.
There could even be an integration into Delicious or WordPress. Wikipedia is working on a bot for external links in their content.
But here's one saved the same day that doesn't:
It's really sad, and more than a little disheartening.
In updating it recently (couple months ago) I found many of the links simply 404'd, not even any redirect at all. Wikipedia was pretty bad for this too as the author found.
I took to hosting all the files I linked to myself to at least be sure they'd stay around but it is a bit of a losing battle with regard to linking to normal webpages.
I might look to link to an archive.org page instead in the future if the link rot keeps happening. Perhaps that would be a more stable option.
The one big problem with Wikipedia redirects is those to sections: if a page is moved or disappears, the software notices, but it doesn't if a section changes its name. Thus, frequently the anchor in an intra-wiki link is broken.
Archive.org or maybe something like wallabag or even a screenshot, but anyways to work around link rot one is better off with a local cached copy.
Number of issues: People who squat pages & accounting for temporary failures.
One of the things that I do when I put up a page, though, is make PDFs of everything that I link to. This way, if the link dries up, or is changed, there's still something to fall back to.
Pardon my ignorance and probably silly question, but what is the point of making data transfers to/from blog hosted at own domain more secure? If I get it right, it will just hide which certain articles reader visits (not that he visits this certain domain) and it will prevent caching along the wire. I suppose there must be some real benefit of this right in from of me (see "everyone else seems to have performed the same upgrade"), but I cannot see it. Any clues?
The biggest issue with this approach would probably be legal, as you'd find yourself redistributing copyright works. Could a fair use argument be made since it would be furthering public discourse?