

Web Decay Graph - mzl
https://www.tbray.org/ongoing/When/201x/2015/05/25/URI-decay

======
alkonaut
I gave up on considering the web to be a "web" of hyperlinks. These days the
www just feels like a collection of siloed (namespaced) applications. I expect
a link to be valid at least long enough to use in an email (i.e. days/weeks),
but I wouldn't use a link in say a blog post without also copying the relevant
information out of the page. The url is then just a source reference
identifier, not a hyperlink that is expected to go anywhere.

~~~
frik
It's not that bad. And about silos: Years old bookmarks of Facebook, Youtube,
Flickr, IMDb, Amazon and Twitter work fine. And for MySpace, Geocity, Tripod
and Frienster you better visit their backup on Archive.org:
[https://web.archive.org/web/20060202160308/http://www.myspac...](https://web.archive.org/web/20060202160308/http://www.myspace.com/)

~~~
rspeer
YouTube isn't a good example there. I often find links to YouTube videos that
have been taken down by a copyright claim (whether legitimate or not) or been
"made private" by the uploader.

------
splitbrain
A commenter raises an interesting question:

 _" How did you detect "decay"? Just based on HTTP codes, or by actually
looking at the linked content? On my own blog, I found more often than I like
that old links still "work" per HTTP, but now refer to something rather
different from the content that I originally intended to refer to."_

What would be some good way to detect such spam sites in an automated way?
Looking for the link's title in the remote HTML? Check for common domain
placeholder page contents and spam words? Maybe Google has some API one could
use?

~~~
socket0
I've run a similar test on my own blog (links dating back to ~2000), and I had
the same problem. Some sites do a permanent redirect on broken links, others
don't even redirect but show generic content on the original URL. I guess your
success in automating this would depend on the nature of the links, but from a
completely random collection the only success I had was with visual
inspection.

(Someone with far too much time on their hands could probably write a script
to attempt to retrieve a copy of the page from the Wayback Machine from around
the time the link was posted, then calculate the percentage change compared to
the current version. Not really reliable, but worth a try.)

------
greggman
Donate to the internet archive. Then once in a while run some script to check
your links, if they fail or pointing to crap link to the archive? Check them
again later just incase you got a false positive?

Yea I know it won't work for all links.

~~~
MasterScrat
Sounds like there should be a service for that:

* make a copy of all linked resources when each post is published

* regularly compare the linked resources to the copies to make sure the links still work as expected

* when a link dies, automatically replace it with a link to the copy

~~~
gosub
more than a service (the service provider could also disappear), it should be
a function of the publishing platform

------
ssn
You can find a lot of academic research on this area:
[https://scholar.google.com/scholar?hl=en&q=web+decay](https://scholar.google.com/scholar?hl=en&q=web+decay)

------
raesene9
This is why for my own personal reasearch and archiving I moved from keeping
"interesting links" to using a web clipper to take a copy of the page. Too
many times I'd think "Oh there was an interesting article on that" to find it
had gone, so now I take a copy of the article for later reference.

~~~
pronoiac
Pinboard can keep archives of bookmarked pages, and can also watch, say, your
Twitter for new links.

~~~
ge0rg
But who will keep archives of pinboard?

Seriously, this approach will only shift the data from one unreliable online
service to another one. Having decentralized offline copies of relevant
information is much better. And maybe we will be able to come up with a
mechanism to coordinate these decentralized backups, a la freenet [0]

[0] [https://freenetproject.org/](https://freenetproject.org/)

~~~
pronoiac
I think Pinboard is deliberately aiming at changing less than most web sites,
and he _probably_ has better backups than I do for my own computers. Also, I
have lots of devices, and it's nice to be able to save and access the archives
from _all_ of them.

Disclosure: I use the bookmarking there, not the archiving.

------
z3t4
I think one of the problems, is that many people do URL's the wrong way. And I
can not blame them, as writing information in the URL at least used to give
better ranking in search engines. Most SEO people will say that the URL is
important! So like many other problems with the WWW, we should blame Google :P

One example on good URL's are HN:
[https://news.ycombinator.com/item?id=9637215](https://news.ycombinator.com/item?id=9637215)

It will not help ranking on search engines, but the URL will hopefully never
change.

~~~
M2Ys4U
Indeed, Cool URIs Don't Change:
[http://www.w3.org/Provider/Style/URI.html](http://www.w3.org/Provider/Style/URI.html)

~~~
tzakrajs
What a wonderful fantasy that would have been.

------
atap
Hmmm, the Y axis of the graph is tough to interpret.

Isn't it kind of weird to use a percentage for a line graph? With percentages,
the goal is to provide an obvious fractional breakdown, that viewers can
readily sum to 100%, visually.

I have no idea how to sum a curve and reconcile back to the original universe.

If the complete set of links in this data is a count of 12,373 links, then how
many have decayed? Based on that graph, I have no idea.

------
username3
> From: Charlie (May 31 2015, at 20:36)

Bret Victor posted some interesting thoughts on this subject a few days ago:

[http://worrydream.com/TheWebOfAlexandria/](http://worrydream.com/TheWebOfAlexandria/)

[http://worrydream.com/TheWebOfAlexandria/2.html](http://worrydream.com/TheWebOfAlexandria/2.html)

[link]

------
sebastianconcpt
Internet's Natural Selection silently doing its job

