
GitHub: a case study in link maintenance and 404 pages - chrismorgan
http://chrismorgan.info/blog/github-links-case-study.html
======
dreamdu5t
Surprised nobody mentioned the extremely annoying and misleading 404 response
when you are denied permission to view a private repo. I can't tell you how
many times people at work tell me my link doesn't work because they aren't
signed-in.

~~~
holman
We do that to avoid leaking the presence of the repository, if it's private.
It's a bit of a pain, but privacy's important to us.

~~~
Strilanc
Perhaps, when the link could be a private repository, the error page should
state it's _either_ missing or inaccessible.

All we need is an error code 402.5 "plausible deniability between unauthorized
and not found"...

~~~
holman
The spec actually uses a 404 specifically for this purpose:
[http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html](http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html)

~~~
wpietri
Sure, and everybody who has read the spec knows that. Alas, that's 1% of your
user base.

For the rest of your users it wouldn't hurt to say, "You might see something
different if you're logged in." Or if they are logged in, saying, "Were you
expecting something different? Maybe you just don't have access yet."

~~~
bowyakka
Wait, we cant hold github to account for making linkrot and not applying the
spec, but when the spec is implemented by them, for good reason, say "but 1%
of your user base reads the spec". Thats a bit of a double standard.

~~~
wpietri
Are you talking to me? I don't think I said anything like the first half of
that.

------
fullsailor
> GitHub uses the branch name; you can replace it with a changeset ID and
> it’ll work, but you’ll need to find a changeset ID.

You can press 'y' to expand the URL to its canonical form.

~~~
chrismorgan
Oh, cool. I didn’t know that. It’s a pity they don’t make it more obvious.
Post updated to include this tidbit of information. Thanks!

------
chrismorgan
I'm really interested in a GitHub person's take on this (holman?). Is my
notion of the balance reasonable? (You've got the data, I'm just guessing with
it.) Do you think you're likely to assess improving GitHub's 404 page at
least?

~~~
holman
You bring up some good points, and we're always improving pages like these
(the 404 itself has gone through a few versions this year already).

I don't think the answer is forcing the canonical URL on every page request,
though. I think it's important to retain branch names instead of the full sha.
I find this, for example:

    
    
        https://github.com/holman/dotfiles/blob/master/osx/set-defaults.sh
    

…far more usable and meaningful than this:

    
    
        https://github.com/holman/dotfiles/blob/0fe9e9963b2389eae4c9de49a4873bd819e19067/osx/set-defaults.sh
    

It'd be great to be able to support file renames and deleted branches better
in the product, but that takes some time to build out. Hopefully we can do
better with that in the future (and we have been working on things like this
recently- supporting repository redirects was a huge one, really).

------
susi22
Github can definitely improve on this. For instance, if I want to edit a file
like so:

[https://github.com/takezoe/gitbucket/edit/master/README.md](https://github.com/takezoe/gitbucket/edit/master/README.md)

you will see a 404 if you're not logged in with no information.

------
chrismorgan
By the way, I apologise to anyone looking for a decent 404 page from my site.
I initially configured that site (with WebFaction) as a static site, and they
don't expose a way of providing a custom 404 page for that. I should switch it
to Apache so that I can get a 404 page, but I haven't got round to doing that.
(More importantly, I haven't designed the 404 page I want, so the other part
of the effort would be wasted.)

At least I gave
[http://relink.chrismorgan.info](http://relink.chrismorgan.info) a proper 404
page. The sentiments displayed on it are the same as what I would put for
[http://chrismorgan.info](http://chrismorgan.info), though: you won't get a
404 page on my site unless you broke the link yourself.

------
mbesto
> _Link maintenance is hard; the web doesn’t just automatically stay intact;
> it requires effort on your part._

This sounds all well and good in theory, but - commercially speaking - becomes
very impractical for many businesses. I'd be really curious what the ROI would
be on a more rigorous link maintenance practice. My general impression is that
it's simply not worth it.

> _I should switch it to Apache so that I can get a 404 page, but I haven 't
> got round to doing that._[1]

Interesting comment from the OP, whom I suspect also recognizes (quite
possibly like GitHub) that the time it takes to do this type of stuff hardly
outweighs the benefit.

[1][https://news.ycombinator.com/item?id=6495675](https://news.ycombinator.com/item?id=6495675)

~~~
wpietri
This may happen to be true for some businesses, but it's not _necessarily_
true. Some toolsets may make it hard, which raises the I. Others make it easy
or free, so people naturally do it. The smaller the I gets, the more likely a
given R is worth it.

I also think it's easy to underestimate the R here. It's very hard to detect
things like brand damage and lost leads, especially when somebody turns up on
your site once, thinks you guys are chumps because of a bad first impression,
and never comes back. Whereas the I is obvious, because the cost is all
internal to the company.

The two factors, alas, reinforce one another. Link rot isn't an obvious
problem when setting up a system, because nobody is linking in. By the time
the problem becomes noticeable, the system is hard to change, making the cost
of a fix high. And that trains people to treat link rot as unimportant,
deepening the cycle.

------
plorkyeran
My favorite part about the Github 404 page is that it consistently locks up
Firefox for me.

~~~
yeukhon
I don't see how that's possible. Go report the bug. This is a bizarre bug.

------
yeukhon
I don't necessarily agree the author. For example, if someone wants their
privacy and wants to delete their Facebook account, when you search the
account it should say the account does not exist. It doesn't say it's gone. It
just say it is not found. 404 has a good security and privacy implication. For
example, when you turn a repo to private and anyone tries to access it without
proper permission should see 404 instead of seeing 401.

------
untilHellbanned
The OP makes valid points but its hard to hate on Github too much. Their users
are the ones breaking the links. But its also hard to hate on the users too
because the whole point is to develop software and that development is never
done.

~~~
chrismorgan
It depends; users break some of the links, but GitHub's design for non-
permanence is, to my mind, the biggest problem.

~~~
untilHellbanned
fair point

------
wil421
Isn't this Security 101, limit what external users can gather about your
systems. Especially when an error occurs, you don't want an exception getting
thrown and then the stack trace gets displayed to the whole internet.

------
dasil003
Totally tangential but, does Mercurial really not allow you to delete things
ever? So, for instance, when you accidentally commit a 100MB PSD file and then
need to remove it, there's no way to do that?

~~~
chrismorgan
Mercurial _does_ let you do it in much the same way Git would, but it requires
you to do it very deliberately, enabling (bundled) extensions in the config.
For example, the "strip" command, part of mq. As with Git, doing things like
that if you've pushed publicly will be difficult and require coordination.

------
samdunne
[http://notfound.org/](http://notfound.org/)

~~~
ygra
I wonder whether that works at all for web sites that have essentially global
reach. Children from the US are unlikely to appear in Germany all of a sudden,
I guess. (Incidentally, the example page for »Other countries« was in Greek
which I cannot even read).

~~~
samdunne
For something like Github, the more international it is, the better

------
deathanatos
If links should be permanent, what happens after a HTTP DELETE?

~~~
dragonwriter
410 GONE

EDIT: To be clear, this is more in regard to the original articles
qualification on the permanence of links: _If for some reason it can’t exist
any more, don’t just let it go: it should show useful error._ Using 410 GONE
instead of 404 NOT FOUND after a DELETE is a fairly minimal but direct
application of this principle.

