To be honest, I suspect this to be a bug rather than deliberate, or else you'd have thought that they'd have notified people.
I'll try it tonight if no one else gets a chance.
Locally, this code reports 200 OK for all of those except the last reports a 302 redirect (to the https, I presume). As you can see, on Google, they all fail with an internal exception.
Paypal does have a universal Disallow in robots.txt, so I thought I'd setup why.gd to do the same. But it works fine.
Anyone else find any other URLs that are blocked?
(Ordinarily I wouldn't complain, but it's happened about six times in the last 2-3 days.)
aren't dupes, even though they link to the exact same page. There really isn't anything you can do about this particular case except error-prone heuristics.
Perhaps the "solution" is to ignore query strings....how many sites use them to distinguish content anymore? Alternately...compare the content of the <head> tag on the linked page? That wouldn't be a perfect solution, but it would probably go a long way.
In any case, I think HN should strip the hash and what follows for purposes of dupe detection, but keep them in the link in case someone actually wants to link to a specific spot in the page.
To answer your question: query strings ("?foo=bar&a=b&c") are widely used. Among other places, HN itself uses them. :) Also, whenever you submit a form with GET.
I had forgotten that HN was using query strings to reference articles...D'oh. By now, I figured that everyone had adopted the URL-mapping approach. Anyway, detecting collisions based on the head tag still seems like a possibility....