Hacker News new | past | comments | ask | show | jobs | submit login

Yeah, it's a difficult problem. In this case, it was a single character that I didn't notice. I'm not even clear on how it got there (or why it wasn't on the second submission URL).

Perhaps the "solution" is to ignore query strings....how many sites use them to distinguish content anymore? Alternately...compare the content of the <head> tag on the linked page? That wouldn't be a perfect solution, but it would probably go a long way.

The hash is of course for linking sections within the page, so I suspect the culprit is a link you clicked on the page itself before submitting the story. There are a couple of links with href="#" there, although they all have JavaScript event handlers that cancel the default action.

Do you have JavaScript disabled by any chance, or do you use some obscure browser?

In any case, I think HN should strip the hash and what follows for purposes of dupe detection, but keep them in the link in case someone actually wants to link to a specific spot in the page.

To answer your question: query strings ("?foo=bar&a=b&c") are widely used. Among other places, HN itself uses them. :) Also, whenever you submit a form with GET.

I honestly couldn't tell you how I got to the page (or what I clicked on once I got there), but it probably involved clicking a number of links.

I had forgotten that HN was using query strings to reference articles...D'oh. By now, I figured that everyone had adopted the URL-mapping approach. Anyway, detecting collisions based on the head tag still seems like a possibility....

You mean matching the title tag if the pre-query-string portion of the URL matches? That could certainly work. Maybe this is something to test with xirium's latest content scrape.

Yep. Or, if you were really worried about false-positives, the contents of the entire head.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact