There could simply be a "report duplicate" link (where you specify which article it's a repost of) next to a new post for the first 1 hour after it's posted. If enough people say it's a duplicate of a particular article, then it goes off, and the comment threads are merged.
So don't strip them indiscriminately, have the HN site compare submitted pages it gets by dropping them one by one to the original link, only maintain those that actually affect the resulting page.
I agree. I've also noticed that the same story as reported by two different news sources will get posted, which is essentially the same thing.
Google does some pretty cool probability stuff (at least, I think that's how they do it) to figure out what articles are the same for news.google.com. Something like that would be really cool on new.yc.
I know, the source is open.... but I'm clearly busy ;)