Similarly, coding algorithms on whiteboards doesn't tell you much other than how good the candidate is at coding algorithms on whiteboards. Given that the vast majority of their revenue comes from either search (which was built over a decade ago) or companies they've acquired, and the numerous well-hyped failures (Buzz, Wave, +, etc.), I don't think it's actually working all that well for Google either.
So is it more likely that Google is filled with idiots that can't see the err of their ways, or that, across tens of thousands of individual hires, this is the most cost effective and produces the best results (minimizing false positives) on average.
College-admissions style interviewing just doesn't make sense for a company like Google.
The players invited to the combine are the ones teams are considering drafting anyway; all the 40 times do is move players up or down the list by generally small amounts. The point isn't that 40 times are useless, it's that they provide very little additional information about a player. Champ Bailey was going to be a high draft pick no matter what he did at the combine, and everyone already knew that Trindon Holliday was fast but probably too small to succeed in the NFL.
Likewise, someone with a 3.9 from MIT or a bunch of good open source work who's coming for an in-person interview is already qualified, and the whiteboard doesn't tell you anything new. I'd guess Google sticks with them for the same reasons teams tout 40 times - it's good marketing both internally (making decisions seem less arbitrary) and externally (look how tough our interviews are is a more socially acceptable way of saying look how smart we are), and it allows people to deflect blame if a hire doesn't work out. Judging by the number of posts about Google interviews I see here and elsewhere, the marketing is certainly successful.
Also, I think your view of the interview pool is somewhat skewed. Most of the candidates I see do not have 3.9s from MIT (BTW, I believe MIT has a 5-point GPA, so it really would be 4.9), and a lot didn't go to Ivy-League universities.
I'm kinda curious what it'd look like if you took the 2002 version of Google and used it on today's Internet. My guess is it would feel incredibly dated and virtually useless because of spam. We have a couple archived UX studies that were done with the old (pre-2010) UI; I remember that when we launched everybody said "Eww, I hate the new UI. Why change a good thing, Google?" and now when they look at the old UI they're like "Omigod, I can't believe I ever managed to look at that. It's like something straight out of 1998."
That's a pretty fantastic hit rate, especially given how rare it is for any draft pick to work out. Similarly, if Google tries a bunch of projects of which only 10% are expected to work, but 20% of them end up working, then they still did a great job even though 80% of their projects are failures.