The forty yard dash provides one of the most important signals for football recruiting. Essentially, it's a measurement of the time from when the athlete takes his hands off of the start line to when he crosses the finish line. Most commonly, it is run in cleats on grass with no pads, and certainly no players trying to prevent you from reaching the goal line. In addition to that, the best technique for running a forty, literally falling forward to build momentum before lifting your hand off the ground, is actually illegal for offensive players (and pretty pointless for defensive ones).
So why is this completely unrelated activity so important for measuring football players? Because it takes ten seconds to measure, and a guy who can run a 4.2 forty can almost always play pretty damn well.
Tech interviews today are a low cost way of getting a rough idea of ability to code. Judging from Google's output, it's pretty damn effective for them.
Except that the 40 really isn't a very good measure - if it were, the Oakland Raiders would be winning the Superbowl every year instead of missing the playoffs. Of the 15 fastest players in the last 12 years (http://en.wikipedia.org/wiki/40-yard_dash), only 2 are stars, and most are bench guys or out of the league. The 40 has its uses, but it needs to be taken as one fairly small factor in evaluating a player, and even then it's only really relevant for certain positions.
Similarly, coding algorithms on whiteboards doesn't tell you much other than how good the candidate is at coding algorithms on whiteboards. Given that the vast majority of their revenue comes from either search (which was built over a decade ago) or companies they've acquired, and the numerous well-hyped failures (Buzz, Wave, +, etc.), I don't think it's actually working all that well for Google either.
I'm not saying it's a perfect indicator, I'm saying it's a cheap and easy indicator. Certainly there are outliers in every direction, but a running back who runs a 5.0 forty just isn't going to be very good (unless he's like 400 pounds).
So is it more likely that Google is filled with idiots that can't see the err of their ways, or that, across tens of thousands of individual hires, this is the most cost effective and produces the best results (minimizing false positives) on average.
College-admissions style interviewing just doesn't make sense for a company like Google.
Actually, I'd say that whiteboard interviews are for Google are an arbitrary way of selecting from already qualified candidates, just as 40 times might aid a team in choosing between 2 otherwise similar players.
The players invited to the combine are the ones teams are considering drafting anyway; all the 40 times do is move players up or down the list by generally small amounts. The point isn't that 40 times are useless, it's that they provide very little additional information about a player. Champ Bailey was going to be a high draft pick no matter what he did at the combine, and everyone already knew that Trindon Holliday was fast but probably too small to succeed in the NFL.
Likewise, someone with a 3.9 from MIT or a bunch of good open source work who's coming for an in-person interview is already qualified, and the whiteboard doesn't tell you anything new. I'd guess Google sticks with them for the same reasons teams tout 40 times - it's good marketing both internally (making decisions seem less arbitrary) and externally (look how tough our interviews are is a more socially acceptable way of saying look how smart we are), and it allows people to deflect blame if a hire doesn't work out. Judging by the number of posts about Google interviews I see here and elsewhere, the marketing is certainly successful.
The vast majority of interviews do not result in a hire. (Some of my coworkers have reported giving 20 in a row without a single offer.)
Also, I think your view of the interview pool is somewhat skewed. Most of the candidates I see do not have 3.9s from MIT (BTW, I believe MIT has a 5-point GPA, so it really would be 4.9), and a lot didn't go to Ivy-League universities.
Google has separate tracks for FE SWEs as well. It's just that you also need a basic proficiency with C++/Java and algorithms to be a Google FE SWE, while I'm not sure that Facebook requires that? (Perhaps because Facebook's frontends are written in PHP instead of C++/Java.)
FWIW, Search is continually being rewritten, and the bulk of the current codebase was written in the past 2 years or so.
I'm kinda curious what it'd look like if you took the 2002 version of Google and used it on today's Internet. My guess is it would feel incredibly dated and virtually useless because of spam. We have a couple archived UX studies that were done with the old (pre-2010) UI; I remember that when we launched everybody said "Eww, I hate the new UI. Why change a good thing, Google?" and now when they look at the old UI they're like "Omigod, I can't believe I ever managed to look at that. It's like something straight out of 1998."
Looking at that list... I wouldn't count the three from 2010/2011 because it's too early to tell, so that leaves 12 players. Of those 12, I would say there are 2 superstars (Bailey, Johnson), 2 great players (Rodgers-Cromartie, Routt), a #1 wideout and potential emerging star (Heyward-Bey), and 3 players with promising early careers that were derailed by injury/death (Mathis, Washington, Williams).
That's a pretty fantastic hit rate, especially given how rare it is for any draft pick to work out. Similarly, if Google tries a bunch of projects of which only 10% are expected to work, but 20% of them end up working, then they still did a great job even though 80% of their projects are failures.
This thinking is exactly what Moneyball was all about in Baseball. Football looking at the wrong things.
For instance, 225# reps, vertical leap, reach, etc (all NFL combine measurements). Great, you are measuring and ranking, but are you doing anything meaningful? Are you actually looking at the right things? Probably not.
If they were, Wes Welker, Tom Brady, Victor Cruz (all in the past superbowl) would have been first round picks, probably top 10. Not undrafted, or late round picks. And JeMarcus Russell and Ryan Leaf wouldn't have been drafted #1 and #2 overall (all the physical tools but not the mental - flame outs).
The point is that we as people tend to think we know what to measure and track, but we likely don't. Frankly, we are probably making it up on the fly and we convince ourselves and others that these are good measures, and until someone figures out the next best thing, they actually are. But, as I said, they are probably not the best, or maybe even good, in the infinite wisdom sense.
But, think about it this way. Both football and programming have ways we can actually see if someone can do what they say that can do: literally, look at the film. Look at game day film on a player. Look at github or other places for programmers. And, just like with football players, give them REAL scenarios that test their ability to think through a problem, in real time. Do the same with a programmer. Put these two together and you get rid of the people who can't think (Russell and Leaf) and you get rid of the "physical specimens" that can't play (Most any Oakland WR drafted under Davis).
I personally think this is a much better way that weeds out the most people. Refine your questions and technique and you can spot the people who can and can't perform pretty quickly. If you are unsure, give them a simulated game (programming problem at home) to see what they can do.