Angel List has the largest data set of startups, investor profiles and activity on the planet. That makes Nivi and Naval some of the most interesting players in Silicon Valley. I suspect we'll be seeing a very interesting investment thesis coming from them in the next couple of years.
I've talked with the Kauffman Foundation, and one of the reasons Kaufman funded them was to have access to that data
It does not really matter if you are using hard data like number of previous companies, or soft data like impressions based on personality. In fact, these days I would say success depends more on the VC(s) involved than the founders. Mark Cuban was at least half-right when he compared tech investments to a Ponzi scheme (http://blogs.wsj.com/venturecapital/2011/08/15/mark-cuban-th...).
This is, obviously, a generalization -- there are a few companies that stand on their own feet, and have even done so without outside investment. But in the majority of cases it seems like startups are acquired because some kind of "inside" deal is going on. When you think about it, most companies rarely benefit from acquisitions and mergers - instead they slowly die. To list a few: AOL, Bebo, Myspace, Flip, Map Quest, Alta Vista, Netscape, Broadcast.com, Excite, Lycos, Ask Jeeves, Sun Microsystems...
What startups are still going strong post-acquisition? I can think of Youtube off the top of my head...what else? If the parent companies are (generally) not benefiting from acquisitions, why are they happening?
The interesting thing here is how alike the two scenarios are - despite how prevelent statistics are in baseball, they're still not the be all and end all.
There's a quite story from not too long ago involving the LA Dodgers - for the life of me I can't remember the names or the time period, but it will hopefully come to me when I'm more awake (or someone here might refresh my memory). Every day, an expert in sabermetrics (aka baseball stats) would pprepare a huge load of paperwork for the Dodgers manager ahead of that day's game, and every day the manager would say thanks, wait until he had left the room, then chuck it all in the trash.
Tech investors aren't the only peope who value gut feelings over statistics.
Re: predicting it based on qualities of the founders: pg has said determination is very important - based on data from startups, it turned out much more important than anticipated. Therefore, one would expect an objective measure of determination to have (some) predictive power. Maybe not as much as the present YC process; but it would be hard data, and a new perspective which might be revealing. Certainly, scientific at least!
It sounds like a sloppy thing to measure; but Martin Seligman (learned helplessness/learned optimism) has quizzes to measure optimism/pessimism, which have experimentally demonstrated predictive power (and actually have been used to supplement job interviewing, with measurable success). So measuring determination might be possible. There's even checks to prevent/detect cheating. Of course, extra smart candidates with million dollar (VC) motivation may quickly subvert any a test. Still, it would be interesting, intellectually, to see if it did work.
Might also be interesting to ask one of Seligman's students to assess YC candidates for optimism (his definition means that you bounce back from failure with energy - e.g. a pivot). It seems plausible to me that that would also be predictive of startup success (and also predictive of determination - the ability to keep going).
It's funny, I would assume pg - as in "a plan for spam" pg - would be supremely interested in running some kind of Bayesian predictive model over YC candidates. Maybe the subtext of that quote was that he tried it and it didn't give any useful data.
An interesting conjecture is that numerically predicting startup success (defined as a high standard deviation of return on investment) might actually be impossible because any venture risky enough to get those kinds of results would fall outside the acceptable error bars of the predictive model. The equivalent in spam filtering would be if you wanted a system to show you only messages that were 99% likely to be spam, but still not spam.
I'm not sure if that actually makes any sense, someone feel free to jump in and tear it to shreds.
This data driven approach doesn't even work well for other sports,e.g. soccer. I guess the reason it works so well for baseball is that it's made up of a large number of well defined tiny one to one faceoffs.