It would be hard to disclose the raw data without also disclosing the secrets of individual startups, but we might publish some trendlines one day.
In a sense, I already do disclose such data in a general way, because whenever I write or say anything about startups, I wouldn't contradict trends I know about.
I'm sure you would find a way to anonymize the data if it would result in a compelling story for YC.
But as we all know so well the "startup outcomes" follow a power law. Dropbox and AirBnB make for a great YC PR, but are certainly nothing like a statistical mean outcome (aka "expected value").
I'd say it is a pretty reasonable thing to postulate that "doing things YC way" would result in a worse expected outcome for an individual founder vs. retaining control of a profitable business. If this statement is wrong please refute it with data.
The way things are right now, YC is selling the "possibility" of becoming an outlier, while downplaying what "typical results" usually look like. If you were in a consumer product market it could likely break FTC rules on truth in advertising:
http://www.ftc.gov/speeches/starek/nima96d4.shtm
EDIT / RESPONSE 1: This is not an "accusation". I am simply stating as fact that YC's model is focused on promoting its largest outlier successes (which form the bulk of YC portfolio value) while releasing no real data on the "mean outcome". Take it for what it is worth. As far as my statement about comparing "mean outcomes" it is obviously just a subjective judgement based on anecdotal evidence because there is no publicly released data from YC.
EDIT / RESPONSE 2: Let's be realistic. Any VC firm can release % of IPOs, M&A and failures as well as IRR figures and exit bands. There is no reason to include any company-specific proprietary data. The only reason for YC to not give such estimates is because it would highlight the fact that most startups are nothing like Dropbox.
No, actually he can't anonymize that data. High variation might as well be a signature for a short list (under 1000) of companies.
Also, his company has spent time and money to gain access to data that should help him pick emerging companies. I'm not sure why you would feel entitled to that. I'm sure they are working on some data project to help automate decisions on applicants, if not they will be eventually.
That data will give them an edge when making offers. When you are in a position where asymmetrical information is working to your advantage why would anyone give that up?
You just made another case against incubators / accelerators in general.
If they have to rely on "asymmetrical information" to get access to deal flow and make competitive offers and if releasing the data would hurt their attractiveness to prospective investees, why exactly are they a good deal for entrepreneurs (who can actually build a business)?
There is an old saying if you do not know who is the fool at the poker table, that's you.
> In a sense, I already do disclose such data in a general way, because whenever I write or say anything about startups, I wouldn't contradict trends I know about.
That's a bit circular, if people would question the things you write or say about startups, you'd tell them they are backed by empirical data, then when someone asks for this data you tell them its implied by the things you write or say about startups.
I can totally understand why most of that data is hard or impossible to disclose, that's obvious. But you can't cite secrets as evidence, all you can do is say "trust me, I know what I'm talking about". You can't have your cake and eat it too.
Obviously no one is questioning the existence of that data, nobody's suggesting you're cooking it up, they're just wondering if they'd spot the same patterns in it as the ones you perceived to be there.
To sum it up, whenever you write or say anything about startups, the only thing you disclose about such data in a general way, is the conclusions that you drew from it, what you perceived (not "know") to be a trend, which is probably useful and very valuable, but is something entirely different than the data itself.
The association is not same as causation. I think It would be more useful to derive or at least inference the causation of success/failure from your data.
For example: if someone can't convince other people to join, how could he be able to convince customers/investors later on. This implies causation. I would love to hear more causation like this for the single founder startup.
There are relationships and associations everywhere. If people look at the data between skin color and success, they will find relationship. but it doesn't mean people succeed because of their skin color. The causation is more useful information than the mere association.
In a sense, I already do disclose such data in a general way, because whenever I write or say anything about startups, I wouldn't contradict trends I know about.