

A Statistical Portrait of a Y Combinator Batch - glaugh
http://blog.statwing.com/a-statistical-portrait-of-a-y-combinator-batch/

======
polyfractal
This is a really excellent example of using your own product to generate
interesting content as a way to drive traffic back to your product.

Great work Statwing. Can't wait until I have some data that needs analyzing so
I can use your service.

------
martythemaniak
"Higher values for Number of Employees/Contractors (FTEs) are weakly
associated with higher values for Average Age of Company's Founders (Rounded)"

I had a suspicion that that might be true, but I wonder why that is? Perhaps
older founders tackle problems that need more domain expertise and more
people? Or perhaps they can rely on savings and have been able to bootstrap a
little better than high-school/college grads?

Anyway, good job on StatWing, I love playing around with numbers and graphs.
Perhaps some public datasets will help people get more familiar with the app
and serve as demo.

~~~
jeremyjh
It would be interesting to know if the average hours worked per week would
also be correlated with # of employees and age. My hypothesis is that older
founders will not be as likely to get involved in a startup that would have
them doing everything themselves and working the kind of hours thus entailed.
They will tend to go for those startups where for whatever reason (early
revenue streams ) they can staff up sooner.

~~~
benreyes
Regarding older founders. I'm currently working with one of the founders that
is a statistical anomaly on the upper rage of the graph. Reality as I perceive
it is far opposite from the case, perhaps this is why they are a statistical
anomaly within YC.

------
austinlyons
Thanks for posting. It would be fun to see the ages of accepted YC applicants
compared with the rejected applicants. I'm not sure how easy it would be to
get the data of the rejected applicants though. Maybe they would self-report
their information if you posted something here on HN.

~~~
glaugh
Really good idea. We should definitely do that.

------
incision
I wonder, is that that spike at 39 thinking "Shit, I'm about to turn 40. It's
now or never!"

~~~
rhplus
The 'spike' is the difference between 1 datapoint and 2 datapoints.

~~~
tgrass
difference between 1 percentage of the datapoints.

~~~
brlewis
Yes. It looks like slightly less than 2% vs. slightly less than 1%. The
statwing dataset for companies shows 80 companies with an average of 2.38
founders, meaning there are 190 founders in the batch. So I think it's 3
39-year-old-founders vs 1 each at nearby ages.

------
wtvanhest
It would be interesting to put this against the ages of Gen Y distribution. I
believe this grouping would actually look relatively old to the peak in
population if we assumed only Gen Y would apply.

I am basing this on my memory that 1990 was the peak year for those born in
Gen Y. (I cannot find the data set to back it up, but I bet someone else knows
where to get it).

\+ a few outside of Gen Y.

------
breckenedge
Good stuff. I imported and played with a regulatory dataset.

The results mostly confirm industry suspicions that enforcement differs the
most based on what region an operator is in (poor regulatory performance
operators are mostly located in the same regulatory region).

What was neat was how little individual manufacturer's designs mattered. But
over time, it was either hugely advantageous or hugely disadvantageous to
simultaneously operate multiple types of designs. Example: in 2007 it was
about 5% better to simultaneously operate multiple designs, but in 2010, it
was about 17% worse.

Also confirmed that it was much, much better (from a penalty standpoint) to
find and self-disclose regulatory non-compliance rather than to let the
regulator find it.

Awesome work guys! Will there be an ability to play with the time dimension
soon?

~~~
glaugh
Awesome! That's really cool.

Time is tricky. That's among our most requested features, though. So we won't
get to it in the very very near future, but its definitely on the roadmap.

Thanks for the comments, really appreciate it!

------
alexshye
I'm interested in the 15% of YC startups with a single founder. I wonder if
that number is higher than average compared to other classes? And if so, how
much higher? I'm also curious if there is a correlation with the average age.

Anyone privy to this information and willing to share?

------
zackzackzack
So I rifled through the source of the webpage and found there is no way to
download the actual data from that page. All the calculations are being done
on the server side and the summary results are getting sent over via HTTP.

An interesting model for sure and one that will ultimately make for technical
sense but enterprise woes in the future. I'm not sure if businesses will want
to upload the data that would most benefit from the StatWing treatment. It
looks like they have realized that though. Maybe aiming to cut their teeth on
people who generate a lot of data and then take a stab at going enterprise via
partnerships with other companies that already have a strong presence in big
companies but are lacking in the analytics.

~~~
jimmytucson
Just out of curiosity, what made you think you would find the raw data on the
client?

~~~
zackzackzack
For most of the visualizations I've found on Hacker News, the data is usually
available directly to the client, either through an API or a static file
(.csv,.json,etc.). StatWing is already using d3 to display their work, so it
is possible they were also using crossfilter to do filtering as well.

So, really, past experience with this sort of thing and seeing that they were
using d3.

------
pdog
Any thoughts on why the ages of 26 and 27 appear to be the mode (especially
for "social" startups)?

~~~
Cyranix
Speaking as a 27-year-old, it seems fairly reasonable to me. This is the age
where you've had roughly 5 years of career experience in software or web
development (if you went to college), and in the adolescence of your career
you may have encountered a problem or a market that seems interesting and may
also have a desire to be your own boss and avoid the tedium/politics/etc. of
the places you've been so far (because, naturally, you won't make the same
poor decisions!).

Not sure that I can speak to the prevalence of social startups in this age
range, apart from the obvious "kids these days" take on it. Bear in mind,
though, that it's not purely a representation of what 26- and 27-year-old
founders are doing -- it's also reflective of YC's position.

------
ekianjo
Statistical data? I mean, the few graphs are interesting, but that's VERY
little data being displayed at all. I was expecting much more before clicking
this link. Tufte would be mad at the abuse of space for the ridiculously small
amount of data actually displayed.

------
redcircle
Statwing is doing it right: it is super easy to navigate to their home page
from the blog, by clicking on their prominent logo, which brings you directly
to their main page.

------
qq66
What exactly is a "social" company vs. a non-social one? Some companies are
clearly in one bucket or another but I'm wondering what kinds of companies are
near the boundary.

~~~
lejohnq
Sorry that's a little confusing, especially since most companies nowadays have
social components. For this dataset we categorized social based on whether or
not the social component is critical to their business.

------
jedberg
I love this product. It makes a hard concept easy, it looks pretty, and it's
fast.

I can't wait to see what this team does next!

~~~
achompas
_It makes a hard concept easy_

I love what Statwing is doing here, but they could be providing people with
enough information to be "dangerous."

The employee count vs. founder age analysis in another thread is a perfect
example. Posters are trying to explain why employee counts rise with founder
ages, when a glance at the plot suggests the effect results from two companies
with abnormally-high (~4 standard deviations from the mean) employee counts.

Statwing is definitely pretty and fast! I'm curious, however, to see how
they'll work to help people with diverse backgrounds interpret results.

------
chimi
Can you let us download the data and run our own analyses?

~~~
glaugh
Unfortunately not. While there's nothing particularly identifying about the
dataset, we collected this data with the understanding that we wouldn't do
that. Sorry!

~~~
zaroth
Why make it harder for users to do something which is already possible using
your own filter feature?

For example, data on one company:

# of Founders: 1 Founder Age: 43 Number of Months Worked: 20 Number of
Employees / Contractors (FTE): 2 Social? No Mobile? No

