
Ask HN: Data questions for YC? - snowmaker
Hi Hacker News,<p>I&#x27;m Jared, I work at Y Combinator. We&#x27;re going to be talking and writing about how we use data at YC, and I&#x27;d like to get the community&#x27;s feedback on what would be most interesting.  For example, some topics we&#x27;re considering are:<p>- What data you should have before applying to YC<p>- How YC uses data to help evaluate applications<p>- What mistakes we&#x27;ve seen early stage startups make with data<p>We&#x27;ll be hosting a talk on this and then writing a blog post to follow.  What would you like to hear about?
======
baldajan
Sam Altman recently tweeted that YC now has an AI to help partners review
applications [1]. I'm curious what that AI does.

Does it scrape data from the application to display it on the side (like
downloads, revenue, MAU, etc). Does it compare an application with a similar
YC company (that either failed or succeeded) such that a partner can make a
correlation.

I'm just shooting in the dark, and have no clue if it actually has anything to
do with data.

[1]
[https://twitter.com/sama/status/715617188841795585](https://twitter.com/sama/status/715617188841795585)

~~~
Mahn
Wouldn't it be hilarious if their AI turned out to be a script that simply
does a regexp match on the application of the lines of "/the (next)?
(google|facebook|instagram|uber) (of \d+)?" to automatically reject it :)

~~~
tedmiston
This is why I found it peculiar they chose to describe most F2 cos that way:

> Awesound - AdWords for podcasts

> Bulletin - Airbnb for Retail

> Müvr Labs - Fitbit for your knees

> Palaround - Tinder for private networks

> Sage - Uber for eldercare workers

> ...

[https://blog.ycombinator.com/first-fellowship-virtual-
demo-d...](https://blog.ycombinator.com/first-fellowship-virtual-demo-day)

------
xavierwjc
1\. What are some data points that have proved to be "most" useful in
evaluating the startups? From YC's experience. Hopefully rest of the startup
community can contribute and add to this data set.

2\. What are some data points YC wish it has in evaluating startups? This
would be a very interesting discussion topic. It can be something that the
startup community work on together as we move forward... to make the
industry/process more "scientific".

~~~
tedmiston
The biggest I've heard consistently from founders is 10% weekly growth (in
whatever metric is important to you, like MAU, etc.) to be considered "the
golden standard".

~~~
xavierwjc
It would be great to get data from other players in the startup community to
test this "10% weekly growth". How correlated is this 10% growth rate to the
success of a startup? What about 7%? 8%? What about other data points like
working on startup 24/7 compare to those who take breaks? How much break? etc.

The more data point, the less we have to guess. That's what YC is trying to
achieve right? Bring sanity to chaos of choosing and incubating startups. This
idea of sharing data is a great start!

------
ALee
1) Testing PG's Top Ten Things That Kill Startups - co-founder problems,
fundraising, etc.

2) Is this biggest mistake usually that people don't fire fast?

3) We know the unicorn growth rate - but what is the median/average growth
rate for a semi-successful YC company?

4) What makes a company part of the walking dead?

5) What are the primary channels a lot of YC companies use for user
acquisition? How has that changed?

6) What are the primary reasons for rejection for YC companies? Does it differ
by partner or uniformly the same?

7) We know that airlines are hard. It'd be great to see what specific
categories of startups that YC has funded that are just more difficult to grow
in or are easier to get traction in - e.g. consumer v. enterprise, analytics,
gaming/media?

8) In fact, I'd love to see a review of PG's claims in general either backed
by data or refuted by data. I suspect they're backed by data, but it'll likely
give more insight into the key area that people find interesting. For example,
are more people now finally choosing NOT to go to grad school?

------
dlo
Do you collect data on start-ups that:

\- Didn't get into YC, e.g. funding, exits \- Didn't get into YC the first
time but got into YC on a subsequent attempt

I'd love to see this data.

~~~
snowmaker
That's a good one. Yes, we do collect data on both of those.

~~~
dlo
Also: which start-ups didn't apply to YC at all and how well they did.

------
pgroves
If there is already a prototype built:

1) What kind of usage numbers "look good"?

2) What kind of ROI on adwords campaigns (or whatever) "look good"?

Or maybe it would be better to frame these as null-hypotheses: E.g. what makes
an early adwords campaign look like the whole idea should be scrapped?

~~~
AznHisoka
I doubt the failure of an Adwords campaign would factor much. It's so hot and
miss with many industries

------
emmasz
It would be helpful to know more data on the companies that weren’t accepted
or failed after:

\- the amount of time founders spent on their project before applying

\- correlation between age and university studies, previous work experience of
founders at big companies

\- what kind of tasks did they outsource

------
mrdrozdov
It would be great to know how many companies have already incorporated before
applying to YC, and what their structure looks like (C-Corp vs LLC, split of
ownership, etc.). Myself and different friends have started our own companies
recently and we're going through this sort of thing for the first time, and we
wish we had better information. When YC visited NYU in the Fall I asked a
similar question and was told that registering as a C-Corp through clerky was
the way to go, but it was already too late for us (we're an LLC). Now I'm
curious what it takes to go from LLC to C-Corp, or if it's even a big deal
this stage in the process.

~~~
snowmaker
I'll just answer this one. The answer is, if you haven't incorporated yet, we
recommend you incorporate as a Delaware C-corporation. If you've already
incorporated as something else, then don't worry about it. If you get accepted
to YC, we'll help you change your corporate structure and it's not that big a
deal.

------
lucas3677
Perhaps more digestible as a series of blog posts, it would be interesting to
read the stories of several startups as to how their collection and use of
data changed over time (from initial prototype through current time).

------
sethbannon
Hey Jared, there are now startups that offer to detect customers that might
churn before they do based on things like product usage data. I'd be curious
to hear if YC has thought about an early detection system like this for
startups it's funded that might be in need of some extra attention, based on
the data it gets from founders. Could also be interesting to see if YC could
predict a startup that's about to breakout in a systematic way.

------
Mz
To me it is interesting that YC puts so much focus on the founders. While you
need an idea to start with, the idea can change or the company can pivot. Yet,
when there has been discussion of the social angle of doing business at YC, it
tends to not be backed up by data and, as such, it often seems to be raked
over the coals in HN discussions. So, I would be interested in any data you
have that supports some of the more hand wavy sounding social things that YC
does.

I am aware that studies are often a case of GIGO and this is an inherently
hard space to quantify. But I have taken classes in this area and read works
that were research based, such as _Getting to Yes_ , and I am always
interested in any solid data that relates to what many people feel is a soft
science at best, thus not worth taking seriously.

Thanks.

------
RickS
this might fall under the "mistakes" category - What data is bullshit / what
metrics are either useless or red flags?

~~~
snowmaker
It does, but that's a good one since it comes up again and again.

------
soneca
When is the ideal time to have a full time data scientist on a startup team?

A consultant or a developer acting as a part-time data scientist works?

------
tedmiston
How the percentage of startups that you accept varies by industry.

For example, there was a sneaker startup in the most recent batch, which was
the first one admitted to YC (AFAIK), but probably not the first to apply.

I'd be curious to see if you had data over the past few batches to demonstrate
trends like this space growing.

------
bigohms
We all know the % of success stories from accepted companies. How does YC know
it made scientifically sound decisions to reject companies that then went onto
succeed via other means? How can YC know the quality of decisions are getting
better without this data?

------
seeing
1) The number of attempts from startups to go after a fairly well-defined idea
(more specific than general). A high number of attempts indicates a problem is
unsolved. A low number indicates unexplored terrain.

Ideally I'd also like to know a bit more, like the most common angles
attempted (in general terms, not specific), although I can imagine how this
might be disclosing people's ideas. Example: x% of attempts at idea A going
after the high end failed, but y% of going after the low end survived and
morphed or expanded to something else that has potential.

2) Which areas have the most statistically significant change in number of
applications since the previous funding cycle. e.g. Developer Tools vs
Developing Countries. Because it could point to upward and downward trends.

Ideally I'd also like to know about ideas in general too.

3) The top 3 questions on the YC application that startups answer most poorly.

4) The top 3 questions startups that are accepted and/or do well answer the
best.

5) The biggest mistakes you see in applications and how they can be avoided
(if they can.) Or the most common things missing.

6) The most convincing arguments you see in applications, and how they're made
(e.g. by providing numbers and percentages, rather than saying "we're growing
a lot".) Example: "if X is true, then Y" is more convincing than "I know Y
will work", because you show that at least you're aware of X and are digging
into it.

7) What most applications think will never work for a fairly well-defined
idea, because it's such a good source of new ideas. But only if it's safe to
assume the startup that applied won't pursue it, as to not run into
confidentiality issues.

This can be information in the aggregate. e.g. show the top 2 things most
applications think will never work, which could account for 80% of the
applications, but don't show the outlier 20% which may be a good idea. In
other words, provide corrective information, but don't provide what might be
the answer.

8) If there's a positive or negative correlation between confidence in the
idea and being selected or doing well. Ideally, I'd like to know if I'm
writing my application in a way that suggests I have no idea what I'm talking
about. But only if you believe providing this will help the startups and the
application process.

9) Which parts of the application process that founders often think look like
negatives turn out to actually be positives. So startups don't get
discouraged.

10) To the extent each question on the application is scored and the score can
be useful if it's communicated back to the startup, provide the score back to
the startup.

Ideally, do this fast enough so the startup can rethink its answers. e.g. in
cases where the startup had better info to supply but didn't, or if you
believe it would help.

~~~
snowmaker
Those are great suggestions, thank you!

(1) and (10) are tricky because of confidentiality, the rest we can mostly do.

------
S4M
I'd love to see the distributions of the valuations/revenues/number of
employees/etc. for YC all companies and per batch.

------
tmaly
I would like to hear more about data as it relates to customer development.

How was data used to select the potential market?

How was data used to craft the customer interview questions?

How was results data from customer interviews used to make a decision to pivot
or move forward?

------
koolba
Where/how do you house and query YC data?

Formats, file types, databases, tools etc...

