Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Data questions for YC?
63 points by snowmaker on Apr 4, 2016 | hide | past | web | favorite | 29 comments
Hi Hacker News,

I'm Jared, I work at Y Combinator. We're going to be talking and writing about how we use data at YC, and I'd like to get the community's feedback on what would be most interesting. For example, some topics we're considering are:

- What data you should have before applying to YC

- How YC uses data to help evaluate applications

- What mistakes we've seen early stage startups make with data

We'll be hosting a talk on this and then writing a blog post to follow. What would you like to hear about?

Sam Altman recently tweeted that YC now has an AI to help partners review applications [1]. I'm curious what that AI does.

Does it scrape data from the application to display it on the side (like downloads, revenue, MAU, etc). Does it compare an application with a similar YC company (that either failed or succeeded) such that a partner can make a correlation.

I'm just shooting in the dark, and have no clue if it actually has anything to do with data.

[1] https://twitter.com/sama/status/715617188841795585

Wouldn't it be hilarious if their AI turned out to be a script that simply does a regexp match on the application of the lines of "/the (next)? (google|facebook|instagram|uber) (of \d+)?" to automatically reject it :)

This is why I found it peculiar they chose to describe most F2 cos that way:

> Awesound - AdWords for podcasts

> Bulletin - Airbnb for Retail

> Müvr Labs - Fitbit for your knees

> Palaround - Tinder for private networks

> Sage - Uber for eldercare workers

> ...


That's a great idea to talk about this. It was a fun project!

1. What are some data points that have proved to be "most" useful in evaluating the startups? From YC's experience. Hopefully rest of the startup community can contribute and add to this data set.

2. What are some data points YC wish it has in evaluating startups? This would be a very interesting discussion topic. It can be something that the startup community work on together as we move forward... to make the industry/process more "scientific".

The biggest I've heard consistently from founders is 10% weekly growth (in whatever metric is important to you, like MAU, etc.) to be considered "the golden standard".

It would be great to get data from other players in the startup community to test this "10% weekly growth". How correlated is this 10% growth rate to the success of a startup? What about 7%? 8%? What about other data points like working on startup 24/7 compare to those who take breaks? How much break? etc.

The more data point, the less we have to guess. That's what YC is trying to achieve right? Bring sanity to chaos of choosing and incubating startups. This idea of sharing data is a great start!

1) Testing PG's Top Ten Things That Kill Startups - co-founder problems, fundraising, etc.

2) Is this biggest mistake usually that people don't fire fast?

3) We know the unicorn growth rate - but what is the median/average growth rate for a semi-successful YC company?

4) What makes a company part of the walking dead?

5) What are the primary channels a lot of YC companies use for user acquisition? How has that changed?

6) What are the primary reasons for rejection for YC companies? Does it differ by partner or uniformly the same?

7) We know that airlines are hard. It'd be great to see what specific categories of startups that YC has funded that are just more difficult to grow in or are easier to get traction in - e.g. consumer v. enterprise, analytics, gaming/media?

8) In fact, I'd love to see a review of PG's claims in general either backed by data or refuted by data. I suspect they're backed by data, but it'll likely give more insight into the key area that people find interesting. For example, are more people now finally choosing NOT to go to grad school?

Do you collect data on start-ups that:

- Didn't get into YC, e.g. funding, exits - Didn't get into YC the first time but got into YC on a subsequent attempt

I'd love to see this data.

That's a good one. Yes, we do collect data on both of those.

Also: which start-ups didn't apply to YC at all and how well they did.

If there is already a prototype built:

1) What kind of usage numbers "look good"?

2) What kind of ROI on adwords campaigns (or whatever) "look good"?

Or maybe it would be better to frame these as null-hypotheses: E.g. what makes an early adwords campaign look like the whole idea should be scrapped?

I doubt the failure of an Adwords campaign would factor much. It's so hot and miss with many industries

It would be helpful to know more data on the companies that weren’t accepted or failed after:

- the amount of time founders spent on their project before applying

- correlation between age and university studies, previous work experience of founders at big companies

- what kind of tasks did they outsource

It would be great to know how many companies have already incorporated before applying to YC, and what their structure looks like (C-Corp vs LLC, split of ownership, etc.). Myself and different friends have started our own companies recently and we're going through this sort of thing for the first time, and we wish we had better information. When YC visited NYU in the Fall I asked a similar question and was told that registering as a C-Corp through clerky was the way to go, but it was already too late for us (we're an LLC). Now I'm curious what it takes to go from LLC to C-Corp, or if it's even a big deal this stage in the process.

I'll just answer this one. The answer is, if you haven't incorporated yet, we recommend you incorporate as a Delaware C-corporation. If you've already incorporated as something else, then don't worry about it. If you get accepted to YC, we'll help you change your corporate structure and it's not that big a deal.

Perhaps more digestible as a series of blog posts, it would be interesting to read the stories of several startups as to how their collection and use of data changed over time (from initial prototype through current time).

Hey Jared, there are now startups that offer to detect customers that might churn before they do based on things like product usage data. I'd be curious to hear if YC has thought about an early detection system like this for startups it's funded that might be in need of some extra attention, based on the data it gets from founders. Could also be interesting to see if YC could predict a startup that's about to breakout in a systematic way.

To me it is interesting that YC puts so much focus on the founders. While you need an idea to start with, the idea can change or the company can pivot. Yet, when there has been discussion of the social angle of doing business at YC, it tends to not be backed up by data and, as such, it often seems to be raked over the coals in HN discussions. So, I would be interested in any data you have that supports some of the more hand wavy sounding social things that YC does.

I am aware that studies are often a case of GIGO and this is an inherently hard space to quantify. But I have taken classes in this area and read works that were research based, such as Getting to Yes, and I am always interested in any solid data that relates to what many people feel is a soft science at best, thus not worth taking seriously.


this might fall under the "mistakes" category - What data is bullshit / what metrics are either useless or red flags?

It does, but that's a good one since it comes up again and again.

When is the ideal time to have a full time data scientist on a startup team?

A consultant or a developer acting as a part-time data scientist works?

How the percentage of startups that you accept varies by industry.

For example, there was a sneaker startup in the most recent batch, which was the first one admitted to YC (AFAIK), but probably not the first to apply.

I'd be curious to see if you had data over the past few batches to demonstrate trends like this space growing.

We all know the % of success stories from accepted companies. How does YC know it made scientifically sound decisions to reject companies that then went onto succeed via other means? How can YC know the quality of decisions are getting better without this data?

1) The number of attempts from startups to go after a fairly well-defined idea (more specific than general). A high number of attempts indicates a problem is unsolved. A low number indicates unexplored terrain.

Ideally I'd also like to know a bit more, like the most common angles attempted (in general terms, not specific), although I can imagine how this might be disclosing people's ideas. Example: x% of attempts at idea A going after the high end failed, but y% of going after the low end survived and morphed or expanded to something else that has potential.

2) Which areas have the most statistically significant change in number of applications since the previous funding cycle. e.g. Developer Tools vs Developing Countries. Because it could point to upward and downward trends.

Ideally I'd also like to know about ideas in general too.

3) The top 3 questions on the YC application that startups answer most poorly.

4) The top 3 questions startups that are accepted and/or do well answer the best.

5) The biggest mistakes you see in applications and how they can be avoided (if they can.) Or the most common things missing.

6) The most convincing arguments you see in applications, and how they're made (e.g. by providing numbers and percentages, rather than saying "we're growing a lot".) Example: "if X is true, then Y" is more convincing than "I know Y will work", because you show that at least you're aware of X and are digging into it.

7) What most applications think will never work for a fairly well-defined idea, because it's such a good source of new ideas. But only if it's safe to assume the startup that applied won't pursue it, as to not run into confidentiality issues.

This can be information in the aggregate. e.g. show the top 2 things most applications think will never work, which could account for 80% of the applications, but don't show the outlier 20% which may be a good idea. In other words, provide corrective information, but don't provide what might be the answer.

8) If there's a positive or negative correlation between confidence in the idea and being selected or doing well. Ideally, I'd like to know if I'm writing my application in a way that suggests I have no idea what I'm talking about. But only if you believe providing this will help the startups and the application process.

9) Which parts of the application process that founders often think look like negatives turn out to actually be positives. So startups don't get discouraged.

10) To the extent each question on the application is scored and the score can be useful if it's communicated back to the startup, provide the score back to the startup.

Ideally, do this fast enough so the startup can rethink its answers. e.g. in cases where the startup had better info to supply but didn't, or if you believe it would help.

Those are great suggestions, thank you!

(1) and (10) are tricky because of confidentiality, the rest we can mostly do.

I'd love to see the distributions of the valuations/revenues/number of employees/etc. for YC all companies and per batch.

I would like to hear more about data as it relates to customer development.

How was data used to select the potential market?

How was data used to craft the customer interview questions?

How was results data from customer interviews used to make a decision to pivot or move forward?

Where/how do you house and query YC data?

Formats, file types, databases, tools etc...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact