
How Mattermark Teamed Up With Bloomberg Beta to Predict Who Will Start Companies - dmor
http://mattermark.com/how-mattermark-teamed-up-with-bloomberg-beta-to-predict-who-will-start-companies-next/
======
onion2k
Take two random people, Bob and Sarah. The chance that each of them is going
to start a company is equal.

 _Leave Bob to get on with his life. Maybe he 'll start a company, maybe he
won't._

 _Invite Sarah to a private event at a well-funded VC company and call it
"Future Founders"._

If either of them definitely wasn't going to start a company then they still
won't. If either definitely was then they still will. Nothing has changed.

But if either of them had considered starting, and was wondering whether to or
not, being labelled a "Future Founder" and being granted access to a group of
349 other people who rank highly on an entrepreneurial scale, plus direct
contact with high profile VCs seems likely to influence their decision. That
nudge could easily account for the difference between founders and non-
founders.

Did MatterMark factor that in to their findings? Seems a bit self-fulfilling
as prophetic judgements go to me.

~~~
roybahat
Hey... I run Bloomberg Beta. I think that creating a self-fulfilling prophecy
is absolutely a dynamic here. Our goal wasn't to succeed at predicting so much
as it is to get to know great people before they start companies. If that
action induces them to start companies, and they have a great experience doing
it, but our prediction's accuracy is compromised -- that's fine. We're using
data as a tool to make things better, not for its predictive value in and of
itself.

------
7Figures2Commas
The title here seems quite exaggerated, as does the claim that "Mattermark
Founder Prediction Is 25X Better Than Chance."

Predicting who in a group of 1.5 million technology professionals is likely to
start a company presents an unsupervised learning problem. Short of contacting
all 1.5 million people and asking them, there is no way to confirm whether the
predictions the system made are correct, so you cannot make claims about
efficacy.

There are approaches used to deal with unsupervised learning problems, but
there are no details in this post even indicating that the folks involved
recognized they were dealing with an unsupervised learning problem in the
first place. Instead, we just have claims like "While we believe the future
founders group has a 17% chance — 25x higher..." for which no further
information is provided.

Perhaps more interesting than the bold claims sans important technical details
is the notion that an early-stage fund would look to court potential founders
before they even made the decision to become founders. While it's true that
many seed stage investors adhere to the mantra of "we invest in people," this
is as good an example as I've seen of the fact that there is currently way too
much capital chasing too few opportunities.

------
squigs25
Wow! This is really cool.

In statistics and machine learning this would be considered an unbalanced data
set: predicitng who will start a company when the vast majority of people will
not is a very difficult task. It's similar to predicting who will be a
terrorist (another really difficult problem).

I think the threshold they are using is way off however. Even if someone has
only a 5% chance of becoming a founder (or less), that's pretty significant. I
understand that would probably increase the population by many orders of
magnitude, but only capturing 17% of 350 means ~60 startups will be found as a
result of this program. Given that the large majority of those are likely to
fail, the numbers could be better.

Some really interesting predictors might be what meetup groups does the
individual belong to, what is their current job title, what is skills and
connections do they have on linkedin and facebook, how many founders are they
"connected" to, and who are they following on twitter.

It's also worth mentioning that this is probably biased, because the data set
of individuals includes data points for founders only after they became
founders. You would ideally want the data from before they became a founder.
Perhaps over time this model would get better, as non-founder individuals
become founders.

~~~
kevin_morrill
CTO of Mattermark here. It is a really interesting problem, because as you say
even if you boost the odds 25x they're still really low. We trained the data
set on venture backed founders (e.g. Series A or beyond), which is a bit
higher bar than just any founder. The hope being that once you reach Series A
you're less likely to fail than just having seed funding. At some point we
want to go back and look at what differentiates founders that reach seed vs.
venture backing.

~~~
JasonCEC
Can you talk a bit about your feature selection or models?

I run a statistical quality control company using machine learning, and
picking up on flaws with tiny probabilities (one batch in every twenty or
thirty million) might benefit from similar techniques!

------
ASquare
Related:
[https://news.ycombinator.com/item?id=7465150](https://news.ycombinator.com/item?id=7465150)

------
seanccox
Hmmm... My email must've ended up in the spam box...

------
dotBen
I'm curious where the date came from - LinkedIn seems obvious but I'm not
aware they make that kind of corpus available for purchase.

