

Show HN: Startup data 2000-2012 from CrunchBase - miquelcamps
http://betabeers.com/uploads/estudios/crunchbase-startup-data/?

======
jpdoctor
Crunchbase is pretty incomplete before 2004 it seems.

Several of the comm companies I checked were missing.

Edit: Upon further testing, it seems that many of the communications wipeouts
($40+M) from that era are missing. It's almost as if the people who funded
them don't want the magnitude of the mistakes in the record.

Here's an example: Photonex, raised $170M [1] over the course of their
lifetime and not a trace on crunchbase of who the culprits were.

Another quick example: Solinet [2] raised a pile of dough (they later renamed
themselves Ceyba).

[1] [http://www.lightreading.com/ip-convergence/photonex-
scores-h...](http://www.lightreading.com/ip-convergence/photonex-scores-
huge-3rd-round/240045206)

[2] [http://www.lightreading.com/ip-convergence/solinet-
systems-s...](http://www.lightreading.com/ip-convergence/solinet-systems-
scores-93-million/240048732)

------
ig1
(I run SeedTable which also does analytics on Crunchbase data)

The problem with ranking by funding/acquisition amounts is the data is pretty
dirty, and outliers are disproportionately likely to be incorrect data
(because someone fat fingered a number to be an order of magnitude larger, or
put a foreign currency amount in as USD). Although the acq data is better than
the funding data.

You might also want to extract stuff like biotech from the data because it's
fundamentally a completely separate market from software tech.

~~~
ipedrazas
Still, on that table we can see that on 2012 there was less funding and less
acquisition.

Will be interesting to see what happens during 2013 but it seems that there's
not as much money as on 2010/2011

~~~
ig1
Yes the counted metrics are much more reliable.

It's much harder for TC to misenter an acquisition/funding round then it is
for them accidentally add an extra zero to a number.

------
aaasen
It's crazy that there are so many huge companies that I've never heard of.
Instagram's acquisition was huge news at $1B, but I didn't hear a peep about
Ariba at $4.3B or Genzyme at $20B.

It's easy to think that consumer oriented web startups are what's hot, but
this data proves otherwise.

~~~
jpdoctor
> _It's easy to think that consumer oriented web startups are what's hot, but
> this data proves otherwise._

I think what it proves: Crunchbase serves the PR segment of consumer oriented
web companies, and fails to serve other segments well. Let's face it, it's not
so much a database as it is part of the techcrunch PR machine.

------
bdcravens
T-Mobile is listed as the top acquisition of 2011, but that deal was blocked
by the DoJ and later dropped by AT&T. In other words, the charts are based on
announcements, not necessarily consummated transactions.

~~~
samspenc
Exactly what I was thinking.

------
ronnix
Were some companies acquired multiple times the same year, or is there a bug
somewhere? (RazorFish is listed 3 times in 2002, DoubleClick and Skype twice
in 2005, Getty Images twice in 2008, Sterling Commerce twice in 2009.)

~~~
miquelcamps
ouch! fixed, thanks :)

------
skhamkar
Here are some links to CSV files based on miquelcamps's sql file.

Acquisitions <http://db.tt/h6PoPnCn>

Companies <http://db.tt/UNYulmJD>

Funding <http://db.tt/SHa45HHc>

Words <http://db.tt/mJMCIREX>

------
hkmurakami
For funding categories by year, I really think it would be nicer/more-useful
if we could look at a time-series line graph by sector over a longer time
period (5+ years) to see trends.

------
minimaxir
Which endpoint(s) of the CrunchBase API are you using? I'm planning on doing
something similar graphically.

~~~
miquelcamps
honestly I have not used the API, i've done some web scraping, if you want you
can download the database here ;)

<https://www.dropbox.com/s/pzhqhtk4g23temz/crunchbase.sql.zip>

~~~
wilson_guaraca
Hey Miguel! I am currently writing a thesis on venture capital and I have
similar data from a database called VentureXpert. I will share it with you via
twitter once I finish in a couple of weeks. I am curious to see how different
our data looks. Thanks for posting.

Twitter: @wguaraca

------
ev_ancasey
cool stuff. how are you grouping data startups within the funding categories
section?

i'd be interested to see the breakdown in recent years of startups that offer
a data product, i.e. data infrastructure, ad optimization, user tracking, etc

~~~
miquelcamps
I would like to be more specific with the categories, but crunchbase for the
most of internet based startups the main category is "Consumer Web".

Example: <http://www.crunchbase.com/company/rent-com>

If you want, you can download the database and check the companies table
<https://www.dropbox.com/s/pzhqhtk4g23temz/crunchbase.sql.zip>

------
sylvinus
Funny that the 3 "top paying" acquirers are telcos and not Googles :)

------
ejfox
Hello I'd like your data please.

~~~
miquelcamps
You can download it here ;)
<https://www.dropbox.com/s/pzhqhtk4g23temz/crunchbase.sql.zip>

~~~
kevin_morrill
Is your list of companies in this db everything in their entity list of
companies, or just ones that got acquired?

~~~
miquelcamps
there are 4 tables: \- acquisitions (date, target, acquirer, price) \-
companies (date, name, description, category, url, twitter) \- funding (date,
company, round, size, investors) \- words (for the tagcloud)

