

How to Build a Data Startup - rjurney
http://www.forbes.com/2010/11/02/startup-facebook-twitter-technology-data.html

======
tricky
My wife was part of a data startup. They'd take freely available epidemiology
data, massage it into a spreadsheet, and sell it to pharma. They do really
well, but only because they know their data and they know that their customers
know that they know what they know. you know?

exactly.

~~~
zackattack
would you kindly go into greater detail?

~~~
tricky
you develop a drug that will work on a couple of different diseases (or, more
likely, different cancers) You can only afford to run a clinical trial on one
disease. How do you choose? You hire a company who has a good handle on data
(A lot of which is gathered by the CDC and other government orgs who make it
freely available.) The data people mine that data and come back to say, "x%
more people die from disease 1 than 2, but insurance will reimburse more for
disease 2, so go with 2."

\- totally making that up, but it is something you could feasibly derive from
the data. Mostly it was along the lines of "teens get acne, american teens
would rather use a pill than a cream so develop a pill for them, cream for the
rest of the world."

~~~
wlievens
So it's market research as a product rather than a service, so to speak?

------
psynix
Much easier to read on the original site:
[http://radar.oreilly.com/2010/10/strata-week-building-
data-s...](http://radar.oreilly.com/2010/10/strata-week-building-data-
star.html)

------
physcab
I think there are multiple ways to start a startup based off of data.

1) Take other people's data and learn how to represent it in a way that will
let them understand it better. See FlowingData, Stamen, and NY Times graphics
[http://www.nytimes.com/interactive/2009/03/10/us/20090310-im...](http://www.nytimes.com/interactive/2009/03/10/us/20090310-immigration-
explorer.html))

2) Take other people's data, glean information from it, and offer a new
service based off that information. See FlightCaster and TweetFeel

3) Develop tools that will allow people to play with their own data and do
their own analysis in-house. See Datameer, Google Prediction API, and
Palantir.

Whatever the route, you'll probably need someone who is comfortable dealing
with databases, a graphic artist to make the information pretty, and someone
with the algorithmic knowledge to capture new insights from massive amounts of
data.

------
anthonyb
This article is really just a rehash of a tiny part of O'Reilly's roundup:
[http://radar.oreilly.com/2010/10/strata-week-building-
data-s...](http://radar.oreilly.com/2010/10/strata-week-building-data-
star.html)

You're much better off reading the originals, which I think have already been
posted on HN anyway:

[http://datasyndrome.com/post/1375987697/analytic-product-
tea...](http://datasyndrome.com/post/1375987697/analytic-product-teams)

and

[http://petewarden.typepad.com/searchbrowser/2010/10/how-
to-t...](http://petewarden.typepad.com/searchbrowser/2010/10/how-to-turn-data-
into-money.html)

------
il
I'm wondering how they determined that a data startup needs exactly those
three founders.

I think a data startup needs two founders- a hacker to collect and analyze the
data and a business guy to provide actionable recommendations and sell it.

But then again, I'm currently a single founder working on a data startup and
wearing all of the hats, so what do I know.

Shameless Plug: Anyone want to analyze huge datasets and create
recommendations with me? Email me!

~~~
anthonyb
It's much clearer in the original article - this one is really just a
rehash/mix and match.

[http://datasyndrome.com/post/1375987697/analytic-product-
tea...](http://datasyndrome.com/post/1375987697/analytic-product-teams)

------
benzheren
Visualization is a key part of the data, highly recommend all Edward Tufte
books on information visualization.

------
gyardley
An ok article, but if you're seriously considering building a data startup,
the #1 most-important thing above all is to know exactly how to scale
beforehand. Bad architecture is expensive, and trying to switch architectures
midstream is a nightmare.

~~~
il
I'm going to get flamed for this, but I think, even for a data startup,
worrying about scaling before you have users or traction is premature,
especially with how cheap hardware is becoming.

Case in point: I'm currently hacking together an inefficient, unoptimized
prototype analyzing pretty large datasets on probably the worst architecture
for this kind of thing known to man, and the whole thing still runs pretty
well on a single $50 VPS.

~~~
gyardley
Do you have full control over the amount of data your system is taking in?

The startup I founded had analytics code in a ton of iPhone applications and
was handling the load just fine right up until the day it suddenly wasn't. By
that point we had customers who relied on us, and we had to deal with it very
quickly. Not fun. And there's certainly more to scaling than just cheap
architecture. We thought EC2 would handle the overflow until we unexpectedly
became completely I/O bound. Firing up a few more instances can't fix that.

If you're just running some scraper and can control what you're taking in,
that's a completely different story.

~~~
il
You're absolutely right, I hadn't considered analytics as an example.

Some data startups I've seen as well as my own project take in existing data
sets and simply generate reports from it for customers. Makes it a lot easier
to scale.

------
gsteph22
The world is data. This is a killer article.

~~~
dstorrs
Personally, I found it a bit fluffy. It pretty much boils down to:

"Data startups need three bodies (hustler, designer, prodineer). Talk to
customers early. Here are the levels of knowledge: 1) data, 2) charts, 3)
reports, 4) actionable analytics; higher numbered levels are more valuable."

~~~
rjurney
Some other posts at www.datasyndrome.com are less fluffy on the same topic.
The 3 founders one was written to suggest a 3rd founder to a startup. Data
hackers tend to underestimate the importance of hustling and design.

~~~
sparky
Surely they have numbers to back them up on that.

~~~
rjurney
I don't think data on this is available, just experience.

