
Datasets and Data-driven Startups - mattyb
http://measuringmeasures.com/blog/2010/4/29/datasets-and-data-driven-startups.html
======
tow21
_"Why would you want to start a data-driven startup?"_

In our case (<http://timetric.com>) - it was at least in part due to having
spent several years beavering away trying to build products for the management
of scientific data, before discovering that outside of science:

a) there's lots of other interesting data can benefit from better management

b) people will pay for it :-)

The article's a very good overview of lots of the issues we've faced in
building a business around data.

------
physcab
To present an alternative viewpoint on this matter, I would say for people
looking to start data-driven startups to first think about your application.
What do you want to make? Then think about how you can glean information from
the normal usage of that product.

For example, I think its much more risky to think "hmm, what datasets are out
there and how can I build a business off of them?" than "hmm, what business do
I want to create, and can I monetize through data acquisition?"

In the first case, you are going to be held to the availability of someone
else's data. You don't know how it was collected, and you have no idea how it
could have been doctored. In the second case, you create your business with a
data-driven mindset, and you control all those parameters yourself. Then you
are just at the mercy of how well your data reporting tools are working.

~~~
patio11
I think data is a detail and decisions are solid gold.

"We'll tell you the status of this customer's last forty-seven credit card
payments" is a detail to your customers. An enormously non-trivial
undertaking, but ultimately a detail.

"Hiya. We're Fair Isaac and we are going to purge the profession of
underwriters from the face of the earth, saving you billions of dollars,
decreasing your loan processing time from weeks to literally seconds, and
allowing your consumer lending to scale in ways you cannot even imagine. By
the way, some numbers are involved." is a wee bit more compelling.

~~~
arohner
Exactly. Though I would go even farther than that. Similar to "People don't
pay for drills, they pay for holes", "People don't pay for data, they pay for
decisions". Data without actionable decisions are worthless.

~~~
dman
Bloomberg and Reuters would like to disagree. The collection of reliable
cleaned data in the real world is such a chore and has such a strong network
effect that it is a good model on which to build a business upon. Eg Mint
might have cashed out because of the buyout but yodlee is going to see a check
every year from intuit for a long long time to come.

~~~
arohner
Yes, with emphasis on _reliable, cleaned_. That data is worthwhile _because_
it is actionable. The value-add in FlightCaster or Fair Isaac is that it
converts unactionable to actionable information.

~~~
dman
I havent used either Flightcaster or Fair Isaac extensively. A cursory glance
at their websites suggests that both companies actually transform the
available data substantially, almost to the point where it is fundamentally
new information. Flightcaster does this with their Machine Learning voodoo,
while Fair Isaac does it by summarising and analysing existing data to create
a credit score / report. In either case there is a worthwhile value add. From
my point of view what these two companies do is very different from Reuters
and Bloomberg who take great pains to not do analysis (that would be a
conflict of interest with their clients). In short - the post I replied to
made the assertion that data by itself has no value. I quote - "People don't
pay for data, they pay for decisions". The two examples I provided (Reuters
and Bloomberg) provide merely scrubbed data, they do not provide analysis.
Hence there appears to be a sizeable market for data.

------
louislouis
I'm looking to collect or download a dataset for music consisting of info such
as artist/album/song. Any idea where to grab this from? There used to be a
list hosted on Google around 2 yrs ago but I can't find anymore.

~~~
physcab
Yes. Lots of info out there. You can use our API at Grooveshark available
here: <http://tinysong.com/api>

You can also bulk download Discogs here: <http://www.discogs.com/data/>

And MusicBrainz has one here: <http://musicbrainz.org/>

You can also bulk download wikipedia, cross reference page titles with a set
of artist names, and grab whatever information you want if your Regex skills
are magical.

A general word of caution: User-sourced music information is very very messy.
Be prepared for a lot of mispellings, bad metadata, missing information, etc.

~~~
nerfhammer
re: extracting wikipedia data

someone's already done this for you. see: dbpedia.org

------
ableal
_[...] the unglamourous quest to get the data to a point where [...]_

That, and once there are results, carting them back into the 'production line'
and plugging them where they'll do any good.

Logistics.

