
YC-Funded Data Marketplace is an Amazon for Structured Information - jmorin007
http://techcrunch.com/2010/03/18/yc-funded-data-marketplace-is-an-amazon-for-structured-financial-information/
======
snewe
One reason i-banks and academics pay top dollar for data is reliability and
consistency. How does Data Marketplace address errors in user-collected data?
Do they guarantee its quality? If I spend $10,000 on a quarterly update on
CRSP and find a systematic mistake, they will fix it without charge. Such
mistakes are hard to find because CRSP has a reputation to maintain. I wonder
if users hired to collect data have the same constraints.

~~~
matthodan
We want to be as transparent as possible about who is selling the data.
Ultimately, it will be the reputation of the seller that determines how
'trustworthy' the data is (and how much you should be willing to pay for it).

~~~
grinich
Will there be bids for exclusive rights to a certain data source?

~~~
matthodan
If users want it-- definitely. We'll need to see how many people contact us
with these types of requests.

------
lg
Hey, it sounds like these guys would be a perfect customer for ITA's Needle.
Needle provides a GUI-based way for nonprogrammers to scrape data, clean it
up, and get it into a structured database where they can create views on it
and all that databasey stuff. <http://www.needlebase.com/>

~~~
joshu
I hope this is at least a tenth as awesome as it sounds. Have you used it?

~~~
lg
I watched an internal demo :) Where they scraped a table off an MLB site,
live, using their GUI and prettified the data a bit. If you check the blog
there's some demonstrations, and the Village Voice gave it a test drive:
[http://blogs.villagevoice.com/music/archives/2010/01/pazz_jo...](http://blogs.villagevoice.com/music/archives/2010/01/pazz_jop_bonus_2.php)

~~~
joshu
I'm in Mountain View. Can I get a demo?

~~~
lg
up to the team, if you email me I'll forward it to them. Or just use the
contact email on the needlebase site, that goes right to them.

------
coderdude
How funny! Right as I start talking about my startup to buy and sell datasets,
it turns out YC had something like this in secret the whole time. At least I
know I'm doing something worth while.

~~~
matthodan
Let's talk! I like where you are going with Webscaled-- maybe we could work
together. Shoot me an email me when you get a chance: matt @
datamarketplace.com

------
kogir
Sounds a little like Microsoft's "Dallas":

<http://www.microsoft.com/WindowsAzure/dallas/>

~~~
jfarmer
Reminded me most of Factual:
<http://techcrunch.com/2010/02/03/factual-1-million-seed/>

------
_delirium
As a pay-for-service approach, it seems reasonable: you say you need some
data, the appropriate people come up with it in the format you want, and you
pay and get the data. But the Amazon-like angle seems odder to me: How will
each of the bits of structured information manage to be sold more than once?
Since most isn't copyrightable, anyone can buy a copy and then put it up
online for free. I could go buy that Wal-Mart Store Locations data set for $30
right now, and publish it on my blog. Are you hoping EULAs will be enough to
prevent this, or just that in practice it won't happen enough to matter, or
people won't notice that the data is free online and continue buying it
anyway?

~~~
matthodan
This is an interesting question that we've been giving a lot of thought to
lately. We hope that it won't be a big problem, but if it is we will need to
implement public EULAs and a dispute resolution process.

In terms of multiple purchases, we suspect there will be some datasets that
are more popular than others. A list of Wal-Mart stores for example could be
useful to a number of different people, such as investors, real estate
developers, business planners, marketers, etc.

~~~
gbookman
I think you guys are doing the right thing by not worrying about EULAs and
other protection mechanisms at this stage in the game. Stay focused on
satisfying your customers and improving the product as much as possible until
you start making major headway.

------
jackowayed
Is it just me, or are more YC companies launching early this cycle? I can
count 5--Etacts, Crocodoc, Cardpool, 140bets, and Data Marketplace--that have
already launched.

Is this just a product of them having more companies this cycle, something
they're stressing this year, or just a coincidence?

~~~
lloydarmbrust
I think the competition to be a YC company is high enough now that most
companies show up the first day with an actual product.

~~~
stevedewald
Yeah, it's true. We definitely felt like we were playing catch-up from day 1.
Not that its a competition, but I think companies benefit more from YC having
a working product earlier on.

~~~
lloydarmbrust
I was referring to the competition to get in.

It makes sense that if YC is getting more applications than their applicants
will be better and more prepared. Several companies in this round had already
produced several products, were ramen profitable, and had been operating for
sometimes years previous to becoming a YC company.

~~~
maxwin
They will still get lots of help. Correct me if I am wrong. I think the
biggest advantage of being a YC company right now is the kind of press,
reputation, and trust of investors founders might get. PG's advice is
certainly valuable but their connections are the biggest deal.

~~~
lloydarmbrust
I don't think you can put the value of YC in one sentence.

------
nfriedly
Nice - I already made a sale!

I didn't see it posted anywhere, but on a $4 sale, I received $3.68, so that's
an 8% fee if it's a flat rate. (Or it could be something like $0.28 + 1%)

@matthodan, you might want to post that somewhere / make it more obvious.

~~~
nfriedly
[update]

A second sale of a $2 item brought in $1.84, so it appears to be an 8% fee
collected off each sale.

------
spicyj
I'm getting an error when trying to upload a CSV data set. :( Not very
encouraging to be prompted with an error at my first attempt to use the
product. Maybe TC jumped on the story a little soon?

~~~
stevedewald
Sorry to hear that. Would you mind emailing me the CSV file at
steve@datamarketplace.com? I'll try and recreate the problem and let you know
when it's fixed.

------
JshWright
Seems like an awfully low signal to noise ratio in the requests at the moment.
The vast majority of them are very unrealistic, in not outright spam.

"Request For Data List of all Assisted Living care centers in the U.S.,
including name, address, phone number, website and any photos on their
website. There should be about 20,000 of them. Budget: $20.00 Deadline: March
25, 2010 10:40 PM"

Yeah... I'll get right on that...

~~~
rokhayakebe
Some data providers already have a master DB, so all they need to do is write
a few queries and pull the info and sell it n times.

------
aheilbut
This looks really interesting, though if one is looking for publicly available
data, I think it's an open question of how much of the value is in providing a
marketplace vs. actually doing the aggregation, organization and some
analysis.

Two other interesting startups in this space that are focussed more on the
data aggregation end are <http://www.AnythingResearch.com> and
<http://www.AggData.com> (which has dozens of retail datasets just like the
Walmart store example, presumably gathered by screen-scraping).

~~~
matthodan
Data Marketplace can help people who are looking for data connect with people
who are able to collect it. One of the problems with all of the data online is
that it is in too raw a format to be useful. Someone must scrape/combine/clean
the data before it can be analyzed. By connecting business people who maybe
don't know how to scrape data with programmers/aggregators who specialize in
it, we hope to make it a lot easier to find useful data.

------
shafqat
Such a simple idea, but very compelling and nicely executed.

How is the pricing determined? Simply based on what the data seller/aggregator
thinks he can get?

The "requesting for data" reminds me bit of a more structured mechanical turk.

~~~
matthodan
Pricing is set by the provider, but we don't mind haggling. We're trying to
build a community around buying and selling data tables, so if the price isn't
right, that's good feedback for the seller.

------
danteembermage
I'm just about finished with a dissertation in Finance (teaching job starts in
August). Obviously I'm very excited about this launch, I have a couple
painfully acquired datasets I wouldn't mind seeing a dollar return from and
it'd be great to save some effort going forward, especially after I have a
salary to spend on that text box.

~~~
matthodan
We'd love to help you get started! If you have any trouble adding your
datasets, please send me an email at matt @ datamarketplace.com and I'll make
sure they get added asap.

------
sachinag
I would love the ability to browse available datasets without having to
register. And I'd really love to be able to request a dataset without having
to register.

Structured datasets are hard, and the hardest part is finding one that updates
as time passes and new data is available - how are y'all handling that?

~~~
matthodan
You should be able to browse the data that is available without registering by
clicking on "Buy Data" in the header. We make people register when they
request a dataset because we want to be able to let them know once we've found
their data.

We're working on an API that will allow sellers to programmatically refresh
their dataset to keep it up-to-date with the source. We're also working on a
search and filtering system that ranks datasets based on when they were last
updated.

~~~
sachinag
Ah, OK, I thought that was just a representative sample since there were only
five datasets.

I'd recommend you find, package, and sell other datasets that exist out there
(Freebase, etc.).

I disagree with the registration requirement - if I want to be updated, make
that a opt-in box - but I acknowledge that it's a justifiable product
decision.

------
adora
This is every grad student researcher's dream... data finding and cleaning is
such a major hassle. If only I had this a couple years ago...

This is going to be great especially when the new Census comes out. Lots of
data to be sifted through and organized in unique ways that the government
never does for us.

~~~
matthodan
Thanks for the feedback! We'll have to devote some resources towards reaching
out to the academic population.

~~~
adora
No problem. I think you can get A LOT of data from grad students. They hold
the keys to a lot of it, and aren't held to some contract to never release it.

------
smakz
I'm sure there's an opportunity here for someone to hook this up with Amazon's
mturk and provide an "automated" data provider ...

~~~
matthodan
We're working on it :)

------
rokhayakebe
I would redesign the site to be more enterprise like than web-application-
like. This is more b2b. I think I am going to buy or sell some data on this
site.

