Hacker News new | past | comments | ask | show | jobs | submit login
Introducing the Priceonomics Business Model: Data Crawling Services (priceonomics.com)
112 points by kevinburke on Nov 26, 2013 | hide | past | favorite | 32 comments



I think this is excellent! I loved the original idea behind Pricenomics (I had built some toy side projects exploring the same concept) and have enjoyed the direction change to focus on understanding pricing rather than just returning dumb results with a web scraper.

This announcement ties it all together. Philosophically, it's less important what actual products/services they are outputting, but that they're sticking to their original mission -- discovering the best price for a commodity and sharing it with the world.

While we're on the subject of philosophy and prices, I really liked this ribbonfarm article about bartering that seems relevant, I think you will enjoy it:

http://www.ribbonfarm.com/2008/03/16/bargaining-with-your-ri...


Do you guys have any concerns about websites' terms of use restrictions?

I've done a bit of crawling and have always been curious how selling this sort of service would handle those potential use restrictions / legal issues.


I've been building a product search tool. I wanted to scrap data to power it. It'd be similar to the caching and scrapping Google does.

But I spoke to a lawyer and she strongly advised against it (she said if the Terms of Service of a site say it isn't allowed, you have to follow that).

Dunno. Confuses me how something is supposedly illegal yet many companies do all the time without a concern.


These people seem to be making a decent business out of selling scraped data - http://www.aggdata.com/ There might be others too.


Yes, wouldn't you be a legal target for many companies once they have discovered that you are scraping their information? I've done crawling myself and wonder if I can actually resell the data without being afoul of the law.


The answer is complicated, because the law isn't shaken out yet. Doubt it will be for centuries.

I've written "Is scraping legal?" here, with details, and my own view morally. https://blog.scraperwiki.com/2012/04/is-scraping-legal/


Don't put information on the public internet if you don't want the public to read it. You can keep your private information nice and safe behind a firewall, or on a private network not attached to the internet. Easy peasy.


Reading is very different from republishing.


I believe you have to look at robots.txt file. If it lets you crawl, you should be fine.


Does the robots.txt have any legal significance? I would think that the Terms of Use can be more thorough and are more legally relevant.


Economist and long time web scraper here.

In your original business model you wanted to understand the price of everything. In what ways did the problem of a lack of information on the demand side come up? That is, it is easy to scrape the price in many markets (supply side), but what kinds of conversations came up within your team about the lack of information on how many units were actually sold at a posted price?

By the way, glad to see you guys were able to make a business out of crawling. I've landed a handful of freelance gigs since leaving grad school based on scraping data for clients, but never tried to expand it to anything beyond consulting projects.


Not an economist, but I have been mulling over a project surrounding scraping and pricing.

Without having access to the actual monetary transaction data, how does one know what was sold and for how much? Without this (or a mechanism by which the lister closes or updates the listing), how do you know anything was actually sold?


"how does one know what was sold and for how much?"

For example with domain names sales prices only a small amount of transactions are public. For example I've been doing it for 16 or 17 years and have never made any of my data public nor have people that I've consulted for.

Another example might be commercial rents. You can track asking rents but you can't really get a handle on actual rent paid since there are many deal factors (renovations, free rent, triple net etc.) that would change the numbers significantly.


Also an economist and a data scraper/consultant here -- depending on the data, some times all you need to figure out is correlation -- frequency of updates, listings being live for X time; clusters of listings around Y days, etc.

In terms of a few real-life examples, on the one hand you have eBay which provides you with sold data (API through Terapeak). On the other hand you have Craigslist, which is kinda opaque, hates scraping, but you can monitor listings and their half-life. (Listings that disappear quickly presumably get sold quick; listings that stick around for weeks relisted over and over have lower liquidity presumably and/or are priced high.)


eBay's completed listings is definitely one of the best applications of obtaining sales data on the Internet that I'm aware of. Besides that, in some cases there are ways to imperfectly estimate quantities when best seller rankings are available (e.g. at Amazon) -- Chevalier and Goolsbee where the first to suggest this approach back in 2003.[1]

As you mentioned, monitoring half-life is another imperfect approach, but it is of course plagued by false positives (a listing goes away but no sale was made). There was a Google Tech Talk many years ago where some economists took this approach[2], except they were looking at pricing power instead of measuring quantity sold.

[1] http://www.hss.caltech.edu/~mshum/ec106/chevaliergoolsbee.pd...

[2] https://www.youtube.com/watch?v=SfjAezl3-cU#t=27m20s


Although it would only be available for a fraction of prices, delta in "quantity available" between scrapes could provide some data.

Most websites don't relay this info to the end user but it could be used on those that do.


I'm pursuing something even less structured -- forum posts. It may be a lost cause.


Is this service going to be very different (in final output) from what you would get with http://import.io? Import.io takes the stand that we will provide you with the tools to scrape a page so they are not really the one doing the scraping (legally speaking). They also provide an API so easy to consume the scraped data in your programs.

I guess with Import.io if something breaks, you will have to redo the scraping logic. Maybe PE will manage that on the users behalf and that's their value add? Looks like PE may be doing some intelligent scraping as well (not everyone is going to list Dell LCD Monitors $400 in their title but they will do the normalization for you -- just a guess).

I have used import.io and it works very well for most sites and it is fairly easy to create a scraper.


Great pivot guys! If any of you guys are working on textual insights with scraping, please let me know. I do pdfs, text docs as well as more traditional html.

I think vertical specific ecommerce pricing data like what these guys are doing is a hard problem ( I have clients in this space as well) and it's great to specialize in something like this. I think the real value comes from an end to end service like what these guys are offering.

Current projects for clients include custom sentiment analysis engines, real time event streams, ad intel type projects, location data, and even menus.

Much of this is done with deep learning and more generalizable models that are likely to fit your domain.

I have a dashboard I'm working on with some customers now that will allow you to avoid having to contact a service provider.

Email is in profile if there's any requests. I hope to put up a good portfolio site soon. Only started this a few months after building out my own text analytics engine.

Also again: best of luck to priceonomics here. It's a lucrative market if done right.


The idea of doing a bunch of different kinds of data is a bit spooky. It's really hard to have the best data in the world in just a single vertical.


What was the original Priceonomics product?


Their very first blog post [0] said

"We’re building the price guide for everything, from bicycles to boats, Aeron chairs to iPads. We hope as you look to buy or sell things you can use our data to get a good deal (or at least prevent yourself from getting ripped off). For example Priceonomics can show you how much a used iPad 2 3G should cost, as well as a whole range of prices"

[0] http://blog.priceonomics.com/post/14567999429/how-to-use-pri...


Sounds just like the needlebase project ITA software developed in the late 2000s, but was shut down after google acquired them. http://www.quora.com/Needlebase


It's good to see a publisher trying a new business model other than advertising and paywalls. I like to see companies leveraging their core competency and providing it as a service. Priceonomics is killing it!


Basically, you seem to be going the LifeStyle Business route. Good for you!!!


Answers the previously unanswered question about where they get their data :-) So yes, they crawl Craigslist and other sites. Now is it more 80legs or something else? Hard to say.


You guys are great at blogging. Have you thought about pivoting?


Rohin from Priceonomics here. Thanks for the kind words.

Well, we're trying to build a company that one day has hundreds of writers and engineers working together. So, we've been writing all along and want to do more of it for sure.


Pretty sure data scraping is their pivot, as articulated in this article.


None post is good without good data, and they are good at that too.


How much would it cost to scrape about 100,000 data points each month?


what's the frequency of crawling you guys support? e.g. hourly updates, daily updates, ..




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: