Retailers can often get product data directly from the manufacturer. This is true 100% of the time when the retailer is drop-shipping. The problem isn't really obtaining this data but the canned data you get is pretty worthless. Thousands of online retailers were hit hard by the Panda update (some time ago) because they all used the exact same descriptions provided by the manufacturers. So a lot of time was spent on custom descriptions and "romance copy." If you can find a way to offer unique product descriptions or at the very least somewhat unique then you could save these companies money.
Another problem online retailers face is categorization. Not on their own sites but with mapping their items to the various taxonomies employed by Amazon, Buy.com, Shop.com, PriceGrabber, and so on forever and ever. Find a way to provide the mappings to each of these company's taxonomies and people will pay you. The alternative is doing it all by hand, using a script to cover most of the ground and doing the rest by hand, or by outsourcing it. If I had a service at my finger tips that could have done all that for me when I needed it, I'd have happily recommended it to the boss.
You should charge for these services, of course.
We use Naive Bayesian classifiers and other heuristics for categorization to help us with the disambiguation of products (merging multiple sources referring to the same product into one clean record). Knowing which category a product comes from greatly narrows down the search space for disambiguation.
We're planning to release this as an endpoint for the API, and we are currently working on fine-tuning the categorizer for this purpose. If you have any suggesetions/ideas drop me an e-mail at shawn[at]semantics3.com!
It's worth it if you don't have a ton of development resources and are trying to get exposure fast.
However, if you can do it yourself, you can take advantage of optimizations you just can't get from canned solutions.
Also, all your competitors don't have the same advantages you do.
> Category-Specific Product Mapping Template: Create category-specific product templates that use common attributes and data manipulation technology to map your product information to the marketplace’s specifications.
Very cool! Thanks for the tip.
I wonder how I can tell what product sources are available, and whether new ones appear? I.e. is this only going to cover amazon.com, or does it know about nordstrom.com too?
We do provide the affiliate purchase links to the products, available under the 'offers' field. Currently you can only figure out when new merchants appear by polling the product ('pull' mechanism). One idea which we have been toying around has been a 'push' mechanism where you can subscribe to a particular product through our API and we would notify if there has been any change (price change or new merchant selling it or product has been discontinued). Drop me a mail at varun <at> semantics3.com if you want more details or wish to bounce off ideas.
1. The prices info, I assume is for US only, right?
2. Are you only analysing new products or also the used ones?
I created a similar (but very, very modest) proof of concept to track mercadolibre's prices (http://numok.com/products/view/samsung-t24a550/9), however it seems to be unusable without a human verifying each listing, as you state in your blog:
> This isn’t the highest price that we’ve recorded for a product though. Turns out this Samsung TV was priced at $1,000,000,000,000.00 ($1 trillion) in early November last year. A dozen sales of this would have gone a long way towards offsetting the American national debt!
3. Are you doing this validation in some way or unreal prices should be expected by using your API?
As stated before, great pricing! Although I'm not sure how does the limit of products work for the two initial account types (Up to 10,000).
Overall, I'm glad to have this API, thank you!
2. We're analyzing used and refurbished products as well. Each offer is tagged with a "condition" field that conveys this.
3. The question of whether a price of a product is right or wrong is, we realized with time, subjective. Yes, $1,000,000,000,000.00 is very unlikely, but where does one drawn the line? Hence, we don't mark something as bad data and remove it from the database at the data layer. But we do handle this problem at the search layer - we internally rank products based on factors such as their (estimated) genuineness, popularity and so on. For the user, what this means is that when you query the API, only the most relevant products will be returned. The ranking system is constantly learning, so the vision is that it'll get better with time and data.
Thanks for your words about the pricing. Each API query returns upto 10 products; the free plan provides 1000 API queries a day. So you could retrieve upto 10000 products each day. Hope that clarifies. Glad you find the API useful - I'd love to know more about how you plan to use it!
Basically it boils down to three things:
1. If the site is slow,crawl slooowly.
2. If you see non-200 http error codes, stop!
3. Obey robots.txt and speed restrictions.
The variety of sources seems extremely minimal and the majority of the data appears to be coming from Amazon.
When I sampled the data (pulling products from each topic), less than 1% of the products had data from a source other than Amazon.
Even so, very fun and interesting service. Look forward to playing around more.
Very curious to know why you would choose PayPal as your payment processor considering the alternatives out there.
If I provide a random book barcode, it's usually an ISBN (10 or 13 digits)
Please check out the Book Prices API from DataWeave here: http://www.dataweave.in/apis/dataset-Book-Price-Search-By-IS... (Full Disclosure: I am an employee of DataWeave). Currently, we are serving data from Indian eCommerce stores, but expansion to other geographies is in the pipeline. You can search by ISBN as well as many other fields.