Here are two real problems you can tackle for online retailers:
Retailers can often get product data directly from the manufacturer. This is true 100% of the time when the retailer is drop-shipping. The problem isn't really obtaining this data but the canned data you get is pretty worthless. Thousands of online retailers were hit hard by the Panda update (some time ago) because they all used the exact same descriptions provided by the manufacturers. So a lot of time was spent on custom descriptions and "romance copy." If you can find a way to offer unique product descriptions or at the very least somewhat unique then you could save these companies money.
Another problem online retailers face is categorization. Not on their own sites but with mapping their items to the various taxonomies employed by Amazon, Buy.com, Shop.com, PriceGrabber, and so on forever and ever. Find a way to provide the mappings to each of these company's taxonomies and people will pay you. The alternative is doing it all by hand, using a script to cover most of the ground and doing the rest by hand, or by outsourcing it. If I had a service at my finger tips that could have done all that for me when I needed it, I'd have happily recommended it to the boss.
Hi, I'm Shawn, and I work on categorization at Semantics3.
We use Naive Bayesian classifiers and other heuristics for categorization to help us with the disambiguation of products (merging multiple sources referring to the same product into one clean record). Knowing which category a product comes from greatly narrows down the search space for disambiguation.
We're planning to release this as an endpoint for the API, and we are currently working on fine-tuning the categorizer for this purpose. If you have any suggesetions/ideas drop me an e-mail at shawn[at]semantics3.com!
Channel Advisor solves your second problem.
The first problem can be solved with some creative mashups of data, images and videos... plus all you really need is 2-3 sentences to be considered unique content.
Just checked out their marketplace material[1] and found this:
> Category-Specific Product Mapping Template: Create category-specific product templates that use common attributes and data manipulation technology to map your product information to the marketplace’s specifications.
Having historical pricing data seems particularly interesting. I've thought about building a fashion site that predicts pricing trends so you can predict when to buy items on sale, and it seems like this could serve as the pricing infrastructure.
I wonder how I can tell what product sources are available, and whether new ones appear? I.e. is this only going to cover amazon.com, or does it know about nordstrom.com too?
That's a great idea. Pricing trends analysis could be done through the historical data we provide across the different merchants.
We do provide the affiliate purchase links to the products, available under the 'offers' field. Currently you can only figure out when new merchants appear by polling the product ('pull' mechanism). One idea which we have been toying around has been a 'push' mechanism where you can subscribe to a particular product through our API and we would notify if there has been any change (price change or new merchant selling it or product has been discontinued). Drop me a mail at varun <at> semantics3.com if you want more details or wish to bounce off ideas.
1. The prices info, I assume is for US only, right?
2. Are you only analysing new products or also the used ones?
I created a similar (but very, very modest) proof of concept to track mercadolibre's prices (http://numok.com/products/view/samsung-t24a550/9), however it seems to be unusable without a human verifying each listing, as you state in your blog:
> This isn’t the highest price that we’ve recorded for a product though. Turns out this Samsung TV was priced at $1,000,000,000,000.00 ($1 trillion) in early November last year. A dozen sales of this would have gone a long way towards offsetting the American national debt!
3. Are you doing this validation in some way or unreal prices should be expected by using your API?
As stated before, great pricing! Although I'm not sure how does the limit of products work for the two initial account types (Up to 10,000).
1. Right now, we're focusing on the US. But we've made room for expanding internationally (the "currency" and "geo" fields are in place with this in mind).
2. We're analyzing used and refurbished products as well. Each offer is tagged with a "condition" field that conveys this.
3. The question of whether a price of a product is right or wrong is, we realized with time, subjective. Yes, $1,000,000,000,000.00 is very unlikely, but where does one drawn the line? Hence, we don't mark something as bad data and remove it from the database at the data layer. But we do handle this problem at the search layer - we internally rank products based on factors such as their (estimated) genuineness, popularity and so on. For the user, what this means is that when you query the API, only the most relevant products will be returned. The ranking system is constantly learning, so the vision is that it'll get better with time and data.
Thanks for your words about the pricing. Each API query returns upto 10 products; the free plan provides 1000 API queries a day. So you could retrieve upto 10000 products each day. Hope that clarifies. Glad you find the API useful - I'd love to know more about how you plan to use it!
I have done some scraping of the amazon.com previously, but they are pretty good at detecting bot's and shutting them down, how did you get around this problem when scraping millions of pages?
Basically it boils down to three things:
1. If the site is slow,crawl slooowly.
2. If you see non-200 http error codes, stop!
3. Obey robots.txt and speed restrictions.
We were originally based out of Singapore and Paypal was the best option we had. If we had the option of Stripe then, we certainly would have gone with them :)
Looks great, nice work! Signed up for a free account to try it out. The pricing on the Large Booster Pack (150K calls for $159) doesn't seem correct given the pricing/value on the Small and Medium packs...
We aggregate data from a variety of sources (crawling, data dumps, rss feeds, and in some cases even manual curation) after which we integrate them into our data pipeline. We update them using a power law distribution, where the top 1% of best selling products (based on our internal ranking system) is updated hourly, the next 3% updated every two hours, etc.. The whole index is refreshed at the end of each month.
As of now we are only indexing US prices - but we plan to expand out to the UK and Germany next. Drop me a note at varun <at> semantics3.com, I would love to chat with you!
The guideline behind the free plan was to provide enough calls and functionality for any developer to build, launch and maintain a moderately sized app. Shout-out aghi from Mashape for helping us with the pricing.
We don't have books yet. That's been a common request, so we'll be launching with books by the end of the month. If you'd like, I can notify you as soon as we do.
Please check out the Book Prices API from DataWeave here: http://www.dataweave.in/apis/dataset-Book-Price-Search-By-IS... (Full Disclosure: I am an employee of DataWeave). Currently, we are serving data from Indian eCommerce stores, but expansion to other geographies is in the pipeline. You can search by ISBN as well as many other fields.
Noted. I'm curious about the motivating factor though - would you like the graphs to validate the depth of the data, get you excited with potential possibilities? Or something else?
Retailers can often get product data directly from the manufacturer. This is true 100% of the time when the retailer is drop-shipping. The problem isn't really obtaining this data but the canned data you get is pretty worthless. Thousands of online retailers were hit hard by the Panda update (some time ago) because they all used the exact same descriptions provided by the manufacturers. So a lot of time was spent on custom descriptions and "romance copy." If you can find a way to offer unique product descriptions or at the very least somewhat unique then you could save these companies money.
Another problem online retailers face is categorization. Not on their own sites but with mapping their items to the various taxonomies employed by Amazon, Buy.com, Shop.com, PriceGrabber, and so on forever and ever. Find a way to provide the mappings to each of these company's taxonomies and people will pay you. The alternative is doing it all by hand, using a script to cover most of the ground and doing the rest by hand, or by outsourcing it. If I had a service at my finger tips that could have done all that for me when I needed it, I'd have happily recommended it to the boss.
You should charge for these services, of course.