Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Semantics3 – API for Products and Prices (semantics3.com)
108 points by netvarun on Feb 14, 2013 | hide | past | web | favorite | 33 comments

Here are two real problems you can tackle for online retailers:

Retailers can often get product data directly from the manufacturer. This is true 100% of the time when the retailer is drop-shipping. The problem isn't really obtaining this data but the canned data you get is pretty worthless. Thousands of online retailers were hit hard by the Panda update (some time ago) because they all used the exact same descriptions provided by the manufacturers. So a lot of time was spent on custom descriptions and "romance copy." If you can find a way to offer unique product descriptions or at the very least somewhat unique then you could save these companies money.

Another problem online retailers face is categorization. Not on their own sites but with mapping their items to the various taxonomies employed by Amazon, Buy.com, Shop.com, PriceGrabber, and so on forever and ever. Find a way to provide the mappings to each of these company's taxonomies and people will pay you. The alternative is doing it all by hand, using a script to cover most of the ground and doing the rest by hand, or by outsourcing it. If I had a service at my finger tips that could have done all that for me when I needed it, I'd have happily recommended it to the boss.

You should charge for these services, of course.

Hi, I'm Shawn, and I work on categorization at Semantics3.

We use Naive Bayesian classifiers and other heuristics for categorization to help us with the disambiguation of products (merging multiple sources referring to the same product into one clean record). Knowing which category a product comes from greatly narrows down the search space for disambiguation.

We're planning to release this as an endpoint for the API, and we are currently working on fine-tuning the categorizer for this purpose. If you have any suggesetions/ideas drop me an e-mail at shawn[at]semantics3.com!

I used this service before: https://www.singlefeed.com/

It's worth it if you don't have a ton of development resources and are trying to get exposure fast.

However, if you can do it yourself, you can take advantage of optimizations you just can't get from canned solutions.

Also, all your competitors don't have the same advantages you do.

Channel Advisor solves your second problem. The first problem can be solved with some creative mashups of data, images and videos... plus all you really need is 2-3 sentences to be considered unique content.

Just checked out their marketplace material[1] and found this:

> Category-Specific Product Mapping Template: Create category-specific product templates that use common attributes and data manipulation technology to map your product information to the marketplace’s specifications.

Very cool! Thanks for the tip.

[1] http://go.channeladvisor.com/rs/channeladvisor/images/us-ds-...

Having historical pricing data seems particularly interesting. I've thought about building a fashion site that predicts pricing trends so you can predict when to buy items on sale, and it seems like this could serve as the pricing infrastructure.

I wonder how I can tell what product sources are available, and whether new ones appear? I.e. is this only going to cover amazon.com, or does it know about nordstrom.com too?

That's a great idea. Pricing trends analysis could be done through the historical data we provide across the different merchants.

We do provide the affiliate purchase links to the products, available under the 'offers' field. Currently you can only figure out when new merchants appear by polling the product ('pull' mechanism). One idea which we have been toying around has been a 'push' mechanism where you can subscribe to a particular product through our API and we would notify if there has been any change (price change or new merchant selling it or product has been discontinued). Drop me a mail at varun <at> semantics3.com if you want more details or wish to bounce off ideas.

So cool, a few questions:

1. The prices info, I assume is for US only, right? 2. Are you only analysing new products or also the used ones?

I created a similar (but very, very modest) proof of concept to track mercadolibre's prices (http://numok.com/products/view/samsung-t24a550/9), however it seems to be unusable without a human verifying each listing, as you state in your blog:

> This isn’t the highest price that we’ve recorded for a product though. Turns out this Samsung TV was priced at $1,000,000,000,000.00 ($1 trillion) in early November last year. A dozen sales of this would have gone a long way towards offsetting the American national debt!

3. Are you doing this validation in some way or unreal prices should be expected by using your API?

As stated before, great pricing! Although I'm not sure how does the limit of products work for the two initial account types (Up to 10,000).

Overall, I'm glad to have this API, thank you!

1. Right now, we're focusing on the US. But we've made room for expanding internationally (the "currency" and "geo" fields are in place with this in mind).

2. We're analyzing used and refurbished products as well. Each offer is tagged with a "condition" field that conveys this.

3. The question of whether a price of a product is right or wrong is, we realized with time, subjective. Yes, $1,000,000,000,000.00 is very unlikely, but where does one drawn the line? Hence, we don't mark something as bad data and remove it from the database at the data layer. But we do handle this problem at the search layer - we internally rank products based on factors such as their (estimated) genuineness, popularity and so on. For the user, what this means is that when you query the API, only the most relevant products will be returned. The ranking system is constantly learning, so the vision is that it'll get better with time and data.

Thanks for your words about the pricing. Each API query returns upto 10 products; the free plan provides 1000 API queries a day. So you could retrieve upto 10000 products each day. Hope that clarifies. Glad you find the API useful - I'd love to know more about how you plan to use it!

I have done some scraping of the amazon.com previously, but they are pretty good at detecting bot's and shutting them down, how did you get around this problem when scraping millions of pages?

Some great advice here on crawling at scale, which has inspired our crawlers a lot : http://news.ycombinator.com/item?id=4367933

Basically it boils down to three things: 1. If the site is slow,crawl slooowly. 2. If you see non-200 http error codes, stop! 3. Obey robots.txt and speed restrictions.

This is very cool, tons of potential for both fun and meaningful use here, but I do have a bone to pick.

The variety of sources seems extremely minimal and the majority of the data appears to be coming from Amazon.

When I sampled the data (pulling products from each topic), less than 1% of the products had data from a source other than Amazon.

Even so, very fun and interesting service. Look forward to playing around more.

"You can use either your PayPal account or your credit card to make your payment. All payments are handled through PayPal and are completely secure."

Very curious to know why you would choose PayPal as your payment processor considering the alternatives out there.

We were originally based out of Singapore and Paypal was the best option we had. If we had the option of Stripe then, we certainly would have gone with them :)

Makes sense. Thanks for the reply.

Looks great, nice work! Signed up for a free account to try it out. The pricing on the Large Booster Pack (150K calls for $159) doesn't seem correct given the pricing/value on the Small and Medium packs...

Thanks for pointing that out. We're fixing it now!

How does this work? Is there a database of the most popular online stores and you're constantly scraping (or GETing) product data?

We aggregate data from a variety of sources (crawling, data dumps, rss feeds, and in some cases even manual curation) after which we integrate them into our data pipeline. We update them using a power law distribution, where the top 1% of best selling products (based on our internal ranking system) is updated hourly, the next 3% updated every two hours, etc.. The whole index is refreshed at the end of each month.

Very cool. Thanks for the explanation.

I really like this product and would like to use it in the UK can pricing be set to GBP? Or is it just USD pricing?

As of now we are only indexing US prices - but we plan to expand out to the UK and Germany next. Drop me a note at varun <at> semantics3.com, I would love to chat with you!

Love the free plan, and the cheap pricing option for people who want to run this on a small scale.

The guideline behind the free plan was to provide enough calls and functionality for any developer to build, launch and maintain a moderately sized app. Shout-out aghi from Mashape for helping us with the pricing.

Looks like a well built product. Nice to see this from Singapore!

how does this do books?

If I provide a random book barcode, it's usually an ISBN (10 or 13 digits)

We don't have books yet. That's been a common request, so we'll be launching with books by the end of the month. If you'd like, I can notify you as soon as we do.

Books would be great - could you notify me too? srmorrisonjit at google's email


Please check out the Book Prices API from DataWeave here: http://www.dataweave.in/apis/dataset-Book-Price-Search-By-IS... (Full Disclosure: I am an employee of DataWeave). Currently, we are serving data from Indian eCommerce stores, but expansion to other geographies is in the pipeline. You can search by ISBN as well as many other fields.

I sure will!

sure, my email is hayk.saakian@<google's popular email service>

Needs more examples with charts to show off the historical data, patterns, etc.

Noted. I'm curious about the motivating factor though - would you like the graphs to validate the depth of the data, get you excited with potential possibilities? Or something else?

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact