
API that uses neural networks to scrape product data - Buneme
https://mlscrapedemo.herokuapp.com/
======
Buneme
Hello,

Over the past few months I've been working on a neural-network based web
scraper for e-commerce websites. The aim is to be able to scrape product data
from any product page (so far it extracts the name, price, main image URL and
technical specification of the product).

I've developed a working prototype of the API along with a demo page (with
rate limits so please use it within reason! ) in order to get some feedback
before I carry on with the project.

Because the API is only a prototype, there are some features which are
currently missing but will be added later on - for example: • Only English
sites that use GBP, EUR or USD are supported • I haven't finished integrating
my computer vision algorithms, which means that in some specific situations,
the API might not detect a strikethrough and will therefore mix-up the
"current price" of the product with the "old price" • The service is running
on a hobby heroku server so the API takes a few seconds longer than it
otherwise would.

I would appreciate any feedback on the API, in particular: • Is there any
other product data that you would like to see (e.g product ID, delivery costs,
etc)? • What sort of applications would you use this API for once it's fully
developed? • Apart from e-commerce sites, what other types of websites would
you like to see an API for (e.g news websites, real estate listings, etc)?

~~~
billconan
We really need a service like this.

We are a furniture e-commerce. Our vendors don't provided detailed product
feeds. We have to rely on scraping.

The most difficult part of scraping the data is that we need to scrape all the
product options (Material, Color, Size ...)

each option is a different SKU. see
[https://www.article.com/product/11833/sven-charme-tan-
sofa](https://www.article.com/product/11833/sven-charme-tan-sofa)

We also need to build nlp models to understand product dimensions and weight
(useful when estimating shipping fee)

~~~
jimhi
Hey - if this is true my company already has a pretty good solution for
getting product info in a standard format from 10s of thousands of websites.
My company also has to gather, format, and estimate dimensions and weight
because we do only international shipping. Try out a random product url on
zipx.com for an example.

Should we talk? My email is in my profile. I think there might be several ways
your company and mine can help eachother actually...

~~~
Buneme
Thanks for getting in touch, I've just sent over an email

------
forgingahead
Well done for getting it set up! How are you training your model? Feeding in
scraped product pages alongside metadata from an API to train it? And what are
your training sources?

Very nice idea, looking forward to seeing how it develops!

~~~
Buneme
Yes that's pretty much it - using a mix of ecommerce APIs and manual work to
create data, and then using that as training data

------
flem10
Interesting project.

Are there any prerequisites for the product page URL?

I just tested it out on a few e-commerce websites (fashion) and all the values
returned from the product pages were null

~~~
Buneme
It's currently a prototype, so I haven't fully finished integrating all the
computer vision algorithms which means that it may miss some data in certain
situations - but the purpose at this stage was just to see if this is
something people would genuinely use and to get general feedback to help me
decide what features to prioritise going forward.

~~~
flem10
Got it.

I have a side project I've been meaning to finish, where I would definitely
use something like this (basically identifying price arbitrage opportunities
across luxury fashion retailers).

I will bookmark and keep an eye out for your progress.

------
klmadfejno
Sounds like a lot of room for costly error. See door dash pizza arbitrage.

