
Launch HN: Depict.ai (YC S20) – Product recommendations for any e-commerce store - antonoo
Hey there! We are Oliver and Anton, and are founders at Depict.ai. We help online stores challenge Amazon by building recommender systems that don&#x27;t require any sales or behavioral data at all.<p>Today, most recommender systems are based on a class of methods commonly called ‘collaborative filtering’ - which means that they generate recommendations based on a user&#x27;s past behavior. This method is successfully used by Amazon and Netflix (see the <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Netflix_Prize" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Netflix_Prize</a>). They are also very unsuccessfully used by smaller companies that lack the critical mass of historical behavioral data required to use those models effectively. This generally results in the cold start problem (<a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Cold_start_(recommender_systems)" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Cold_start_(recommender_system...</a>) and a worse customer experience. We solve this by not focusing on understanding the customer but instead focus on understanding the product.<p>The way we do this is with machine learning techniques that create vector representations of products based on the products’ images and descriptions, and recommend matching using these vector representations. More specifically, we have found a way to scrape the web and then train massive neural networks on e-commerce products. This makes it possible to leverage large amounts of product metadata to make truly impressive recommendations for any e-commerce store.<p>One analogy we like is that just as almost no single company has enough sales or behavioral data to consistently predict, for instance, credit card frauds on their own, almost no e-commerce company has enough data to generate good recommendations based only on their own information. Stripe can make excellent fraud detection models by pooling transactions from many smaller companies, and we can do the same thing for personalizing e-commerce stores by pooling product metadata.<p>Through A&#x2F;B-tests we have proved that we can increase top-line revenue with 4-6% for almost any e-commerce store. To prove our value we offer the tests and setup 100% for free. We make money by taking a cut of the revenue uplift we generate in the A&#x2F;B-tests. We have also found that the sales and decision cycle gets much shorter by being independent of customer&#x27;s user data. You can see us live at Staples Nordics and kitchentime.com, among others.<p>Oliver and I have several years of experience applying recommender systems within e-commerce and education respectively and felt uneasy about a winner-takes-it-all development where the largest companies could use their data supremacy to out-personalize any smaller company. Our goal is to build a company that can offer the best personalization to any e-commerce store, not just the ones with enough data.<p>Do you think our approach seems interesting, crazy, lazy or somewhere in the middle? We’d love any feedback - please feel free to shoot us comments below or DM, we’ll be here to answer your thoughts and gather feedback!
======
riddlemethat
I worked with a team that built an engine that did exactly this. There are
complex issues to resolve. Mostly, the market for retail analytics is very
small so even though this type of data fabric can offer retailers incredible
insights into what they should be purchasing and how to bundle/market their
products together, they won't pay what it costs to scrape such much data to
generate the recommendations.

The product recommendation angle for eCommerce is a better angle but only
works well for big companies where you have enough data at the onset to drive
better recommendations. With smaller companies and lesser known products you
must make probabilistic determinations based on image analysis and context
structure that will be mostly guess work until you have real data. Such as you
surmised with your A/B testing.

Anyway, it seems you already got some major clients under your belt and have
proven a track record. Hope you are able to succeed in your quest to make
better recommendations work for small business with the data fabric you
created.

Happy to chat through my experiences if you have interest. hn (at) strapr
(dot) com is my email.

~~~
antonoo
Thanks for sharing your experience, we will connect with you per email!

------
serendipityrecs
Cool idea. I've been working on an adjacent product (serendipityrecs.com), but
mine is more targeted towards consumers as opposed to B2B. I think I
gravitated towards B2C by default because as an engineer, I don't want to deal
with sales, but your product makes sense. I'll be interested in following your
progress over the next few years to see how your thesis plays out.

Couple questions

\- How well do your recommendations hold up against Amazon's? Since you're
scraping the metadata, you should be able to generate recs for Amazon items
from their own catalogue. This might be an interesting product / demo for
potential customers.

\- Once you hook up your system to your customer's back end, how do you learn
from the behavioral data you get from them? That's straightforward for cf/mf,
but can be tricky to integrate into what you already have. \- You talk about
Stripe pooling the data from their customers. I think the analogue for you
would be pooling the behavioral data from your customers as opposed to the
metadata. Have you thought about this?

\- It sounds like you're doing nearest neighbors on the vector representation.
You may already know this, but LSH is a fast way to do this when you have many
items.

\- Do you embed all items from your different customers into the same vector
space? That would be ideal from the POV of creating a pooled dataset that
would be helpful for all future customers, but sounds tricky given that
everyone likely has their own idiosyncratic system.

Best of luck! Lmk if you'd like to talk shop sometime, I also have several
years of experience with recommender systems (my email is in my profile).

~~~
antonoo
Hey interesting to hear from you. Will connect per email as a follow up since
we seem to have quite some stuff in common.

On the first question regarding comparing with Amazon: That’s a great point.
We can actually personalize those kinds of demos for each specific customer,
since the marginal cost of scraping yet another store is pretty low given the
infrastructure we have put in place. See an example here:
[https://demo.depict.ai/madstyleshop](https://demo.depict.ai/madstyleshop)

------
chudaka_pi
So are a you scraping (data acquisition) engine, an AI engine, a catalogue as
a service engine, a recommendation engine, or all of the above? No info on
your model capabilities, data scope (brands/retailers/SKUs)? It seems like you
are vectorizing SKUs, which is a well-understood problem. How does your
platform compare to markable.ai (look beyond the Visual AI, these guys have a
robust vectorization pipeline, and they are scraping continuously which takes
a team of engineers) or visenze.com (massive platform of most of of
e-commerce, almost a billion SKUs, styles, occasions, lots of AI). You guys
seem knowledgeable but it looks like a multiple products al in one and a
fairly small team. Good luck, regardless!

------
sanj
I built a recommended like this at a prior job. We carefully tested it against
the original “naive” algorithm which was used direct user behavior clustering.

What was interesting is that the naive algorithm got better over time and the
incremental benefit of our new code got smaller.

Why?

Because the training data for the naive algo included user behavior from the
new one. As we created better recommendations, users clicked on them and that
fed into the old algo!

Coming to your product: what is to prevent a customer from using it for a few
weeks, copying down the results, and then using those recommendations forever?

They’ll get most of the benefit for very small cost.

~~~
antonoo
Hey! While we're mainly looking at the product metadata, our algorithm also
learns user-specific patterns for each store. That makes the recommendations
hard to copy - since two people could look at the same product but get
different recommendations. Also, what to recommend to a particular product can
depend on more factors than only person & product - seasonality & trends also
influence (to varying degrees).

~~~
sanj
All of these things are true. But they need to be of sufficient magnitude over
simple, zero knowledge about user recommendations to warrant ongoing cost.

For the space I was in, recommending hotels, the incremental value of the use
behavior was minimal.

On the other hand, getting the zero knowledge version working really well was
worth an extra 11.6%.

------
jonas_b
Congratulations guys. I met two of you when you visited our agency and told
your story. Very impressive, and for those of you reading, these guys were
still in high-school when we met them, maybe still are? Stoked that you got
into YC!!

~~~
antonoo
Haha, that's awesome! So cool that you still remember us from the days of
knocking on doors around Stockholm:)

------
an_opabinia
> This generally results in the cold start problem

If I'm an online store with 100 products, couldn't I just punch the products
into Amazon on a fresh account, then copy the search results? 100 products
would maybe take me 20 minutes to do a day, but if you're saying there's a
4-6% lift, seems like it's worth it?

If it was 1,000 products, maybe I do this once a week for 200 minutes? Etc.
etc.

Here's what'll happen: Your online store won't have most of the products on
Amazon's recommended list. Isn't that the problem?

So no matter what, don't I eventually have to scale to Amazon size to get the
value out of collaborative filtering?

Maybe no small business has that real supply chain. They are just front-
running other stuff. But hey, that's their prerogative - to try to be Amazon
without doing the stuff that actually makes Amazon successful.

> Netflix Prize

They don't even use those methods anymore. And that competition was much more
about how to do IT and ensemble methods than any one particular approach,
since that's how you get to #1.

Netflix Prize is sort of the opposite narrative of what you're actually doing.
If you're seeking something that normal people recognize, just stick to
talking about Amazon.

> Do you think our approach seems interesting, crazy, lazy or somewhere in the
> middle?

At least the premise doesn't square away.

Considering the data gathering, it seems easier to do user-product
collaborative filtering.

Considering the math, it seems easier to do user-product collaborative
filtering. You can bootstrap weights data for a e.g. non-negative matrix
factorization collaborative filtering from existing recommendations.

Is there going to be something important encoded in the image or metadata you
can relate to other things? It seems easiest to just use the keywords. Like
you don't need a picture of guacamole to know it goes with tortilla chips,
it's in the keywords.

Then again, the whole point is to find serendipitous stuff from your existing
user data. If you only offer 100 products, none of them will serendipitously
be shopping carts together because that's so few products. It's already
curated to such a degree collaborative filtering will not find anything you
don't already know.

~~~
shoguning
> it's in the keywords

This assumes that the source and target listings definitely have accurate
keywords.

There would be _plenty_ of value in a model that can use either keywords OR
images to effectively make recs.

> Considering the data gathering, it seems easier to do user-product
> collaborative filtering.

What user data are you talking about here? The customer doesn't necessarily
have user data.

I personally think this approach makes a lot of sense. If it works, a one-
size-fits all recommender would be really useful and easy to sell.

~~~
an_opabinia
> What user data are you talking about here?

The user data from punching searches into Amazon and copying their "Also
bought" recommended product lists.

------
mlthoughts2018
Taking a cut of revenue seems extreme because what you are doing is extremely
commodity.

ML teams at hosting platforms like Wix or Shopify or Squarespace could offer
the same as a built in or slight higher tiered premium feature, paying a tiny
fixed cost instead of a share of revenue uplift.

This could even be basically an intern or a new grad project at tech companies
like that, the technology for the model is very simple. The devil would be in
the details of integrating with the data model backing those platforms
ecommerce shop products, but you could solve once and then immediately offer
it for all your customers and out of the box for new customers.

The part of your idea that makes me skeptical is the scalability of applying
your recommendation approach to bespoke customers. Like, I’m sure you can do
it, but with nowhere near the same reach or efficiency or price point as well
capitalized major store hosting platforms.

~~~
antonoo
Yes we have seen that simple integration and A/B test setup is key for our
customers.

It is both about integrating with the website as well as integrating with the
catalog of existing products, and this is right now easier with us than any
other provider since we have built tooling for making this very efficient
right now.

------
sheeshkebab
It’s an interesting service, although your pricing model makes for a very
tough sell (and complex to even technically consider - ab tests? Need detailed
sales data too? Forget it...)

~~~
antonoo
Hey! When we do the AB-tests, you will objectively see how much more money we
make for you. The pricing can get a little bit technical compared to other
solutions, but it all boils down to a simple _fixed_ monthly recurring revenue
fee based on the revenue uplift.

~~~
alehul
Hey Anton, love the product - wanted to chime in on the A/B test pricing
model.

I was PM for a company that grew to $2.5m ARR using an A/B test to prove our
value and charging on the delta. (We also had the same technical execution via
JS widget).

It added some friction; a few customers actually wanted a fixed rate and
didn't understand the A/B test premise (it's all in the presentation). Most
absolutely loved it, though, and were ready to signup after hearing it.

There's certainly best practices in the execution of this pricing strategy,
and if you ever want to talk through those, let me know. Would be happy to
chat. :)

------
zkid18
Oliver, Anton congrats with a launch. RecSys analyst here.

Correct me if I wrong, but afaiu, you have designed a black-box content-based
recommender system for e-commerce domain by scrapping publicly available data.
I love your business model, though I have a couple questions:

1\. A/B testing in RecSys is a tricky process in terms of further
interpretation. How do you choose the control and test group? I would love to
go beyond revenue percentage influx while considering the new model. Btw, do
you have your own A/B testing environment?

2\. Are you targeting one specific problem, like cold start or checkout
recommendation or have a general solution?

3\. Are you planning to open-source your model?

4\. Do you have any Wordpress/Shopify plugins?

Anyway, I really like your idea and would love to contribute.

Let's stay in touch via twitter: @kidrulit.

~~~
antonoo
These are some great questions!

1\. In order for e-stores to trust the results, we tend to use Google Optimize
for the A/B-tests (which randomly assigns 50% of users to see the stores'
previous recommendations)

2\. Currently our main focus are recommendations on the product page - but we
do aim to also add our recommendations in other spaces, where the checkout &
landing pages are some clear alternatives! Mostly a bandwidth question at this
point.

3\. We might open-source parts of it sometime in the future, thought it's not
something we're currently considering actively.

4\. Not yet! We integrate by injecting a JS widget on their site, through
which we also track the user behaviour.

Sounds great - let's follow up on Twitter

------
KaoruAoiShiho
How does it compare to recombee or AWS personalize? I'm in the market for this
but I guess I'm a bit more technical than the people you're selling to and so
can use stuff that's a tad lower level?

~~~
antonoo
We have done A/B tests against AWS Personalize and other players like Nosto,
Apptus, and more and to date we have not failed a single A/B test!

~~~
danpalmer
> to date we have not failed a single A/B test

This smells fishy. If you're experimenting with a typical p < 0.05 (which is
often too high for ecommerce optimisation), surely you'd expect to fail 1 in
20 by chance, even if your product is better?

------
brecs
This sounds very exciting! I'm interested in the A/B tests you ran to show
revenue lift from your recommendations. What is the baseline model you test
against? It seems to me that the most "fair" comparison would be to set up
exactly the same vector representations and neural network model for only a
single company at a time, and compare the performance to demonstrate that it
is really your approach of combining different companies' datasets that
provides the extra value here. Is that what you guys did?

~~~
antonoo
In A/B tests we have compared against e-commerce stores existing
recommendations which are either manual or made by other recommender systems.
We will consider running experiments like the one you describe it is a good
idea!

------
bartkappenburg
We [0] use a combination of Elastic’s “More Like This” query (which uses the
said vector spaces et al) and view, clicks and sales data and ML (behaviour
on—site). This prevents the Cold Start Problem and improves with more data.

How does your scraping hold up against the already pretty effective More Like
This query in ES? That one is backed by years of research and gives very good
results.

[0] [https://www.conversify.com](https://www.conversify.com)

~~~
shoguning
Generating the vector representation is surely the hard part here, or at least
the differentiator.

The search algorithm, as you point out, is pretty much commoditized at this
point.

------
thegginthesky
Hi! Congrats on the launch!

Using content based recommendation is interesting but requires a constant
scraping for more data. Plus the whole cost of curating the dataset and
guaranteeing data quality can be extra challenging. How are you getting around
these problems?

Also, your approach with A/B Test is interesting, but how would you do it for
smaller shops? Wouldn't it take too long to give appropriate results? Or are
you using a Bayesian Test Methodology?

~~~
antonoo
Yup, you're right! We have very significant infrastructure to have stable and
scalable scraping that ensures the best data quality. This is one of the
things we spend the most time on.

Thanks! Yes, in order for shops to trust the A/B-test results we tend to use
Google Optimize, which use Bayesian inference.

------
notdang
For example I am into coffee roasting at home. The cheapest device to roast
coffee at home is a popcorn maker, not all, just some brands that meet some
power requirements. When I look at those devices on Amazon, they recommend me
green coffee beans, which is correct and helpful. I guess Amazon is using
colaborative filtering. What in this case a content base recommendation system
will do?

------
Boxxed
Without the behavioral data you'll just plain miss out on correlations that
simply can't be described by product metadata alone. Is this a big deal? It
seems like that would be one of the big advantages of collaborative filtering.
I guess that advantage is probably larger on sites with a huge product variety
(e.g. Amazon) which aren't really your audience.

~~~
antonoo
From A/B tests we have seen that having good representations of products is
much more important than tons of behavioral data. In the cases when we do use
behavioral data, what we have found is that unless we use it in conjunction
with our product representations we do not get sufficient leverage on using
it.

------
ssharp
What was the control in the A/B test?

I've seen lots of recommendation algorithms fail against curated
recommendations. It would be really interesting to see where / what this
approach beats.

Would also be really curious to how stores have reacted to the revenue model.
Is that a one-time fee based on the A/B test results or are you capturing a
cut of the uplift in perpetuity?

~~~
antonoo
Great question, we have done many many AB-tests, it's often the case that the
control group are manually curated recommendations. In general, the more
SKU:s, the better result our method yields, since they have to deal with
sparse datasets & cold start problems if they use classic collaborative
filtering based methods. And naturally, if you have more products it becomes a
huge pain to do them all manually, especially if you have a reasonable product
turnover.

------
Winterflow3r
Hey! Grattis! Really happy for the success of my fellow Stockholmare! Can I
ask how do you integrate with your customer's online stores? I've always
puzzled how third party recommender services integrate with someone's existing
shop. Is there some sort of JS widget or similar you add to the customer's
existing site?

~~~
antonoo
Stort tack! Yes exactly, we do our integration + user tracking through a JS
widget we inject on their site. Since most e-commerce companies don't have in-
house tech people, just copy-pasting our script enables us to integrate many
times faster and cheaper than what alternative methods would allow for.

~~~
Winterflow3r
Thank you for the reply! That makes sense as an approach :) Wish you all the
best!

------
swyx
great pitch! i dont work in ecommerce so i cant attest to the appeal of it but
sounds like it might work to layperson ears.

one nit - "we lift revenue by 4-6%" doesn't feel like a very impressive number
(it may be within the bounds of normal noise for a smaller ecom site?). that
said, im very much not an ecomm guy. is this a bigger deal than it initially
reads?

i also feel like recommender systems work much better for netflix (infinite
consumption) than for ecommerce (where if i already bought a shoe i normally
dont want another). perhaps this tech is better applied to _media_ than to
ecomm?

~~~
Exuma
My biz partner would kill to have a 4-6% boost. If you're spending 100K+ a day
on ads thats a significant ROI lift.

~~~
swyx
the tricky thing is attributing causality to random noise. if normal noise is
+/-1%, then yeah sure 4-6% is legit. but if you get 4-6% variation _normally_
, then its kinda hard to tell if the recommender is doing anything or not.
ofc, if it persists over an extended period, then that's cool.

im not saying this has no value, im saying people without a baseline (like me)
have no idea how to assess this tagline and maybe there might be a better one
to pursue.

~~~
Exuma
At high volume ROI is extremely consistent generally, but yes I imagine with
lower volume stores it's much harder to determine that.

------
tariqueshams
It makes sense with a recommended system for customers to spend more money on
a site, but how does this help stores compete with Amazon? Interesting revenue
model.

~~~
antonoo
Up to 35% of Amazon's transactions come from automated recommendations – which
they have become really good at optimizing. We help everyone else reach these
numbers even though they do not have the same volumes of data or technical
expertise as Amazon.

------
Finbarr
Very cool! Most ecommerce stores don't have enough data to do product
recommendations with the existing tools. This is much needed.

------
rrwright
Website link: [https://depict.ai/](https://depict.ai/)

------
eries
Impressive idea - best of luck with it

~~~
antonoo
Thank you!

------
i386
I’d be keen to try this - can you work with Squarespace?
james@wildspiritdistilling.co

------
ianmchenry
How do you use the meta data associated with the images to power better
recommendations?

~~~
antonoo
Thanks for asking. We usually do the tagging implicitly ourselves when we make
our product recommendations. You can add manual rules on top if you want to up
the weight a new collection of products that have just launched or similar,
for example.

------
noteanddata
for this part, "scrape the web", what is the web part, is it other e-commerce
sites including amazon and so on? do you foresee any issues/risks about that?

~~~
antonoo
Yes, mainly other e-commerce stores. Larger sites like Amazon & Alibaba tend
to be extra effective to scrape since they already have good metadata that we
can draw on. We don't currently foresee any major risks, the tide seems to
turn more and more towards a free web in that sense
[https://towardsdatascience.com/web-scraping-is-now-
legal-6bf...](https://towardsdatascience.com/web-scraping-is-now-
legal-6bf0e5730a78)

------
shayankh
why not call it content based recommender systems? also do you think in future
you could turn it into a hybrid system when you have user feedback?

