Hacker News new | past | comments | ask | show | jobs | submit login
Two Decades of Recommender Systems at Amazon.com (computer.org)
230 points by chwolfe on July 5, 2017 | hide | past | web | favorite | 74 comments

20 years and they still think I need more than one vacuum.

ht: https://twitter.com/kibblesmith/status/724817086309142529

Reminds me of Netflix recommendations. They keep recommending I "continue watching" the ~45 seconds of credits I missed from the last dozen shows I watched. Most lists have movies/shows I've already seen at the top still.

I'm sure there's some reason behind it all, whether technological, user behavior, or obscuring a small catalog. I wish I knew what it was.

Or if I watch a couple episodes of a show and decide I don't like it and click the "Stop recommending this show" button, it'll still show up in the "Continue Watching" list. What's the point of the button?

You can go into your viewing activity (only available from your account page on the desktop site, I think) and delete the show from there, and then it should remove it from "Continue Watching".

It's not an obvious or straightforward thing to need to do, but seems to work.

That's definitely available in the app as well.

IIRC, "Continue Watching" shows things you haven't finished watching in most-to-least recent order. There isn't any aspect of "recommendation" to it beyond that; you could go find the worst rated, least recommended content on Netflix, play it for a minute, and it'll be first on your "Continue Watching" list.

Have you ever seen something being _recommended_ after you clicked that button?

It's a UX failure. There's no point in recommending something that I've seen and there is almost no way I can know that I don't want to watch something without seeing at least a little bit of it. For all intents and purposes, the "Continue Watching" list is a recommendation of things to watch. If I thumbs down or ask Netflix to stop recommending it, I shouldn't see it prominently displayed anywhere when I open the app.

When I consider these UX fails I often attribute them to the faith in AB testing. It may not be, but it's common for people to complain about X in the Netflix UI and then have either employees or apologists come in and talk about how there have been 100k variants vetted through A/B testing and say the way it works now is the way most people like it. However, this approach tends to fall flat on cross-cutting and more creatively challenging UX concerns. Sometimes you need a Cambrian explosion.

I have a ton of half watched shows that I keep queued for when I'm in the mood, which is usually in the winter. So it works fine for me. It's a reminder of "oh yeah I was watching that before vacation".

Or Netflix decides to 'hide' a show I want to continue watching, and I have to search for it.

That sucks.

The problem is that the algorithm seems to be unable to distinguish interchangeable items (e.g. vacuum cleaners, lawnmowers, Swiss Army knives) which you need exactly one of, similar items (e.g. different death metal albums, or Hammer horror movies) which if you bought one of, you'll probably want some others, and connected items (e.g. different parts of a course) where you'll probably want all the others.

It's weird, though; seems like an "average time between purchases" associated with each category would be an obvious metric to use in the algo.

A small problem might be that they're probably trying to optimize that metric, i.e. reduce your average time between purchases in each category...

Some fungibility factor attached to each product wouldn't be amiss here

I can certainly imagine a world in which you, having just purchased a vacuum, are more likely to purchase a different vacuum than any random person they just chose to show that recommendation to.

For example, you might return your first one and then want to buy a different one. Instead of letting you forget about it and then months later potentially buying elsewhere, they preempt you and offer new ones soon after.

This is obviously speculation, but I think Amazon could come up with something better if they felt it pressing i.e. recommending the same product category was no better than random products.

But there are so many other things that make more sense: just imagine what a salesman would try to do, and he answer is not "another vacuum". They could try to get me to buy bags to go with the vacuum. They might think I am into cleaning right now and try to sell me a feather duster. It just makes so little sense that their reaction to "bought a thing" is to run ads showing me "same thing you just bought" instead of "things that go well with the thing you just bought"... particularly as they know what those things are as before you leave Amazon's website they are all over "people who bought this tend to buy this also". When I buy a camera they should try to sell me a lens cap. When I buy exercise equipment they should try to sell me weights or towels or mats. When I buy a DVD player they should try to sell me DVDs. Yet for the next two weeks they insist on spending as money showing me the thing I just bought as if I want to buy it again: what the hell? :/

Definitely! They should be using Apriori or some other association rule learning algorithm for this.

They already use these techniques to suggest things before you make the purchase, but it would be great to extend it over a long period of time, like you said. A few months later they suggest carpet cleaner or bags for your vacuum.

Heck, The small online shop where I bought my robot lawnmower does this. Last year I bought replacement knifes for the robot. This year they sent me a mail at the same time, asking if I wanted to buy another pair of replacement knifes, or perhaps other accessories or a new battery. And yes, I wanted another set of knifes - click, done.

Indeed, but that would require common sense. That's a lot of work, as Doug Lenat will readily confirm.

My instinct is that they are willing to live with showing some comparatively weak recommendations occasionally if that means not adding layers and layers of exceptions on top of their basic recommendation engine. And/or they have projected that the ROI isn't there for building in more sophisticated recommendations.

At Amazon's scale, a very very small increase in the effectiveness of recommendations can represent millions in revenue. They probably do have layers of exceptions, and custom-built software used by teams of "editors" to manually manage these exceptions. Not to mention the experimentation system where they are simultaneously testing hundreds of things from CSS one-liners to new ranking algorithms.

It's probably a pretty complex beast with the weight and inertia of a core system that's been continuously in production through 20 years of growth. They probably continue to invest heavily in it, the ROI is there, it's just a really hard problem for them at this point. Their engineers probably curse its limitations and would love to rewrite/replace old and outdated parts, and indeed there will be teams working on that, but those projects will either die out or spend so much time reaching feature parity with the bloated existing system that it ends up looking much the same.

Why would this be an "exception"? The basic item-based collaborative filtering described at the beginning of the article finds items that are likely to be purchased by someone who has purchased a specific item.

I'm pretty sure that it's true that someone who buys a vacuum is much more likely to buy another vacuum. It may seem counterintuitive, and may not be true for you after you just bought a vacuum, but I see no reason to believe that Amazon's algorithm isn't working properly and giving Amazon a lot of value.

It seems like we're all saying the same things - I agree it is likely that someone who has bought a vacuum before is more likely to buy another. But to make the recommendations really shine, it should wait n months before suggesting that again, where n will vary quite a lot by product type - those are the sort of exceptions I'm talking about.

It should not require layers and layers of exceptions to categorize products that get repeat buys (paper towels), similar buys (detective novels), or one-time buys (vacuum cleaners), and make recommendations accordingly.

> This is obviously speculation, but I think Amazon could come up with something better if they felt it pressing i.e. recommending the same product category was no better than random products.

You'd think they'd have so much data on buyer behaviour that they could recommend you e.g. other cleaning products that vacuum cleaner buyers tend to buy like mops, dusters and sprays. I can't imagine how recommending you the same thing you just bought over and over that you're not likely to get again for years is going to yield the best results and even manually selected recommendations would do better than that but willing to be convinced as obviously it's in their best interest to be smart here. You'd think you'd even explicitly build in a rule in your recommender system not to recommend something that has been recently bought.

I wonder if the issue is that essentially, nearly everyone buys a vacuum at some point. So, you end up where the only distinct group of vacuum-buyers from whom you can drive recommendation behaviour is people who use Amazon only to buy vacuums, basically you're clustered into either the entire world or a tiny group of Amazon vacuum fanatics.

Another issue might just be that the recommendation engine updates almost immediately, go to the homepage after taking any action (browsing, buying) and it's already updated. So they might've gone for "good enough, but fast" rather than "accurate, but slow"

Same experience. I ordered a pair of speaker stands from Amazon on Monday.

This morning they sent me an email saying "Amazon.com has new recommendations for you based on your browsing history." It was a bunch of other speaker stands.

Well, do you happen to know how many other people are repeat buyers of similar products? Maybe a large number of buyers end up buying a second set of speaker stands because they don't like the ones they just bought or because they're enthusiasts who gift them or have multiple audio setups. Just because it seems unlikely someone would be a repeat buyer in a given product category doesn't mean it isn't profitable to target every buyer as if they might be.

Almost no one? Obvious anecdote alert, but I don't know a single person who shops on Amazon who hasn't made the exact same observation. Many/most big ticket items are, for most people, a once-every-X-years purchase, and yet Amazon's recommendation system can be counted upon to suggest another TV, another DSLR...it's remarkably terrible for a system that's been under active development for twenty years.

Plus my speaker stands haven't even been delivered yet. Amazon knows this.

Annecdata ahoy: I bought a TV from Amazon and they stopped showing me more TV's after I rated it. Maybe they think you're still in the market and might send those stands/DSLRs back because you didn't rate them.

If Amazon's default assumption is "The product we sold you was garbage, maybe you want buy a different one instead" then I think they have even bigger problems.

EDIT: And if I liked it so much that I wanted to gift it to someone, why show me the 5 competing stands instead of the one I know is good from personal experience?

A person who has multiple audio setups wouldn't need prompting to about it either. If they like the ones they got, they'll order it again. If they didn't like it, they'll look for something else without needing an email about it.

It's an interesting point, but I would hope that they would factor in the negative impact on non-repeat-buyers. Some buyers(at the very least: me) feel remorse or regret over their purchase if they are presented with a bunch of other options which may be better or cheaper than whatever they chose to buy.

Yep, this. I look at something and it recommends it to me for months. I buy something and same thing. It's quite obtuse. Maybe there's something to it, maybe it does a good job growing revenue, but there's noting elegant about it.

> Maybe there's something to it, maybe it does a good job growing revenue, but there's noting elegant about it.

It almost certainly does a good job growing revenue, and they almost certainly care more about revenue than about the perception of "elegance." You say that as if it's unusual or unexpected.

> You say that as if it's unusual or unexpected.

Absolutely not :P. Hey, whatever works. But I think many people on this subthread, including me, are wondering whether something more elegant would produce more revenue.

To be fair, I myself have two vacuums. Maybe buy another and see what happens.

Given that Amazon's system can be described as: "... for every item i1, we want every item i2 that was purchased with unusually high frequency by people who bought i1", I'd say it's completely rational that their system recommends you vacuum cleaners if you've already bought one.

I bought an expensive camera in March, they still insist I need 5 more from different vendors.

The suggestions are probably based on browsing history rather than on buying history. It's not that surprising; you likely browse for multiple items, which helps make inferences about what you might want to buy. You only buy a small fraction of the items you browse, much of them unrelated. Taking one of those points (your purchase) and guessing that it belongs to a cluster of points that shouldn't be suggestions is not obvious. Further, basing suggestions on purchases might give terrible results given that your purchases are mostly unrelated.

It's certainly an annoying behaviour - but my personal suspicion is that it arose from the system's origin as the recommender for a bookstore.

I mean if I read a book by P.G.Wodehouse it in fact IS likely that I want to read another similar (but not identical) book by P.G.Wodehouse.

Quite how they managed to hang onto this quirk is a different question that I can't even fathom the answer to (I don't believe they haven't noticed and I find it very hard to believe this is actually the optimum sales technique for vacuum cleaners!)

Maybe they've worked out that showing people more of what they just bought makes people go on and on about Amazon's recommendation system all the time.

And when people are complaining about something they endlessly repeat "Amazon Amazon Amazon Amazon".

When you have nothing to complain about you're less likely to evangelise.

In some senses, their reasoning is rational. Some models work for some situations better than others...that recommender model might work exceptionally for someone trying out different brands of the same consumable product (like Natural Deodorants, for example). For other products, an accessory product recommender model might work better (like for Vacuums and vacuum attachments). And since the number of scenarios is infinite, you might as well try as many as possible and stick with all of the ones that show promise.

Which is fine until 90% of your browsing consists of scrolling past different recommended product widgets, which seems to be the case at Amazon now. At some point, taking different working models and trying to work on converging their benefits into a more monolithic model would be a huge benefit.

Don't think there is much rationale behind all these models. It's more like P(would buy a vacuum | bought a vacuum) > P(would buy X | bought a vacuum) where X is a single product. Now P(would buy a vacuum | bought a vacuum) < sum(P (would buy X | bought a vacuum)) for X that is not a vacuum, but what would be the recommendation? Hey, you bought a vacuum, come back and buy some non-vacuum stuff?

For most recommendation UIs, you would need a hero item that make people want to click on. It might turn out that another vacuum is probably the best item for some people to click on, and go on to buy other stuff once they are on the site.

The reason you see such an obvious false positive in this case isn't because people who bought vacuums are likely to buy another, but rather that people who look at vacuums are likely to buy a vacuum, and the model hasn't accounted for whether you've already bought one.

A different recommended might use different types of conditionals (items bought instead of items looked at, for example), and also have success in different areas (like recommending iPhone cases for iPhone owners). In order to converge the models in a Bayesian framework you'd have to deal with the combinatorial explosion of products and event conditionals which might be pretty gnarly. But some convergence work would be better than none, otherwise you end up with 20 different recommender widgets on a page.

Overall I don't think amazon's approach to date has been bad...it's just time to clean up a bit.

Why would you need different models like "accessory recommender"? If they have enough data to conclude that purchasers of itemA are highly likely to purchase itemB, why not just recommend itemB, even if itemA and itemB are both vacuums and a few people will complain about that on technology forums?

We always had three vacuums in the house growing up: a main canister vacuum (kept in a closet), an upright for the hard kitchen floor (kept handy), and a dustbuster (also kept handy).

Three tools with different uses.

my personal ultimate example of this was I bought a lawnmower. and then for two months, all of my amazon advertisements, the ones following me around the web, were for lawn mowers.

even a vacuum I could see some weird "yes, maybe I could buy one as a gift." but a lawnmower? no.

Just for months? Amazon keeps recommending me the same 90s Radiohead albums based on stuff I bought a decade+ ago. Retargeting can be so on point these days so I don't get why Amazon's recommender systems have been horrible for years. All they have to do is to peek at my cart, which I use as a wishlist, to figure out what I'm currently interested in.

Well, if you bought some particular brand I won't name, you will need one per year.

I had read this paper, last week and took notes. Here is my summary of the paper. It was a fun read. http://muratbuffalo.blogspot.com/2017/07/paper-summary-two-d...

> initially I was worried I wouldn't understand or enjoy the article

Surprised to read this coming from an engineering professor. I guess we mortals have more in common with professors than we think :) thanks for the humility.

Thanks for these and other posts and generally putting in effort and making the commentary publicly available.


20 years, and I still can't see a chronological listing of the other books in a series when shopping for my Kindle.

Sometimes it works surprisingly well, I think I remember a presentation where somebody demonstrated that searching for components of a meth lab on amazon would have amazon recommending to you other components to help you finish the lab.

I investigated different algorithms for recommendation systems once and was astonished to find that the best one was also the most simple one: Jaccard similarity. No other similarities nor a proprietary custom built algorithm (which was the actual target of the investigation) could beat it.

This implies that you know all the relevant attributes of the items you're recommending. For something like music, books, or movies, there might literally be thousands of them, most of which are unidentifiable, any of which might differ in importance from one item to another.

I built a collaborative filtering system (http://web.onetel.com/~hibou/morse/MORSE.html) and didn't have to worry about who directed a film, when it was released, where it was filmed, its budget, who starred in it, or what its genre or plot elements were. All the relevant information was implicit in how the people who saw it rated it out of ten.

Thank you for the link, I intend on studying it carefully this weekend. I may want to implement something similar using board game data from boardgamegeek.com (I've already been working on a recommendation system). BGG has a 1-10 point rating scale (using floating point numbers), so I suspect the method described will fit.

My only concern is the number of mutually rated titles between users in board gaming is probably lower than movies. Which I suspect will reduce confidence rates.

The current approach (with a Jaccard system), requires a degree of human intervention and only works on some users. It meets the goal of recommending titles for me, but it'd be nice to expose the system externally.

Could be interesting for you: https://en.wikipedia.org/wiki/Netflix_Prize

Funny fact: 4 years ago I was looking for something what I would call today a "transitive similarity index" to measure similarity of shopping baskets of brick-and-mortar stores. Because I didn't find anything I "invented" my own algorithm. This year I implemented the same algorithm in a system which is live since yesterday. Googling up the Jaccard similarity I just realized that I re-invented a transitive version of it. Now my baby has a name. :)

Yep, I am definitely going to buy a $800 camera the day after I bought a slightly better one!

Related: Deep Neural Networks for YouTube Recommendations https://research.google.com/pubs/pub45530.html (Discussion: https://news.ycombinator.com/item?id=12426064)

anecdotal, but I've been their customer for all those years, and besides the few books early on, absolutely one hundred percent of their recommendations was utter useless.

I bought some embedded system accessories two years ago for college and I am still getting offered embedded system accessories.

Amazon's recommender system is garbage (for me).

I use the amazon app to price match at another retailer.

The last scan or search always is somewhere in a recommendation (& one of my game apps)

If you want to see a truly terrible recommendation system try aliexpress.

Indeed. But then I end up looking at things I would never have searched for. Today I was looking for a UV sensor module and searched for "8511", a portion of the part number for a commong one. First 4 results were UV sensors. Next one was " Women Police Costume Uniform Secductive Role Play Sexy Lingerie Sex Outfit Babydoll Fancy Halloween Costume Sexy Cop Outfit", followed by "Synthetic Druzy Mineral Stone Double Flared Saddle Ear Gauge Wood Flesh Tunnel Plug Piercing Body Jewelry Expanders". To be fair, I should have included "uv sensor" in my search phrase. But then I would have missed that cop outfit.

Just in the event someone out there is looking to work on these types of problems, Amazon's Personalization teams are absolutely hiring.

Homepage: https://www.amazon.jobs/en/teams/personalization-and-recomme...

Applied Science: https://www.amazon.jobs/jobs/372996

Software Engineering: http://www.amazon.jobs/jobs/549950 http://www.amazon.jobs/jobs/402127 http://www.amazon.jobs/jobs/430623

Software Managers: http://www.amazon.jobs/jobs/385902 http://www.amazon.jobs/jobs/437405 http://www.amazon.jobs/jobs/489879

If one of those don't strike your fancy but you are passionate about working in this space, feel free to send me an email: ${HN_USERNAME}@amazon.com.

Just think: all those things you can't stand the Amazon's recommendation engine does, here's your chance to fix it for everyone.

Given how long some of these problems have been going on, it may be a chance to learn why it's not possible to fix them within their current organizational structure.

"Oh wow, so you have discovered that after someone buys something, and you keep recommending the same thing, a large number of people go out of their way to buy other stuff to remove the recommendation. So we actually make more money recommending the last thing that they would want to buy?"

Are they hiring a product manager?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact