Hacker News new | past | comments | ask | show | jobs | submit login
Instagram’s Explore Recommender System (instagram-engineering.com)
109 points by YoavShapira on Nov 26, 2019 | hide | past | favorite | 55 comments



It can go pretty wrong - look at a picture of a gun one time and IG will start shoving gun pictures down your throat, it's horribly non-forgiving and can't be tuned by the end user easily. You learn pretty quick to never look even once at something you don't want a whole lot of that same thing force-fed to you - on the other hand, start looking at huskies and you'll get tons of puppers filling your explore. :)


Agreed, it would be preferable if you could turn it off altogether and just get a wide variety of popular posts as that seems to match the title of 'explore' more closely. I'm into lifting but my explore has turned into a flood of fit instagram models.


> I'm into lifting but my explore has turned into a flood of fit instagram models.

This has been my experience as well. I end up using the "Don't show images like this for this hashtag" feature but they always seem to come back.


This is so amusing to me. I imagine their team goes to great lengths to obtain training data, clean it up and feed it to their ML algorithm.

Yet the most critical input data - direct feedback from the user on the traits that shape their interests, is completely ignored/not collected.


Presumably IG is optimizing for something different than what you as a user would be optimizing for. Possible candidates would be “engagement” or sad impressions or clicks. As a user you probably don’t care to maximize how many ads you see or even how much you use the app. You just want to see things that interest you. And of course there’s absolutely no reason why IG would optimize for that.


You may or may not be right. I don't use instagram because of this. But how could they possibly know how relevant the data is to even their model since they do not even collect that data in the first place?


Well, as others have noted, apparently they do have some affordance for that. But in general, the argument "why don't they collect data X to see if it's relevant to their goals" could be applied to literally any possible data X. Of course intuition must be used to focus data collection efforts.


This is one of the reasons why I would never use such an app. If a tool would rather guess what I want (and obviously get it completely wrong) rather than ask me then I consider the tool defective and stop wasting my time with it.


That is one of the things I like about the YouTube one - the option to remove from your view history seemingly does have an effect on the recommendation engine.


> seemingly

It definitely does. The homepage goes back to recommending a mix of my subscriptions and content generally related to them.

I like to reset my Youtube viewing history a couple times a month to see which direction my viewing habits will take my recommendations this time. I'll often ratchet into new territory. This month it was Warcraft 3: Reforged gameplay (Grubby), a game I haven't played in 15+ years.

The month before it was fiction book review channels. It's fun to change it up.

In Youtube's recent homepage update this month, they also rolled out a "Don't recommend this channel anymore" button on the dropdown which is very welcome.


Thanks for this suggestion — will try this as a way to escape the ‘filter bubble’. To clarify: do you reset completely, retaining absolutely no likes, subscriptions, etc., or do you find it sufficient to clear your watched videos?


That happened to me! For some reason IG decided that I was super interested in barbers of arab countries (what I assumed were arab countries) and it was constantly showing me barbershop videos and fancy men's hair cuts. I had to block all those accounts just to stop the spam (and thankfully they weren't that many).


Same thing happens with youtube, at least in my experience. The time i click by mistake on a video that really doesn't interests me i feel like i'm doomed.


If your watch history is on, removing that specific entry usually fixes the problem. Though, if your watch history is off and empty...


Thanks for the hint, gonna clean something


Youtube seems heavily biased towards things I’ve viewed most recently on the homepage. I actually like Youtube’s recommendations more than any other website, especially the similar videos section on video pages themselves. It seems to recommend genuinely related videos and usually ones with a certain base level of ‘quality’. It’s good at things like picking out dates in the video title and listing other videos for that date/event.


I got hacked at one point, and I really should restart my account it's so bad. Every time I pull it up it's just big butts...


Instagram's Explore page is encouraging narrowcasting, which is arguably very detrimental to society/users in general.


There is a "see fewer posts like this" button (as mentioned in the text)

But yeah it can be off-putting.


This feature currently isn't present in either the web or pwa version of Instagram.

That aside, my personal experience would be vastly improved by having a "hide all images with text in it" option.


This option is not working for me at all.

I wonder if it's because if I see a thumbnail of something unwanted, I first have to open the post in order to access the menu, so the the app counts it as a view first, adding the unwanted post to things I "engage" with.


Try long pressing and then sliding the post up. You'll get the same menu without popping into the full post view. I don't think it counts as strongly, because I've been able to eliminate a lot of things this way.


I think it is driven by likes and bookmarks, if you browse your likes and bookmarks you might see the source of certain recommendations, and unliking and unmarking should control it. (I don't actually know, but this is how I believe it works.)


Not entirely. Likes and saves increase the chance of seeing some types of content - but i also started seeing guns and kardashians at one point, without ever liking a gun post or a kardashian post. So simply viewing things will cause them to show up


In my experience that button does nothing to prevent the spam of images and videos of a certain topic.


If instagram's product KPIs were more inline with what I wanted as a user, this would be great...but they're not and my explore feed is frequently filled with models, child musical prodigies, and other popcorn-esque content.

Compared that to Spotify, whose goal I presume is to get me to listen to more music and buy tickets and merch through their occasional marketing.

I'm a music snob but damn does Spotify get me great recommendations on new releases, my discover weekly, and more. Not only that, I've bought tickets through their frequent listener promotions probably more than 10 times at this point.


I've had terrible luck the past couple years with Spotify's Discover Weekly. Last time I remember it being good was Fall 2016. Now my "Discover" Weekly playlist has me "discovering" the same exact songs over and over.

I've been pigeonholed way beyond what I thought possible. Do other users really engage with the same 10 songs over and over and over that this is the default behavior of their recommendation engine?

I get "Discover" tracks which are from the same album I have downloaded to my phone!


Spotify's Discover Weekly has improved a lot since I started following different Artists. At the begginning it was more like what you just said everytime the same music over and over until I started following new Artists.

The same thing applies for New Releases.


Good call. I agree with you and I don't feel that requiring that level of user engagement should be necessary to grant them a decent experience.


Yeah I'm getting this problem as well.

I need a better diverse and robust recommendation system from Spotify (at this point I'm addicted, I listen maybe 5 hours on average) and a lot of times I get the same n number of songs again and again.

99% of the weeks Discover Weekly has come out, they have 1 or 2 real nice songs but the rest are the same "garbage" I've been listening to for a while.

Anyways, I agree, Discover Weekly needs a revamp.


i have a separate problem. i got a puppy and there was a time where i had to just put in headphones while i was crate training him, and listen to rain sounds to fall asleep, instead of his cries. months later, all i get on my discover feed are a bunch of mellow rain/sleepy songs. I dont actually listen to that crap, i just want it periodically for certain things. there needs to be a way to tag songs as "DO NOT CONSIDER" for the discover feature


Similar situation here, I use spotify for music during D&D games (you can find some amazing playlists on reddit / dnd forums), but now 20% of what's in my mixes is generic fantasy and halloween background ambiance.


What sort of KPIs would you prefer? I work on similar things at IG and curious what you would suggest.


I think it’s inherent in a free ad driven product that I spend as many minutes per day browsing through images and videos. It’s like television; it’s not like IG wants me to look at one high value image and call it a day. If I listen to music all day however, I am still going about the rest of my life. Instagram success means my direct attention...direct attention is easier to get with racy photos and surprising content.


It's so interesting how powerful word embeddings (or in this case account embeddings) are. This reminds me a bit of the 538 article[1] about doing math on subreddits - I'd be interested to see what sort of math you could do on instagram account embeddings. What happens when you subtract two celebrities from each other?

Also, I'm curious about the tradeoffs of revealing this information - does knowing this make it easier to game the instagram algorithm? From this article I'd think that having a more narrowly targeted account (for example someone putting selfies on one account and landscape photos on another) might make their embedding more similar to others. Another thought is that maybe someone liking a bunch of things unrelated to their content would make them wrongly appear in certain explore pages.

[1] https://fivethirtyeight.com/features/dissecting-trumps-most-...


Is this really AI? This seems like simple classification and ranking. Honestly I didn't see anything new in there that hasn't been around for the past 10 years. KNN? NDCG? That's entry level ML. TFA does throw around neural networks a bit, but doesn't go into any detail.

EDIT: Maybe I'm just thrown off by the "Powered by AI" part of the article title. I was expecting more I suppose.


Why would something need to use techniques less than 10 years old to qualify as being AI? The field of artificial intelligence has been around for well over 50 years. Long before the most recent boom in AI research and marketing, AI has been a serious academic field and has been taught as part of a standard computer science curriculum. Very basic techniques like A* search are widely recognized as being AI techniques.

There’s really no point in having a semantic argument, but just know that if you wish to do so you are arguing against many decades of wide usage of the term.


the concepts are simple but doing it at scale is hard. there are some good insights on scaling here:

* scaling to more engineers/products: IGQL is an interesting way to compose ML pipelines with straightforward syntax

* scaling to more ranking candidates: an active user with a large follow graph who loads the explore tab likely has millions of eligible candidates - how do you load those fast? the idea of using a "distilled" model as a first, light ranking before using a full model as the final predictor is a good intuitive idea that I haven't seen described before.

* scaling KNN is hard: FB has done interesting work to make approximate nearest neighbor search fast, and opensourced it (the FAISS library which is referred to in the post). the improvements here are certainly non-trivial.

* scaling to more users: creating useful general purpose user embeddings is hard!

* scaling to more objectives: instagram has many business objectives, e.g. likes, follows, minimizing hides, so there is a need to have multiple models making many predictions. There is also a need to weight them intelligently, which is where the Bayesian optimization libraries come in.

in some sense, nothing is truly AI, but this is useful work which you can learn a lot from.


Just read it as "Powered by big data" or "Powered by software" or "Powered by mathematics". It sounds fake and dumb. If it sounds fake and dumb to you, a ML practicioner, then maybe this article was not written for you. If you read the article itself, you get the feeling that AI was tacked on at the last minute, likely with an intent that has nothing to do with knowledge sharing.


True. It sets rightfully under the engineering category, because all this does is just engineering. At scale is challenging, but with Facebook's resources, it is more like problem waiting to be tackled, but the stakes are not high, because it CAN be tackled, just how much resources you want to pour into.

What makes this mildly interesting is the IGQL, but again, without knowing the full syntax, it feels pretty restrictive.

Youtube is doing much more advanced stuff, like Reinforcement learning@Scale, as comparing to Instagram in this regards.


Anything moderately related to software and data is called AI now, because that's what gets the upvotes.


AFAIK classification, deep learning, ML, etc. all are valid subsets of AI


It's more like complex linear regressions. The term "AI" evokes the idea of computers possessing something similar to human intelligence which it very much is not. We don't even properly understand how the human brain works.


There’s no need for AI to actually work the same way that the human brain works, and that’s never been part of the definition of “AI.”


Yeah i am totally with you. AI can mean linear regressions and often does.

"look Ma i'm writing my own AI algorithm!!!" ... "writes linear regression by hand in python"


During a recent discussion on this topic, this emerged with regard to "what is AI":

A. System should, without prompting, identify areas of improvement and innovation

B. Automated collection of data and the processing thereof, combined with application towards a concrete goal — does not qualify under A.

C. Part of A. is willing and unwilling discovery and exposure to both benevolent and adversarial environments and operating conditions


This is just moving the goalposts around so that you can take advantage of the hype boost you get from calling everything "AI".


Embeddings have only really been popularized in the last few years. The 'original' word2vec paper was 2013. It's really only been maybe two years that using embeddings on other things has been popularized.


http://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf

Bengio has a paper in 2003 that describes almost the same idea as word2vec (CBOW model to be exact).


I've worked at a large e-commerce company before and it's surprising how many of the basic techniques like embedding, seed accounts, round robin diversification are exactly the same. I used to wonder if some of the apparent idiosyncrasies in our system were shared by other companies. It's uncanny how much of it is industry standard.


Yeah, i have come to a similar conclusion. It is also very clear that academic papers are unless simple enough to be understood (like word2vec) they rarely make it to production.


So IG ins't using collaborative filtering? The whole process starts w/ simple NN search in the account embedding space. Those candidates are then passed to the ranking stack.

This makes sense w/ what I see in IG recs: past behavior is strongly reinforced w/ littler diversity. Filter Bubble/Pigeon Hole problem.

So in conclusion, I would argue that the IG explore tab doesn't have ANY explore at all!


Interesting, and looks like the present choices have evolved over time.

Seems like they are moving towards a structured RL implementation. There are elements of it, a follow-up post on some components would be interesting.


Honestly...it feels pretty standard stuff.


[flagged]


smart. nobody questions it. corporate spends millions making shitty ADs that basically say AI will make our farmers better and therefore help the world. I love when someone asks me about AI and i'm like yeah it's a pile of shit that nobody understands but sells like hot garbage. Now before any hardcode deep learning bros get at me, sure there are cutting edge AI applications and science stuff going on. But i'm talking about commercialized shit that sells AI like a feature. It's just a pile of poop but people gobble it up.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: