Hacker News new | past | comments | ask | show | jobs | submit login

Yes, the algorithm accounts for the exploration-vs-exploitation problem.[1] Right now it uses a simple epsilon-greedy strategy: 20% of the newsletters you receive are picked completely at random. I also use a technique I call "popularity smoothing," which limits the number of times that popular newsletters can be forwarded.

I think the concerns about ML causing echo chambers/other problems, while not completely unfounded, are overblown (perhaps due to the overall anti-big-tech sentiment). I think human behavior plus ease of sharing online is a much larger factor. I'm optimistic that ML filtering can actually help people get a much larger variety of information, which is one of my goals for this.

[1] https://en.wikipedia.org/wiki/Multi-armed_bandit




Thank you for sharing your approach, I find it very interesting. I've worked in some systems that also optimize for "serendipity"[1], which is just optimizing not only for ranking accuracy but also for adding new relevant content into the mix. (I believe you need a lot of data for this to be viable though)

> I think the concerns about ML causing echo chambers/other problems, while not completely unfounded, are overblown

I disagree that it's overblown. This is a widely discussed topic in ML research, especially around recommender systems [0]. While I do agree that ML systems have enormous potential in augmenting human capability, we should be addressing possible flaws such as the one mentioned.

I'm personally interested in how to solve biases in machine learning systems, as it impacts so much of my professional work. But I also think bias, or echo-chamber, isn't unique to ML since we see it so much in the world and institutions that surround us, but we are in a unique position to address these problems directly on the systems we create.

[0] https://arxiv.org/abs/2010.03240 [1] https://link.springer.com/article/10.1007/s11390-020-0135-9


> I disagree that it's overblown.

Fair enough. To be clear, I'm not saying ML bias isn't a significant problem; and certainly I'll continue to address it as the project grows. Based on this, it sounds like we might be in agreement mostly:

> But I also think bias, or echo-chamber, isn't unique to ML since we see it so much in the world and institutions that surround us, but we are in a unique position to address these problems directly on the systems we create.

I often run into people (usually not ML practitioners) who appear to think that ML inherently results in filter bubbles/bias (as opposed to specific ML implementations) and thus think the entire approach should be abandoned; whereas I think ML, with a proper focus on reducing bias, is one of our most promising options.


Instead of 20% being completely random, could you instead present some newsletters that are completely opposite to what they prefer? For example, if someone actively reads conservative political commentary, adding in some random newsletters about cars or travel doesn't improve bias (correct me if I'm wrong). It would not be better to offer some percentage of liberal political commentary? And vice versa of course. My suggestion is imperfect because I'm leaving out the middle/moderate viewpoint and touting only extremes, but I'm ultimately trying to drive toward not diversity a of topics but diversity of viewpoints on interested topics.


Something like that could be an option; see also this other comment.[1] However every time I've thought about "hey, what if we made the 20% better by...", my conclusion has been "oh, but then we might never recommend [items with some set of properties that should get recommended], and the algorithm won't be able to correct itself." Any time you try to optimize in a certain direction, you'll always introduce at least a little bias[2], and randomness at least gives a chance of breaking out of that. So something like this could be a good idea, but I wouldn't want to have it completely replace random recommendations.

[1] https://news.ycombinator.com/item?id=27669863

[2] This experience has affected the way I perceive advice from people on the internet


Out of curiosity, I wonder if newsletter diversity would be improved if you made suggestions by clustering the newsletters by content (similar to your other comment about seeding the network), but picking from the newsletters furthest from any cluster


Yes, I would like to do that--combine it with the rating prediction, so e.g. if two newsletters both have a predicted rating of 4 stars, prefer the one with content that's most dissimilar to the users' positively rated newsletters so far.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: