

Building the Next New York Times Recommendation Engine - jprob
http://open.blogs.nytimes.com//2015/08/11/building-the-next-new-york-times-recommendation-engine/

======
flashman
I'm really impressed that NYT took the time to document this. It's always
interesting to see the different recommendation models evaluated and applied
to real-world situations.

I've been pursuing a collaborative filtering approach to product
recommendation lately ('people who bought this also bought that'), but perhaps
LDA would let me model our products based on their metadata ('people who
bought products broadly like this also bought products broadly like that').

------
muktabh
We make a contextual recommendation engine as a service for online publishers
at our startup ParallelDots. We discovered the problem of tags not really
working well for recommendations on our clients websites too. We ended up
using unsupervised word embeddings and auto encoders on top of them to solve
the problem. We dont still use it for personalization though, just
contextually similar articles. Great seeing some of similar problems being
solved at New York Times too. :)

------
ThomPete
The way I see it, the primary thing to solve for any recommendation engine is
to optimize for serendipity. I.e. allowing you to get information you didn't
know you wanted.

This means basically also finding ex. articles that are not written by NYT.

Newspapers problem is that their primarily omnibus approach to whats relevant
isn't really doing the waste amount of insightful information available that
exist out there.

So the whole issue IMO with all newspapers/media these days. They are building
silos where none should really exist and this is one of the primary the reason
why people don't consider it valuable anymore.

~~~
volaski
Maybe you had no idea but what you describe already exists in the form of
native ads and recommendation widgets like outbrain. And here's what I do when
I run into them: I rarely click them. When I'm on NYT, I don't want to click
out to some "recommended" website that doesn't have high journalism integrity
as NYT (Let's not get into a needless argument of whether that itself is
correct or not). My point is, I disagree with your argument that "different
source" has anything to do with serendipity. Also you say silo shouldn't exist
but I don't see a reason why. Sure there are clearly cases where certain
companies siloing up their user's data is bad for humanity, but in this case
it doesn't even make sense (what even is "silo" in this context anyway?).
People create silos because of demand. Imagine if NYT started opening up and
let any random guy on the web write articles on their front page, what did
they gain by "opening up" their silo? Most readers of NYT are there exactly
because it's a silo that guarantees certain degree of quality. Once they start
"opening up", you'll probably be the first to say "yeah New york times is done
now, it's all low quality now"

~~~
ThomPete
I don't think we are talking about the same thing.

To get sense of where I am coming from i would like to refer to some of my
writing on the subject.

[http://000fff.org/#/slaves-of-the-feed-this-is-not-the-
realt...](http://000fff.org/#/slaves-of-the-feed-this-is-not-the-realtime-
weve-been-looking-for)

and

[http://000fff.org/#/how-to-think-like-facebook-and-
twitter](http://000fff.org/#/how-to-think-like-facebook-and-twitter)

It's about something slightly different than what you seem to imply, sorry if
that was imprecise.

------
bcaine
Fun read. Topic modeling can be fascinating to work with.

Curious how they measured performance of their model, and whether they found a
"best" number of topics for LDA where their model stopped getting much benefit
by having more topics.

I'd imagine increased number of topics would have some interesting side
effects where it would create too narrow of recommendations.

------
doppenhe
We have built this and anybody can use it
[https://Algorithmia.com/recommends](https://Algorithmia.com/recommends). 2
lines of js to implement.Currently serving the geekwire.com recs. You can also
modify it further (see blog.Algorithmia.com).

The article is awesome though good on NYT.

~~~
bydamn989
Nice product you have there. Not only did it take 10 minutes to run, it also
managed to return no results.

~~~
doppenhe
That should not be the case, sorry you had a not great experience. Please
share the url you used at Diego at Algorithmia for com so we can debug and get
back to you.

------
ersii
I think it'd be great if you'd have this kind of information in your help
section later on, for anxious people like me who are very wary of even having
a recommendation engine at a news paper. I was actually on my way to sign up
for a subscription after reading "A Renegade Trawler, Hunted for 10,000 Miles
by Vigilantes" by Ian Urbina - but held back for the moment to give it more
thought.

That said, I guess I could see a point in it maybe retaining users /
subscribers if it's good enough. (I'd still appreciated it a lot more if this
functionality could be turned off for users who request it though).

~~~
buckbova
My first reaction to your comment is that you're overreacting and targeted
news stories based in topics you enjoy is a great thing.

But after some thought on the recommendation engine, this seems more like a
confirmation bias engine. Not something I'd want from a "news" source.

~~~
untog
I'm not really how it would be confirmation bias. The NYT doesn't have
multiple stories on the same topic with differing conclusions that it A/B
tests with.

As the article states, it'll suggest articles about Hillary Clinton if you've
read articles about her previously, but it doesn't say it'll only give you
positive ones. There _is_ a chance that it'll narrow people's interests (if
you only read sports, for example) but that already happens anyway.

