
Are We Making Much Progress? Analysis of Recent Neural Recommendation Approaches - sndean
https://arxiv.org/abs/1907.06902v1
======
olooney
Eerily reminiscent of the replication crisis in psychology and the social
sciences. My key takeaways:

1) Half the papers couldn't be reproduced on a technical level. Publish your
code _and_ your data, people!

2) Most of these papers uses "weak baselines" so they can show some kind of
improvement and get their paper published. I'm conflicted about this because
if we require every paper to beat state-of-the-art, we'd (collectively, as the
entire discipline) be lucky to publish one paper a year. From one point of
view, these papers actually represent a form of publishing negative results -
we tried this and it _didn 't_ work - which isn't a bad thing. But the biased
way its presented makes it harder to separate the wheat from the chaff.

3) It's not obvious that we're going to squeeze any more business value from
this particular stone. Sometimes all the useful information in a dataset can
be found with a fairly simple algorithm. Not everything benefits from a more
complex representation, and sometimes you can't fix that with regularization
or more data. Sometimes you just have to use the simple model and accept that
it captures all the signal that's available and everything else is noise.

~~~
joe_the_user
_It 's not obvious that we're going to squeeze any more business value from
this particular stone._ (ie get more value from recommendations)

I could even argue "relying primarily on recommendations is a dark pattern";
In many platforms, the approach is replacing good search with good
recommendation. Essentially, you don't get _say_ , to explicitly specify your
results, instead you get an opaque mix of "things you want and things we you
to want" and you're supposed to be happy with this. You can see this in action
in Amazon-recommendation, youtube and Google - actual control is replaced by
"we know what you're thinking" (which indeed works some of the time but is
infuriating when it doesn't and leaves lots of room nefarious effects, so
Youtube fascist/extremist propaganda effects, etc).

And someone always pipes in here with "users are dumb and can only deal with
recommendations since they can barely figure out toasters". Well, sites kind
of need to educate their users, users actually have learned a bit given the
dominance of the Internet over the last 30 years and a general awareness of
the problems of recommendation-reliance can contribute to changes just as now
a lot of supposedly not-dumb developers and managers think of recommendations
as a benevolent or just neutral approach.

~~~
IfOnlyYouKnew
That's a rather cynical take. The objective function is clearly defined as
"giving recommendations the user agrees with". As such, a well-working
algorithm is clearly superior to search, almost by definition.

Here's an optimistic cultural take: good recommendations take you outside your
bubble. It's _not_ following Return of the Jedi with Episode I, but changing
to something different, that you enjoy even though you would not have expected
it.

~~~
joe_the_user
_That 's a rather cynical take. The objective function is clearly defined as
"giving recommendations the user agrees with"._

I wouldn't call cynical at all. I view it as idealistic. The user should be in
charge, the user should have tools to formulate a query describing what they
want. Doubting the value of something other than the user being in charge goes
with this. You can say we're each optimistic about something different but
then we come to objective comparisons.

 _Here 's an optimistic cultural take: good recommendations take you outside
your bubble. It's not following Return of the Jedi with Episode I, but
changing to something different, that you enjoy even though you would not have
expected it._

The thing consider with your comparison is - most Youtube recommendation _don
't_ give anything new at all. They usually some other thing from whatever is
considered the top ten. Google gives ten mainstream movies, Youtube gives top
ten of whatever song-era you are looking at. The engines _never, ever find
"hidden gems"_. The rise of recommendation engines resulted in a
homogenization of the web - we have all experienced this. And I'd claim better
algorithms can't change since the algorithms just don't have enough
information to why a user like song/movie/product X; for just songs you have
multiple qualities that different people look for (etc. back to the GP and
OP's fine arguments).

And further, if a user _wanted_ something new, you could have a "choose at
random" button that let them know what they were getting into. Further, I
don't think a certain amount of recommendation within tool-type processes is
bad but that can devolve. Google rose by being more likely to indeed get
people what they were thinking of but that vein has mined to the point that
all that's left is ... low quality trash. But I'm optimistic there are ways to
do better.

------
azhenley
To answer if we are making much progress, here’s a quote: “Specifically, we
considered 18 algorithms that were presented at top-level research conferences
in the last years. Only 7 of them could be reproduced with reasonable effort.
For these methods, it however turned out that 6 of them can often be
outperformed with comparably simple heuristic methods, e.g., based on nearest-
neighbor or graph-based techniques. The remaining one clearly outperformed the
baselines but did not consistently outperform a well-tuned non-neural linear
ranking method.“

~~~
kariluoma
The method that outperformed the others was MultVAE [1]

[1]: [https://arxiv.org/abs/1802.05814](https://arxiv.org/abs/1802.05814)

------
anjc
There are fields (recommender systems in particular in my opinion) where you
will not get published if you report a strong baseline and your approach does
not outperform it, but you may get published if you use weak baselines and
your approach outperforms it. This doesn't make sense to me as surely it's
possible for promising new approaches to not yet outperform current state of
the art methods.

These peer review aberrations incentivise these bad research practices I
think.

------
yorwba
Previously:
[https://news.ycombinator.com/item?id=20495047](https://news.ycombinator.com/item?id=20495047)

------
joker3
Most papers in most disciplines end up not being worth much, and the
incentives regarding publishing hold across all of computer science. Is there
really something particularly bad going on in deep learning? Or is it just the
usual process?

~~~
newen
Most papers are slight improvements, or barely an improvement after
selectively comparing against other methods or using particular datasets that
work well with that method, or after mangling the results so much with
vagaries and statistics that the results look kind of good if you squint a
lot. That's just how it is because 99% of academics are not geniuses but they
still need to graduate or get their tenure. But the papers of non-geniuses
provide ideas and an environment for the geniuses to publish their work and
flourish, so it works out in the end.

