
Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform - inputcoffee
https://research.fb.com/publications/horizon-facebooks-open-source-applied-reinforcement-learning-platform/
======
AndrewKemendo
This is not a research paper. In fact, most ML papers aren't research papers.
Compare the FB paper to the first result in biorxiv under the genetics heading
[1]. There are basically no similarities other than being done in LaTeX. I
never expect a research paper to talk about how the research affected business
processes, but again this isn't research in any traditional sense.

What this is, is documentation of how Facebook implemented a technology stack
that uses reinforcement learning techniques to do something. Namely:
"Notifications at Facebook"

So what can other developers and business owners take from this? I don't see
anything about the down stream product impacts. Does it impact conversion to
paid rate for users? Does it reduce human labor? How does it improve benefits
to users. All I see them write are two things:

"We observed a significant improvement in activity and mean- ingful
interactions by deploying an RL based policy for certain types of
notifications, replacing the previous system based on supervised learning."

I'm sorry but there is absolutely nothing rigorous in that statement. How are
"meaningful interactions" defined? Hopefully they aren't still arguing the
formula (more interaction = makes users better off).

"After deploying the DQN model, we were able to improve daily, weekly, and
monthly metrics without sacrificing notification quality."

Improve for who? Well obviously Facebook and how much activity people have.
Not necessarily if the user is actually getting more value from it.

What's the Return on Investment for this system?

Listen, I'm a huge fan of being open with business practices, research
etc...I'm also obsessive about RL and making progress in the field.

What I can't stand however is lack of rigorous and tangible proof of how we're
making things better for users or the society broadly with RL yet, or even in
most cases getting positive ROI for the effort we're putting into ML/DL.

I've built these tools at scale so it hurts to say this, but the economics
just aren't yet lining up here across the entire ML/DL industry and that has
me worried that another AI winter is coming.

[1][https://www.biorxiv.org/content/early/2018/11/01/422345](https://www.biorxiv.org/content/early/2018/11/01/422345)

~~~
ma2rten
This kind of paper which describes a system is not uncommon in computer
science. It's a way for future papers which use the system to cite it.

For example the scikit learn paper:
[http://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11...](http://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf)

~~~
AndrewKemendo
Indeed! This is an endemic issue in the computer sciences.

I don't think it's a bad way to approach knowledge dissemination by the way,
it is however indicative of the problem of reprouducability and explainability
in AI broadly.

I bring this point up simply to be another voice stating that we need more
rigorous methodology in AI research if we are going to make advances that are
focused first on knowledge, rather than primarily applicability of technology.

Way back before AI was cool, there was a good paper on this [1] that is very
relevant to today.

Quoting from the abstract:

"There are two central problems concerning the methodology and foundations of
Artificial Intelligence (AI). One is to find a technique for defining problems
in AI. The other is to find a technique for testing hypotheses in AI. There
are, as of now, no solutions to these two problems. The former problem has
been neglected because researchers have found it difficult to define AI
problems in a traditional manner. The second problem has not been tackled
seriously, with the result that supposedly competing AI hypotheses are
typically non-comparable."

[1][https://link.springer.com/chapter/10.1007/978-1-4471-3542-5_...](https://link.springer.com/chapter/10.1007/978-1-4471-3542-5_3)

------
inputcoffee
In case you want to see the actual paper: [https://research.fb.com/wp-
content/uploads/2018/10/Horizon-F...](https://research.fb.com/wp-
content/uploads/2018/10/Horizon-Facebooks-Open-Source-Applied-Reinforcement-
Learning-Platform.pdf)?

~~~
Diederich
Thanks for the link!

I recommend that anybody who is interested in this read section 9 in this
paper: "NOTIFICATIONS AT FACEBOOK". It brings into focus real ways that this
technology is used and is useful.

~~~
sova
So cryptic and hand wavy description in paper but hey I'm glad they have good
results. Going from pedals to combustion engines clearly has efficiency
benefits but they gloss over the real flesh of the fruit when they don't
describe how their particular engine is better or tuned. Wish they had more
talk of notifications as Markov chains and how they evaluate interaction

------
dheera
If they want to increase adoption, they really need to make this stuff easier
to install. I mean _zero_ friction. Either pip, or an Ubuntu PPA that "just
works".

Caffe2 install page: "We only support Anaconda packages at the moment. If you
do not wish to use Anaconda, then you must build Caffe2 from source." => We
are a company with a 400B+ market cap but are too lazy to support more than
one installation configuration. Good luck dealing with dependency hell, poor
ML grad student researcher.

MXnet install page: "You can either upgrade your CUDA install or install the
MXNet package that supports your CUDA version." => We welcome you with open
arms regardless of your configuration! No matter your configuration we have an
pre-built package for you!

~~~
Digitalghost
Hey! Have you tried our docker install? We tried to make it as simple as
possible.

I agree that installing without docker is a pain, but you should follow our
install guide. Particularly for Caffe2, it's included in PyTorch 1.0 so you
don't have to install it separately :-).

------
rjammala
Github repo:
[https://github.com/facebookresearch/Horizon](https://github.com/facebookresearch/Horizon)

~~~
maldeh
Ah, it has the fabled BSD+ license:
[https://github.com/facebookresearch/Horizon/blob/master/PATE...](https://github.com/facebookresearch/Horizon/blob/master/PATENTS)

I guess that's not too unusual from their opensource contributions at this
point.

~~~
Digitalghost
Hey! Thanks for your feedback. I have to admit that I'm a license noob and
picked an option basically at random. I am changing Horizon to a BSD license
(no PATENTS part).

~~~
maldeh
Wow, that's... fantastic and generous, I wasn't expecting this response. The
change makes this framework so much more inviting to use.

(I'm also surprised Facebook's legal / open source policy would allow such
flexibility around licenses.)

------
sytelus
One of the most interesting part of the paper is how RL is used - especially
here for Horizon where one of the goal seems to be problems where simulation
isn't available. One such problem is push notifications:

 _Historically, we have used supervised learning models for predicting click
through rate (CTR) and likelihood that the notification leads to meaningful
interactions._

 _We introduced a new policy that uses Horizon to train a Discrete-Action DQN
model for sending push notifications to address the problems above. The Markov
Decision Process (MDP) is based on a sequence of notification candidates for a
particular person. The actions here are sending and dropping the notification,
and the state describes a set of features about the person and the
notification candidate. There are rewards for interactions and activity on
Facebook, with a penalty for sending the notification to control the volume of
notifications sent. The policy optimizes for the long term value and is able
to capture incremental effects of sending the notification by comparing the
Q-values of the send and don’t send action._

------
Geee
This kind of AI applied at scale (Facebook) scares me. Mainly because humans
are at the other side of the feedback loop, and not only is the AI adapting,
but people are adapting too. Over time this kind of runaway feedback loop
could lead into anything.

~~~
wrkronmiller
How does that differ substantially from humans interacting with the
environment or other humans?

~~~
ramses0
Minimally: rate of change > capacity to adapt, and lack of natural predators,
right?

------
pesenti
Blog post: [https://code.fb.com/ml-
applications/horizon/](https://code.fb.com/ml-applications/horizon/) (would be
a better link if that can be changed)

------
amrrs
Discussion on Google's Dopamine - its Reinforcement Learning Framework
[https://news.ycombinator.com/item?id=15648746](https://news.ycombinator.com/item?id=15648746)

~~~
traek
That discussion is on Dopamine the startup, not the framework from Google.

Google's framework is at
[https://github.com/google/dopamine](https://github.com/google/dopamine) and I
don't believe it's generated discussion on HN before.

