
Lessons Learned Reproducing a Deep Reinforcement Learning Paper - fishfish
http://amid.fish/reproducing-deep-rl
======
Radim
_" Switching from experimenting a lot and thinking a little to experimenting a
little and thinking a lot was a key turnaround in productivity. When debugging
with long iteration times, you really need to pour time into the hypothesis-
forming step - thinking about what all the possibilities are, how likely they
seem on their own, and how likely they seem in light of everything you’ve seen
so far."_

Not to glorify by-gone days, but isn't there a certain charm to having to wait
your turn to plug punch cards into a mainframe?

Having a "single shot", with long iteration cycles, indeed does something
strange to the way you approach programming and bugs and program design.

Anecdote: my long iteration cycles were caused by my mom restricting access to
my C64. I filled page after page (paper!) with code, anxious for any chance to
try it. I like to think I learned something during these "mental dry runs"…

 _On topic_ : I wrote a rant about the "Mummy Effect" of trying to reproduce
ML papers (which we do regularly): [https://rare-technologies.com/mummy-
effect-bridging-gap-betw...](https://rare-technologies.com/mummy-effect-
bridging-gap-between-academia-industry/)

~~~
ssivark
As they say, _a few days in the laboratory might save a few hours in the
library!_

Alternately, _a few days of doing might save a few hours of thinking._ :-)

~~~
hulahoof
The flavour I'd always heard was _a day of coding saves an hour of planning_

------
shmageggy
A lot of these points apply to non-RL and non-Deep-Learning projects as well.
I'm currently working on a machine learning project that has nothing to do
with RL, neural networks, Tensorflow, or GPUs, but I have encountered many of
the same surprises and have converged on many of the same solutions and
insights. Here are the ones that I think transfer over to general machine
learning projects:

\- Implementation time vs debugging time: I've spent vastly more time
debugging and rewriting. Furthermore, I was surprised at how much time I spent
even setting up infrastructure to allow debugging. Corollary: Don't take
shortcuts in initial implementation. Literally every hack I put in in order to
get something working quickly has come back to bite me and I've had to rewrite
and refactor every one of them.

\- Thinking vs doing, and being explicit: There are too many degrees of
freedom to just try random shit. One of the biggest tools I found (like the
author) was to write everything down. I have four+ notebooks filled with
literal prose conversations with myself interspersed with diagrams and
drawings. I credit that with getting me past many mental roadblocks.

\- Measure and test: Without unit and integration tests, I'd probably go in
circles forever. It's too easy to change something to fix a problem and have
that break something else. I now have a test for even the most mundane
assumptions in my model and my code, after too many times realizing "oh,
before I changed X I could assume Y was true but that no longer holds".

I feel like engineering for machine learning requires a different skill-set
than traditional programming and even traditional AI programming. Taking a
very calculated and deductive approach like the author suggests seems to be
crucial.

~~~
deong
This is exactly right.

When I was teaching machine learning classes, the biggest hurdle I had to get
a lot of students over was their history and expectation that they can
implement something, fix the compiler errors, fix the runtime crashes, and
then just declare victory. Machine Learning is basically the study of getting
a computer do give you an answer you don't know, so you have to be in this
constant state of defensiveness, never quite trusting that things are correct.

------
m_ke
A good way to get around the long iteration cycles is to develop on subsets of
the data or simpler tasks. With computer vision you can either start with
something like MNIST (which might not always be ideal because you'll need to
adjust your architecture) or take a subset of about a 1000 examples from your
real training set and get it to overfit.

Another thing that helps is having a way to queue up multiple experiments.
It's not that hard to set up with celery.

Also I'd really recommend starting with the dumbest baseline model possible
and getting the evaluation pipeline around it working. You can then focus on
iterating on it and measure how much each change moves the needle.

------
superdimwit
What is the tool you are using to write your 'logs'? (pictured in this image
[http://amid.fish/images/rl_logs.jpg](http://amid.fish/images/rl_logs.jpg))

~~~
jquast
that looks like jupyter notebook or the floydhub service he talks about mid-
article

~~~
m_ke
I'm pretty sure that's Bear, a great markdown based notes app.

[http://www.bear-writer.com/](http://www.bear-writer.com/)

~~~
fishfish
Yeah, Bear :)

------
thisisit
>Switching from experimenting a lot and thinking a little to experimenting a
little and thinking a lot was a key turnaround in productivity.

I think corollary might be - Know your data and the processing steps from the
paper very well.

