
Why Reproducibility Matters in Deep Learning - etrain
http://determined.ai/blog/reproducibility-in-ml/
======
mlthoughts2018
I don't think third-party management of the dimensions of variability is the
right idea (i.e. don't treat a third party tool like the link's advertised
PEDL tool as a problem-solver... anything third party is just yet another
dimension of variability that can make reproducibility harder).

Rather, start from a software craftsmanship mindset even for the earliest
stages of prototypes, and certainly for things that actually yield production-
facing models or tools.

One big piece of advice along these lines is to use containers, VMs, and
environment management tools, as well as build tools, all the time. This stuff
should absolutely be part of the first commits to your new prototype's repo.

Choose something that works well for the team overall, codify it in writing
with some team agreements on the expected standards, and then really stick to
it. Banish the temptation to just tweak some code and run an experiment in an
ad hoc way. You'll quickly realize that this constraint causes you to solve
problems _faster_ and enables _more_ experimentation, even if you can feel
like your ad hoc freedom is encroached on at the start.

Just one example might be using a Docker container along with your programming
language's packaging tooling in order to standardize the _exact_ training-time
software environment, and to either use an external config file or use
environment variables defined in e.g. a Dockerfile as the source-controlled
means by which dimensions of variability are allowed to change.

Want to re-train the model exactly? It should be as simple as checking out the
repo and e.g. running one Make command that rebuilds a Docker container for
you, retrieves the right data for you, sets environment variables, etc.

Want to vary that model to do an experiment with a new version of TensorFlow,
a new CUDNN version, a new model architecture, swap in a new classifier
algorithm, etc.? Then you should be making a branch in version control for
your experiment and changing these items in source code (whether the Makefile
source, Dockerfile source, or actual code), so that the experiment can be
reproduced by CI, by teammates, etc., and they can easily see the code
defining the experiment during code review e.g. with a pull request or a
similar tool.

You would be totally free to use this article's recommended PEDL if you want,
but use it _inside the defined environment_. Not externally, as merely some ad
hoc thing installed on your machine or your local work environment.

I could go on about how this will give other benefits, like discouraging
people from writing experiments into an inappropriate environment, like a
Jupyter notebook, but maybe that is for another time.

