
Ask HN: How do you version control your neural nets? - mlejva
When I started working with neural nets I instinctively started using git. 
Soon I realised that git isn&#x27;t working for me. Working with neural nets seems way more empirical than working with a &#x27;regular&#x27; project where you have a very specific feature (e.g. login feature): you create a branch where you implement this feature. Once the feature is implemented you merge with your develop branch and you can move to another feature.<p>The same approach doesn&#x27;t work with neural nets for me. There&#x27;s &#x27;only&#x27; one feature you want to implement - you want your neural net to generalise better&#x2F;generate better images&#x2F;etc (depends on the type of problem you are solving). This is very abstract though. One often doesn&#x27;t even know what&#x27;s the solution until you empirically try to tweak several hyper parameters and see the loss function and accuracy. This makes the branch model impossible to use I think. 
Consider this: you create a branch where you want to use convolutional layers for example. Then you find out that your neural net is performing worse. What should you do know? You can&#x27;t merge this branch to your develop branch since it&#x27;s a basically &#x27;dead end&#x27; branch. On the other hand when you delete this branch you lose information that you&#x27;ve already tried this model of your net. This also produce huge amount of branches since you have enormous number of combinations for your model (e.g. convolutional layers may yield better accuracy when used with different loss function).<p>I&#x27;ve ended up with a single branch and a text file where I manually log all models I have tried so far and their performance. This creates nontrivial overhead though.
======
btown
If your neural net config is in a relatively standalone file, or you can mark
it with a special comment block, you could have your test runner actually read
the source file, regex it out, and concat the source block, date, current git
SHA, and performance metrics into a "neural_runs.txt" file. If something else
about your data pipeline is changing as well, e.g. filter settings on your
image preprocessing, you can throw that in there too.

If you check this in, then every commit will include the diff of everything
you tried to get there alongside the final source file, and additionally that
file will serve as a single historical record for everything you tried for all
time. Asking yourself a month later "did I ever try cross entropy" is as easy
as grepping the file.

Heck, you could insert into a database as well if you really wanted to, and
visualize your performance changes over time a la
[http://isfiberreadyyet.com/](http://isfiberreadyyet.com/) . Sky's the limit.

------
kixiQu
I am very interested to see what people's answers are for this, because I pine
for a version control system designed for the twists and turns of experimental
investigation rather than the needs of engineering implementation. I very much
suspect that some sort of structured approach to one's commit messages might
be key, and a careful mapping of DAG concepts to experimental ones--branching
as the modification of an independent variable, with a base commit selected as
the control point of comparison? Would one want to be able to rebase in order
to compare against a different point? What would the semantics of merges
represent?

------
cityhall
I've been trying to do this better recently after having some non-reproducible
results. I've settled on taking all hyperparameters (including booleans like
whether to use batch norm) from a global dict. Instead of commenting and
uncommenting lines, I look up a key with a default value, adding the default
to the dict if it wasn't there. Then I print and log the dict with the
results.

I end up with a bunch of code like:

    
    
        if get_param('use_convnet_for_thing1', True):
            convnet1_params = get_param('convnet1_params', None)
            thing1 = build_convnet(thing1_input, convnet1_params)
        elif ...
    

By logging the hyperparameter dict, source checkpoint, and rand seed, results
should be reproducible.

This works well for rapid iteration like in jupyter notebooks. For models that
take days to train, you might as well use source control for your scripts.

------
dwhitena
Great questions and discussions. I'm definitely passionate about versioning in
the context of models and data science for both data and code. I work full
time on the open source Pachyderm project (pachyderm.io), and we have users
versioning their data and models in our system. Basically, you can output
checkpoints, weights, etc. from your modeling and have that data versioned
automatically in Pachyderm. Then if you utilize that persisted model in a data
pipeline for inference, you can have total provenance over which versions of
which models created which results (and which training data was used to create
that version of the model, etc.).

------
taroth
Shameless plug: [https://hyperdash.io](https://hyperdash.io)

I got tired of maintaining one-off scripts to do recording, so I started
working with friends on a dedicated solution. Today it lets you stream logs
via a small Python library, then view individual training runs on an
iOS/Android app. Takes less than a minute to get setup.

We're planning on expanding to model versioning in the next few weeks.
Interesting to see how others are thinking about it. If you have model
versioning thoughts you dont feel like posting here, drop me a note at
andrew@hyperdash.io

------
agitator
Maybe write a shell macro to pull accuracy and error into the commit message
along with your comment on the changes. You could also add some automation to
automatically branch if your test results are worse than before, so if you hit
a dead end on that branch and realize the experiment didn't go well down the
line, you can head back to where you branched, or if the end result works, you
can merge back into your starting branch.

------
rpedela
Is there any value to the code in failed attempts or do you just want a log of
things you have tried?

If the former, you could try a single experiment branch and use tags to denote
different experiments. Add a tag when you finish an experiment then overwrite
with your changes for next experiment and repeat. This would keep all the
changes while not have having a huge number of dead branches and the branch
could be merged when necessary.

If the latter, why not an experiment log that is checked in which has a
similar form to a change log? Or maybe create an issue and branch for each
experiment then update the issue with results and delete the branch?

------
p1esk
Why would you manually log your models? In my NN experiments, I automatically
write the list of all hyperparameter values and the corresponding performance
to a file. In addition, I automatically generate and save graphs showing the
results, typically one graph per a nested 'for' loop.

------
kungito
Why is it so bad if you have many branches?

~~~
mlejva
It has several disadvantages: (1) it creates a nontrivial overhead for your
workflow. You're basically creating git branches every few lines of code (I
guess part of it could be somehow automated). (2) it kind of feels like an
overkill to create a whole new branch just to for example change a single
variable. (3) a lot of those branches would remain "unfinished". By that I
mean they would simply exist just to inform you that you have tried this model
of neural net. You couldn't merge them to anything since those changes would
make your net perform worse. (4) if you would want to see the code of some
specific model you had tried before you would need to always switch to a
different branch. This creates a bulky workflow.

~~~
gayprogrammer
There's data, and then there's metadata. Consider how you want to shift
information between your files of data (variables), and the tree-like
structure in a version control timeline.

You shouldn't need to record _how_ the data changes directly alongside the
data. That would be like commenting out old code for every change instead of
making new commits. It just defeats the purpose of using version control.

Branching is metadata about _how_ you have based changes off a given starting
point, and committing records the actual, linear changes. Nobody ever said a
branch _must_ be merged into another branch--that's just typical of how bugs
get fixed--it's not important to the tree data structure.

I would think that switching branches to see the code of some specific model
would simplify my workflow; I would only have to manage a single set of data,
instead of accommodating multiple model versions in my working directory. If
you are going to be recording version/parent information every few lines of
code anyway, then you might as well do it in a tree/timeline data structure
with lots of tooling available.

You're right that branching could be a lot of overhead done manually as you
are, but automating git (for example) to create or switch branch should not be
difficult nor particularly slow. There are also ways you can avoid being
overwhelmed by noise such as searching branches from only specific commits,
limit the depth of results, etc.

------
andbberger
Not just for neural nets - balancing experimentation against building reusable
tools is probably the biggest logistical challenge in scientific programming
in general.

I've converged to a workflow where I maintain a library with a main project
pipeline and reusable tools for the project, and do all scripting with jupyter
(all notebooks version controlled).

I've found that machine learning projects can be pretty effectively
parametrized with config dicts for data, training and the model. Each type of
config gets it's own pipelined method that does all of the library calls -
pipeline_batch_gen, pipeline_train, pipeline_build_model.

Example of a poorly organized config from a project:

model_config = { 'optimizer': optimizer, 'clip_grad': clip_grads, 'name':
model_name, 'residual': residual, 'n_conv_filters': n_conv_filters,
'n_output_hus': n_output_hus, 'activation': activation, 'batch_norm':
batch_norm, 'output_bn': output_bn, 'generation': generation, 'data_spec': {
'uniform_frac': uniform_frac, 'include_augment': True, 'batch_size':
batch_size, 'bulk_chunk_size' : bulk_chunk_size, 'max_bulk_chunk_size':
max_bulk_chunk_size, 'loss_weighter': loss_weight }, 'train_spec': {
'early_stopping_patience': early_stopping_patience, 'lr_plateau_patience':
lr_plateau_patience, 'learning_rate': init_lr, 'clip_grads': clip_grads,
'partial_weight': partial_weight } }

I've wanted to give Sacred a try
[https://github.com/IDSIA/sacred](https://github.com/IDSIA/sacred) \- looks
promising but haven't tried yet so can't comment.

I still tend to keep track of model performance by hand though. But I have
always have the notebooks I can go back to for reference. This is something
sacred could help a lot with.

Another very non-trivial aspect of this kind of work is the compute/storage
infrastructure you need to scale beyond a single workstation.

We have a nice system here where $HOME lives on NFS and gets mounted when you
log into any machine on the network - I can hardcode paths in my code and
count on every worker having the same filesystem. I can't imagine how we would
do distributed jobs without NFS. That's not a very realistic solution for
homegamers though - you need a very fast network and expensive commodity
hardware. And sys admins.

Does anyone have a solution for that half of the problem? I've seen a number
of merkle-tree based data version control solutions recently...

