Hacker News new | comments | show | ask | jobs | submit login
Ask HN: How do you version control your neural nets?
42 points by mlejva 95 days ago | hide | past | web | 13 comments | favorite
When I started working with neural nets I instinctively started using git. Soon I realised that git isn't working for me. Working with neural nets seems way more empirical than working with a 'regular' project where you have a very specific feature (e.g. login feature): you create a branch where you implement this feature. Once the feature is implemented you merge with your develop branch and you can move to another feature.

The same approach doesn't work with neural nets for me. There's 'only' one feature you want to implement - you want your neural net to generalise better/generate better images/etc (depends on the type of problem you are solving). This is very abstract though. One often doesn't even know what's the solution until you empirically try to tweak several hyper parameters and see the loss function and accuracy. This makes the branch model impossible to use I think. Consider this: you create a branch where you want to use convolutional layers for example. Then you find out that your neural net is performing worse. What should you do know? You can't merge this branch to your develop branch since it's a basically 'dead end' branch. On the other hand when you delete this branch you lose information that you've already tried this model of your net. This also produce huge amount of branches since you have enormous number of combinations for your model (e.g. convolutional layers may yield better accuracy when used with different loss function).

I've ended up with a single branch and a text file where I manually log all models I have tried so far and their performance. This creates nontrivial overhead though.




If your neural net config is in a relatively standalone file, or you can mark it with a special comment block, you could have your test runner actually read the source file, regex it out, and concat the source block, date, current git SHA, and performance metrics into a "neural_runs.txt" file. If something else about your data pipeline is changing as well, e.g. filter settings on your image preprocessing, you can throw that in there too.

If you check this in, then every commit will include the diff of everything you tried to get there alongside the final source file, and additionally that file will serve as a single historical record for everything you tried for all time. Asking yourself a month later "did I ever try cross entropy" is as easy as grepping the file.

Heck, you could insert into a database as well if you really wanted to, and visualize your performance changes over time a la http://isfiberreadyyet.com/ . Sky's the limit.


I am very interested to see what people's answers are for this, because I pine for a version control system designed for the twists and turns of experimental investigation rather than the needs of engineering implementation. I very much suspect that some sort of structured approach to one's commit messages might be key, and a careful mapping of DAG concepts to experimental ones--branching as the modification of an independent variable, with a base commit selected as the control point of comparison? Would one want to be able to rebase in order to compare against a different point? What would the semantics of merges represent?


I've been trying to do this better recently after having some non-reproducible results. I've settled on taking all hyperparameters (including booleans like whether to use batch norm) from a global dict. Instead of commenting and uncommenting lines, I look up a key with a default value, adding the default to the dict if it wasn't there. Then I print and log the dict with the results.

I end up with a bunch of code like:

    if get_param('use_convnet_for_thing1', True):
        convnet1_params = get_param('convnet1_params', None)
        thing1 = build_convnet(thing1_input, convnet1_params)
    elif ...
By logging the hyperparameter dict, source checkpoint, and rand seed, results should be reproducible.

This works well for rapid iteration like in jupyter notebooks. For models that take days to train, you might as well use source control for your scripts.


Great questions and discussions. I'm definitely passionate about versioning in the context of models and data science for both data and code. I work full time on the open source Pachyderm project (pachyderm.io), and we have users versioning their data and models in our system. Basically, you can output checkpoints, weights, etc. from your modeling and have that data versioned automatically in Pachyderm. Then if you utilize that persisted model in a data pipeline for inference, you can have total provenance over which versions of which models created which results (and which training data was used to create that version of the model, etc.).


Shameless plug: https://hyperdash.io

I got tired of maintaining one-off scripts to do recording, so I started working with friends on a dedicated solution. Today it lets you stream logs via a small Python library, then view individual training runs on an iOS/Android app. Takes less than a minute to get setup.

We're planning on expanding to model versioning in the next few weeks. Interesting to see how others are thinking about it. If you have model versioning thoughts you dont feel like posting here, drop me a note at andrew@hyperdash.io


Maybe write a shell macro to pull accuracy and error into the commit message along with your comment on the changes. You could also add some automation to automatically branch if your test results are worse than before, so if you hit a dead end on that branch and realize the experiment didn't go well down the line, you can head back to where you branched, or if the end result works, you can merge back into your starting branch.


Is there any value to the code in failed attempts or do you just want a log of things you have tried?

If the former, you could try a single experiment branch and use tags to denote different experiments. Add a tag when you finish an experiment then overwrite with your changes for next experiment and repeat. This would keep all the changes while not have having a huge number of dead branches and the branch could be merged when necessary.

If the latter, why not an experiment log that is checked in which has a similar form to a change log? Or maybe create an issue and branch for each experiment then update the issue with results and delete the branch?


Why would you manually log your models? In my NN experiments, I automatically write the list of all hyperparameter values and the corresponding performance to a file. In addition, I automatically generate and save graphs showing the results, typically one graph per a nested 'for' loop.


Why is it so bad if you have many branches?


It has several disadvantages: (1) it creates a nontrivial overhead for your workflow. You're basically creating git branches every few lines of code (I guess part of it could be somehow automated). (2) it kind of feels like an overkill to create a whole new branch just to for example change a single variable. (3) a lot of those branches would remain "unfinished". By that I mean they would simply exist just to inform you that you have tried this model of neural net. You couldn't merge them to anything since those changes would make your net perform worse. (4) if you would want to see the code of some specific model you had tried before you would need to always switch to a different branch. This creates a bulky workflow.


There's data, and then there's metadata. Consider how you want to shift information between your files of data (variables), and the tree-like structure in a version control timeline.

You shouldn't need to record how the data changes directly alongside the data. That would be like commenting out old code for every change instead of making new commits. It just defeats the purpose of using version control.

Branching is metadata about how you have based changes off a given starting point, and committing records the actual, linear changes. Nobody ever said a branch must be merged into another branch--that's just typical of how bugs get fixed--it's not important to the tree data structure.

I would think that switching branches to see the code of some specific model would simplify my workflow; I would only have to manage a single set of data, instead of accommodating multiple model versions in my working directory. If you are going to be recording version/parent information every few lines of code anyway, then you might as well do it in a tree/timeline data structure with lots of tooling available.

You're right that branching could be a lot of overhead done manually as you are, but automating git (for example) to create or switch branch should not be difficult nor particularly slow. There are also ways you can avoid being overwhelmed by noise such as searching branches from only specific commits, limit the depth of results, etc.


For #4, maybe `git work tree add` could help. The docs give a clear example of how to use it.


Not just for neural nets - balancing experimentation against building reusable tools is probably the biggest logistical challenge in scientific programming in general.

I've converged to a workflow where I maintain a library with a main project pipeline and reusable tools for the project, and do all scripting with jupyter (all notebooks version controlled).

I've found that machine learning projects can be pretty effectively parametrized with config dicts for data, training and the model. Each type of config gets it's own pipelined method that does all of the library calls - pipeline_batch_gen, pipeline_train, pipeline_build_model.

Example of a poorly organized config from a project:

model_config = { 'optimizer': optimizer, 'clip_grad': clip_grads, 'name': model_name, 'residual': residual, 'n_conv_filters': n_conv_filters, 'n_output_hus': n_output_hus, 'activation': activation, 'batch_norm': batch_norm, 'output_bn': output_bn, 'generation': generation, 'data_spec': { 'uniform_frac': uniform_frac, 'include_augment': True, 'batch_size': batch_size, 'bulk_chunk_size' : bulk_chunk_size, 'max_bulk_chunk_size': max_bulk_chunk_size, 'loss_weighter': loss_weight }, 'train_spec': { 'early_stopping_patience': early_stopping_patience, 'lr_plateau_patience': lr_plateau_patience, 'learning_rate': init_lr, 'clip_grads': clip_grads, 'partial_weight': partial_weight } }

I've wanted to give Sacred a try https://github.com/IDSIA/sacred - looks promising but haven't tried yet so can't comment.

I still tend to keep track of model performance by hand though. But I have always have the notebooks I can go back to for reference. This is something sacred could help a lot with.

Another very non-trivial aspect of this kind of work is the compute/storage infrastructure you need to scale beyond a single workstation.

We have a nice system here where $HOME lives on NFS and gets mounted when you log into any machine on the network - I can hardcode paths in my code and count on every worker having the same filesystem. I can't imagine how we would do distributed jobs without NFS. That's not a very realistic solution for homegamers though - you need a very fast network and expensive commodity hardware. And sys admins.

Does anyone have a solution for that half of the problem? I've seen a number of merkle-tree based data version control solutions recently...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: