
Ask HN: How to factor DL training pipelines - mlevental
I&#x27;m getting to a point where I&#x27;m implementing a lot of different DL models for my research (in PyTorch) and it&#x27;s obvious to me there&#x27;s a lot of boiler plate (creating the data loaders, forward pass, backward pass, val (.eval) pass, collecting statistics, saving checkpoints, etc. So like a good software engineer I try to factor these &quot;pipelines&quot; into reusable components (a train step, a val step, etc.). The problem is every single time I feel like I have a good factorization I find some quirky model or training trick edge case (e.g. need to do something with the optimizer at some point in the training process).<p>Any advice on either how to factor this or some framework that does it already for me? I&#x27;ve looked at fastai&#x27;s callback model and initially tried to roll my own version but it ends being up being so brittle with very leaky function interfaces (passing boolean flags and branching on reflection and etc).<p>My kingdom for a monad!
======
chillee
I suppose the reason why PyTorch doesn't implement this boiler plate for you
is due to all of these quirky edge cases :)

I think Pytorch Lightning might be worth taking a look at:
[https://github.com/williamFalcon/pytorch-
lightning](https://github.com/williamFalcon/pytorch-lightning)

~~~
mlevental
well i wasn't expecting pytorch to implement (since pytorch is a graph
compiler (or something like that) and therefore low-level) i was just
wondering if there was something exactly like what you've linked to. so
thanks!

