Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a great debugging tool for deep learning. Often your code might have been running for 20 hrs on 16 GPUs and you just don't want to restart from scratch again.


You don't have to without this tool, you can save and load models from/to disk already from provided APIs for frameworks/libraries (e.g. [1]). Handy for when you need to hard reset, move, and/or share data.

[1] https://pytorch.org/tutorials/beginner/saving_loading_models...


Yeah. But wouldn't it be nice / cool if the training process is interactive, reproducible, recordable and replayable?


Not sure I totally agree that this achieves those goals, but I can't argue they would be ideal goals.

>interactive

AFAIK, as you type python code it will execute with the changes incorporated in the next loop execution. That's a heavily constraint "interactive" (forward propagating changes against an active execution). I mean, wouldn't a python shell with shell history be more interactive and hit all your goals? Sure the advantage of this is you can change code based on output feedback, but if you are too slow, you have to backtrack, and if you are too fast, you just wait anyways in both cases. And if you want to add code to earlier loop executions, you are kaput for earlier loop executions since those are set in stone against a different version of the code?

> reproducible

How can you get this, i.e. reproduce any loop changes outside of IDE/editor code undo/redo stack. It's hiding all the magic of what code outputted what with the "reloading" function.

> recordable, replayable

This moves the "history" to the IDE/editor code undo/redo stack. I think this is worse, cause, what happens when you close the IDE/editor without recording which specific console output was for which version of the code?

My main gripe is this sentence in the README:

> This lets you e.g. add logging, print statistics or save the model without restarting the training and, therefore, without losing the training progress.

If the italicised is the prime concern, and the purpose of this is to solve that, I'm just saying PyTorch already lets you save/persist training data, maybe the author didn't know about it + python shell combo?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: