
Saving memory using gradient-checkpointing - stablemap
https://github.com/openai/gradient-checkpointing
======
eggie5
It takes about 4GB of memory to train a VGG network w/ batch size 8 on
Tensorflow. Could I use a larger batch size at the expense of computation time
w/ this module?

~~~
iaw
Yes, potentially a batch-size 64 with a 25% increase in run-time based on the
figures they reported.

------
supermdguy
I wonder if this would work when using keras with the tensorflow backend. Has
anyone tried it?

