I'm not very well versed, but i believe that training requires more memory to st...

		quadrature 28 days ago \| parent \| context \| favorite \| on: Microsoft CTO says he wants to swap most AMD and N... I'm not very well versed, but i believe that training requires more memory to store intermediate computations so that you can calculate gradients for each layer.