The gist of it is that we endeavour to provide ‘useful’ values for the gradient at non-differentiable points, even if the traditional derivative is not defined or infinite.
You could imagine it as us smoothing out edges or discontinuities, ideally in a way that that makes things like gradient descent well behaved.
The gist of it is that we endeavour to provide ‘useful’ values for the gradient at non-differentiable points, even if the traditional derivative is not defined or infinite.
You could imagine it as us smoothing out edges or discontinuities, ideally in a way that that makes things like gradient descent well behaved.