Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a problem in theory, but not really in practice for a well designed AD system. We wrote a bit about this here: https://juliadiff.org/ChainRulesCore.jl/dev/maths/nondiff_po...

The gist of it is that we endeavour to provide ‘useful’ values for the gradient at non-differentiable points, even if the traditional derivative is not defined or infinite.

You could imagine it as us smoothing out edges or discontinuities, ideally in a way that that makes things like gradient descent well behaved.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: