Hacker News new | past | comments | ask | show | jobs | submit login

The title seems pretty cool!

Could anyone care to tell me if there is a motivation to learn this topic for ordinary ML enginneer such as myself. It seems these ideas presented in this paper are really helpful for those who develop DL frameworks like pytorch, but what about for those who only use frameworks?

However, regardless of how useful these ideas for me, I really respect researchers who publish excellent papers.




AD is important to understand for ML practitioners in the same way as compilers are important to understand for programmers. You can get away without knowing all the details, but it helps to understand where your gradients come from. However this paper is probably not be a good place to start if you're new to AD. If you want a better introduction, here are a few good resources:

Autodidact is a pedagogical implementation of AD: https://github.com/mattjj/autodidact

A nice literature review from JMLR: http://www.jmlr.org/papers/volume18/17-468/17-468.pdf

This paper reinterprets AD through the lens of category theory, an abstraction for modeling a wide class of problems in math and CS. It provides a language to describe these problems in a simple and powerful way, and is the foundation for a lot of work in functional programming (if you're interested in that kind of stuff). There was a thread on HN recently that discusses why category theory is useful: https://news.ycombinator.com/item?id=18267536

"Category Theory for the Working Hacker" by Philip Wadler is a great talk if you're interested in learning more: https://www.youtube.com/watch?v=gui_SE8rJUM

Also recommend checking out Bartosz Milewski's "Category Theory for Programmers": https://github.com/hmemcpy/milewski-ctfp-pdf


You actually want to know the gist of how these autodiff libraries work to know a) which approaches are fast and which approaches lead to giant complex gradient graphs. b) which approaches are stable and which lead to numerically unstable gradients.

You would be surprised how many code is out there (even for influential papers) whose graphs are obviously bad or whose graph could be fixed easily for more stability. Because people don't think about what the gradient looks like, while they should.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: