Jeremy here. Here to answer any questions or comments that you have.
But more importantly - I need to mention that Terence Parr did nearly all the work on this. He shared my passion for making something that anyone could read on any device to such an extent that he ended up creating a new tool for generating fast, mobile-friendly math-heavy texts: https://github.com/parrt/bookish . (We tried Katex, Mathjax, and pretty much everything else but nothing rendered everything properly).
I've never found anything that introduces the necessary matrix calculus for deep learning clearly, correctly, and accessibly - so I'm happy that this now exists.
Terence here. Jeremy's role was critical in terms of direction and content for the article. Who better than he to describe the math needs for deep learning. :)
Question; when I learned Vector Calculus back in college, I used Marsden & Tromba as our text book where they equate the derivative of a function from R^n->R^m with the Jacobian. Is matrix calculus the same thing, just a slightly different notation?
Typographic advice: the body text has very long lines in a desktop browser, which makes it a bit slow and tiring to read. I’d say the ideal is somewhere between 1/2 and 2/3 this length. I’d recommend keeping the same width on screen but bumping the font size up by 30%.
As an extra minor nit, italicizing functions like sin, etc. is also somewhat unconventional in mathematical typesetting.
I agree that the font should be bigger. I need to learn more CSS in order to switch between font sizes per platform. The font of the text is easy but all of the images were generated from latex using a specific font size. I need to scale the in-line equation images as the font size bumps up.
it's a variable name for a polymorphic function, taylor series, euler formular, etc. depending on context. This is a significant difference to singular types, but being a variable name to an abstract concept is in principle no different to typesetting x. This goes neatly with "everything is an objectreference" and might be more of a programmer's perspective.
This is great! my graduate advisor a long time ago made a really great matrix calculus study sheet for me that was absolutely invaluable in learning ML ( i mean really this is great for not just DL but all sorts of reasoning in ML)
Oh in that case I'm just confused - that's what I know as Einstein notation. Modern physics papers seemed to use much more complex notation, but I probably just misunderstood.
If we're talking about Einstein notation, then I'm a fan - `np.einsum()` is often a great way to create fast tensor computations with minimal code.
index notation isn't useful for calculation it's useful for algebra i.e. if you want closed form solution to tensor equations. my distinct impression is that in ML no one cares about that because eventually everything gets a numerical treatment.
Hi Jeremy, thanks for creating so many great learning resources. Myself and a fellow ML enthusiast are starting a deep learning meetup here in Phoenix next month. Do you have any advice for creating a welcoming environment for people to learn/teach? Thanks!
Probably better asking that at http://forums.fast.ai , since a lot of participants there have set up local meetups, and some of them have gotten pretty big. I haven't created a meetup myself - my focus has been on the online course and community, frankly.
If you're looking at this with the intention of getting started in Deep Learning and feeling overwhelmed by the math then Andrew Ng offers a great course on Coursera that goes over all of the formulas needed to calculate the forward propagation, loss computation, backward propagation, and gradient descent. Highly recommend it for anyone interested in breaking into the field of machine learning.
This is great. But I find the advantage of Coursera is that it incorporates quizzes and programming homework into the lectures reinforcing the learning. Also, the material is updated and more relevant to today's ML problems then his 2008 lectures on youtube.
Thanks for this. Was taking Andrew Ng's course but the way he glosses over the calculus and then expects the student to understand the implications at end of lecture was a turn off so I dropped it. I hated the feeling I wasn't learning, just memorizing solutions.
You might prefer the approach at http://course.fast.ai - all the concepts are taught with code, instead of math, and understanding is developed by running experiments.
I've taken 3/5 of his Deep Learning MOOC and thought the calculus, at least its implementation, was very well explained. Were you dissatisfied with the lack of depth or lack of explanation? Just curious.
Indeed, most tools surprisingly lack this ability. I was shocked when I needed to break my calculations down to a piece-wise form when doing matrix calc with SymPy.
But how do you compute the derivative of x'Ax in Mathematica (x being a vector and A being a matrix)? What you have pointed out is only scalar derivatives, if I am not mistaken here.
Mathematica can definitely compute the derivatives if you fix the size of the matrix. This isn't very useful if you're trying to compute the derivative of an expression with arbitrary sized matrices.
Wow, this is really a great resource. I wish it had been available a few years back when I took the free online version of CS231n. The hardest part (for me, anyway) was the long-forgotten Calculus needed for backprop. Especially as applied to matrices. I struggled at the time to find accessible explanations of many of the matrix operations, and you seem to have it all laid out here. Thank you.
Thanks so much for this. I have no interest in deep learning (at the moment) but I was working through some papers about the Lucas Kanade tracker and this paper explains some of the underlying math in just the right amount of detail. The authors usually show the beginning and end point and just say something like "using the chain rule" we arrive at ... It took me a while to understand what they were saying and this paper helps a lot.
The math is super easy but keeping all the notation s and conventions in my head is hard, I've never seen it laid out this nicely before. Thanks!
Hiya. That's funny because it's exactly what caused us to write this article. Jeremy and I were working on an automatic differentiation tool and couldn't find any description of the appropriate matrix calculus that explained the steps. Everything is just providing the solution without the intervening steps. We decided to write it down so we never have to figure out the notation again. haha
Matrix calculus is a bit screwy when you realize that there are two possible notations to represent matrix derivatives (numerator vs. denominator layout; numerator layout is used in this guide). Plus, the notation is not very "speaking" for doing calculations unless you commit to memory some basic results.. which is why, as a physicist, I would recommend working in tensor calculus notation during calculations, and translating back to matrix notation for writing the results.
I was also surprised when I saw that there was no standard notation for Jacobian matrices. We use the numerator notation in the article, but point out that there are papers that use the denominator notation. I think I remember from engineering school that we used numerator notation so we stuck with that.
This is what index notation is good for, and I encourage everyone to learn it. Jacobians are dx_a/dx_b, two indices, and clearly b belongs to the derivative. Whether it's rows or columns is an implementation detail of how you're storing these numbers.
Index notation also seems natural for programming: an element A[i,j] or a slice Z[3,4,:] are precisely this.
I agree. If your Matrix calculus involves more complicated use of derivative operators you can't treat it like linear algebra anymore. Better to break it down into something like tensor notation first and back to matrices at the end. https://en.wikipedia.org/wiki/Del. Specifically I was always confused by the material derivative of a vector field when presented as either Matrix or vector calculus. If you represent it in tensor notation ( or explicitly break it out as operations on basis vectors ) it works out nicely.
So are we turning machine learning into a Euclidean distance calculation [with] many dimensions, with different weights for each dimension?
That’s... not that sexy. But at least it makes sense to anyone with an undergrad degree in CS or math, which is something neural networks never accomplished.
In school, I didn't make it much past basic calculus/algebra. As a self-taught programmer (my highest level of education is a high-school diploma), I seriously wish I could go back and put more effort into math. I love looking at these types of topics, but I have absolutely no clue what I'm looking at.
If anyone can recommend any books, courses, or any other material that starts from high-school level math, and gradually increases in complexity, I would love to look at it.
I am writing a book that I hope can serve such a purpose. Would you be interested in taking a look? If so, shoot me an email at mathintersectprogramming@gmail.com
Fair warning, I have shown it to a programmer who claimed some level of "math phobia," and they said the first chapter was too difficult. I rewrote that chapter since, and I think it is better, but I could use some feedback :)
Well... this paper is really designed to be accessible with just high school math, if you take your time (a few weeks or months) and follow the references. Any time it relies on some concept, it includes a link to learning more about that concept, and also has a link to a forum where you can ask questions if you get stuck. There's also a table of all notation used.
If you give it a go and find you're not successful, I'd be interested to hear where is the first point where you got stuck and couldn't get unstuck, since that would suggest a need for us to improve our paper!
I would like to be able to read the math in DL papers.
(sorry I'm asking for something that it's too broad)
1) How much does this document cover the notations in those papers.
2) When I read a paper and if I am not sure what the math means, does that mean that I did not grok the subject yet, or the math presented in that paper goes beyond the math given in this Matrix Calculus document (assuming I studied well this document).
I would say try to get as much of the intuition behind the math as you can. Knowing what an equation means rather than fully understanding the Greek notation is what it matters for practice. If you want to substantially contribute to the theoretical CS literature, you will need to have a good handle of the notation (for obvious reasons).
Note: I come from math and Econ, so the split between practitioner and theorists might be different for CS/ML.
While matrix derivatives are important, there is also a lot of other math in DL papers. In particular, a lot of the probability side concerns expectations, KL divergences, entropy, etc., which are all defined in terms of integrals or sums. You need undergraduate-level probability background.
The first 5 chapters of the Goodfellow deep learning book are a great resource for understanding the probability, linear algebra, optimization, and information theory you need to digest deep learning papers.
The book "Matrix Differential Calculus with Applications in Statistics and Econometrics, 3rd Edition" by Magnus and Neudecker, is available at http://www.janmagnus.nl/misc/mdc2007-3rdedition (468 pages).
But more importantly - I need to mention that Terence Parr did nearly all the work on this. He shared my passion for making something that anyone could read on any device to such an extent that he ended up creating a new tool for generating fast, mobile-friendly math-heavy texts: https://github.com/parrt/bookish . (We tried Katex, Mathjax, and pretty much everything else but nothing rendered everything properly).
I've never found anything that introduces the necessary matrix calculus for deep learning clearly, correctly, and accessibly - so I'm happy that this now exists.