
Matrix Calculus for Deep Learning - jph00
http://parrt.cs.usfca.edu/doc/matrix-calculus/index.html
======
jph00
Jeremy here. Here to answer any questions or comments that you have.

But more importantly - I need to mention that Terence Parr did nearly all the
work on this. He shared my passion for making something that anyone could read
on any device to such an extent that he ended up creating a new tool for
generating fast, mobile-friendly math-heavy texts:
[https://github.com/parrt/bookish](https://github.com/parrt/bookish) . (We
tried Katex, Mathjax, and pretty much everything else but nothing rendered
everything properly).

I've never found anything that introduces the necessary matrix calculus for
deep learning clearly, correctly, and accessibly - so I'm happy that this now
exists.

~~~
jacobolus
Typographic advice: the body text has very long lines in a desktop browser,
which makes it a bit slow and tiring to read. I’d say the ideal is somewhere
between 1/2 and 2/3 this length. I’d recommend keeping the same width on
screen but bumping the font size up by 30%.

As an extra minor nit, italicizing functions like _sin_ , etc. is also
somewhat unconventional in mathematical typesetting.

~~~
parrt
I agree that the font should be bigger. I need to learn more CSS in order to
switch between font sizes per platform. The font of the text is easy but all
of the images were generated from latex using a specific font size. I need to
scale the in-line equation images as the font size bumps up.

~~~
dbetteridge
the magic incantation here is probably media queries

@media (max-width: 768px) {

    
    
        p {
    
         font-size: 1rem;
        }
    }

~~~
parrt
We have to also adjust the image sizes for the in-line equations. That’s what
I need to figure out :)

------
marrowgari
If you're looking at this with the intention of getting started in Deep
Learning and feeling overwhelmed by the math then Andrew Ng offers a great
course on Coursera that goes over all of the formulas needed to calculate the
forward propagation, loss computation, backward propagation, and gradient
descent. Highly recommend it for anyone interested in breaking into the field
of machine learning.

~~~
ransom1538
It is also all free on youtube:
[https://www.youtube.com/watch?v=UzxYlbK2c7E](https://www.youtube.com/watch?v=UzxYlbK2c7E)

~~~
wuliwong
They also have it all on Stanford's site with some other information and
course materials.

[https://see.stanford.edu/Course/CS229](https://see.stanford.edu/Course/CS229)

------
smrtinsert
Thanks for this. Was taking Andrew Ng's course but the way he glosses over the
calculus and then expects the student to understand the implications at end of
lecture was a turn off so I dropped it. I hated the feeling I wasn't learning,
just memorizing solutions.

~~~
jph00
You might prefer the approach at
[http://course.fast.ai](http://course.fast.ai) \- all the concepts are taught
with code, instead of math, and understanding is developed by running
experiments.

~~~
joshgel
I did both and found fast.ai so much easier to understand for someone without
a background in math, like me.

------
calebh
Fortunately there is a website now capable of doing matrix calculus!
[http://www.matrixcalculus.org](http://www.matrixcalculus.org)

Mathematica doesn't seem to be able to do matrix calculus, which surprised me
quite a bit.

~~~
improbable22
Mathematica can absolutely do all of this! D[ matrix ,x] ... or things like
this:

f[x_,y_] := x^2 + Sin[y]

vars={x,y};

Table[ D[f[x,y],var1, var2], {var1,vars}, {var2,vars}] // MatrixForm

~~~
SoerenL
But how do you compute the derivative of x' _A_ x in Mathematica (x being a
vector and A being a matrix)? What you have pointed out is only scalar
derivatives, if I am not mistaken here.

~~~
improbable22
Like this, perhaps?

A = {{1,2},{3,4}}

vec = {x^2, x^3}

D[vec.A.vec, x]

Or perhaps like this, again the table of derivatives:

xvec = {x1,x2}

Table[ D[xvec.A.xvec,x] ,{x,xvec}]

(all untested... one typo caught...)

~~~
calebh
Mathematica can definitely compute the derivatives if you fix the size of the
matrix. This isn't very useful if you're trying to compute the derivative of
an expression with arbitrary sized matrices.

~~~
improbable22
You can do general matrices too, what do you have in mind?

aa[x_] = {{1, 2}, {3, 4}} x

bb[x_] = {x^2, x^3}

D[ a[x].b[x] , x, x] (* for any suitable tensors *)

% /. {a -> aa, b -> bb}

------
kaffeinecoma
Wow, this is really a great resource. I wish it had been available a few years
back when I took the free online version of CS231n. The hardest part (for me,
anyway) was the long-forgotten Calculus needed for backprop. Especially as
applied to matrices. I struggled at the time to find accessible explanations
of many of the matrix operations, and you seem to have it all laid out here.
Thank you.

------
zwieback
Thanks so much for this. I have no interest in deep learning (at the moment)
but I was working through some papers about the Lucas Kanade tracker and this
paper explains some of the underlying math in just the right amount of detail.
The authors usually show the beginning and end point and just say something
like "using the chain rule" we arrive at ... It took me a while to understand
what they were saying and this paper helps a lot.

The math is super easy but keeping all the notation s and conventions in my
head is hard, I've never seen it laid out this nicely before. Thanks!

~~~
parrt
Hiya. That's funny because it's exactly what caused us to write this article.
Jeremy and I were working on an automatic differentiation tool and couldn't
find any description of the appropriate matrix calculus that explained the
steps. Everything is just providing the solution without the intervening
steps. We decided to write it down so we never have to figure out the notation
again. haha

------
adyavanapalli
Matrix calculus is a bit screwy when you realize that there are two possible
notations to represent matrix derivatives (numerator vs. denominator layout;
numerator layout is used in this guide). Plus, the notation is not very
"speaking" for doing calculations unless you commit to memory some basic
results.. which is why, as a physicist, I would recommend working in tensor
calculus notation during calculations, and translating back to matrix notation
for writing the results.

~~~
parrt
I was also surprised when I saw that there was no standard notation for
Jacobian matrices. We use the numerator notation in the article, but point out
that there are papers that use the denominator notation. I think I remember
from engineering school that we used numerator notation so we stuck with that.

~~~
improbable22
This is what index notation is good for, and I encourage everyone to learn it.
Jacobians are dx_a/dx_b, two indices, and clearly b belongs to the derivative.
Whether it's rows or columns is an implementation detail of how you're storing
these numbers.

Index notation also seems natural for programming: an element A[i,j] or a
slice Z[3,4,:] are precisely this.

------
poster123
The book "Matrix Differential Calculus with Applications in Statistics and
Econometrics, 3rd Edition" by Magnus and Neudecker, is available at
[http://www.janmagnus.nl/misc/mdc2007-3rdedition](http://www.janmagnus.nl/misc/mdc2007-3rdedition)
(468 pages).

------
hinkley
So are we turning machine learning into a Euclidean distance calculation
[with] many dimensions, with different weights for each dimension?

That’s... not that sexy. But at least it makes sense to anyone with an
undergrad degree in CS or math, which is something neural networks never
accomplished.

------
letlambda
>For functions of a single parameter, /partial operator/ is equivalent to
/full derivative operator/ (for sufficiently smooth functions).

Are you pulling my leg here, or do I need to scrap my understanding of
calculus?

If they exist, how could they not be equivalent?

------
lprd
In school, I didn't make it much past basic calculus/algebra. As a self-taught
programmer (my highest level of education is a high-school diploma), I
seriously wish I could go back and put more effort into math. I love looking
at these types of topics, but I have absolutely no clue what I'm looking at.

If anyone can recommend any books, courses, or any other material that starts
from high-school level math, and gradually increases in complexity, I would
love to look at it.

Cheers :)

~~~
KSS42
I would start by watching the Linear Algebra and Calculus videos by
3Blue1Brown. This will give you an intuitive understanding of both.

[https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2x...](https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab)

[https://www.youtube.com/playlist?list=PLZHQObOWTQDMsr9K-rj53...](https://www.youtube.com/playlist?list=PLZHQObOWTQDMsr9K-rj53DwVRMYO3t5Yr)

~~~
lprd
Thank you!

------
sydl
Thanks for this great contribution.

I would like to be able to read the math in DL papers. (sorry I'm asking for
something that it's too broad)

1) How much does this document cover the notations in those papers. 2) When I
read a paper and if I am not sure what the math means, does that mean that I
did not grok the subject yet, or the math presented in that paper goes beyond
the math given in this Matrix Calculus document (assuming I studied well this
document).

~~~
blt
While matrix derivatives are important, there is also a lot of other math in
DL papers. In particular, a lot of the probability side concerns expectations,
KL divergences, entropy, etc., which are all defined in terms of integrals or
sums. You need undergraduate-level probability background.

~~~
jph00
The first 5 chapters of the Goodfellow deep learning book are a great resource
for understanding the probability, linear algebra, optimization, and
information theory you need to digest deep learning papers.

------
tw1010
Like, did we ever get this much math on a regular basis back a few years ago
on HN? It's exciting to see how math is seeping into engineering culture.

------
dbetteridge
Fantastic content, thankyou.

If I could make one request it would be a bit of margin/padding on the left of
the body text. Would make it more readable on mobile.

------
cjhanks
Bookmarked, this is a very nice reference I will need to read in greater
depth.

------
gigatexal
Reading this shows me just how little I know about advanced mathematics.

------
peetle
So great! Thanks Terence and Jeremy!

------
bahram_banisadr
Beautifully straightforward - going to be referring to this for quick
refreshers

