The matrix calculus you need for deep learning (2018)

dang · 2023-07-31T19:56:59

Matrix calculus for deep learning part 2 - https://news.ycombinator.com/item?id=23358761 - May 2020 (6 comments)

Matrix Calculus for Deep Learning - https://news.ycombinator.com/item?id=21661545 - Nov 2019 (47 comments)

The Matrix Calculus You Need for Deep Learning - https://news.ycombinator.com/item?id=17422770 - June 2018 (77 comments)

Matrix Calculus for Deep Learning - https://news.ycombinator.com/item?id=16267178 - Jan 2018 (81 comments)

quanto · 2023-07-31T16:55:02

The article/webpage is a nice walk-through for the uninitiated. Half the challenge of doing matrix calculus is remembering the dimension of the object you are dealing with (scalar, vector, matrix, higher-dim tensor).

Ultimately, the point of using matrix calculus (or matrices in general) is not just concision of notation but also understanding that matrices are operators acting on members of some spaces, i.e. vectors. It is this higher level abstraction that makes matrices powerful.

For people who are familiar with the concepts but need a concise refresher, the Wikipedia page serves well:

https://en.wikipedia.org/wiki/Matrix_calculus

PartiallyTyped · 2023-07-31T17:22:18

Adding, these operators are also "polymorphic"; for matrix multiplication the only operations you need are (non commutative) multiplication and addition; thus you can use elements of any non-commutative ring, i.e. a set of elements with those two operations :D

Matrices themselves form non-commutative rings too; and based on this, you can think of a 4N x 4N matrix as a 4x4 matrix whose elements are NxN matrices [1] :D

[1] https://youtu.be/FX4C-JpTFgY?list=PL49CF3715CB9EF31D&t=1107

You already know whose lecture it is :D

I love math.. I should have become a mathematician ...

tikhonj · 2023-08-01T00:13:59

You can even generalize linear algebra algorithms to closed semirings and have some really cool algorithms pop out, like finding the shortest path in graphs. There's a great paper called "Fun with Semirings" that goes into more details; unfortunately looks like the PDF isn't easily available online any more, but I found some slides[1] that seem to cover the same ideas well enough.

[1]: https://pdfs.semanticscholar.org/2e43/477e26a54b2d1a046c2140...

PartiallyTyped · 2023-08-01T10:47:24

Okay I went over the slides and good lord this would have made my life easier not too long ago.

PartiallyTyped · 2023-08-01T06:39:56

This deserves its own HN post imho.

mrfox321 · 2023-07-31T17:37:32

Re [1]: it's fairly concrete to simply say that matrix multiplication can be performed block-wise.

PartiallyTyped · 2023-07-31T18:25:58

I don’t disagree; but that is just an example of MM. The gist is not that you can do block multiplication; but that you can define matrices over any non commutative ring, which includes other matrices - ie blocks.

mrfox321 · 2023-07-31T19:20:38

Yeah matrices are more abstract. I guess I am just pointing out that your concrete example of non-commutative rings (matrices of matrices) still needs a proof to demonstrate bijection between 4N x 4N (scalar) and 4 x 4 (N x N(scalar)).

Block MM demonstrates the equivalence.

SnooSux · 2023-07-31T15:15:00

This is the resource I wish I had in 2018. Every grad school course had a Linear Algebra review lecture but never got into the Matrix Calculus I actually needed.

ayhanfuat · 2023-07-31T15:43:31

That was my struggle, too. Imperial College London has a small online course which covers similar topics (https://www.coursera.org/learn/multivariate-calculus-machine...). It helped a lot.

unpaddedantacid · 2023-07-31T16:12:01

I just finished my first year in an AI bachelors, we saw Linear Algebra with basic matrix calculations and theorems, so much calculus that the notes take up 3GB space, physics, phycology and very outdated logic classes and basics to python which left many of the students wondering how to import a library

dpflan · 2023-07-31T15:57:22

True, this was a designated resource during my studies (2020/2022), but they were post-2018.

cs702 · 2023-07-31T15:16:59

Please change the link to the original source:

https://arxiv.org/abs/1802.01528

---

EDIT: It turns out explained.ai is the personal website of one of the authors, so there's no need to change the link. See comment below.

parrt · 2023-07-31T16:21:05

:) Yeah, I use my own internal markdown to generate really nice html (with fast latex-derived images for equations) and then full-on latex. (tool is https://github.com/parrt/bookish)

I prefer reading on the web unless I'm offline. The latex its super handy for printing a nice document.

cs702 · 2023-07-31T18:53:53

Even though it's shockingly common, I never cease to be surprised and delighted when authors who are on HN take the time to reply to comments about their work.

Thank you for doing this with Jeremy and sharing it with the world!

parrt · 2023-07-31T19:49:18

Sure thing! Very enjoyable to have people use our work.

liorben-david · 2023-07-31T15:28:53

Explained.ai seems to be Terrence Parr's personal site

cs702 · 2023-07-31T15:37:36

Thank you for pointing it out. I edited my comment.

trolan · 2023-07-31T16:01:46

I finished Vector Calculus last year and have no experience in machine learning but this seems exceptionally thorough and would have made my life easier having a practical explanation over a mathematical one, but woe is the life of the engineering student I guess.

parrt · 2023-07-31T16:18:05

Glad to be of assistance! Yeah, It really annoyed me that this critical information was not listed in any one particular spot.

rdedev · 2023-07-31T18:58:07

I had followed this when I was learning DL through Andrew NG's course. In one of the lessons, he had the formula for calculating the loss as well as it's derivatives.

I tried driving these formulas from scratch using what I learned from OP's post but it felt like there was something missing. I think it boils down to me not knowing how to aggregate those element wise derivatives into a matrix form. Afaik the Matrix cookbook and certain notes from Stanford cs231n that helped me grok it fully

bluerooibos · 2023-07-31T19:07:48

Oh nice, I did most of this in school, and during my non-CS engineering degree. Thanks for sharing!

Always wanted to dip my toes into ML, but I've never been convinced of it's usefulness to the average solo developer, in terms of things you can build with this new knowledge. Likely I don't know enough about it to make that call though.

williamcotton · 2023-07-31T20:19:03

Here’s an ML project I’ve been working on as a solo dev:

https://github.com/williamcotton/chordviz

Labeling software in React, CNN in PyTorch, prediction on app in SwiftUI. 12,000 and counting hand labeled images of my hand on a guitar fretboard!

godelski · 2023-07-31T18:38:57

There's a common belief that you don't need math for ML or that you need a lot of math for ML. So let me clarify:

You don't need math to make a model perform well, but you do need math to know why your model is wrong.

nsajko · 2023-07-31T19:12:44

Another matrix math reference: https://github.com/r-barnes/MatrixForensics

_the_inflator · 2023-07-31T16:56:28

I just had a glimpse look at it. A good sum-up.

It seems that these topics are covered by the first one or two semesters of a Math degree. Of course university is a bit more advanced.

jayro · 2023-07-31T19:20:50

We just released a comprehensive online course on Multivariable Calculus (https://mathacademy.com/courses/multivariable-calculus), and we also have a course on Mathematics for Machine Learning (https://mathacademy.com/courses/mathematics-for-machine-lear...) that covers just the matrix calculus you need in addition to just the linear algebra and statistics you need, etc. I'm a founder and would be happy to answer any questions you might have.

thewataccount · 2023-07-31T20:56:13

I understand you don't have a free trial, is there any chance you have a demo somewhere of what it actually looks like though? Like a tiny sample lesson or something along those lines? It looks interesting but I'm just uncertain as to what it actually "feels" like in practice vs lets say Brilliant, etc.

I only see pictures, I'm curious the extent of the interaction in the linear algebra/matrix calc specifically

jayro · 2023-07-31T23:45:51

That's a good point! We definitely need to add some more information to the website. In the meantime, if you send an email to support@mathacademy.com, I'd be happy to give you demo over Zoom and answer any questions you might have.

barrenko · 2023-07-31T19:31:04

Whom do you think Mathematics for Machine Learning benefits? In my personal opinion the only audience for a plethora of courses and articles available in that regard is useful mostly to the people that recently went through college level Linear Algebra.

I'd like more resources geared for people that are done with Khan Academy and want something as well made for more advanced topics.

jayro · 2023-07-31T19:46:13

The Mathematics for Machine Learning course doesn't assume knowledge of Linear Algebra, but covers the basics of Linear Algebra you'll need along with the basics of Multivariable Calculus, Statistics, Probability, etc. it does however, assume knowledge of high-school math and Single Variable Calculus. If you've been out of school for while, our adaptive diagnostic exam will identify your knowledge gaps and create a custom course for you that includes the necessary remediation.

If you're REALLY rusty (maybe you've been out of school for a while 5+ years), or maybe you just never learned the material that well in the first place, then you might want to start with one of our Mathematical Foundations courses that will scaffold you up to the level where you can handle the content in Mathematics for Machine Learning. More info can be found here: https://mathacademy.com/courses

The Mathematics for Machine Learning course would be ideal for anyone who majored in a STEM subject like CS (or at least has a solid mathematical foundation) and is interested in doing work in machine learning.

barrenko · 2023-08-01T09:07:27

Appreciate the reply, hopefully subscribing to your service beginning of next year (after I am done with Khan Academy math).

thatsadude · 2023-07-31T17:47:35

vec(ABC)=kron(C.T,A)vec(C) is all your need for matrix calculus!

esafak · 2023-07-31T18:27:04

Can anyone provide an intuitive explanation?

fjkdlsjflkds · 2023-08-01T06:15:57

I guess op meant "vec(ABC)=kron(B.T,A)vec(C)", and my attempt at explaining it would be:

If you take the result of transforming the columns vectors in the C matrix by AB and vectorize it you get the same as vectorizing first C and then transforming it by a block matrix obtained as the Kronecker product of B transposed and A.

The significance is that it performs a reduction of matrix calculus to vector calculus (i.e., it shows that you can convert any matrix calculus operation/formula/statement into a vector calculus operation/formula/statement).

hayasaki · 2023-07-31T20:00:48

They have an error in their formula, but the vectorized form(stacking columns of the matrix to form a vector) of the triple matrix multiplication(A times B times C) can be changed to a form involving kronecker products against another vectorized matrix.

I wouldn't say that is everything, but it is a useful trick.

esafak · 2023-07-31T21:37:07

That is just reading out the equation in English. My question is, why is it so?

hayasaki · 2023-07-31T22:12:05

The correct version you can find here: https://en.wikipedia.org/wiki/Kronecker_product#Matrix_equat...

The answer for why it is so is pretty trivial(just do the indexing for each element) if you know the definition of the kronecker product and what the 'vec' operation is.

For an intuitive explanation, try thinking of how the matrix multiplication would work and consider how the kronecker product pattern would apply to the vector.

This honestly isn't a super interesting result, and I would say the original commenter was overstating its importance in the matrix calculus. It really is more useful for solving certain matrix problems, or speeding up some tensor product calculations if you have things with a certain structure. Like if we have discretization of a PDE then depending on the representation the operator in the discrete space may be a sum of kronecker products, so applying those can be fast using a matrix multiply and never storing the kroneckered matrices.

scrubs · 2023-07-31T01:20:59

Darn good post!