
Matrix Calculus for Deep Learning - yarapavan
https://explained.ai/matrix-calculus/index.html
======
madenine
I’m in favor of there being more and better resources to learn anything out
there, but every time I see a deep learning 101 type material all I can think
is “who is this for?”.

In ~July 2016 I was at a presentation by NVidia at GW in DC. They showed off
how easy it was to build out and train a model using some of their tooling
(Digits maybe?). After the demo they opened it up for questions and a grad
student ‘asked’ “You just did in 10 mins with 30 lines of code what I worked
on for an entire semester”.

That’s been the trajectory of the tools and increasing abstraction in this
space. It’s just getting easier and easier to build models that work (which is
great), and it gets easier and easier to do so without knowing more than an
extremely high level overview of the math behind it all.

So while this looks like a great resource - who’s it for?

For jobs/problems that need you to have a thorough understanding of the math
and theory behind the networks this isn’t going to cut it.

For jobs/problems that need you to get something working math or not - this
likely isn’t necessary to get started.

So it’s for people that have been getting into DL but also haven’t bothered or
needed to look up the math concepts?

~~~
nabla9
> So while this looks like a great resource - who’s it for?

I give you an analogy. Electricity. Who needs to know complex numbers and
differential equations to understand electricity? Technician, civil engineer,
scientist or research engineer?

Technician who just wires the house don't need math. They just read the wiring
instructions and follow standard practices. Nvidia boasts about the tools it
builds for 'ML technicians' in this analogy.

You need to know math if you are building new architectures and applying
complex models for something nontrivial. It's not going to work first time and
you need to know what's going on. Even if you are the 'civil engineer' in this
analogy you should be able to read the math and understand it even if you
don't do the math by yourself. You won't be able to do literary research and
learn new stuff if you can't read math fluently.

If you are programmer who is given ML tools to implement something someone
else designed and understand you don't need this or use existing models, you
don't need this. Your career might benefit from knowing it but you can manage
without.

~~~
p1esk
I believe the OP's point was that the math described in the article is too
simple and not enough to do any serious research. Anyone who attempts to do NN
research already knows this material (and a lot more). This tutorial could be
useful to someone who wanted to implement simple backprop from scratch, but
all DL libraries already do it automatically. Someone who just wants to learn
a bit about NNs to classify images or generate text does not need to know
this, and someone who wants to make a breakthrough in NN theory already knows
it. So yes it's not very clear who is the target audience here. I'm guessing
it's for a bright highschooler who just learned calculus and who is interested
in how NNs work. For such students I'd recommend reading
[http://neuralnetworksanddeeplearning.com](http://neuralnetworksanddeeplearning.com)
instead.

~~~
hgoel
But someone who wants to contribute to the research doesn't just have this
knowledge pop into their mind out of nowhere. They're going to learn it from
somewhere and what's wrong with one more resource to help out with that.

------
olooney
I am very impressed with the clarity of presentation here. I usually link to
The Matrix Cookbook[1] when I need to cite a reference for matrix calculus
theorems but I might reference this instead in the future. I particularly like
the section on the vector chain rule (which is very clear) and the section on
element-wise operations (which uses novel notation to present many results in
a compact form.)

[1]:
[https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf](https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf)

------
fg6hr
A genuine question: is there any math behind ML at all? For example, is there
any solid theory, with proven theorems, that would tell us what happens when
we add another conv layer here or use a 3x3 conv kernel instead of a 2x2 one
over there, or replace that tanh with a relu? From my limited understanding,
ML roughly works like this: we shuffle around the ML graph, using some
intuition, off-load it to a cluster of GPUs that costs $10k/hour, feed it a
dataset with 1 billion images and see what happens; but nobody can predict the
behavior of training or convergence or accuracy based on the ML graph and data
alone.

~~~
jeeceebees
Of course there is. All the building blocks that people are mix and matching
in networks nowadays were introduced at some point.

The paper that introduced batch norm, adaptive instance norm, attention heads,
or any module used in a network have an extensive discussion of the motivation
for their existance, some derivation or proof that they do what you want, and
an empirical test to show it helps in practice. The reason some losses allow
GANs to converge in certain situations while others don't isn't a complete
mystery, there is theory that supports this.

Researchers designing new models are considering weak points in old
approaches, identifying why they aren't working correctly, and proposing
something new that solves a part of the problem. All of this is done by
looking at the math behind all the operations in the network (or at least the
parts relevant to a certain question).

That nobody really knows how AI works is one of those myths told by the media.
Just because the model weights aren't interprable doesn't mean we don't know
why that model works well. It just takes quite a bit of maths knowledge to
really understand state of the art models. All that knowledge is also easily
packaged into modern frameworks that make it easy to use without a deep
knowledge of why it works. All of this contributes to the feeling that nobody
really knows what's going on, while in reality it's onky the majority of
people that don't know what's going on ;)

~~~
p1esk
_nobody really knows how AI works is one of those myths told by the media_

It's not a myth. No one really understands how neural networks work. We don't
know why a particular model works well. Or why any model works well. For
example no one can answer why NNs generalize so well even when they have
enough learning capacity to memorize all training examples. We can guess, but
we don't know for sure. Most of the proofs you see in papers are there as
fillers, so that papers seem more convincing. We rarely can prove anything
mathematically about NNs that has any practical value or leads to any
breakthroughs in understanding.

If we did really understand how NNs work, then we wouldn't need to do
expensive hyperparameter searches - we would have a way to determine the
optimal ones given a particular architecture and training data. And we
wouldn't need to do expensive architecture searches, yet the best of the
latest convnets have been found through NAS (e.g. EfficientNet), and there's
very little math involved in the process - it's pretty much just random
search.

Funny you mentioned the batchnorm paper - we still don't know why batchnorm is
so effective - the paper gave an explanation (covariate shift reduction) which
later was shown to be wrong (batchnorm does not reduce it), then several other
explanations were suggested (smoother loss surface, easier gradient flow,
etc), but we still don't know for sure. Pretty much every good idea in NN
field is a result of lots of experimentation, good intuition developed in the
process, looking at how a brain does it, and practical constraints. And yes,
sometimes we're looking at the equations, and thinking hard, and sometimes we
see a better way to do stuff. But usually it starts with empirical tests, and
if successful, some math is used in the attempt to explain things. Not the
other way around.

NNs are currently at a similar point as where physics was before Newton and
before calculus.

~~~
chestervonwinch
> NNs are currently at a similar point as where physics was before Newton and
> before calculus.

I'm more inclined to compare with the era after Newton and Leibniz, but prior
to the development of rigorous analysis. If you look at this time period, the
analogy fits a bit better IMO -- you have a proliferation of people using
calculus techniques to great advantage for solving practical problems, but no
real foundations propping the whole thing up (e.g., no definition of a limit,
continuity, notions of how to deal with infinite series, etc.).

~~~
p1esk
Maybe. On the other hand, maybe a rigorous mathematical analysis of NNs is as
useful as a rigorous mathematical analysis of computer architectures - not
very useful. Maybe all you need is just to keep scaling it up, adding some
clever optimizations in the process (none of the great CPU ideas like caches,
pipelining, out of order execution, branch prediction, etc came from rigorous
mathematical analysis).

Or maybe it's as useful as a rigorous mathematical analysis of a brain -
again, not very useful, because for us (people who develop AI systems), it
would be far more valuable to understand a brain on a circuit level, or an
architecture level, rather than on a mathematical theory level. The latter
would be interesting, but probably too complex to be useful, while the former
would most likely lead to dramatic breakthroughs in terms of performance and
capabilities of the AI systems.

So maybe we just need to keep doing what we have been doing in DL field in the
last 10 years - trying/revisiting various ideas, scaling them up, and evolving
the architectures the same way we've been evolving our computers for the last
100 years, with the hope there will be more clues from neuroscience. I think
we just need more ideas like transformers, capsules, or neural Turing
machines, and computers that are getting ~20% faster every year.

------
master_yoda_1
If anybody want to read the real stuff then here is the reference: Matrix
Computations
[https://www.cs.cornell.edu/cv/Books/GVL/index.htm](https://www.cs.cornell.edu/cv/Books/GVL/index.htm)

~~~
FabHK
The book by Golub (RIP) and Van Loan goes way beyond what's discussed here
(SVD, QR decomposition, eigenvalue computations, iterative solvers, error
analysis, etc.), and it's focused more on (numeric) linear algebra rather than
calculus.

