
Relearning Matrices as Linear Functions - dhruvp
https://www.dhruvonmath.com/2018/12/31/matrices/
======
_hardwaregeek
Linear Algebra, at least at my school, is taught pretty poorly. Instead of
teaching the beauty of transformations, the course is boggled down in
numerical nonsense and tedious calculations (who wants to find the inverse of
a 3x3 matrix? Bueller? Bueller?). Only after learning Algebra and
homomorphisms, isomorphisms and automorphisms did I appreciate the importance
of linear transformations. Stuff like Singular Value Decomposition gets a lot
more interesting once you know some basic Algebra. I suppose Linear can't get
too abstract because non math majors have to take it, but starting from
generalized ideas of transformations is a far better way to teach it imo.

~~~
postsantum
That was exactly my experience. Struggled with matrices theory at uni doing
some bullshit exercises but started to grasp the topic only when I needed to
apply some linear transformation in a game

~~~
datasciencetext
I think the situation has improved somewhat as visualization tools have become
easier to use. We made this simple visual [1] to help people understand what
they might get out of linear algebra, and it was easy enough for some
statisticians to accomplish.

[1][https://datasciencetexts.com/subjects/linear_algebra.html](https://datasciencetexts.com/subjects/linear_algebra.html)

~~~
steve19
Nice site, but it's worth giving some info about yourself on the site and why
I should trust your advice, given that these books are expensive.

In elementary machine learning, you give two options. You should really
include introduction to statistical learning by the same folks who wrote ESL.
It's a great book that covers the same ground as ESL but with less math.

~~~
datasciencetext
Thanks for the feedback! ISL is indeed a good option, especially for the more
application-oriented; it's on the todo list!

------
andrewla
It took until I started learning differential geometry in the form of General
Relativity to arrive at this insight, even though I feel like the notion of a
matrix as a linear map was drilled in pretty thoroughly. The notion of matrix
multiplication as function composition was presented almost as an interesting
side effect of matrix multiplication -- that is, multiplication by these rules
came first, and, hey, look, they compose!

Personally I found the prospect of tensor algebra to be much more intuitive
than either of these; with matrices thrown in mostly as a computational
device. Even a vector (through the dot product) is just a linear function on
other vectors, and the notion of function composition carries through to that
and to higher-order tensors.

Covariance and contravariance are a little more complicated to completely
grok, but for most applications in Euclidean space (where the metric is the
identity function) the distinction is of more theoretical interest anyway.

~~~
ijidak
The metric?

~~~
throwawaymath
A _metric_ is a distance function. Defining a metric on a space is one of ways
you create a topology.

I'm not sure what the parent means by the metric being the identity function,
however. The Euclidean metric is basically the hypotenuse of a triangle
parameterized by two vectors. The adjacent and opposite sides of the triangle
are measured to be the Euclidean norm of each vector (their length), and the
hypotenuse is the shortest distance between them.

The Euclidean metric is not the _only_ metric - you can define distance
however you'd like as long as it's consistent. But I'm not sure how the
identity function works as a metric, because that would map a vector to
another vector, not a scalar.

~~~
andrewla
In differential geometry the metric [1] is a tensor that defines the
relationship of vectors in the space to vectors in the tangent space. The
identity function as a metric means that you are in a locally flat space where
geodesics (the path taken by traveling in a given direction) are straight
lines.

A metric in a traditional metric space is a global distance function; you can
use the metric tensor in a Riemannian manifold to allow integration to find
the distance between two points.

[1]
[https://en.wikipedia.org/wiki/Metric_tensor](https://en.wikipedia.org/wiki/Metric_tensor)

~~~
throwawaymath
Ah, so that's the setting we're talking about. Thanks for explaining that.

------
munchbunny
In my high school matrices were first taught in geometry class, starting with
using matrices as affine transformations in 2-d and then 3-d, and using that
to teach concepts like what eigenvectors/values are, the equivalence of matrix
and function composition, etc.

That was taught right after a unit on complex numbers and trigonometry so that
we could see the parallels between composing polynomial functions on complex
numbers and composing affine transformations.

To this day I think that was one of the most beautiful and eye opening lessons
I've had in mathematics.

In hindsight, I think I got lucky that the teachers who wrote the curriculum
this way were math, physics, and comp sci masters/phd's who looked at their
own educations and decided that geometry class was a great Trojan horse for
linear algebra.

~~~
rramadass
You certainly were lucky to be taught Linear Algebra in such a manner! I came
to understand the importance of such an approach only after a lot of head-
scratching and self-study. IMO, a beautiful and important branch of
"Practical" Maths has been needlessly obscured by the pedantic formalism
espoused by the teaching community. Linear Algebra SHOULD always be taught
alongside Coordinate/Analytic Geometry and Trigonometry for proper intuition.

I found the book "Practical Linear Algebra: A Geometry Toolbox" very helpful
in my study.

------
whatshisface
FWIW, I was told that matrices are linear maps pretty early on in my
education. Are there any college level linear algebra / matrix calculations
courses that _don 't_ tell students about that?

~~~
btilly
Sadly, there are. Or at least were.

When I went through university the standard set of courses was a Calculus
course that was mostly about derivatives, a second one that was mostly about
integrals, a third Calculus course that was about multi-variable Calculus.
That third course necessarily had to teach matrices, and taught it as rote
calculations. There was a follow-up differential equations course which
refreshed people's memories of matrices..as a rote calculation.

It was done this way because the multi-variable Calculus course was a
prerequisite for a lot of physics+engineering courses. So a lot of students
wanted to take that sequence. Differential equations were a prerequisite for
some other advanced courses. Linear algebra was pretty much just for math
majors.

~~~
jimhefferon
For me also. It is still that way in many programs, that I can tell.

------
dhruvp
Hey OP here!

When I first was introduced to matrices (high school) it was in the context of
systems of equations. Matrices were a shorthand for writing out the equations
and happened to have interesting rules for addition etc. It took me a while to
think about them as functions on their own right and not just tables. This
post is my attempt to relearn them as functions which has helped me develop a
much stronger intuition for linear algebra. That’s my motivation for this post
and why I decided to work on it. Feedback is more than welcome.

~~~
msla
What got me for a while was the concept of a tensor:

For example: What is a tensor?

Wrong way to answer it: Well, the number 5 is a tensor. So's a row vector.
So's a column vector. So's the dot product and the cross product. So's a two-
dimensional matrix. So's a four-dimensional matrix, just... don't ask me to
write one on the board, eh? So's this Greek letter with smaller Greek letters
arranged on its top right and bottom right. Literally anything you can think
of is a tensor, now... try to find some conceptual unity.

Then coordinate-free fanaticism kicked in, robbing the purported explanations
of any explanatory power in terms of practical applications of tensors. The
only thing they could _do_ was shift indices around.

What finally made it stick is decomposing every mathematical concept into
three parts:

1\. Intuition, or why we have the concept to begin with.

2\. Definitions, or the axioms which "are" the concept in some formal sense.

3\. Implementations, or how we write specific instances of the concept down,
including things like the source code of software which implements the
concept.

~~~
brianberns
As a layman, the word "tensor" always intimidated me. As a programmer, I was
surprised then when I found out that a tensor is just a multi-dimensional
array (where the number of dimensions can be as small as 0). That was a
concept I was already quite comfortable with.

~~~
soVeryTired
That's a bit like saying a vector is 'a row of numbers'. Not incorrect, but
missing the point. What matters is what vectors _do_. It's the properties like
pointwise addition, scalar multiplication, and existence of an inverse that
make vectors vectors.

------
michelpp
This is great and a nice mathematical approach to the ideas of matrices.
Another great resource is 3blue1brown's essence of linear algebra:

[https://www.3blue1brown.com/essence-of-linear-algebra-
page](https://www.3blue1brown.com/essence-of-linear-algebra-page)

Math is Fun also has a nice writeup that explain matrix multiplication from a
real world example of a bakery making pies and tracking costs:

[https://www.mathsisfun.com/algebra/matrix-
multiplying.html](https://www.mathsisfun.com/algebra/matrix-multiplying.html)

------
noobermin
One of the things that always irked me about the term "linear transformation"
is it doesn't include affline transformations, which is funny because back in
elementary school, you learn that a "linear equation" looks like Mx + b. Of
course, the article states the term "linearity" when talking vector spaces (or
modules) means linearity in arguments, while the term linear for a child in
school means "something like a line on graph paper", and this is yet another
example of terminology in the way mathematics is taught, possibly for
historical reasons, that leads to even more confusion.

PS. incase you didn't know, affline transformations are not linear:

    
    
      f(x) = mx + b =>
      f(x+y) = m(x+y) + b /= mx+b + my+b = f(x) + f(y),
      f(cx) = c m x + b /= c(mx + b) = c f(x)

------
tptacek
Their most recent post about kernels is even better than this:

[https://www.dhruvonmath.com/2019/04/04/kernels/](https://www.dhruvonmath.com/2019/04/04/kernels/)

The matrix/function stuff is elementary enough that I understand it
intuitively (I suck at math), although it's neat to be reminded that given a
enough independent points you can reconstruct the function (this breaks a
variety of bad ciphers, sometimes including ciphers that otherwise look
strong).

The kernel post actually does some neat stuff with the kernel, which I found
more intuitively accessible than (say) what Strang does with nullspaces.

------
adenadel
If you're interested in this approach to linear algebra you should read Linear
Algebra Done Right by Sheldon Axler.

~~~
avip
Or pretty much any other Linear Algebra book.

~~~
ulucs
Axler's book has the advantage of skipping determinants in order to provide a
more intuitive approach to linear algebra.

~~~
throwawaymath
I strongly disagree skipping determinants provides a more intuitive approach
to linear algebra. I don't know your background, but I'd venture a guess you
feel it does because the Laplace expansion formula for computing the
determinant[1] feels uninspired and out of place.

The reason determinants are hard to teach (in my opinion) is because a
rigorous derivation of their formula isn't possible without first teaching
multilinear algebra and constructing the exterior algebra. Once you do those
things, the natural geometric interpretation of the determinant basically
falls onto your lap. But it's still very useful for e.g. computing eigenvalues
and using the characteristic polynomial, so it's taught before that context
can be formalized.

Professors shouldn't teach determinants in the context of matrices, at least
not at first. That's heavily computation-focused, and the symbol pushing looks
really unmotivated and strange to students. Instead they should teach the
basis-free definition of determinants (i.e. focus on the linear map, not the
matrix transformation representing the linear map for some basis). Then the
determinant is "only" the volume of the image of the unit hypercube under the
linear transformation, which is where the parallelepiped comes in. If the
linear transformation is invertible, the unit hypercube is transformed from an
_n_ -dimensional cube into an _n_ -dimensional parallelogram, from which you
can geometrically see the way the linear map transforms the entire vector
space it's defined over.

3Blue1Brown has a very good video on the geometry underlying the
determinant[2]. For a more rigorous presentation which constructs the exterior
algebra and derives the determinant formula using the wedge product, Noam
Elkies has notes[3][4] for when he teaches Math 55A at Harvard. Incidentally
Noam Elkies uses Axler's book, and while he obviously approves of it he's
pretty upfront in asserting that the determinant should be taught anyway[5].

________________________

1\. [http://mathb.in/33068](http://mathb.in/33068)

2\.
[https://www.youtube.com/watch?v=Ip3X9LOh2dk](https://www.youtube.com/watch?v=Ip3X9LOh2dk)

3\.
[http://www.math.harvard.edu/~elkies/M55a.10/p8.pdf](http://www.math.harvard.edu/~elkies/M55a.10/p8.pdf)

4\.
[http://www.math.harvard.edu/~elkies/M55a.10/p9.pdf](http://www.math.harvard.edu/~elkies/M55a.10/p9.pdf)

5\.
[http://www.math.harvard.edu/~elkies/M55a.10/index.html](http://www.math.harvard.edu/~elkies/M55a.10/index.html)

~~~
rocqua
I agree, the way I still see determinant is as the 'volume scaling factor' of
a linear transformation.

This means it makes sense that det(A) = 0 means A is non-invertible. It also
makes a lot of sense when the jacobian pops up in the multi-dimensional chain
rule.

Given the above, and the Cayley–Hamilton theorem, I never really had to know
why the determinant was calculated the way it is. The above give enough of an
interface to work with it.

------
meuk
It recently occurred to me that if you use that matrices represent linear
functions, you don't have to do tedious math to prove that matrix
multiplication is associative (that is, (A * B) * C = A * (B * C), which
allows us to write A * B * C without brackets, since it doesn't matter how we
place the brackets anyway).

For a matrix M, denote f_M(x) = M * x. Then f_{A * B} = f_A(f_B(x)) so that
f_{(A * B) * C} = f_{A * B}(f_C(x)) = f_A(f_B(f_C(x))) and also f_{A * (B *
C)} = f_A(f_{B * C}(x)) = f_A(f_B(f_C(x))).

So f_{(A * B) * C} = f_{A * (B * C)} = f_A(f_B(f_C(x)))

------
adamnemecek
Conjugate transpose and other adjoints are kinda nuts, they are the other part
of the story

[http://www.reproducibility.org/RSF/book/bei/conj/paper_html/...](http://www.reproducibility.org/RSF/book/bei/conj/paper_html/index.html)

Esp the ray tracing/topology relationship is nuts.

------
ivan_ah
Nice! The illustrations + color coding for the vectors are very useful.

Here is a video tutorial that goes through some of the same topics (build up
matrix product from the general principle of a linear function with vector
inputs):
[https://www.youtube.com/watch?v=WfrwVMTgrfc](https://www.youtube.com/watch?v=WfrwVMTgrfc)

Associated Jupyter notebook here:
[https://github.com/minireference/noBSLAnotebooks/blob/master...](https://github.com/minireference/noBSLAnotebooks/blob/master/chapter02_linearity_intuition.ipynb)

------
Jun8
Good, intuitive introduction to matrices. Next steps could be showing that
there are infinitely many different matrix representations of a linear map
(different from the polynomials) and they can be used for function spaces,
too.

One question that usually pops up that I was confused about till recently: are
rank two tensor equivalent to matrices? Answer is no, e.g. see here:
[https://physics.stackexchange.com/questions/20437/are-
matric...](https://physics.stackexchange.com/questions/20437/are-matrices-and-
second-rank-tensors-the-same-thing)

~~~
dhruvp
Hey!

Thanks for the feedback. I go into this in the next post on eigenvectors here:
[https://www.dhruvonmath.com/2019/02/25/eigenvectors/](https://www.dhruvonmath.com/2019/02/25/eigenvectors/).
I start by discussing basis vectors which I believe is what you’re looking for
in your comment.

------
S4M
I just skimmed the article quickly. Are there other ways to learn about
matrices? If you don't treat them as linear applications, they are just boring
grids of numbers and the matrices multiplication doesn't make any sense.

~~~
thegabriele
Which is precisely how they were presented to me at college.

------
sytelus
The basic equivalency is fine but what about all other things you can do with
matrices but can’t do with functions? For example, what is the equivalent of
transpose in functions? How about Eigen values or Gaussian elimination?

------
zwieback
Nice article. That's how I learned matrices in high school in Germany. Maybe
it's different here in the US, I'll have to take a look at my daughters'
textbooks.

~~~
a_t48
They were in the textbook in my high school, but we always skipped that
chapter.

------
kregasaurusrex
Having not taken a linear algebra course in college, does anyone have a
recommendation for a book/course to follow?

~~~
rocqua
That would heavily depend on whether you are coming at it from a theoretical
math p.o.v. or a more applied p.o.v.

Not that the applied approach should leave out the theory, because theoretical
stuff like this article give a great and intuitive understanding of linear
algebra. However, the more theoretical treatments should set up things like
rings, modules, and even category theory that are much less useful from an
applied perspective.

For the theoretical approach I've heard good things about 'linear algebra done
right'. I imagine it is less appealing for the applied approach. All I can say
is be wary of the 'shut up and calculate' mindset in linear algebra. Getting
the ideas behind the concepts is essentially a shortcut to understanding
linear algebra without any downsides.

------
diehunde
Gilbert Strang uses similar approach on his Linear Algebra lectures. Much more
intuitive

------
mikorym
The next relearning step is to construct the category where arrows are
matrices...

~~~
wolfgke
> The next relearning step is to construct the category where arrows are
> matrices...

Why not the category of vector spaces (morphisms are linear maps)?

~~~
mikorym
So yes, this is equivalent to FinVect of the field of which entries in the
matrices consist.

The difference is that here you construct the category from a simpler premise.
To construct FinVect you need to include all set objects with structure
satisfying some axioms.

The category of matrices is simply positive integers with as morphisms n x m
matrices between the two integers. Composition is matrix multiplication.

Here [1] is a nice overview. If you can follow what is going on there, it is
worth while looking at II, III and IV.

[1] [https://unapologetic.wordpress.com/2008/06/02/the-
category-o...](https://unapologetic.wordpress.com/2008/06/02/the-category-of-
matrices-i/)

------
je42
this was an important result in the linear algebra class for first year
math/cs/eng students at my university.

------
Grustaf
What could possibly be a more basic understanding of a matrix in mathematics?
There’s a reason the first teach you Linear Algebra before anything else.

------
j7ake
Who is "we" in this context?

~~~
dhruvp
Hey! I wrote this article - “we” is referring to people who had a similar
educational experience to me. I was introduced to matrices as a tool for
solutions to systems of equations. I always wish I was taught the functional
perspective from the beginning.

