Hacker News new | past | comments | ask | show | jobs | submit login
Relearning Matrices as Linear Functions (dhruvonmath.com)
294 points by dhruvp 6 months ago | hide | past | web | favorite | 93 comments

Linear Algebra, at least at my school, is taught pretty poorly. Instead of teaching the beauty of transformations, the course is boggled down in numerical nonsense and tedious calculations (who wants to find the inverse of a 3x3 matrix? Bueller? Bueller?). Only after learning Algebra and homomorphisms, isomorphisms and automorphisms did I appreciate the importance of linear transformations. Stuff like Singular Value Decomposition gets a lot more interesting once you know some basic Algebra. I suppose Linear can't get too abstract because non math majors have to take it, but starting from generalized ideas of transformations is a far better way to teach it imo.

That was exactly my experience. Struggled with matrices theory at uni doing some bullshit exercises but started to grasp the topic only when I needed to apply some linear transformation in a game

I think the situation has improved somewhat as visualization tools have become easier to use. We made this simple visual [1] to help people understand what they might get out of linear algebra, and it was easy enough for some statisticians to accomplish.


Nice site, but it's worth giving some info about yourself on the site and why I should trust your advice, given that these books are expensive.

In elementary machine learning, you give two options. You should really include introduction to statistical learning by the same folks who wrote ESL. It's a great book that covers the same ground as ESL but with less math.

Thanks for the feedback! ISL is indeed a good option, especially for the more application-oriented; it's on the todo list!

At mine we have an "applied" linear algebra course for engineering students and the "normal" one that math people take.

I didn't pass the "normal" one but I think that's because I had another 300 level math course, capstone, and four other CS courses at the same time. I'm certain I wouldn't have passed the applied one, it looked very tedious and that's usually what gets me with homework.

It took until I started learning differential geometry in the form of General Relativity to arrive at this insight, even though I feel like the notion of a matrix as a linear map was drilled in pretty thoroughly. The notion of matrix multiplication as function composition was presented almost as an interesting side effect of matrix multiplication -- that is, multiplication by these rules came first, and, hey, look, they compose!

Personally I found the prospect of tensor algebra to be much more intuitive than either of these; with matrices thrown in mostly as a computational device. Even a vector (through the dot product) is just a linear function on other vectors, and the notion of function composition carries through to that and to higher-order tensors.

Covariance and contravariance are a little more complicated to completely grok, but for most applications in Euclidean space (where the metric is the identity function) the distinction is of more theoretical interest anyway.

I found this series by eigenchris helpful to understand tensors https://www.youtube.com/watch?v=8ptMTLzV4-I&list=PLJHszsWbB6...

The metric?

A metric is a distance function. Defining a metric on a space is one of ways you create a topology.

I'm not sure what the parent means by the metric being the identity function, however. The Euclidean metric is basically the hypotenuse of a triangle parameterized by two vectors. The adjacent and opposite sides of the triangle are measured to be the Euclidean norm of each vector (their length), and the hypotenuse is the shortest distance between them.

The Euclidean metric is not the only metric - you can define distance however you'd like as long as it's consistent. But I'm not sure how the identity function works as a metric, because that would map a vector to another vector, not a scalar.

In differential geometry the metric [1] is a tensor that defines the relationship of vectors in the space to vectors in the tangent space. The identity function as a metric means that you are in a locally flat space where geodesics (the path taken by traveling in a given direction) are straight lines.

A metric in a traditional metric space is a global distance function; you can use the metric tensor in a Riemannian manifold to allow integration to find the distance between two points.

[1] https://en.wikipedia.org/wiki/Metric_tensor

Ah, so that's the setting we're talking about. Thanks for explaining that.

The metric in a vector space is a dot product. If you just have one vector space, it’s not that interesting then, but if you have many vector spaces all glued together (like a tangent space to a curved surface), then looking at how the dot product varies between nearby tangent spaces tells you a lot about the surface (Gaussian curvature and so on).

In my high school matrices were first taught in geometry class, starting with using matrices as affine transformations in 2-d and then 3-d, and using that to teach concepts like what eigenvectors/values are, the equivalence of matrix and function composition, etc.

That was taught right after a unit on complex numbers and trigonometry so that we could see the parallels between composing polynomial functions on complex numbers and composing affine transformations.

To this day I think that was one of the most beautiful and eye opening lessons I've had in mathematics.

In hindsight, I think I got lucky that the teachers who wrote the curriculum this way were math, physics, and comp sci masters/phd's who looked at their own educations and decided that geometry class was a great Trojan horse for linear algebra.

You certainly were lucky to be taught Linear Algebra in such a manner! I came to understand the importance of such an approach only after a lot of head-scratching and self-study. IMO, a beautiful and important branch of "Practical" Maths has been needlessly obscured by the pedantic formalism espoused by the teaching community. Linear Algebra SHOULD always be taught alongside Coordinate/Analytic Geometry and Trigonometry for proper intuition.

I found the book "Practical Linear Algebra: A Geometry Toolbox" very helpful in my study.

FWIW, I was told that matrices are linear maps pretty early on in my education. Are there any college level linear algebra / matrix calculations courses that don't tell students about that?

Sadly, there are. Or at least were.

When I went through university the standard set of courses was a Calculus course that was mostly about derivatives, a second one that was mostly about integrals, a third Calculus course that was about multi-variable Calculus. That third course necessarily had to teach matrices, and taught it as rote calculations. There was a follow-up differential equations course which refreshed people's memories of matrices..as a rote calculation.

It was done this way because the multi-variable Calculus course was a prerequisite for a lot of physics+engineering courses. So a lot of students wanted to take that sequence. Differential equations were a prerequisite for some other advanced courses. Linear algebra was pretty much just for math majors.

For me also. It is still that way in many programs, that I can tell.

It's possible if you learned matrices outside the context of a linear algebra class that you might be mystified about what's actually happening.

I took Linear Algebra the same semester I took Computer Graphics, which worked out really well for me - the first half of LA taught me everything I needed to know about transformation matrices, and the second half of CG covered 3D graphics in OpenGL. The first half of CG was all 2D graphics stuff, and the second half of LA was about eigenvectors/values - I've forgotten everything from that part of the class.

> FWIW, I was told that matrices are linear maps pretty early on in my education. Are there any college level linear algebra / matrix calculations courses that don't tell students about that?

I know they taught us about matrices in high school, but I don't recall them talking about any applications at all. I think the topic was pretty drained of context, just rote application of the rules for add/subtract/multiply/etc.

Sure they'll tell you that in passing, but won't really explain what that actually means. Certainly Linear Algebra 1 at college was a lot of apply this method to this object (that we'll call a matrix) to calculate this thing called a determinant. Don't worry about what it is or what it represents, just check if it's zero or not. If it's not zero apply this other method to calculate its inverse. Repeat.

I agree. It's called Linear Algebra for a reason. :) That said, I still like the presentation given in this article.

Same here in the UK - a big chunk of the Oxford School Mathematics Project O-level syllabus in the early 80s comprised the various transformations and the matrix operations needed to create them.

I'm sure I was told, but I don't think it was strongly emphasized by my instructors. It comes pretty late in Strang's text, for instance.

I find Strang's text to be unnecessarily tedious. Both of Lang's LA textbooks (Intro to LA, and LA) both take linear maps as the core point of the text.

Hey OP here!

When I first was introduced to matrices (high school) it was in the context of systems of equations. Matrices were a shorthand for writing out the equations and happened to have interesting rules for addition etc. It took me a while to think about them as functions on their own right and not just tables. This post is my attempt to relearn them as functions which has helped me develop a much stronger intuition for linear algebra. That’s my motivation for this post and why I decided to work on it. Feedback is more than welcome.

What got me for a while was the concept of a tensor:

For example: What is a tensor?

Wrong way to answer it: Well, the number 5 is a tensor. So's a row vector. So's a column vector. So's the dot product and the cross product. So's a two-dimensional matrix. So's a four-dimensional matrix, just... don't ask me to write one on the board, eh? So's this Greek letter with smaller Greek letters arranged on its top right and bottom right. Literally anything you can think of is a tensor, now... try to find some conceptual unity.

Then coordinate-free fanaticism kicked in, robbing the purported explanations of any explanatory power in terms of practical applications of tensors. The only thing they could do was shift indices around.

What finally made it stick is decomposing every mathematical concept into three parts:

1. Intuition, or why we have the concept to begin with.

2. Definitions, or the axioms which "are" the concept in some formal sense.

3. Implementations, or how we write specific instances of the concept down, including things like the source code of software which implements the concept.

If you ask a mathematician a tensor is an element of a tensor product, just like a vector is an element of a vector space. This moves the question to "what is a tensor product", which you can think about as a way to turn bilinear maps into linear maps (this is an informal statement of the universal property of the tensor product, you also need a proof of existence of such an object, but it's easy for vector spaces and alright for modules after seeing enough algebra)

Crikey, I hope I never have to talk to that mathematician! That's a terse, unintuitive definition that isn't very helpful unless you're already familiar with the concepts. (Also maybe you meant linear maps into bilinear?)

Reminds me of the time an algebraist mentioned to me that he was working on profinite group theory. I asked what a profinite group was, and he immediately replied 'an inverse limit of an inverse system', with no follow up. Well thanks buddy, that really opened my eyes.

Math is just a much deeper topic than most others. The things people do in research level math can take a really long time to explain to a lay person because of the many layers of abstraction involved.

It is a very deep and specialised topic. However, there are ways to convey intuition to a 'mathematically mature' audience, and there are quick definitions that are correct but unenlightening. I much prefer the former :)

No, it turns bilinear maps into linear one! If you have three R-modules (one can read K-vector spaces if unfamiliar with modules) N,M,P and a bilinear map N×M→P then there is a unique linear map N⊗M→P compatible with the map N×M→N⊗M which is part of the structure of a tensor product. (What's really going on here in fancy terms is the so called Hom-Tensor adjunction because the _⊗M functor is adjoint to the Hom(M,_) functor, but just thinking about bilinear and linear maps is much clearer)

Ok fair enough, I think I get you.

I think the ideas behind the coordiate-free formulation of tensor calculus make it relatively easy though.

A tensor is a function that takes an ordered set of N covariant vectors (i.e. row vectors) and M contravariant vectors (i.e. column vectors) and spits out a real number. It has to be linear in each of its arguments.

I'm pretty sure all the complicated transforms follow from that definition (though you may have to assume the Leibniz rule - I can't remember), and from ordinary calculus.

As a layman, the word "tensor" always intimidated me. As a programmer, I was surprised then when I found out that a tensor is just a multi-dimensional array (where the number of dimensions can be as small as 0). That was a concept I was already quite comfortable with.

That's a bit like saying a vector is 'a row of numbers'. Not incorrect, but missing the point. What matters is what vectors do. It's the properties like pointwise addition, scalar multiplication, and existence of an inverse that make vectors vectors.

You're confusing a tensor with its representation. Tensors are objects which must obey a certain set of rules. (Which rules depends on whether you're talking to a mathematician or a physicist.)

That’s not really what a tensor is; this simplification is due to tensorflow I think?

I always just thought of it as a thing that is indexable.

It's a nice article - you focus on matrices as a kind of operator that takes a vector as input and produces another vector. This is one side of the coin.

The other interpretation is that matrices are functions that take two arguments (a row vector and a column vector) and produce a real number. IMO this interpretation opens the door to deeper mathematics. It links in to the idea that a column vector is a functional on a row vector (and vice versa), giving you the notion of dual space, and ultimately leading on to differential forms. It also makes tensor analysis much more natural in general.

If you're going to attempt a definition like this you need some more conditions (approaching concepts like linearity, for instance). Otherwise you can have a black box like x^3y^3-u^3v^3 that takes in [u,v] and [x,y]^T and spits out a real number; that's not a matrix-y operation.

That didn’t make any sense to, and I work with matrices every day. Are you trying to describe a dot product?

I'm describing a somewhat unusual way of thinking about vectors, matrices etc. At least, it's unusual from the perspective of someone with an engineering / CS background.

First think about row and column vectors. A row vector and a column vector can be combined via standard matrix multiplication to produce a real number. From that perspective, a row vector is a function that takes a column vector and returns a real number. Similarly, column vectors take row vectors as arguments and produce real numbers.

It turns out that row (column) vectors are the only linear functions on column (row) vectors. This result is known as the Reisz representation theorem. If I give you a linear function on a row vector, you can find a column vector so that computing my function is equivalent to calculating a matrix multiply with your column vector.

Now on to matrices. Matrices take one row vector and one column vector and produce a real number. I can feed a matrix a single argument - the row vector, say - so that it becomes a function that takes one more argument (the column vector) before it returns a real number. Sort of like currying in functional programming. But as we said, the only linear functions that map column vectors into real numbers are row vectors. So by feeding our matrix one row vector, we've produced another row vector. This is the "matrices transform vectors" perspective in the OP's article. But I think the "Matrices are linear functions" perspective is more general and more powerful.

This perspective of vectors, matrices, etc... as functions might seem needlessly convoluted. But I think it's the right way to think about these objects. Tricky concepts like the tensor product and vector space duality become relatively trivial once you come to see all these objects as functions.

I appreciate you trying to explain it, however I believe it would have really helped if you started from the examples where this way of thinking is useful: you mentioned tensor product and vector space duality, however unless someone is already familiar with these concepts (I'm not) then it does seem needlessly convoluted. Are there any practical applications of these concepts that you can describe?

I haven't been involved in abstract math in close to a decade, but I think it's a description of the (general) inner product. So, a generalization of the dot product. The classic dot product is that operation with the identity matrix. My understanding is that using matrices that way is very common in physics.

Any example of some practical use that would make it easier to understand?

>geometrically, all linear maps can be thought of as rotations and scalings.

and reflections.

A reflection is just a scaling with a negative factor.

And projections. Without them you only get linear maps with nonzero determinant.

Shearings cannot be represented in this way.

Yes they can. This follows from singular value decomposition. Let S be the matrix representation of a shear transformation. There exist rotation matrices R, B and a diagonal matrix D such that S = RDC, where C is the transpose of B. D is the matrix representation of a scaling transformation and R, B are the matrix representations of rotation transformations. Since S is a product of rotation and scaling matrices, its corresponding linear transformation is a composition of rotations and scalings.

It would ordinarily be weird to represent shear transformations using rotations and scalings because shear matrices are elementary. But it checks out.

OK, point taken. I considered "scaling" in a less general sense (scalar multiple of the unit matrix), while you want to allow arbitrary diagonal entries. My definition is to my knowledge the common one in linear algebra textbooks because in yours, the feasible maps depend on the chosen basis.

EDIT: To state my point more clearly: in textbooks, "scaling" is the linear map that is induced by the "scalar multiplication" in the definition of the vector space (that is why both terms start with "scal").

Reminds me of the old "hack" to use three shear transformations to rotate an image.

The idea being that a shear is relatively much faster on weaker CPUs, relative to doing a "proper" (reverse mapping) rotation.

A nice write-up can be found here: https://www.ocf.berkeley.edu/~fricke/projects/israel/paeth/r...

Notably though, shearings are very 'rare'. Any pertubation will make a shearing no longer a shearing. At least, if I remember correctly.

Same for the the orthogonal matrices, or the diagonal matrices, or the symmetric matrices, or the unit determinant matrices, or the singular matrices ... They are all sets of Lebesgue measure zero.

Orthogonal, diagonal, symmetric, and unit-determinant matrices are all sub-groups though, which makes them 'more special' then all shearing matrices.

Singular matrices are special in the sense that they keep the matrix monoid from being a group. My category theory isn't strong enough to characterize it, but this probably also has a name.

Edit: I think the singular matrices are the 'kernel' of the right adjoint of the forgetful functor from the category of groups to the category of monoids. Though I must admit a lot of that sentence is my stringing together words I only vaguely know.

They can if you add a dimension to the space. That's one of the reasons 3d graphics use 4d vectors and matrices.

You're talking about translation.

No, I wasn't, but I did confuse the terms. Shear can be done without the extra dimension. Skew transforms require the extra dimension, as does translation.

What do you mean by skew? A perspective transformation (homography)? I'm not sure it's standard terminology.

This is great and a nice mathematical approach to the ideas of matrices. Another great resource is 3blue1brown's essence of linear algebra:


Math is Fun also has a nice writeup that explain matrix multiplication from a real world example of a bakery making pies and tracking costs:


One of the things that always irked me about the term "linear transformation" is it doesn't include affline transformations, which is funny because back in elementary school, you learn that a "linear equation" looks like Mx + b. Of course, the article states the term "linearity" when talking vector spaces (or modules) means linearity in arguments, while the term linear for a child in school means "something like a line on graph paper", and this is yet another example of terminology in the way mathematics is taught, possibly for historical reasons, that leads to even more confusion.

PS. incase you didn't know, affline transformations are not linear:

  f(x) = mx + b =>
  f(x+y) = m(x+y) + b /= mx+b + my+b = f(x) + f(y),
  f(cx) = c m x + b /= c(mx + b) = c f(x)

Their most recent post about kernels is even better than this:


The matrix/function stuff is elementary enough that I understand it intuitively (I suck at math), although it's neat to be reminded that given a enough independent points you can reconstruct the function (this breaks a variety of bad ciphers, sometimes including ciphers that otherwise look strong).

The kernel post actually does some neat stuff with the kernel, which I found more intuitively accessible than (say) what Strang does with nullspaces.

If you're interested in this approach to linear algebra you should read Linear Algebra Done Right by Sheldon Axler.

Or pretty much any other Linear Algebra book.

I guess the distinction (in my mind) is the perspective that Linear Algebra Done Right takes in that they don't focus on matrix representations.

Axler's book has the advantage of skipping determinants in order to provide a more intuitive approach to linear algebra.

I strongly disagree skipping determinants provides a more intuitive approach to linear algebra. I don't know your background, but I'd venture a guess you feel it does because the Laplace expansion formula for computing the determinant[1] feels uninspired and out of place.

The reason determinants are hard to teach (in my opinion) is because a rigorous derivation of their formula isn't possible without first teaching multilinear algebra and constructing the exterior algebra. Once you do those things, the natural geometric interpretation of the determinant basically falls onto your lap. But it's still very useful for e.g. computing eigenvalues and using the characteristic polynomial, so it's taught before that context can be formalized.

Professors shouldn't teach determinants in the context of matrices, at least not at first. That's heavily computation-focused, and the symbol pushing looks really unmotivated and strange to students. Instead they should teach the basis-free definition of determinants (i.e. focus on the linear map, not the matrix transformation representing the linear map for some basis). Then the determinant is "only" the volume of the image of the unit hypercube under the linear transformation, which is where the parallelepiped comes in. If the linear transformation is invertible, the unit hypercube is transformed from an n-dimensional cube into an n-dimensional parallelogram, from which you can geometrically see the way the linear map transforms the entire vector space it's defined over.

3Blue1Brown has a very good video on the geometry underlying the determinant[2]. For a more rigorous presentation which constructs the exterior algebra and derives the determinant formula using the wedge product, Noam Elkies has notes[3][4] for when he teaches Math 55A at Harvard. Incidentally Noam Elkies uses Axler's book, and while he obviously approves of it he's pretty upfront in asserting that the determinant should be taught anyway[5].


1. http://mathb.in/33068

2. https://www.youtube.com/watch?v=Ip3X9LOh2dk

3. http://www.math.harvard.edu/~elkies/M55a.10/p8.pdf

4. http://www.math.harvard.edu/~elkies/M55a.10/p9.pdf

5. http://www.math.harvard.edu/~elkies/M55a.10/index.html

I agree, the way I still see determinant is as the 'volume scaling factor' of a linear transformation.

This means it makes sense that det(A) = 0 means A is non-invertible. It also makes a lot of sense when the jacobian pops up in the multi-dimensional chain rule.

Given the above, and the Cayley–Hamilton theorem, I never really had to know why the determinant was calculated the way it is. The above give enough of an interface to work with it.

It recently occurred to me that if you use that matrices represent linear functions, you don't have to do tedious math to prove that matrix multiplication is associative (that is, (A * B) * C = A * (B * C), which allows us to write A * B * C without brackets, since it doesn't matter how we place the brackets anyway).

For a matrix M, denote f_M(x) = M * x. Then f_{A * B} = f_A(f_B(x)) so that f_{(A * B) * C} = f_{A * B}(f_C(x)) = f_A(f_B(f_C(x))) and also f_{A * (B * C)} = f_A(f_{B * C}(x)) = f_A(f_B(f_C(x))).

So f_{(A * B) * C} = f_{A * (B * C)} = f_A(f_B(f_C(x)))

Conjugate transpose and other adjoints are kinda nuts, they are the other part of the story


Esp the ray tracing/topology relationship is nuts.

Nice! The illustrations + color coding for the vectors are very useful.

Here is a video tutorial that goes through some of the same topics (build up matrix product from the general principle of a linear function with vector inputs): https://www.youtube.com/watch?v=WfrwVMTgrfc

Associated Jupyter notebook here: https://github.com/minireference/noBSLAnotebooks/blob/master...

Good, intuitive introduction to matrices. Next steps could be showing that there are infinitely many different matrix representations of a linear map (different from the polynomials) and they can be used for function spaces, too.

One question that usually pops up that I was confused about till recently: are rank two tensor equivalent to matrices? Answer is no, e.g. see here: https://physics.stackexchange.com/questions/20437/are-matric...


Thanks for the feedback. I go into this in the next post on eigenvectors here: https://www.dhruvonmath.com/2019/02/25/eigenvectors/. I start by discussing basis vectors which I believe is what you’re looking for in your comment.

I just skimmed the article quickly. Are there other ways to learn about matrices? If you don't treat them as linear applications, they are just boring grids of numbers and the matrices multiplication doesn't make any sense.

Which is precisely how they were presented to me at college.

The basic equivalency is fine but what about all other things you can do with matrices but can’t do with functions? For example, what is the equivalent of transpose in functions? How about Eigen values or Gaussian elimination?

Nice article. That's how I learned matrices in high school in Germany. Maybe it's different here in the US, I'll have to take a look at my daughters' textbooks.

They were in the textbook in my high school, but we always skipped that chapter.

Having not taken a linear algebra course in college, does anyone have a recommendation for a book/course to follow?

That would heavily depend on whether you are coming at it from a theoretical math p.o.v. or a more applied p.o.v.

Not that the applied approach should leave out the theory, because theoretical stuff like this article give a great and intuitive understanding of linear algebra. However, the more theoretical treatments should set up things like rings, modules, and even category theory that are much less useful from an applied perspective.

For the theoretical approach I've heard good things about 'linear algebra done right'. I imagine it is less appealing for the applied approach. All I can say is be wary of the 'shut up and calculate' mindset in linear algebra. Getting the ideas behind the concepts is essentially a shortcut to understanding linear algebra without any downsides.

Gilbert Strang uses similar approach on his Linear Algebra lectures. Much more intuitive

The next relearning step is to construct the category where arrows are matrices...

> The next relearning step is to construct the category where arrows are matrices...

Why not the category of vector spaces (morphisms are linear maps)?

So yes, this is equivalent to FinVect of the field of which entries in the matrices consist.

The difference is that here you construct the category from a simpler premise. To construct FinVect you need to include all set objects with structure satisfying some axioms.

The category of matrices is simply positive integers with as morphisms n x m matrices between the two integers. Composition is matrix multiplication.

Here [1] is a nice overview. If you can follow what is going on there, it is worth while looking at II, III and IV.

[1] https://unapologetic.wordpress.com/2008/06/02/the-category-o...

Isn't that the same?

I suppose that technically, the 'arrows are matrices' definition rules out infinite dimensional vector spaces, but I'd guess that OP meant to include them.

An argument against would be to keep to a small category.

Is it Vect?

this was an important result in the linear algebra class for first year math/cs/eng students at my university.

What could possibly be a more basic understanding of a matrix in mathematics? There’s a reason the first teach you Linear Algebra before anything else.

Who is "we" in this context?

Hey! I wrote this article - “we” is referring to people who had a similar educational experience to me. I was introduced to matrices as a tool for solutions to systems of equations. I always wish I was taught the functional perspective from the beginning.

me for example. My teachers in college did a real shit job at teaching this subject.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact