Hacker News new | comments | show | ask | jobs | submit login
Linear algebra tutorial in four pages (minireference.com)
256 points by ivan_ah on Dec 10, 2013 | hide | past | web | favorite | 111 comments

A few minor mistakes (perhaps just in my eyes), but overall pretty good.

The hardest part about teaching linear algebra is that nobody explains the big picture. I teach mathematics and computer science and regularly tutor linear algebra students, and I encounter students all the time who ask me "What are vectors good for? I thought all we cared about were matrices and doing RREF and stuff."

For this reason, I deemphasize computations and emphasize the connection between linear maps and matrices. It can be summed up as follows: if you fix a basis, every linear map is a matrix and every matrix is a linear map, and operations on functions (like composition, inversion, whatever) correspond to operations on the corresponding matrices.

It's definitely not an analogy or anything in "scare quotes" that would imply something different is going on behind the scenes. It's exactly the same thing.

Other questions usually left out of discussions about linear algebra (and these notes): what are orthogonal vectors good for? Why would we ever want a basis besides the standard basis in real Euclidean space? Is Euclidean space the only vector space out there? Do vectors have to be lists of numbers?

> What are orthogonal vectors good for?

Any set of n linearly independent vectors B_a={\vec{a}_i}_{i=1..n} in a vector space can be used as a coordinate system, but the "computational cost" of finding the coordinates w.r.t. the basis B_a will be annoying. Each time you want to find the coordinate of a vector you have solve a system of linear equations.

A basis consisting of orthogonal vectors {\vec{e}_i}_{i=1..n} is way cooler because you can calculate the coefficients of any vector using the formula $v_i = (\vec{v} · \vec{e}_i)/||\vec{e}_i||²

Of course the best thing is to have an orthonormal basis B_s={\hat{e}_i}_{i=1..n}, so that the coefficients of a vector w.r.t B_s can be calculated simply v_i = \vec{v} · \hat{e}_i. A projection onto the \hat{e}_i subspace.

> Why would we ever want a basis besides the standard basis in real Euclidean space?

Hmm... I'm thinking eigenbases? The operations of a matrix A for vectors expressed in terms of its eigenbasis is just a scaling, i.e. {B_e}_[A]_{B_e} = Q^{-1} {B_s}_[A]_{B_s} Q = diagonal matrix = Λ.

> Is Euclidean space the only vector space out there? > Do vectors have to be lists of numbers?

Good point. If I bump this to 5-6 pages, I will try to include something about generalized vector spaces. Orthogonal polynomials could be a good one to cover.

Computer graphics is full of fun topics that answer these questions in accessible and visible/tangible ways!

Skeletal character rigging in particular motivates a nice understanding of orthonormal bases and why a basis other than the identity matrix is very useful.

Even simple 2d conversions between screen space & pixel space, for example, can be useful motivational examples- its the thing your web browser did to render what you're reading. :)

Not that anyone would want to cover those topics in a 4 page linear algebra primer, but maybe there is some inspiration there for ways to include pictures that explain, instead of more math notation... ;)

The last 2 questions there are interesting and worthwhile topics for the interested student, but I'd say, just my $0.02, don't ruin a good thing by stuffing too much into it...

> Each time you want to find the coordinate of a vector you have solve a system of linear equations.

This is wrong, unless I'm misreading you. The first time you want the coordinates of a vector in a nonstandard basis you need to construct the matrix that does the change of basis, which requires inverting a matrix (same cost as solving a LSE), but after that, it's just plain matrix-vector multiplication.

I knew the answers to these questions :) I'm just saying they belong in a discussion of the topics. Maybe I just like to take math slower than a 4-page summary.

maybe bump it to two different handouts?

This is my favorite introductory textbook on linear algebra which goes to great lengths to avoid matrix and determinant computations:


I read Axler's book quite recently, after pouring through reviews of linear algebra books. I did like it quite a lot -- it has a very clean, approachable narrative style. Having completed it, though, I feel there's maybe not enough operational knowledge gained from it. Meaning, it explains the theory, the motivations, well, but one might augment it with Schaum's[1], or some such guided exercises.


Thanks, i have been doing a lot of machine learning hacks lately and almost everything involves "vector spaces", 249 pages should be fun, just finishing up "Think Bayes" by Alan Downey.

Axler is not a text in applied linear algebra, it is a book you should read after you are already familiar with both basic abstract algebra and basic linear algebra (i.e. you know how to multiply matrices etc.), it then provides a nice motivation for linear algebra as a subfield of abstract algebra. You may still, however, not find the ideas of determinant, trace and other practical concepts very intuitive after reading Axler.

Basic abstract algebra? Definitely not needed to use that book to its full potential.

> Why would we ever want a basis besides the standard basis in real Euclidean space?

It think it can be explained from a SVD pointview, the standard basis may not likely be optimal one, and didn't capture the most variance. The basis with larger variance, will presumably contains more information.

After obtaining the good basis, we can approximate the original matrix using only the ones(often much smaller in number than standard basis) with larger variances. This is useful to find a shorter representation, yet preserves much of the distance information.

I think this post explain the intuition pretty clear:


Wow, I just read that entire paper. Thank you for linking, it was super helpful!

Couldn't agree more! "Matrix is a function" is a super useful analogy, especially for us programmers, who deal with functions every day.

Well, it's not an analogy! There's a 1-1 correspondence between matrices and linear maps over a vector space with a fixed basis. "A matrix represents a function" is no more an analogy than "The symbols 'V', '5', and '五' represent the number five" is an analogy.

It's more like matrices are a useful way of representing certain functions, just like tally marks are a more useful way of representing numbers when you're counting the number of guests as they arrive but arabic numerals are more useful when you're doing arithmetic. Because these functions — linear maps — have a special structure to them, we can define matrix "multiplication" in a way that corresponds to the composition of linear maps.

In other words, it's not a happy accident. We invented matrices as a way to reason about and compute with linear maps more effectively. For example, "diagonal matrices" and "upper-triangular matrices" (among others) play important roles throughout linear algebra, but without the matrix representation of linear maps it'd be hard to talk about these things.

Here's the man you have to thank for putting all this together: http://en.wikipedia.org/wiki/Arthur_Cayley

And matrix mult ~= function composition. I had a long ahaa moment reading this.

That's a good idea. Perhaps when teaching people familiar with SVG, you can start by showing them the transform matrix syntax and explaining what it does. https://developer.mozilla.org/en-US/docs/Web/SVG/Attribute/t...

So... are you going to answer them?

Gilbert Strang explains clearly why we need change of basis with an example of image compression here, http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-...

How about "Is Euclidean space the only vector space out there?" And "Do vectors have to be lists of numbers?"

Perhaps with some examples?

> "Do vectors have to be lists of numbers?"

No. They just have to satisfy the vectors axioms (greatly simplifying, you can add vectors together and multiply them by scalars , and they behave as you would expect - associative, commutative, distributive, etc as appropriate.)

So, in this regard, functions can be vectors, if you define addition to be pointwise addition. For example, consider the set of continuous functions on the unit interval [0, 1]. Add two together and you get another continuous function on the unit interval, multiply by a scalar and you get another continuous function on the unit interval, etc.

It turns out that, if you restrict to spaces of finite dimension, then Euclidean space is the only vector space out there, up to your choice of labelling the basis elements and up to a certain linear transformation.

In this sense, vectors can always be written as lists of numbers, but no, they don't have to be this in general. For example, you could have a vector space of smooth functions. It's infinite dimensional, and its elements (its vectors) are functions, not tuples of numbers.

You must also add the condition that your scalars are real numbers. There are many finite dimensional vector spaces that are nothing like Euclidean space: any finite extension of a finite field, for example. This comes from Galois theory, but is not just abstract nonsense. One application to computing is if your scalars are only 0 or 1 and all operations are done mod 2, then you have a framework for doing error correcting codes, among other things. We call this set of scalars either the finite field of size 2 or the Galois field of size 2, aka GF(2).

Yes, what I should have said is: if k is an algebraically closed field then n-dimensional vector spaces over k are isomorphic to k^n. Unless I've had too much to drink this evening.

It's a little surprising, in a "no-bullshit" discussion of "theoretical and computational aspects of linear algebra," to see matrix inversion touted as the way to solve linear equations. The guide literally introduces the examples by saying "Dude, enough with the theory talk, let's see some calculations." Yet standard numerical practice avoids literal inversion, in favor of factorization methods.

E.g., It is common practice to write the form A^{-1}b for economy of notation in mathematical formulas... The trouble is that a reader unfamiliar with numerical computation might assume that we actually compute A^{-1}... On most computers it is always more effective to calculate A^{-1}b by solving the linear system Ax = b using matrix factorization methods... (Dennis & Schnabel, "Numerical Methods for Unconstrained Optimization and Nonlinear Equations", section 3.2).

E.g., As a final example we show how to avoid the pitfall of explicit inverse computation... The point of this example is to stress that when a matrix inverse is encountered in a formula, we must think in terms of solving equations rather than in terms of explicit inverse formation. (Golub and Van Loan, "Matrix Computations", section 3.4.11).

Yes, one almost never inverts a matrix using a computer, and one never inverts a matrix by hand. What does that teach you? I was pretty surprised to see trivialities like this in a four-page summary. But, unfortunately, this is just a poor summary. It also barely covers the spectral decomposition, and has no mention at all of the singular-value decomposition.

we do matrix inversion by hand in mathematics :D

As someone with a math degree, I love this. However, I think the author over-estimates the familiarity of a typical high school student with mathematical notation:

> The only prerequisite for this tutorial is a basic understanding of high school math concepts

I think fundamentally the material in these four pages is accessible to many high school graduates, but perhaps not in this concise rendering (which is awesome for me, but probably overwhelming for someone not familiar with set-theoretic notation, summation notation, etc.).

You are right. I am totally trying to sneak in the \in and the \mathbb{R}'s in there to see if people will notice.

Previously I had \forall in there too, but went over and removed things. On the TODO list is to introduce the { obj | obj description } notation for defining sets (in this case vector spaces).

Well, I noticed! It stood out pretty clearly, I'd say.

I really don't think that the intended audience of this sort of ground-up summary is going to be comfortable with \mathbb{R}^{m\times n} notation all over the place. Yes, you explain what the notation means, but there's zero chance that a reader who didn't already know that notation will become fluent in it right away: they're going to be flipping back to the definition every time. That might be okay if you're trying to teach all of math, but a person reading "Linear Algebra in Four Pages" presumably neither wants nor needs that full skill set. (I'd say that {obj|obj description} notation would make the text even more opaque to novices, no matter how cleanly introduced.)

Why not just make a less precise reference to a table of "numbers" (which, as an added bonus, would leave the door open to complex matrices without having to start from scratch)?

Twist: This is a great argument to teach notation earlier !

A little observation: You repeatedly use language that fails to distinguish definitions & properties vs. effective computational methods.

For example, in section G:

> To find the eigenvalue of a matrix we start from the eigenvalue equation ....

Solving the resulting equation is one way of computing eigenvalues. But it might not be the one you want to use in some practical situation.

Just before that, in section F:

> The determinant of a matrix, ... serves to check if a matrix is invertible or not.

It is true that a square matrix is invertible iff is has nonzero determinant. It certainly is not true that, for a matrix of any size, computing the determinant is a good method for checking whether a matrix is invertible.

> Solving the resulting equation is one way of computing eigenvalues. But it might not be the one you want to use in some practical situation.

Related fun fact: Solving the eigenvalue equation is (in general) impossible in closed form for matrices 5x5 or larger, because there's no closed form solution to the quintic. So some other algorithm for finding eigenvectors and eigenvalues is often absolutely required in practice.

Well, that depends on whether you want the exact eigenvalues.

> It certainly is not true that, for a matrix of any size, computing the determinant is a good method for checking whether a matrix is invertible.

Can you explain this part? I don't understand what you mean, and I have an Advanced Linear Algebra final in a few days, so I should probably know this stuff.

I thought the determinant was not defined for non-square matrices. Also, I thought that inverses only existed for square matrices (implying that the determinant check is a good way of testing invertability).

It's a valid way to test whether a (square) matrix has an inverse, but computing big determinants is a lot of effort (both from a "student work" perspective and from a "computational cost" perspective). The point is that in many cases, there are enormously more efficient ways to find an inverse (or, more generally, to solve matrix problems with inverses in them).

Oh, okay. Thanks for the clarification.

Good luck to anyone who has a linear algebra exam coming up!

I also have a similar short tutorial for Newtonian mechanics here: http://cnd.mcgill.ca/~ivan/miniref/mech_in_7_pages.pdf

LinAlg 2 coming up in two days. Its a fun course!

I really enjoyed the Gilbert Strang videos on linear algebra back when I was taking the course at McGill: http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-...

He has an exceedingly dry sense of humor. It's quite nice, when you're 40 minutes into an involved lecture. The series is indeed worth the watch.

Yes, very entertaining and enlightening (even if you already know the subject). Love Strang's teaching style.

I agree, his lecture deriving the determinant from a few desirable properties is a classic in my mind.

Just to throw this out there in the hope it is useful, I have been using this book as a review of linear algebra:



This book is awesome! The entire text is interspersed with collapsable examples and exercises.

Roger Horn is one of the best linear algebra and matrix guys around, and I was, by wide margins, the star student in his class, effortlessly.

For the notes, I found serious problems in just the first half of the first column on the first page.

F'get about the four pages.

If enough people want an outline in a few pages, then I'll consider knocking one out and putting up somewhere as a PDF file.

I'd be happy to see three serious problems described in the first column of the first page.

I'll take a crack at being as pedantic as possible. I'm just doing this to play along.

1) "Linear algebra is the math of vectors and matrices" - I think this would be better phrased as "the math of vectors and linear transformations." Matrices are a convenient way to represent finite dimensional linear transformations, but I think it's putting the cart before the horse to say that linear algebra is the math of matrices. This is a minor problem, though.

2) "A vector ⃗v ∈ R^n is an array of n real numbers" - In general vectors are not arrays of numbers, real or otherwise. This kind of definition will run into problems when you get into Hilbert/Banach spaces.

3) "the function f(x) = ln(x) has the inverse f^−1(x) = e^x" - this is only true for x > 0 if we're talking about the real-valued logarithm function, since it's not defined for x <= 0.

Thank you, that is awesome. So are all the problems ones of a pedantic nature? For example I understood the inverse function description as a definition even though there was a range (which I also knew) for which it was invalid.

Generally I consider those sorts of things to be the difference between writing 'proofs' versus explaining concepts.

Yeah, I wouldn't really consider any of those things to be "serious" problems, at least for the level that this guide is pitched at.

> 2) "A vector v⃗ ∈ R^n is an array of n real numbers" - In general vectors are not arrays of numbers, real or otherwise. This kind of definition will run into problems when you get into Hilbert/Banach spaces.

Surely this is ok if it is parsed as "(a vector in R^n) is an array of n real numbers".

That's better, but still not entirely correct. An array of real numbers is not itself a vector. It is a representation of a vector in a particular basis. This is a technical point but very important to grok, especially when starting to look at vector spaces other than F^n.

> An array of real numbers is not itself a vector. It is a representation of a vector in a particular basis.

No, such an array can be either a vector or the representation (which also has to be a vector) you mentioned.

To make such things make sense, need a definition of a vector space. It can be tempting to omit that definition from an outline, but the definition is just crucial, e.g., when talking about subspaces, which is often crucial. E.g., for m x n matrix A and n x 1 'vector' x, the set of all x such that Ax = 0 is a subspace, but to know this need the definition of a vector space.

(1) In R^n, none of n, R, or R^n has been defined.

Better: Let n be a positive integer, and let R be the set of real numbers. Then R^n is the set of all n-tuples over the real numbers. Such an n-tuple is a list of n numbers; e.g., (1, 3, 2.6, 1, 7) is an n-tuple for n = 5.

For positive integers m and n, an m x n (read "m by n") (real) 'matrix' as a rectangular array of real numbers with m rows and n columns. Typically for integer i = 1, 2, ..., m and integer j = 1, 2, ..., n, in m x n matrix A = [a_{ij}] a_{ij} is the number, 'component', in row i and column j. An example of a 2 x 3 matrix is

     1 9  4
     4 0 -1
(typically surrounded with square brackets not easy to type without, say, TeX).

Typically an n-tuple x in R^n is regarded as an n x 1 matrix. In this way, with m x n matrix A, n x 1 vector b, and matrix multiplication (we have yet to define) Ax, we can write Ax = b as a system of m linear equations in n unknowns x, and this is the most important foundation and application of linear algebra.

(2) Similarly for positive integer n and the set of complex numbers C, C^n is the set of all n-tuples of complex numbers. And, we can have an m x n matrix of complex numbers.

Once get very far in linear algebra, complex numbers become essential if only from the fundamental theorem of algebra (that is, from factoring a polynomial). In particular, eigenvalues are important, and they are often complex numbers.

(3) R^n and C^n are examples of 'vector spaces', but there are also many other examples.

For a 'vector space', we have a non-empty set V of 'vectors' and a 'field' F. Note: A 'field' is defined carefully in courses in abstract algebra, but the two leading examples in practice are R and C. Most commonly F = R or F = C, but there is also interest in other fields, e.g., the integers modulo a prime number. Here we assume that F = R or F = C.

On V we have an operation 'addition' denoted by +. So, for vectors x in V and y in V, there is a vector x + y in V. Addition is commutative so that x + y = y + x. Vector addition is also associative, that is, for x, y, z in V,

(x + y) + z = x + (y + z)

Also there is a vector 0 in V so that for any x in V we have 0 + x = x.

Also for x in V, there is -x in V so that x + (-x) = x - x = 0.

Or set V with operation + forms a commutative group (defined carefully in courses in abstract algebra).

On V with F we have an operation 'multiplication'. Then for x in V and a in F, with this 'multiplication' the 'product' ax in V.

For x, y in V and a, b in F, we have two distributive laws, (a + b)x = ax + bx and a(x + y) = ax + ay. Also, for 0 in F, 0x is the vector 0. For the vector 0, a0 is also the vector 0.

We have that (-1)x = -x.

For some examples of vector spaces, (A) let F = R and let V be the set of all monophonic audio signals exactly 10 seconds long, (B) let F = R and let V be the set of all real valued random variables X where E[X^2] is finite, and (C) let F = R and for m x n matrix A and n x 1 vector x and m x 1 vector b, let V be the set of all x such so that Ax = b (so far the matrix product Ax has not been defined, but it will be defined so that Ax = b is a system of m linear equations in n unknowns, the components of x).

(3) For set S, x in S means that x is an 'element' of S.

(4) The notation (R, R, R) = R^3 is neither good nor standard.

The notation R^3 = R X R X R as the set theory Cartesian product is standard.

Actually, we already know just what (R, R, R) is -- a 3-tuple and not a set of all 3-tuples of real numbers.

(5) R^{m x n} is questionable notation and, anyway, needs a definition.

(6) For the "power" of that mathematics, might illustrate that with some applications. One of the most important is solving systems of linear equations. For more, a matrix can describe a linear transformation, and linearity is a powerful assumption that commonly holds with good accuracy in practice. So, vectors and matricies become powerful means of working with linear systems where a matrix describes the system and the inputs and outputs of that system are vectors.

From G. F. Simmons, the two pillars of analysis in mathematics are continuity and linearity, and linear algebra emphsizes both, especially linearity.

More examples of linearity include both differentiation and integration in calculus and the Fourier transform.

(6) The definition of inverse function is wrong. That definition holds only when the given function f is 1-1. Even if f is not 1-1, the inverse function is still useful, but its values can be a set of values, that is, the inverse function is multiply valued. E.g., just take f(x) = x^2. Then the f^{-1}(1) = {-1, 1}.

And modest too.

Horn wrote about my work, "Best performance in the course throughout the semester. Knows this material cold."

It was just true. Why? It was an 'advanced' course in linear algebra and matrix theory, and I'd never had even a first course in the subject.

But long before the course, in total I'd done much more than just such a course.

I had had two good courses in abstract algebra, each of which had provided a good foundation for linear algebra and also touched on the subject. I had taken a reading course in theoretical physics based heavily on matrix theory. And I had written my undergraduate honors paper on group representation theory, essentially in linear algebra.

I'd worked carefully through two of the best books on linear algebra, one by E. Nearing and the other by P. Halmos. The Halmos book is famous as the crown jewel of books on linear algebra. Why? It can be good to look at linear algebra as a baby version of Hilbert space theory, heavily from von Neumann, and Halmos wrote his book when he was an assistant to von Neumann at the Institute for Advanced Study. So, deliberately, the Halmos book is an introduction to Hilbert space theory. It also has the profound perspective of one of the best mathematicians of all time and the elegance of thought of von Neumann.

For applications, in my career and in my own study, I'd worked through stacks of books and papers on applications in numerical linear algebra, multivariate statistics, optimization, curve fitting, signal processing, especially around the fast Fourier transform, differential equations, advanced calculus, deterministic optimal control theory, exterior algebra, etc.

At one point I wrote my own notes on much of linear algebra, with black ink on white paper, 100% cotton I got especially for the notes. My work was very careful although not elegant.

For review, just before the course, I wrote out from memory 80 pages covering about 2/3rds of the course. As Horn wrote, I knew the subject cold, long before I took his course. Horn's homework, tests, and final exam were all challenging for the other students, and I effortlessly blew them away, by large margins on all three. E.g., I never studied for the tests or either the mid-term or final exam. When doing the homework, I nearly never looked at my notes from class and, instead, just wrote out the solutions based on what I already knew.

I didn't belong in the course and had told the department that at the beginning. The department's reaction was to smile and say "Take the course anyway.". So, I did.

For me, the course was a huge waste: From a long commute to campus and taking care of my wife, I was short on time, but the Horn homework had me up nearly all night and then exhausted the next day once a week and, thus, ate into a big fraction of the best time I had each week. Some of my other, more important work suffered. When I did the Horn homework, I was usually just exhausted. I'd get the papers back, graded, in about two weeks, recognize my handwriting, but, from having been so tired, actually had no memory of having done the work. Bummer. It was a trial by fire and exhaustion, not education. I was being asked to 'prove myself', not learn something.

In the program, I only wanted to learn some good material and had no desire at all to 'compete', certainly not in that linear algebra course beneath me. I didn't know I was blowing away the other students until the grader showed me the scores on homework and tests, all except the final exam.

In many ways Horn was a very good teacher, but, in total, he blew away nearly all the students. If I hadn't already known the material coming into the course, then I couldn't have learned it in the course. The way I had learned the material was from some slow, careful, thoughtful study and a lot of work with applications, and the course offered neither.

Would you mind dropping the titles of those books. I think I found the P Halmos (http://www.amazon.com/Algebra-Problem-Dolciani-Mathematical-...), but I'm having some trouble finding the Nearing.

Also, what background do you think is necessary for these texts?

> I'm having some trouble finding the Nearing.

In my first post here, I misspelled the name. As in my first response to your post, the correct spelling is Nering.

> Also, what background do you think is necessary for these texts?

The background to get started in linear algebra is essentially just high school algebra. In college, linear algebra is commonly the next course after calculus.

For more, after linear algebra, commonly there is a course in 'analysis' such as Rudin's 'Principles' in my list. Without a good course, this book would be tough reading.

The applications of linear algebra are so broad that for a good sampling of those applications the prerequisites in total are essentially at least a good undergraduate major in mathematics.

Two of the biggest 'branches' of mathematics are analysis and algebra. Essentially the first course in 'analysis' is calculus.

As in Simmons (in this list), the two most important pillars of analysis are continuity and linearity.

The first place to learn linearity well is 'linear algebra' -- or, much the same thing, matrix theory or finite dimensional vector spaces.

Then the algebraic aspects can be emphasized and can lead to applications in, say, the theory of error correcting codes.

For the analysis aspects, can continue with Hilbert and Banach spaces and more and also various applications.

Much of the mathematics in applied mathematics, science, engineering, and technology is linear algebra.

Here is a list of books relevant to linear algebra and/or to the course I took from Roger Horn. This list also contains maybe 60% of what I'd worked from before Horn's course that let me do well in that course.


Linear Algebra


Horn's course was close to

Roger A. Horn, Charles R. Johnson, 'Matrix Analysis', 0-521-38632-2, Cambridge University Press, 1990.

with also a few topics from

Roger A. Horn, Charles R. Johnson, 'Topics in Matrix Analysis', 0-521-46713-6, Cambridge University Press, 1994.


The course also used for reference and some topics and exercises

Richard Bellman, 'Introduction to Matrix Analysis: Second Edition', McGraw-Hill, New York, 1970.

This book is just packed with little results; at some point can get the impression that the author went on and on ... writing. Bellman was a very bright guy in mathematics, engineering, and medicine.


Relatively easy to read and relatively close to applications, and another book Horn's course used as a reference, is

Ben Noble, 'Applied Linear Algebra', Prentice-Hall, Englewood Cliffs, NJ, 1969.

Some edition of this book may be a good place to start for a student interested in applications now.


About the easiest reading in this list, and my first text on linear algebra, was

D. C. Murdoch, 'Linear Algebra for Undergraduates', John Wiley and Sons, New York, 1957.

This book is awfully old, but what it has are still the basics.


For my undergraduate honors paper I made some use of, and later read carefully nearly all of,

Evar D. Nering, 'Linear Algebra and Matrix Theory', John Wiley and Sons, New York, 1964.

The main part of this book is a relatively solid start, maybe a bit terse and advanced for a first text. The book also has in the back a collection of advanced topics, some of which might be quite good to know at some point and difficult to get elsewhere.

One of the topics in the back is linear programming, and for that I'd recommend something else, e.g., Chv'atal and/or Bazaraa and Jarvis in this list.


Likely the crown jewel of books on linear algebra is

Paul R. Halmos, 'Finite-Dimensional Vector Spaces, Second Edition', D. Van Nostrand Company, Inc., Princeton, New Jersey, 1958.

Halmos wrote this in about 1942 when he was an assistant to von Neumann at the Institute for Advanced Study. The book is intended to be a finite dimensional introduction to Hilbert space theory, or how to do linear algebra using mostly only what also works in Hilbert space. It's likely fair to credit von Neumann with Hilbert space.

The book is elegant.

Apparently at one time Harvard's course Math 55, with a colorful description at,


used this text by Halmos and also, as also in this list, Rudin's 'Principles' and Spivak.


Long highly regarded as a linear algebra text is

Hoffman and Kunze, 'Linear Algebra, Second Edition', Prentice-Hall, Englewood Cliffs, New Jersey, 1971.


Numerical Methods


If want to take numerical computations in linear algebra seriously, then consider the next book or something better if can find it

George E. Forsythe and Cleve B. Moler, 'Computer Solution of Linear Algebraic Systems', Prentice-Hall, Englewood Cliffs, 1967.


Multivariate Statistics

My main start with multivariate statistics was

N. R. Draper and H. Smith, 'Applied Regression Analysis', John Wiley and Sons, New York, 1968.

Apparently later editions of this book remain of interest.


A relatively serious book on 'regression analysis' is

C. Radhakrishna Rao, 'Linear Statistical Inference and Its Applications: Second Edition', ISBN 0-471-70823-2, John Wiley and Sons, New York, 1967.


Three famous, general books on multivariate statistics are

Maurice M. Tatsuoka, 'Multivariate Analysis: Techniques for Educational and Psychological Research', John Wiley and Sons, 1971.

William W. Cooley and Paul R. Lohnes, 'Multivariate Data Analysis', John Wiley and Sons, New York, 1971.

Donald F. Morrison, 'Multivariate Statistical Methods: Second Edition', ISBN 0-07-043186-8, McGraw-Hill, New York, 1976.


Analysis of Variance

A highly regarded first book on analsis of variance and experimental design is

George W. Snedecor and William G. Cochran, 'Statistical Methods, Sixth Edition', ISBN 0-8138-1560-6, The Iowa State University Press, Ames, Iowa, 1971.

and a famous, more mathematical, book is

Henry Scheff'e, 'Analysis of Variance', John Wiley and Sons, New York, 1967.


Linear Optimization

A highly polished book on linear programming is

Vav sek Chv'atal, 'Linear Programming', ISBN 0-7167-1587-2, W. H. Freeman, New York, 1983.


Nicely written and with more emphasis on the important special case of network flows is

Mokhtar S. Bazaraa and John J. Jarvis, 'Linear Programming and Network Flows', ISBN 0-471-06015-1, John Wiley and Sons, New York, 1977.


A grand applied mathematics dessert buffet, based on Banach space and the Hahn-Banach theorem is

David G. Luenberger, 'Optimization by Vector Space Methods', John Wiley and Sons, Inc., New York, 1969.


Mathematical Analysis Relevant to Understanding Linearity


Long the first place a math student gets a fully serious encounter with calculus and closely related topics has been

Walter Rudin, 'Principles of Mathematical Analysis, Third Edition', McGraw-Hill, New York, 1964.

The first chapters of this book do well as an introduction to metric spaces, and that work applies fully to vector spaces.


A nice place to get comfortable doing mathematics in several dimensions is

Wendell H. Fleming, 'Functions of Several Variables', Addison-Wesley, Reading, Massachusetts, 1965.

Some of the material here is also good for optimization.


Another place to get comfortable doing mathematics in several dimensions is

Michael Spivak, {\it Calculus on Manifolds: A Modern Approach to Classical Theorems of Advanced Calculus,\/} W.\ A.\ Benjamin, New York, 1965.\ \


The first half, the 'real' half of the next book has polished introductions to Hilbert and Banach spaces which are some of the most important vector spaces

Walter Rudin, 'Real and Complex Analysis', ISBN 07-054232-5, McGraw-Hill, New York, 1966.


An elegant introduction to how to get comfortable in metric space is

George F. Simmons, 'Introduction to Topology and Modern Analysis', McGraw Hill, New York, 1963.


Ordinary Differential Equations


Linear algebra is important, as some points crucial, for ordinary differential equations, a polished introduction from a world expert is

Earl A. Coddington, 'An Introduction to Ordinary Differential Equations', Prentice-Hall, Englewood Cliffs, NJ, 1961.

Modded down because I found two useless statements in the first two sentences of your post.

I'd appreciate that. My linear algebra is extremely basic (all picked up as needed for an AI course). Any well-rounded ground up resources would be awesome.

I'd be keen to see your outline.

If this post is representative I want to read as little of your writing as possible..

yet here you are posting on an internet forum

Roger Horn is one of the best linear algebra and matrix guys around, and I was, by wide margins, the star student in his class, effortlessly.

That's good, because you clearly wouldn't have been the star student in English class.

As English, there's nothing wrong with what I wrote.

and I was, effortlessly and by a wide margin, the star student in his class.

Adding a request.

Please do.

Please do. I got lost in section B.

If I write a PDF file outline of linear algebra, where can I post it?

Any suggestions?

You could make a Dropbox file public with little effort, or create a Scribd account and post it there. Thank you in advance.


Yes please!


erk, I don't think condensing much information in the smallest place possible is the best way to learn something (or even review it).

I'm all for a no bullshit and quick way to get something. That's why I sometimes check learnXinYminutes.com or some random cheatsheets on Google. But this doesn't make it for me.

Btw, if you really want to get a good grasp on Linear Algebra you should check Gilbert Strang's video courses on MIT OpenCourseWare. They are amazing and soooo easy to understand you don't even need to like mathematics to watch them. I haven't come across a better support to start with Linear Algebra.

Course codes 18.06SC and 18.06 i think if anyone else was looking. Thanks for the pointer, really appreciated!

Got excited for a moment...then did some prerequisites digging and ended up with a prerequisites dependency chain:

18.06 -> 18.02 -> 18.01

Where 18.02 = Multi Variable Calc 18-01 = Single Variable Calc

Considering that my whole point of learning linear algebra was to clear it as a roadblock for Machine Learning, this is what my whole Dependency chain looks like:

Machine Learning -> Linear Algebra -> Multivariable Calc -> Single Variable Calc -> high school algebra and trigonometry.

I have a feeling I'll end up sticking to being a web developer :)

Calculus gets a bad rap for being difficult, but if you're learning on your own, you can just focus on the ideas and not on the arduous computation (which something like maxima can do for you). The core ideas of calculus can probably be learned in a week. Review all the trig on Khan Academy, then try watching some calculus lectures, focusing on the big ideas, not memorizing rules for computing derivatives or integrals.

BTW, you probably don't need much calculus to learn most of the linear algebra you need; those requirements are mostly there for mathematical maturity, plus then being able to assign more interesting exercises.

Gilbert Strang made a great series of lectures on the big ideas of calculus sans most of the computation - http://www.youtube.com/playlist?list=PLBE9407EA64E2C318

You don't really need those calc courses to learn linear algebra. It's more about "mathematical maturity": familiarity with vectors, functions, inverse functions, a general intuition about multidimensional coordinates, and so on. A programming background would probably go a long way (although I don't speak from experience; I learned a lot of math before I started programming).

It's hard to overstate the importance of linear algebra in software. It's really worth learning.

I think the calc course prerequisites have an even simpler explanation: mathematicians don't usually manage to mention the existence of vectors until the multivariable calculus class. If you're already familiar with that idea, you shouldn't need to know anything about partial derivatives and surface integrals to understand linear algebra!

Don't look at the prerequisites and dig right in. I don't think you'll have any problems. Linear Algebra is something that kinda stand on its own.

> this is what my whole Dependency chain looks like

You should not think like that when learning. If you want to be efficient just take shortcuts.

Learn machine learning (btw there are great courses on coursera or Stanford about that), and if you're stuck on something just check on wikipedia or another resource.

Why do linear algebra teaching materials never mention what a determinant is?

(It's the product of the eigenvalues.)

> It's the product of the eigenvalues.

It sure is (provided you count multiplicities correctly), but that's not the one-sentence explanation I would have gone with, even to someone with much more intuition about eigenvalues than you'd get from this document. I would say the determinant is the volume scale factor (or hypervolume scale factor, in the general case).

Actually, what's really interesting to me is your general point: why do <teaching materials for mathematics subject X> never mention <the key insights that made subject X clear to me>? And why does this question not get asked more often? The nearest I've been able to come to a plausible answer is a mixture of (i) the key insight varies from person to person more than you'd think, combined with the closely related (ii) you can't teach key insights just by saying a few words.

Thank you for your comment! First off the explanation regarding det(A) = the volume scaling of the transformation T_A associated with the matrix A. I've been stuck for over a week now trying to word this any other way possible because, in the current ordering of the sections, I'm covering determinants before linear transformations. Perhaps there is no better once sentence than talking about the volume and I should reorder the sections...

You've raised a very important point regarding "routes to concepts" which really should be asked more often!

> (ii) you can't teach key insights just by saying a few words.

Generally true, though we have to say that the key difficulty in communicating insights is lacking prerequisites. Therefore, if you think very carefully about the prerequisite structure (i.e. model of the reader's previous knowledge) you can do lot's of interesting stuff in very few words.

> (i) the key insight varies from person to person more than you'd think,

Let G=(V,E) where V is the set of math concepts, and E are the links between them. Then there are as many ways to click on a concept x, as there there are in-links for x! In this case we have at least three routes:

  Route 1: geometric
   lin. trans T_A = {matrix:A, B_in:{e1,..,en}, B_out:{f1,..,fn}} 
   ---> what happens to a unit cube 1x1x1,
        represented  (1,...1) w.r.t. B_in,
        after going through T_A?
        ---> output will be (1,..,1) w.r.t. B_out
             ---> observe that cols of A are f1..fn
                     det(A) = (hyper)volume of (hyper)parallelepiped 
                              formed by vectors f1..fn
   Route 2: computational
    det(A) = { formula for 2x2, formula for 3x3, ... }
    ---> a easy test for invertibility of an n by n matrix
    sidenote: see Cramer's rule for another application of dets
   Route 3: algebraic 
    given a 2x2 matrix A, 
    find a coefficients A T D that satisfy the following matrix equation:
         B*A^2  + T*A  + D   = 0,
    the linear term is the trace T = sum_i a_ii, the constant term is D = det(A).
    sidenote:  B*λ^2 + T*λ + D = {characteristic poly. of A} = det(A-λI)

So perhaps the task of teaching math is not so much to try to find "the right" or "the best" route to a concept, but to collect many explanations of routes (edges in G), then come up with a coherent narrative that covers as many edges as possible (while respecting the partial-sorted-order of prerequisites ).

Uh, so you think this varies significantly from person to person? I don't know about you, but I think

> The determinant of a matrix is the product of its eigenvalues, and calculating it allows you to check if the matrix is invertible or not.

is a heck of a lot better than

> The determinant of a matrix is "a special way" to combine the entries of a matrix that serves to check if a matrix is invertible or not.

to pretty much any person.

I love a lesson that doesn't include ANY real world examples. What's the purpose of this document? Does it accomplish that purpose?

I'll have to add some examples, yes >> TODO.txt

purpose = to give a quick intro to the subject (the most important idea at least, namely, that matrix-vector product can represent linear transformations).

Suppose that you are driving a car and you are hit with a big tree. The main eigenvector is in the direction from the car to the tree, the eigenvalue measures the deformation of the car produced by this accident. If the car is a half shorter after the accident then the eigenvalue is 1/2.

Wow amazing stuff here, I agree with you that most math textbooks in our education system fail to explain concepts which are supposed to be very simple. Thank you so much for making things simpler. I'm having MATH 270 final next week at McGill, this is going to help a lot in studying :)

No pseudo-inverse, no singular value decomposition, no least squares regression, no k-nearest neighbors and I'm sure I'm leaving many other things, all fundamental algebra very much needed for understanding and developing of computer science (e.g. machine learning, data storage and compression), telecommunications (e.g. queueing theory, multimedia coding and streaming) and many other fields related to engineering. I lovehate maths (I really struggle with them), but I honestly think our linear algebra book back in grad school was 600 pages for a good reason, you just can't do proper engineering without it. (Also I have another really thick book on the most important numerical methods for implementing that algebra on a computer, so, come on! :-))

This is really basic. Noone(who uses the subject) should need a cheat sheet for this... Also there are really good books out there and linalg is a must nowdays. As for textbook: http://www.amazon.com/Linear-Algebra-Right-Undergraduate-Mat... As for reference: http://www.amazon.com/Matrix-Analysis-Roger-Horn/dp/05213863...

Suppose that you want to project a vector of data (x1,x2,x3,...,xn) into the one dimensional subspace generated by the vector (1,1,...,1). What's the projection in this case?

Hint: You obtain the most important concept of statistics.

I wish I had this in grad school when I had forgotten all my undergrad Linear Algebra. It's not a very hard or deep aspect of math if you understand the fundamentals, so this is very useful.

Courses would benefit from "quick-reference guides like the end of Dror Bar Natan's paper on Khovanov homology http://arxiv.org/abs/math/0201043 He says "It can fit inside your wallet."

See: http://www.math.toronto.edu/drorbn/Talks/HUJI-011104/cube.gi...

Very cool, I like title of the associated book "No Bullshit Guide To Linear Algebra" too.

Does anyone know of a nice, short summary of discrete mathematics to go along with this?

Studying at a university where all this and more is part of the first year education of computer scientists, I - probably foolishly - assumed basic linear algebra was common knowledge in the community. Now my interest is piqued: Which educational path / career path did you take ending up in IT and (more specifically) how much mathematical education did it include?

What is your educational background and which route

I bought this guy's previous book and the print quality was crap. I hope he improves this on his next book.

In page 3, section B. Using elementary matrices: To remember what matrix correspond to a row operation just apply that operation to the identity matrix and you obtain the elementary associated matrix.

Here is a 1914 book on calculus. Calculus made easy by Thompson Silvanus Philip


thanks for the post, i've been trying to learn linear algebra for a long time but keep getting stuck / bored.

For me what makes it really interesting is the realization that all of linear regression (think y = mx + b) is based on linear algebra, specifically the notion that figuring out a best-fit line (in the case of a single-dimensional input variable) is projection of the N-dimensional vector space of observations (where you have N data points) onto the best-fitting 2 dimensional vector spaces (assuming you're fitting a slope and an intercept).

When thought of this way, a lot of linear algebra has geometric interpretations and for me this makes it a lot less abstract.

Nearly everything in linear algebra has important geometric interpretations.

In addition, a lot that works in the 3-space we live in (for the non-relativistic approximation) generalizes to linear algebra, vector spaces more generally, and, in particular, both Hilbert space and Banach space.

Yeah, dergachev is pointing to one of the cool direct applications of linear algebra to machine learning.

Suppose you are given the data D = (r_1; r_2; r_3) where each row r_i is an n-vector r_i=(a_i1, a_i2, ..., a_in, b_i). Each row consists of some observation data. We want to predict future b_j given the future \vec{a}_j, given that we have seen {r_i}_{i=1...N} (The data set consists of N data rows r_i where both \vec{a}_i and b_i are known).

One simple model for b_j given \vec{a}_i = \vec{a}_i = (a_i1, a_i2, ..., a_in) is a linear model with $n$ parameters m_1,m_2,...,m_n:

   y_m(x_1,...x_n) =  m_1x_1 + m_2x_2 + ... + m_nx^n  =  \vec{m} · \vec{x}
If the model is good then y_m(\vec{a}_i) approximates b_i well.

But startup wisdom dictates that we should measure everything!

Enter error term:

   e_i(\vec{m}) = |y_m(\vec{a}_i)   - b_i|²,
the squared absolute-value of the difference between the model's prediction and the actual output---hence the name error term.

Our goal is to make the sum $S$ of all the error terms as small as possible.

   S(\vec{m}) = \sum_{i=1}^{i=M} e_i(\vec{m}) 
Note that the "total squared error" is a function of the model parameters \vec{m}.

At this point we have reached a level of complexity that becomes difficult to follow. Linear algebra to the rescue!

We can express the "vector prediction" of the model y_m in "one shot" in terms of the following matrix equation:

  A\vec{m} = \vec{b}

  where A is an N by n matrix  (contains the a_:: part of the data)
  \vec{m} is an n by 1 vector  (model parameters---the unknown)
  \vec{b} is an N by 1 vector  (contains the b_: part of the data)

To find \vec{m}, we must solve this matrix equation, however A is not a square matrix: A is a tall skinny matrix N >> n, so there is no A^{-1}.

Okay so we don't have a A^{-1} to throw at the equation A\vec{m}=\vec{b} to cancel the A, but what else could we throw at it. Let's throw A^T at it!

     A^T A \vec{m}  =  A^T \vec{b}
     \   /
       N   (an n by n matrix) 

       N \vec{m}  = A^T \vec{b}
Now the thing to observe is that if N is invertible, then we can find \vec{m} using

        \vec{m}  = N^{-1} A^T \vec{b}

This solution to the problem is known as the "least squares fit" solution (i.e. choice of parameters for the model m_1, m_2, ..., m_n). This name comes from the fact that the vector \vec{m} is equal to the output of the following optimization problem

    \vec{m}   =  minimize S(\vec{m}) over all \vec{m} 
Proof: http://en.wikipedia.org/wiki/Linear_least_squares_(mathemati...

Technical detail: the matrix N=A^T*A is invertible if and only if the columns of A are linearly independent.

TL;DR. When you have to do a "linear regression" model of data matrix X and labels \vec{y}, the best (in the sense of least squared error) linear model is \vec{m} = (X^T X)^{-1} X^T \vec{y}.

That literally made more sense than anything I've learned this semester. :)

Can you please post the tex associated with the pdf? Thanks in advance.

Does anyone have a recommendation for an equally good and concise guide for Discrete Mathematics by any chance?

Literally just finished this exam. Too bad I didn't see this summary 4 hours ago!

Linear algebra class in one abbreviation: LAPACK.

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact