
Linear algebra tutorial in four pages - ivan_ah
http://minireference.com/blog/linear-algebra-tutorial/
======
j2kun
A few minor mistakes (perhaps just in my eyes), but overall pretty good.

The hardest part about teaching linear algebra is that nobody explains the big
picture. I teach mathematics and computer science and regularly tutor linear
algebra students, and I encounter students all the time who ask me "What are
vectors good for? I thought all we cared about were matrices and doing RREF
and stuff."

For this reason, I deemphasize computations and emphasize the connection
between linear maps and matrices. It can be summed up as follows: if you fix a
basis, every linear map is a matrix and every matrix is a linear map, and
operations on functions (like composition, inversion, whatever) correspond to
operations on the corresponding matrices.

It's definitely not an analogy or anything in "scare quotes" that would imply
something different is going on behind the scenes. It's exactly the same
thing.

Other questions usually left out of discussions about linear algebra (and
these notes): what are orthogonal vectors good for? Why would we ever want a
basis besides the standard basis in real Euclidean space? Is Euclidean space
the only vector space out there? Do vectors have to be lists of numbers?

~~~
ivan_ah
> What are orthogonal vectors good for?

Any set of n linearly independent vectors B_a={\vec{a}_i}_{i=1..n} in a vector
space can be used as a coordinate system, but the "computational cost" of
finding the coordinates w.r.t. the basis B_a will be annoying. Each time you
want to find the coordinate of a vector you have solve a system of linear
equations.

A basis consisting of orthogonal vectors {\vec{e}_i}_{i=1..n} is way cooler
because you can calculate the coefficients of any vector using the formula
$v_i = (\vec{v} · \vec{e}_i)/||\vec{e}_i||²

Of course the best thing is to have an orthonormal basis
B_s={\hat{e}_i}_{i=1..n}, so that the coefficients of a vector w.r.t B_s can
be calculated simply v_i = \vec{v} · \hat{e}_i. A projection onto the
\hat{e}_i subspace.

> Why would we ever want a basis besides the standard basis in real Euclidean
> space?

Hmm... I'm thinking eigenbases? The operations of a matrix A for vectors
expressed in terms of its eigenbasis is just a scaling, i.e. {B_e}_[A]_{B_e} =
Q^{-1} {B_s}_[A]_{B_s} Q = diagonal matrix = Λ.

> Is Euclidean space the only vector space out there? > Do vectors have to be
> lists of numbers?

Good point. If I bump this to 5-6 pages, I will try to include something about
generalized vector spaces. Orthogonal polynomials could be a good one to
cover.

~~~
dahart
Computer graphics is full of fun topics that answer these questions in
accessible and visible/tangible ways!

Skeletal character rigging in particular motivates a nice understanding of
orthonormal bases and why a basis other than the identity matrix is very
useful.

Even simple 2d conversions between screen space & pixel space, for example,
can be useful motivational examples- its the thing your web browser did to
render what you're reading. :)

Not that anyone would want to cover those topics in a 4 page linear algebra
primer, but maybe there is some inspiration there for ways to include pictures
that explain, instead of more math notation... ;)

The last 2 questions there are interesting and worthwhile topics for the
interested student, but I'd say, just my $0.02, don't ruin a good thing by
stuffing too much into it...

------
dmlorenzetti
It's a little surprising, in a "no-bullshit" discussion of "theoretical and
computational aspects of linear algebra," to see matrix inversion touted as
the way to solve linear equations. The guide literally introduces the examples
by saying "Dude, enough with the theory talk, let's see some calculations."
Yet standard numerical practice avoids literal inversion, in favor of
factorization methods.

E.g., _It is common practice to write the form A^{-1}b for economy of notation
in mathematical formulas... The trouble is that a reader unfamiliar with
numerical computation might assume that we actually compute A^{-1}... On most
computers it is always more effective to calculate A^{-1}b by solving the
linear system Ax = b using matrix factorization methods..._ (Dennis &
Schnabel, "Numerical Methods for Unconstrained Optimization and Nonlinear
Equations", section 3.2).

E.g., _As a final example we show how to avoid the pitfall of explicit inverse
computation... The point of this example is to stress that when a matrix
inverse is encountered in a formula, we must think in terms of solving
equations rather than in terms of explicit inverse formation._ (Golub and Van
Loan, "Matrix Computations", section 3.4.11).

~~~
greeneggs
Yes, one almost never inverts a matrix using a computer, and one _never_
inverts a matrix by hand. What does that teach you? I was pretty surprised to
see trivialities like this in a four-page summary. But, unfortunately, this is
just a poor summary. It also barely covers the spectral decomposition, and has
no mention at all of the singular-value decomposition.

~~~
baby
we do matrix inversion by hand in mathematics :D

------
nilkn
As someone with a math degree, I love this. However, I think the author over-
estimates the familiarity of a typical high school student with mathematical
notation:

> The only prerequisite for this tutorial is a basic understanding of high
> school math concepts

I think fundamentally the material in these four pages is accessible to many
high school graduates, but perhaps not in this concise rendering (which is
awesome for me, but probably overwhelming for someone not familiar with set-
theoretic notation, summation notation, etc.).

~~~
ivan_ah
You are right. I am totally trying to sneak in the \in and the \mathbb{R}'s in
there to see if people will notice.

Previously I had \forall in there too, but went over and removed things. On
the TODO list is to introduce the { obj | obj description } notation for
defining sets (in this case vector spaces).

~~~
Steuard
Well, I noticed! It stood out pretty clearly, I'd say.

I really don't think that the intended audience of this sort of ground-up
summary is going to be comfortable with \mathbb{R}^{m\times n} notation all
over the place. Yes, you explain what the notation means, but there's zero
chance that a reader who didn't already know that notation will become fluent
in it right away: they're going to be flipping back to the definition every
time. That might be okay if you're trying to teach all of math, but a person
reading "Linear Algebra in Four Pages" presumably neither wants nor needs that
full skill set. (I'd say that {obj|obj description} notation would make the
text even more opaque to novices, no matter how cleanly introduced.)

Why not just make a less precise reference to a table of "numbers" (which, as
an added bonus, would leave the door open to complex matrices without having
to start from scratch)?

------
ggchappell
A little observation: You repeatedly use language that fails to distinguish
definitions & properties vs. effective computational methods.

For example, in section G:

> To find the eigenvalue of a matrix we start from the eigenvalue equation
> ....

Solving the resulting equation is one way of computing eigenvalues. But it
might not be the one you want to use in some practical situation.

Just before that, in section F:

> The determinant of a matrix, ... serves to check if a matrix is invertible
> or not.

It is true that a square matrix is invertible iff is has nonzero determinant.
It certainly is _not_ true that, for a matrix of any size, computing the
determinant is a good method for checking whether a matrix is invertible.

~~~
Steuard
> Solving the resulting equation is one way of computing eigenvalues. But it
> might not be the one you want to use in some practical situation.

Related fun fact: Solving the eigenvalue equation is (in general) _impossible_
in closed form for matrices 5x5 or larger, because there's no closed form
solution to the quintic. So some other algorithm for finding eigenvectors and
eigenvalues is often absolutely required in practice.

~~~
ggchappell
Well, that depends on whether you want the _exact_ eigenvalues.

------
ivan_ah
Good luck to anyone who has a linear algebra exam coming up!

I also have a similar short tutorial for Newtonian mechanics here:
[http://cnd.mcgill.ca/~ivan/miniref/mech_in_7_pages.pdf](http://cnd.mcgill.ca/~ivan/miniref/mech_in_7_pages.pdf)

~~~
whitewhim
LinAlg 2 coming up in two days. Its a fun course!

------
dergachev
I really enjoyed the Gilbert Strang videos on linear algebra back when I was
taking the course at McGill:
[http://ocw.mit.edu/courses/mathematics/18-06-linear-
algebra-...](http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-
spring-2010/)

~~~
aet
Yes, very entertaining and enlightening (even if you already know the
subject). Love Strang's teaching style.

~~~
blt
I agree, his lecture deriving the determinant from a few desirable properties
is a classic in my mind.

------
elvinmelvin
Just to throw this out there in the hope it is useful, I have been using this
book as a review of linear algebra:

[http://linear.ups.edu/html/fcla.html](http://linear.ups.edu/html/fcla.html)

~~~
ivan_ah
+1

This book is awesome! The entire text is interspersed with collapsable
examples and exercises.

------
graycat
Roger Horn is one of the best linear algebra and matrix guys around, and I
was, by wide margins, the star student in his class, effortlessly.

For the notes, I found serious problems in just the first half of the first
column on the first page.

F'get about the four pages.

If enough people want an outline in a few pages, then I'll consider knocking
one out and putting up somewhere as a PDF file.

~~~
ChuckMcM
I'd be happy to see three serious problems described in the first column of
the first page.

~~~
defen
I'll take a crack at being as pedantic as possible. I'm just doing this to
play along.

1) "Linear algebra is the math of vectors and matrices" \- I think this would
be better phrased as "the math of vectors and linear transformations."
Matrices are a convenient way to represent finite dimensional linear
transformations, but I think it's putting the cart before the horse to say
that linear algebra is the math of matrices. This is a minor problem, though.

2) "A vector ⃗v ∈ R^n is an array of n real numbers" \- In general vectors are
not arrays of numbers, real or otherwise. This kind of definition will run
into problems when you get into Hilbert/Banach spaces.

3) "the function f(x) = ln(x) has the inverse f^−1(x) = e^x" \- this is only
true for x > 0 if we're talking about the real-valued logarithm function,
since it's not defined for x <= 0.

~~~
ChuckMcM
Thank you, that is awesome. So are all the problems ones of a pedantic nature?
For example I understood the inverse function description as a definition even
though there was a range (which I also knew) for which it was invalid.

Generally I consider those sorts of things to be the difference between
writing 'proofs' versus explaining concepts.

~~~
defen
Yeah, I wouldn't really consider any of those things to be "serious" problems,
at least for the level that this guide is pitched at.

------
baby
erk, I don't think condensing much information in the smallest place possible
is the best way to learn something (or even review it).

I'm all for a no bullshit and quick way to get something. That's why I
sometimes check learnXinYminutes.com or some random cheatsheets on Google. But
this doesn't make it for me.

Btw, if you really want to get a good grasp on Linear Algebra you should check
Gilbert Strang's video courses on MIT OpenCourseWare. They are amazing and
soooo easy to understand you don't even need to like mathematics to watch
them. I haven't come across a better support to start with Linear Algebra.

~~~
luuse
Course codes 18.06SC and 18.06 i think if anyone else was looking. Thanks for
the pointer, really appreciated!

~~~
udit99
Got excited for a moment...then did some prerequisites digging and ended up
with a prerequisites dependency chain:

18.06 -> 18.02 -> 18.01

Where 18.02 = Multi Variable Calc 18-01 = Single Variable Calc

Considering that my whole point of learning linear algebra was to clear it as
a roadblock for Machine Learning, this is what my whole Dependency chain looks
like:

Machine Learning -> Linear Algebra -> Multivariable Calc -> Single Variable
Calc -> high school algebra and trigonometry.

I have a feeling I'll end up sticking to being a web developer :)

~~~
tokenrove
Calculus gets a bad rap for being difficult, but if you're learning on your
own, you can just focus on the ideas and not on the arduous computation (which
something like maxima can do for you). The core ideas of calculus can probably
be learned in a week. Review all the trig on Khan Academy, then try watching
some calculus lectures, focusing on the big ideas, not memorizing rules for
computing derivatives or integrals.

BTW, you probably don't need much calculus to learn most of the linear algebra
you need; those requirements are mostly there for mathematical maturity, plus
then being able to assign more interesting exercises.

~~~
elq
Gilbert Strang made a great series of lectures on the big ideas of calculus
sans most of the computation -
[http://www.youtube.com/playlist?list=PLBE9407EA64E2C318](http://www.youtube.com/playlist?list=PLBE9407EA64E2C318)

------
wfunction
Why do linear algebra teaching materials never mention what a determinant
_is_?

(It's the product of the eigenvalues.)

~~~
waqf
> _It 's the product of the eigenvalues._

It sure is (provided you count multiplicities correctly), but that's not the
one-sentence explanation I would have gone with, even to someone with much
more intuition about eigenvalues than you'd get from this document. I would
say the determinant is the volume scale factor (or hypervolume scale factor,
in the general case).

Actually, what's really interesting to me is your general point: why do _<
teaching materials for mathematics subject X>_ never mention _< the key
insights that made subject X clear to me>_? And why does this question not get
asked more often? The nearest I've been able to come to a plausible answer is
a mixture of (i) the key insight varies from person to person more than you'd
think, combined with the closely related (ii) you can't teach key insights
just by saying a few words.

~~~
ivan_ah
Thank you for your comment! First off the explanation regarding det(A) = the
volume scaling of the transformation T_A associated with the matrix A. I've
been stuck for over a week now trying to word this any other way possible
because, in the current ordering of the sections, I'm covering determinants
before linear transformations. Perhaps there is no better once sentence than
talking about the volume and I should reorder the sections...

You've raised a very important point regarding "routes to concepts" which
really should be asked more often!

> (ii) you can't teach key insights just by saying a few words.

Generally true, though we have to say that the key difficulty in communicating
insights is lacking prerequisites. Therefore, if you think very carefully
about the prerequisite structure (i.e. model of the reader's previous
knowledge) you can do lot's of interesting stuff in very few words.

> (i) the key insight varies from person to person more than you'd think,

Let G=(V,E) where V is the set of math concepts, and E are the links between
them. Then there are as many ways to click on a concept x, as there there are
in-links for x! In this case we have at least three routes:

    
    
      Route 1: geometric
       lin. trans T_A = {matrix:A, B_in:{e1,..,en}, B_out:{f1,..,fn}} 
       ---> what happens to a unit cube 1x1x1,
            represented  (1,...1) w.r.t. B_in,
            after going through T_A?
            ---> output will be (1,..,1) w.r.t. B_out
                 ---> observe that cols of A are f1..fn
                      therefore 
                         det(A) = (hyper)volume of (hyper)parallelepiped 
                                  formed by vectors f1..fn
       
       Route 2: computational
        det(A) = { formula for 2x2, formula for 3x3, ... }
        ---> a easy test for invertibility of an n by n matrix
        sidenote: see Cramer's rule for another application of dets
       
       
       Route 3: algebraic 
        given a 2x2 matrix A, 
        find a coefficients A T D that satisfy the following matrix equation:
             B*A^2  + T*A  + D   = 0,
        the linear term is the trace T = sum_i a_ii, the constant term is D = det(A).
        sidenote:  B*λ^2 + T*λ + D = {characteristic poly. of A} = det(A-λI)
    
    

So perhaps the task of teaching math is not so much to try to find "the right"
or "the best" route to a concept, but to collect many explanations of routes
(edges in G), then come up with a coherent narrative that covers as many edges
as possible (while respecting the partial-sorted-order of prerequisites ).

------
source99
I love a lesson that doesn't include ANY real world examples. What's the
purpose of this document? Does it accomplish that purpose?

~~~
ivan_ah
I'll have to add some examples, yes >> TODO.txt

purpose = to give a quick intro to the subject (the most important idea at
least, namely, that matrix-vector product can represent linear
transformations).

------
starstart
Suppose that you are driving a car and you are hit with a big tree. The main
eigenvector is in the direction from the car to the tree, the eigenvalue
measures the deformation of the car produced by this accident. If the car is a
half shorter after the accident then the eigenvalue is 1/2.

------
michaelchum
Wow amazing stuff here, I agree with you that most math textbooks in our
education system fail to explain concepts which are supposed to be very
simple. Thank you so much for making things simpler. I'm having MATH 270 final
next week at McGill, this is going to help a lot in studying :)

------
gaelow
No pseudo-inverse, no singular value decomposition, no least squares
regression, no k-nearest neighbors and I'm sure I'm leaving many other things,
all fundamental algebra very much needed for understanding and developing of
computer science (e.g. machine learning, data storage and compression),
telecommunications (e.g. queueing theory, multimedia coding and streaming) and
many other fields related to engineering. I lovehate maths (I really struggle
with them), but I honestly think our linear algebra book back in grad school
was 600 pages for a good reason, you just can't do proper engineering without
it. (Also I have another really thick book on the most important numerical
methods for implementing that algebra on a computer, so, come on! :-))

------
jakab922
This is really basic. Noone(who uses the subject) should need a cheat sheet
for this... Also there are really good books out there and linalg is a must
nowdays. As for textbook: [http://www.amazon.com/Linear-Algebra-Right-
Undergraduate-Mat...](http://www.amazon.com/Linear-Algebra-Right-
Undergraduate-
Mathematics/dp/0387982582/ref=sr_1_1?ie=UTF8&qid=1386775556&sr=8-1&keywords=linear+algebra+done+right)
As for reference: [http://www.amazon.com/Matrix-Analysis-Roger-
Horn/dp/05213863...](http://www.amazon.com/Matrix-Analysis-Roger-
Horn/dp/0521386322/ref=sr_1_3?s=books&ie=UTF8&qid=1386775486&sr=1-3&keywords=matrix)

------
starstart
Suppose that you want to project a vector of data (x1,x2,x3,...,xn) into the
one dimensional subspace generated by the vector (1,1,...,1). What's the
projection in this case?

Hint: You obtain the most important concept of statistics.

------
mathattack
I wish I had this in grad school when I had forgotten all my undergrad Linear
Algebra. It's not a very hard or deep aspect of math if you understand the
fundamentals, so this is very useful.

------
mrcactu5
Courses would benefit from "quick-reference guides like the end of Dror Bar
Natan's paper on Khovanov homology
[http://arxiv.org/abs/math/0201043](http://arxiv.org/abs/math/0201043) He says
"It can fit inside your wallet."

See:
[http://www.math.toronto.edu/drorbn/Talks/HUJI-011104/cube.gi...](http://www.math.toronto.edu/drorbn/Talks/HUJI-011104/cube.gif)

------
tdicola
Very cool, I like title of the associated book "No Bullshit Guide To Linear
Algebra" too.

Does anyone know of a nice, short summary of discrete mathematics to go along
with this?

------
Perseids
Studying at a university where all this and more is part of the first year
education of computer scientists, I - probably foolishly - assumed basic
linear algebra was common knowledge in the community. Now my interest is
piqued: Which educational path / career path did you take ending up in IT and
(more specifically) how much mathematical education did it include?

What is your educational background and which route

------
dionyziz
I bought this guy's previous book and the print quality was crap. I hope he
improves this on his next book.

------
starstart
In page 3, section B. Using elementary matrices: To remember what matrix
correspond to a row operation just apply that operation to the identity matrix
and you obtain the elementary associated matrix.

------
anoncow
Here is a 1914 book on calculus. Calculus made easy by Thompson Silvanus
Philip

[http://www.gutenberg.org/ebooks/33283](http://www.gutenberg.org/ebooks/33283)

------
nathan-wailes
thanks for the post, i've been trying to learn linear algebra for a long time
but keep getting stuck / bored.

~~~
dergachev
For me what makes it really interesting is the realization that all of linear
regression (think y = mx + b) is based on linear algebra, specifically the
notion that figuring out a best-fit line (in the case of a single-dimensional
input variable) is projection of the N-dimensional vector space of
observations (where you have N data points) onto the best-fitting 2
dimensional vector spaces (assuming you're fitting a slope and an intercept).

When thought of this way, a lot of linear algebra has geometric
interpretations and for me this makes it a lot less abstract.

~~~
ivan_ah
Yeah, dergachev is pointing to one of the cool direct applications of linear
algebra to machine learning.

Suppose you are given the data D = (r_1; r_2; r_3) where each row r_i is an
n-vector r_i=(a_i1, a_i2, ..., a_in, b_i). Each row consists of some
observation data. We want to predict future b_j given the future \vec{a}_j,
given that we have seen {r_i}_{i=1...N} (The data set consists of N data rows
r_i where both \vec{a}_i and b_i are known).

One simple model for b_j given \vec{a}_i = \vec{a}_i = (a_i1, a_i2, ..., a_in)
is a linear model with $n$ parameters m_1,m_2,...,m_n:

    
    
       y_m(x_1,...x_n) =  m_1x_1 + m_2x_2 + ... + m_nx^n  =  \vec{m} · \vec{x}
    

If the model is good then y_m(\vec{a}_i) approximates b_i well.

But startup wisdom dictates that we should measure everything!

Enter error term:

    
    
       e_i(\vec{m}) = |y_m(\vec{a}_i)   - b_i|²,
    

the squared absolute-value of the _difference_ between the model's prediction
and the actual output---hence the name error term.

Our goal is to make the sum $S$ of all the error terms as small as possible.

    
    
       S(\vec{m}) = \sum_{i=1}^{i=M} e_i(\vec{m}) 
    

Note that the "total squared error" is a function of the model parameters
\vec{m}.

At this point we have reached a level of complexity that becomes difficult to
follow. Linear algebra to the rescue!

We can express the "vector prediction" of the model y_m in "one shot" in terms
of the following matrix equation:

    
    
      A\vec{m} = \vec{b}
    
      where A is an N by n matrix  (contains the a_:: part of the data)
      \vec{m} is an n by 1 vector  (model parameters---the unknown)
      \vec{b} is an N by 1 vector  (contains the b_: part of the data)
    
    

To find \vec{m}, we must solve this matrix equation, however A is not a square
matrix: A is a tall skinny matrix N >> n, so there is no A^{-1}.

Okay so we don't have a A^{-1} to throw at the equation A\vec{m}=\vec{b} to
cancel the A, but what else could we throw at it. Let's throw A^T at it!

    
    
         A^T A \vec{m}  =  A^T \vec{b}
         \   /
           N   (an n by n matrix) 
    
           N \vec{m}  = A^T \vec{b}
    

Now the thing to observe is that if N is invertible, then we can find \vec{m}
using

    
    
            \vec{m}  = N^{-1} A^T \vec{b}
    
    

This solution to the problem is known as the "least squares fit" solution
(i.e. choice of parameters for the model m_1, m_2, ..., m_n). This name comes
from the fact that the vector \vec{m} is equal to the output of the following
optimization problem

    
    
        \vec{m}   =  minimize S(\vec{m}) over all \vec{m} 
    

Proof:
[http://en.wikipedia.org/wiki/Linear_least_squares_(mathemati...](http://en.wikipedia.org/wiki/Linear_least_squares_\(mathematics\)#Derivation_directly_in_terms_of_matrices)

Technical detail: the matrix N=A^T*A is invertible if and only if the columns
of A are linearly independent.

TL;DR. When you have to do a "linear regression" model of data matrix X and
labels \vec{y}, the best (in the sense of least squared error) linear model is
\vec{m} = (X^T X)^{-1} X^T \vec{y}.

~~~
somethingnew
That literally made more sense than anything I've learned this semester. :)

------
aves
Does anyone have a recommendation for an equally good and concise guide for
Discrete Mathematics by any chance?

------
dblarons
Literally just finished this exam. Too bad I didn't see this summary 4 hours
ago!

------
af3
Linear algebra class in one abbreviation: LAPACK.

