A few minor mistakes (perhaps just in my eyes), but overall pretty good.
The hardest part about teaching linear algebra is that nobody explains the big picture. I teach mathematics and computer science and regularly tutor linear algebra students, and I encounter students all the time who ask me "What are vectors good for? I thought all we cared about were matrices and doing RREF and stuff."
For this reason, I deemphasize computations and emphasize the connection between linear maps and matrices. It can be summed up as follows: if you fix a basis, every linear map is a matrix and every matrix is a linear map, and operations on functions (like composition, inversion, whatever) correspond to operations on the corresponding matrices.
It's definitely not an analogy or anything in "scare quotes" that would imply something different is going on behind the scenes. It's exactly the same thing.
Other questions usually left out of discussions about linear algebra (and these notes): what are orthogonal vectors good for? Why would we ever want a basis besides the standard basis in real Euclidean space? Is Euclidean space the only vector space out there? Do vectors have to be lists of numbers?
Any set of n linearly independent vectors B_a={\vec{a}_i}_{i=1..n} in a vector space can be used as a coordinate system, but the "computational cost" of finding the coordinates w.r.t. the basis B_a will be annoying. Each time you want to find the coordinate of a vector you have solve a system of linear equations.
A basis consisting of orthogonal vectors {\vec{e}_i}_{i=1..n} is way cooler because you can calculate the coefficients of any vector using the formula $v_i = (\vec{v} · \vec{e}_i)/||\vec{e}_i||²
Of course the best thing is to have an orthonormal basis B_s={\hat{e}_i}_{i=1..n}, so that the coefficients of a vector w.r.t B_s can be calculated simply v_i = \vec{v} · \hat{e}_i.
A projection onto the \hat{e}_i subspace.
> Why would we ever want a basis besides the standard basis in real Euclidean space?
Hmm... I'm thinking eigenbases? The operations of a matrix A for vectors expressed in terms of its eigenbasis is just a scaling, i.e. {B_e}_[A]_{B_e} = Q^{-1} {B_s}_[A]_{B_s} Q = diagonal matrix = Λ.
> Is Euclidean space the only vector space out there?
> Do vectors have to be lists of numbers?
Good point.
If I bump this to 5-6 pages, I will try to include something about generalized vector spaces.
Orthogonal polynomials could be a good one to cover.
Computer graphics is full of fun topics that answer these questions in accessible and visible/tangible ways!
Skeletal character rigging in particular motivates a nice understanding of orthonormal bases and why a basis other than the identity matrix is very useful.
Even simple 2d conversions between screen space & pixel space, for example, can be useful motivational examples- its the thing your web browser did to render what you're reading. :)
Not that anyone would want to cover those topics in a 4 page linear algebra primer, but maybe there is some inspiration there for ways to include pictures that explain, instead of more math notation... ;)
The last 2 questions there are interesting and worthwhile topics for the interested student, but I'd say, just my $0.02, don't ruin a good thing by stuffing too much into it...
> Each time you want to find the coordinate of a vector you have solve a system of linear equations.
This is wrong, unless I'm misreading you. The first time you want the coordinates of a vector in a nonstandard basis you need to construct the matrix that does the change of basis, which requires inverting a matrix (same cost as solving a LSE), but after that, it's just plain matrix-vector multiplication.
I knew the answers to these questions :) I'm just saying they belong in a discussion of the topics. Maybe I just like to take math slower than a 4-page summary.
I read Axler's book quite recently, after pouring through reviews of linear algebra books. I did like it quite a lot -- it has a very clean, approachable narrative style. Having completed it, though, I feel there's maybe not enough operational knowledge gained from it. Meaning, it explains the theory, the motivations, well, but one might augment it with Schaum's[1], or some such guided exercises.
Thanks, i have been doing a lot of machine learning hacks lately and almost everything involves "vector spaces", 249 pages should be fun, just finishing up "Think Bayes" by Alan Downey.
Axler is not a text in applied linear algebra, it is a book you should read after you are already familiar with both basic abstract algebra and basic linear algebra (i.e. you know how to multiply matrices etc.), it then provides a nice motivation for linear algebra as a subfield of abstract algebra. You may still, however, not find the ideas of determinant, trace and other practical concepts very intuitive after reading Axler.
> Why would we ever want a basis besides the standard basis in real Euclidean space?
It think it can be explained from a SVD pointview, the standard basis may not likely be optimal one, and didn't capture the most variance. The basis with larger variance, will presumably contains more information.
After obtaining the good basis, we can approximate the original matrix using only the ones(often much smaller in number than standard basis) with larger variances. This is useful to find a shorter representation, yet preserves much of the distance information.
I think this post explain the intuition pretty clear:
Well, it's not an analogy! There's a 1-1 correspondence between matrices and linear maps over a vector space with a fixed basis. "A matrix represents a function" is no more an analogy than "The symbols 'V', '5', and '五' represent the number five" is an analogy.
It's more like matrices are a useful way of representing certain functions, just like tally marks are a more useful way of representing numbers when you're counting the number of guests as they arrive but arabic numerals are more useful when you're doing arithmetic. Because these functions — linear maps — have a special structure to them, we can define matrix "multiplication" in a way that corresponds to the composition of linear maps.
In other words, it's not a happy accident. We invented matrices as a way to reason about and compute with linear maps more effectively. For example, "diagonal matrices" and "upper-triangular matrices" (among others) play important roles throughout linear algebra, but without the matrix representation of linear maps it'd be hard to talk about these things.
No. They just have to satisfy the vectors axioms (greatly simplifying, you can add vectors together and multiply them by scalars , and they behave as you would expect - associative, commutative, distributive, etc as appropriate.)
So, in this regard, functions can be vectors, if you define addition to be pointwise addition. For example, consider the set of continuous functions on the unit interval [0, 1]. Add two together and you get another continuous function on the unit interval, multiply by a scalar and you get another continuous function on the unit interval, etc.
It turns out that, if you restrict to spaces of finite dimension, then Euclidean space is the only vector space out there, up to your choice of labelling the basis elements and up to a certain linear transformation.
In this sense, vectors can always be written as lists of numbers, but no, they don't have to be this in general. For example, you could have a vector space of smooth functions. It's infinite dimensional, and its elements (its vectors) are functions, not tuples of numbers.
You must also add the condition that your scalars are real numbers. There are many finite dimensional vector spaces that are nothing like Euclidean space: any finite extension of a finite field, for example. This comes from Galois theory, but is not just abstract nonsense. One application to computing is if your scalars are only 0 or 1 and all operations are done mod 2, then you have a framework for doing error correcting codes, among other things. We call this set of scalars either the finite field of size 2 or the Galois field of size 2, aka GF(2).
Yes, what I should have said is: if k is an algebraically closed field then n-dimensional vector spaces over k are isomorphic to k^n. Unless I've had too much to drink this evening.
It's a little surprising, in a "no-bullshit" discussion of "theoretical and computational aspects of linear algebra," to see matrix inversion touted as the way to solve linear equations. The guide literally introduces the examples by saying "Dude, enough with the theory talk, let's see some calculations." Yet standard numerical practice avoids literal inversion, in favor of factorization methods.
E.g., It is common practice to write the form A^{-1}b for economy of notation in mathematical formulas... The trouble is that a reader unfamiliar with numerical computation might assume that we actually compute A^{-1}... On most computers it is always more effective to calculate A^{-1}b by solving the linear system Ax = b using matrix factorization methods... (Dennis & Schnabel, "Numerical Methods for Unconstrained Optimization and Nonlinear Equations", section 3.2).
E.g., As a final example we show how to avoid the pitfall of explicit inverse computation... The point of this example is to stress that when a matrix inverse is encountered in a formula, we must think in terms of solving equations rather than in terms of explicit inverse formation. (Golub and Van Loan, "Matrix Computations", section 3.4.11).
Yes, one almost never inverts a matrix using a computer, and one never inverts a matrix by hand. What does that teach you? I was pretty surprised to see trivialities like this in a four-page summary. But, unfortunately, this is just a poor summary. It also barely covers the spectral decomposition, and has no mention at all of the singular-value decomposition.
As someone with a math degree, I love this. However, I think the author over-estimates the familiarity of a typical high school student with mathematical notation:
> The only prerequisite for this tutorial is a basic understanding of high school math concepts
I think fundamentally the material in these four pages is accessible to many high school graduates, but perhaps not in this concise rendering (which is awesome for me, but probably overwhelming for someone not familiar with set-theoretic notation, summation notation, etc.).
You are right. I am totally trying to sneak in the \in and the \mathbb{R}'s in there to see if people will notice.
Previously I had \forall in there too, but went over and removed things.
On the TODO list is to introduce the { obj | obj description } notation for defining sets (in this case vector spaces).
Well, I noticed! It stood out pretty clearly, I'd say.
I really don't think that the intended audience of this sort of ground-up summary is going to be comfortable with \mathbb{R}^{m\times n} notation all over the place. Yes, you explain what the notation means, but there's zero chance that a reader who didn't already know that notation will become fluent in it right away: they're going to be flipping back to the definition every time. That might be okay if you're trying to teach all of math, but a person reading "Linear Algebra in Four Pages" presumably neither wants nor needs that full skill set. (I'd say that {obj|obj description} notation would make the text even more opaque to novices, no matter how cleanly introduced.)
Why not just make a less precise reference to a table of "numbers" (which, as an added bonus, would leave the door open to complex matrices without having to start from scratch)?
A little observation: You repeatedly use language that fails to distinguish definitions & properties vs. effective computational methods.
For example, in section G:
> To find the eigenvalue of a matrix we start from the eigenvalue equation ....
Solving the resulting equation is one way of computing eigenvalues. But it might not be the one you want to use in some practical situation.
Just before that, in section F:
> The determinant of a matrix, ... serves to check if a matrix is invertible or not.
It is true that a square matrix is invertible iff is has nonzero determinant. It certainly is not true that, for a matrix of any size, computing the determinant is a good method for checking whether a matrix is invertible.
> Solving the resulting equation is one way of computing eigenvalues. But it might not be the one you want to use in some practical situation.
Related fun fact: Solving the eigenvalue equation is (in general) impossible in closed form for matrices 5x5 or larger, because there's no closed form solution to the quintic. So some other algorithm for finding eigenvectors and eigenvalues is often absolutely required in practice.
> It certainly is not true that, for a matrix of any size, computing the determinant is a good method for checking whether a matrix is invertible.
Can you explain this part? I don't understand what you mean, and I have an Advanced Linear Algebra final in a few days, so I should probably know this stuff.
I thought the determinant was not defined for non-square matrices. Also, I thought that inverses only existed for square matrices (implying that the determinant check is a good way of testing invertability).
It's a valid way to test whether a (square) matrix has an inverse, but computing big determinants is a lot of effort (both from a "student work" perspective and from a "computational cost" perspective). The point is that in many cases, there are enormously more efficient ways to find an inverse (or, more generally, to solve matrix problems with inverses in them).
I'll take a crack at being as pedantic as possible. I'm just doing this to play along.
1) "Linear algebra is the math of vectors and matrices" - I think this would be better phrased as "the math of vectors and linear transformations." Matrices are a convenient way to represent finite dimensional linear transformations, but I think it's putting the cart before the horse to say that linear algebra is the math of matrices. This is a minor problem, though.
2) "A vector ⃗v ∈ R^n is an array of n real numbers" - In general vectors are not arrays of numbers, real or otherwise. This kind of definition will run into problems when you get into Hilbert/Banach spaces.
3) "the function f(x) = ln(x) has the inverse f^−1(x) = e^x" - this is only true for x > 0 if we're talking about the real-valued logarithm function, since it's not defined for x <= 0.
Thank you, that is awesome. So are all the problems ones of a pedantic nature? For example I understood the inverse function description as a definition even though there was a range (which I also knew) for which it was invalid.
Generally I consider those sorts of things to be the difference between writing 'proofs' versus explaining concepts.
> 2) "A vector v⃗ ∈ R^n is an array of n real numbers" - In general vectors are not arrays of numbers, real or otherwise. This kind of definition will run into problems when you get into Hilbert/Banach spaces.
Surely this is ok if it is parsed as "(a vector in R^n) is an array of n real numbers".
That's better, but still not entirely correct. An array of real numbers is not itself a vector. It is a representation of a vector in a particular basis. This is a technical point but very important to grok, especially when starting to look at vector spaces other than F^n.
> An array of real numbers is not itself a vector. It is a representation of a vector in a particular basis.
No, such an array can be either a vector or the representation (which also has to be a vector) you
mentioned.
To make such things make sense, need a definition of
a vector space. It can be tempting to omit that
definition from an outline, but the definition is
just crucial, e.g., when talking about subspaces,
which is often crucial. E.g., for m x n matrix A
and n x 1 'vector' x, the set of all x such that
Ax = 0 is a subspace, but to know this need the
definition of a vector space.
(1) In R^n, none of n, R, or R^n has been defined.
Better: Let n be a positive integer, and let R be
the set of real numbers. Then R^n is the set of all
n-tuples over the real numbers. Such an n-tuple is
a list of n numbers; e.g., (1, 3, 2.6, 1, 7) is an
n-tuple for n = 5.
For positive integers m and n, an m x n (read "m by
n") (real) 'matrix' as a rectangular array of real
numbers with m rows and n columns. Typically for
integer i = 1, 2, ..., m and integer j = 1, 2, ...,
n, in m x n matrix A = [a_{ij}] a_{ij} is the
number, 'component', in row i and column j. An
example of a 2 x 3 matrix is
1 9 4
4 0 -1
(typically surrounded with square brackets not easy
to type without, say, TeX).
Typically an n-tuple x in R^n is regarded as an n x
1 matrix. In this way, with m x n matrix A, n x 1
vector b, and matrix multiplication (we have yet to
define) Ax, we can write Ax = b as a system of m
linear equations in n unknowns x, and this is the
most important foundation and application of linear
algebra.
(2) Similarly for positive integer n and the set of
complex numbers C, C^n is the set of all n-tuples of
complex numbers. And, we can have an m x n matrix
of complex numbers.
Once get very far in linear algebra, complex numbers
become essential if only from the fundamental
theorem of algebra (that is, from factoring a
polynomial). In particular, eigenvalues are
important, and they are often complex numbers.
(3) R^n and C^n are examples of 'vector spaces', but
there are also many other examples.
For a 'vector space', we have a non-empty set V of
'vectors' and a 'field' F. Note: A 'field' is
defined carefully in courses in abstract algebra,
but the two leading examples in practice are R and
C. Most commonly F = R or F = C, but there is also
interest in other fields, e.g., the integers modulo
a prime number. Here we assume that F = R or F = C.
On V we have an operation 'addition' denoted by +.
So, for vectors x in V and y in V, there is a vector
x + y in V. Addition is commutative so that x + y =
y + x. Vector addition is also associative, that
is, for x, y, z in V,
(x + y) + z = x + (y + z)
Also there is a vector 0 in V so that for any x in V
we have 0 + x = x.
Also for x in V, there is -x in V so that x + (-x) =
x - x = 0.
Or set V with operation + forms a commutative group
(defined carefully in courses in abstract algebra).
On V with F we have an operation 'multiplication'.
Then for x in V and a in F, with this
'multiplication' the 'product' ax in V.
For x, y in V and a, b in F, we have two
distributive laws, (a + b)x = ax + bx and a(x + y) =
ax + ay. Also, for 0 in F, 0x is the vector 0. For
the vector 0, a0 is also the vector 0.
We have that (-1)x = -x.
For some examples of vector spaces, (A) let F = R
and let V be the set of all monophonic audio signals
exactly 10 seconds long, (B) let F = R and let V be
the set of all real valued random variables X where
E[X^2] is finite, and (C) let F = R and for m x n
matrix A and n x 1 vector x and m x 1 vector b, let
V be the set of all x such so that Ax = b (so far
the matrix product Ax has not been defined, but it
will be defined so that Ax = b is a system of m
linear equations in n unknowns, the components of
x).
(3) For set S, x in S means that x is an 'element'
of S.
(4) The notation (R, R, R) = R^3 is neither good nor
standard.
The notation R^3 = R X R X R as the set theory
Cartesian product is standard.
Actually, we already know just what (R, R, R) is --
a 3-tuple and not a set of all 3-tuples of real
numbers.
(5) R^{m x n} is questionable notation and, anyway,
needs a definition.
(6) For the "power" of that mathematics, might
illustrate that with some applications. One of the
most important is solving systems of linear
equations. For more, a matrix can describe a linear
transformation, and linearity is a powerful
assumption that commonly holds with good accuracy in
practice. So, vectors and matricies become powerful
means of working with linear systems where a matrix
describes the system and the inputs and outputs of
that system are vectors.
From G. F. Simmons, the two pillars of analysis in
mathematics are continuity and linearity, and linear
algebra emphsizes both, especially linearity.
More examples of linearity include both
differentiation and integration in calculus and the
Fourier transform.
(6) The definition of inverse function is wrong.
That definition holds only when the given function f
is 1-1. Even if f is not 1-1, the inverse function
is still useful, but its values can be a set of
values, that is, the inverse function is multiply
valued. E.g., just take f(x) = x^2. Then the
f^{-1}(1) = {-1, 1}.
Horn wrote about my work,
"Best performance in the course
throughout the semester. Knows
this material cold."
It was just true. Why? It was
an 'advanced' course in linear
algebra and matrix theory, and I'd
never had even a first course in the subject.
But long before the course, in total
I'd done much more than just such a
course.
I had had two good courses in
abstract algebra, each of which had
provided a good foundation for linear
algebra and also touched on the
subject.
I had taken a
reading course in theoretical physics
based heavily on matrix theory.
And I had written my undergraduate honors
paper on group representation theory,
essentially in linear algebra.
I'd worked carefully through
two of the best books on linear algebra,
one by E. Nearing and the other by P.
Halmos. The Halmos book is famous as
the crown jewel of books on linear
algebra. Why? It can be good to look
at linear algebra as a baby version of
Hilbert space theory, heavily from
von Neumann, and Halmos wrote his book
when he was an assistant to von Neumann
at the Institute for Advanced Study.
So, deliberately, the Halmos book is
an introduction to Hilbert space theory.
It also has the profound perspective
of one of the best mathematicians of
all time and the elegance of thought
of von Neumann.
For applications, in my career and in
my own study, I'd worked through stacks
of books and papers on applications
in numerical linear algebra, multivariate
statistics, optimization, curve fitting,
signal processing, especially around
the fast Fourier transform, differential
equations, advanced calculus, deterministic
optimal control theory, exterior algebra, etc.
At one point I wrote my own notes on much
of linear algebra, with black ink on
white paper, 100% cotton I got especially
for the notes. My work was very careful
although not elegant.
For review, just before the course, I wrote
out from memory 80 pages covering about
2/3rds of the course. As Horn wrote, I
knew the subject cold, long before I took
his course. Horn's homework, tests, and
final exam were all challenging for the
other students, and I effortlessly blew
them away, by large margins on all three.
E.g., I never studied for the tests or
either the mid-term or final exam. When
doing the homework, I nearly never looked
at my notes from class and, instead, just
wrote out the solutions based on what
I already knew.
I didn't belong in the course and had told
the department that at the beginning. The
department's reaction was to smile and
say "Take the course anyway.". So, I did.
For me, the course was a huge waste:
From a long commute to campus and taking
care of my wife, I was short on time, but
the Horn homework had me
up nearly all night and then exhausted
the next day once a week and, thus, ate
into a big fraction of the best time I
had each week. Some of my other, more
important work suffered. When I did
the Horn homework, I was usually just
exhausted. I'd get the papers back,
graded, in about two weeks, recognize
my handwriting, but, from having been
so tired, actually had no memory of
having done the work. Bummer. It was
a trial by fire and exhaustion, not
education. I was being asked to
'prove myself', not learn something.
In the program, I only wanted to learn
some good material and had no desire
at all to 'compete', certainly not in
that linear algebra course beneath me.
I didn't know I was blowing away the
other students until the grader showed
me the scores on homework and tests,
all except the final exam.
In many ways Horn was a very good teacher,
but, in total, he blew away nearly all the
students. If I hadn't already known the
material coming into the course, then I
couldn't have learned it in the course.
The way I had learned the material was
from some slow, careful, thoughtful study
and a lot of work with applications,
and the course offered neither.
In my first post here, I misspelled the name. As in
my first response to your post, the correct spelling
is Nering.
> Also, what background do you think is necessary
for these texts?
The background to get started in linear algebra is
essentially just high school algebra. In college,
linear algebra is commonly the next course after
calculus.
For more, after linear algebra, commonly there is a
course in 'analysis' such as Rudin's 'Principles' in
my list. Without a good course, this book would be
tough reading.
The applications of linear algebra are so broad that
for a good sampling of those applications the
prerequisites in total are essentially at least a
good undergraduate major in mathematics.
Two of the biggest 'branches' of mathematics are
analysis and algebra. Essentially the first course
in 'analysis' is calculus.
As in Simmons (in this list), the two most important
pillars of analysis are continuity and linearity.
The first place to learn linearity well is 'linear
algebra' -- or, much the same thing, matrix theory
or finite dimensional vector spaces.
Then the algebraic aspects can be emphasized and can
lead to applications in, say, the theory of error
correcting codes.
For the analysis aspects, can continue with Hilbert
and Banach spaces and more and also various
applications.
Much of the mathematics in applied mathematics,
science, engineering, and technology is linear
algebra.
Here is a list of books relevant to linear algebra
and/or to the course I took from Roger Horn. This
list also contains maybe 60% of what I'd worked from
before Horn's course that let me do well in that
course.
========================================
Linear Algebra
----------------------------------------
Horn's course was close to
Roger A. Horn, Charles R. Johnson,
'Matrix Analysis',
0-521-38632-2,
Cambridge University Press, 1990.
with also a few topics from
Roger A. Horn, Charles R. Johnson,
'Topics in Matrix Analysis',
0-521-46713-6,
Cambridge University Press, 1994.
----------------------------------------
The course also used for reference and some topics
and exercises
Richard Bellman,
'Introduction to Matrix Analysis:
Second Edition',
McGraw-Hill,
New York,
1970.
This book is just packed with little results; at
some point can get the impression that the author
went on and on ... writing. Bellman was a very
bright guy in mathematics, engineering, and
medicine.
----------------------------------------
Relatively easy to read and relatively close to
applications, and another book Horn's course used as
a reference, is
Ben Noble,
'Applied Linear Algebra',
Prentice-Hall,
Englewood Cliffs, NJ,
1969.
Some edition of this book may be a good place to
start for a student interested in applications now.
----------------------------------------
About the easiest reading in this list, and my first
text on linear algebra, was
D. C. Murdoch,
'Linear Algebra for Undergraduates',
John Wiley and Sons,
New York,
1957.
This book is awfully old, but what it has are still
the basics.
----------------------------------------
For my undergraduate honors paper I made some use
of, and later read carefully nearly all of,
Evar D. Nering,
'Linear Algebra and Matrix Theory',
John Wiley and Sons,
New York,
1964.
The main part of this book is a relatively solid
start, maybe a bit terse and advanced for a first
text. The book also has in the back a collection of
advanced topics, some of which might be quite good
to know at some point and difficult to get
elsewhere.
One of the topics in the back is linear programming,
and for that I'd recommend something else, e.g.,
Chv'atal and/or Bazaraa and Jarvis in this list.
----------------------------------------
Likely the crown jewel of books on linear algebra is
Paul R. Halmos,
'Finite-Dimensional Vector Spaces, Second Edition',
D. Van Nostrand Company, Inc.,
Princeton, New Jersey,
1958.
Halmos wrote this in about 1942 when he was an
assistant to von Neumann at the Institute for
Advanced Study. The book is intended to be a finite
dimensional introduction to Hilbert space theory, or
how to do linear algebra using mostly only what also
works in Hilbert space. It's likely fair to credit
von Neumann with Hilbert space.
The book is elegant.
Apparently at one time Harvard's course Math 55,
with a colorful description at,
used this text by Halmos and also, as also in this
list, Rudin's 'Principles' and Spivak.
----------------------------------------
Long highly regarded as a linear algebra text is
Hoffman and Kunze,
'Linear Algebra, Second Edition',
Prentice-Hall,
Englewood Cliffs,
New Jersey,
1971.
========================================
Numerical Methods
----------------------------------------
If want to take numerical computations in linear
algebra seriously, then consider the next book or
something better if can find it
George E. Forsythe and
Cleve B. Moler,
'Computer Solution of Linear Algebraic Systems',
Prentice-Hall,
Englewood Cliffs,
1967.
========================================
Multivariate Statistics
My main start with multivariate statistics was
N. R. Draper and
H. Smith,
'Applied Regression Analysis',
John Wiley and Sons,
New York,
1968.
Apparently later editions of this book remain of
interest.
----------------------------------------
A relatively serious book on 'regression analysis'
is
C. Radhakrishna Rao,
'Linear Statistical Inference and
Its Applications:
Second Edition',
ISBN 0-471-70823-2,
John Wiley and Sons,
New York,
1967.
----------------------------------------
Three famous, general books on multivariate
statistics are
Maurice M. Tatsuoka,
'Multivariate Analysis:
Techniques for Educational and
Psychological Research',
John Wiley and Sons,
1971.
William W. Cooley and
Paul R. Lohnes,
'Multivariate Data Analysis',
John Wiley and Sons,
New York,
1971.
Donald F. Morrison,
'Multivariate Statistical Methods:
Second Edition',
ISBN 0-07-043186-8,
McGraw-Hill,
New York,
1976.
========================================
Analysis of Variance
A highly regarded first book on analsis of variance
and experimental design is
George W. Snedecor and William G. Cochran,
'Statistical Methods,
Sixth Edition',
ISBN 0-8138-1560-6,
The Iowa State University Press,
Ames, Iowa,
1971.
and a famous, more mathematical, book is
Henry Scheff'e,
'Analysis of Variance',
John Wiley and Sons,
New York,
1967.
========================================
Linear Optimization
A highly polished book on linear programming is
Vav sek Chv'atal,
'Linear Programming',
ISBN 0-7167-1587-2,
W. H. Freeman,
New York,
1983.
----------------------------------------
Nicely written and with more emphasis on the
important special case of network flows is
Mokhtar S. Bazaraa and
John J. Jarvis,
'Linear Programming and Network Flows',
ISBN 0-471-06015-1,
John Wiley and Sons,
New York,
1977.
----------------------------------------
A grand applied mathematics dessert buffet, based on
Banach space and the Hahn-Banach theorem is
David G. Luenberger,
'Optimization by Vector Space Methods',
John Wiley and Sons, Inc., New York, 1969.
========================================
Mathematical Analysis Relevant to Understanding
Linearity
----------------------------------------
Long the first place a math student gets a fully
serious encounter with calculus and closely related
topics has been
Walter Rudin,
'Principles of Mathematical Analysis, Third Edition',
McGraw-Hill,
New York, 1964.
The first chapters of this book do well as an
introduction to metric spaces, and that work applies
fully to vector spaces.
----------------------------------------
A nice place to get comfortable doing mathematics in
several dimensions is
Wendell H. Fleming,
'Functions of Several Variables',
Addison-Wesley,
Reading, Massachusetts,
1965.
Some of the material here is also good for
optimization.
----------------------------------------
Another place to get comfortable doing mathematics
in several dimensions is
Michael Spivak,
{\it Calculus on Manifolds:
A Modern Approach to Classical Theorems
of Advanced Calculus,\/}
W.\ A.\ Benjamin,
New York,
1965.\ \
----------------------------------------
The first half, the 'real' half of the next book has
polished introductions to Hilbert and Banach spaces
which are some of the most important vector spaces
Walter Rudin,
'Real and Complex Analysis',
ISBN 07-054232-5,
McGraw-Hill,
New York,
1966.
----------------------------------------
An elegant introduction to how to get comfortable in
metric space is
George F. Simmons,
'Introduction to Topology and Modern Analysis',
McGraw Hill,
New York,
1963.
========================================
Ordinary Differential Equations
----------------------------------------
Linear algebra is important, as some points crucial,
for ordinary differential equations, a polished
introduction from a world expert is
Earl A. Coddington,
'An Introduction to Ordinary
Differential Equations',
Prentice-Hall,
Englewood Cliffs, NJ,
1961.
I'd appreciate that. My linear algebra is extremely basic (all picked up as needed for an AI course). Any well-rounded ground up resources would be awesome.
erk, I don't think condensing much information in the smallest place possible is the best way to learn something (or even review it).
I'm all for a no bullshit and quick way to get something. That's why I sometimes check learnXinYminutes.com or some random cheatsheets on Google. But this doesn't make it for me.
Btw, if you really want to get a good grasp on Linear Algebra you should check Gilbert Strang's video courses on MIT OpenCourseWare. They are amazing and soooo easy to understand you don't even need to like mathematics to watch them. I haven't come across a better support to start with Linear Algebra.
Got excited for a moment...then did some prerequisites digging and ended up with a prerequisites dependency chain:
18.06 -> 18.02 -> 18.01
Where
18.02 = Multi Variable Calc
18-01 = Single Variable Calc
Considering that my whole point of learning linear algebra was to clear it as a roadblock for Machine Learning, this is what my whole Dependency chain looks like:
Machine Learning -> Linear Algebra -> Multivariable Calc -> Single Variable Calc -> high school algebra and trigonometry.
I have a feeling I'll end up sticking to being a web developer :)
Calculus gets a bad rap for being difficult, but if you're learning on your own, you can just focus on the ideas and not on the arduous computation (which something like maxima can do for you). The core ideas of calculus can probably be learned in a week. Review all the trig on Khan Academy, then try watching some calculus lectures, focusing on the big ideas, not memorizing rules for computing derivatives or integrals.
BTW, you probably don't need much calculus to learn most of the linear algebra you need; those requirements are mostly there for mathematical maturity, plus then being able to assign more interesting exercises.
You don't really need those calc courses to learn linear algebra. It's more about "mathematical maturity": familiarity with vectors, functions, inverse functions, a general intuition about multidimensional coordinates, and so on. A programming background would probably go a long way (although I don't speak from experience; I learned a lot of math before I started programming).
It's hard to overstate the importance of linear algebra in software. It's really worth learning.
I think the calc course prerequisites have an even simpler explanation: mathematicians don't usually manage to mention the existence of vectors until the multivariable calculus class. If you're already familiar with that idea, you shouldn't need to know anything about partial derivatives and surface integrals to understand linear algebra!
Don't look at the prerequisites and dig right in. I don't think you'll have any problems. Linear Algebra is something that kinda stand on its own.
> this is what my whole Dependency chain looks like
You should not think like that when learning. If you want to be efficient just take shortcuts.
Learn machine learning (btw there are great courses on coursera or Stanford about that), and if you're stuck on something just check on wikipedia or another resource.
It sure is (provided you count multiplicities correctly), but that's not the one-sentence explanation I would have gone with, even to someone with much more intuition about eigenvalues than you'd get from this document. I would say the determinant is the volume scale factor (or hypervolume scale factor, in the general case).
Actually, what's really interesting to me is your general point: why do <teaching materials for mathematics subject X> never mention <the key insights that made subject X clear to me>? And why does this question not get asked more often? The nearest I've been able to come to a plausible answer is a mixture of (i) the key insight varies from person to person more than you'd think, combined with the closely related (ii) you can't teach key insights just by saying a few words.
Thank you for your comment! First off the explanation regarding det(A) = the volume scaling of the transformation T_A associated with the matrix A. I've been stuck for over a week now trying to word this any other way possible because, in the current ordering of the sections, I'm covering determinants before linear transformations.
Perhaps there is no better once sentence than talking about the volume and I should reorder the sections...
You've raised a very important point regarding "routes to concepts" which really should be asked more often!
> (ii) you can't teach key insights just by saying a few words.
Generally true, though we have to say that the key difficulty in communicating insights is lacking prerequisites. Therefore, if you think very carefully about the prerequisite structure (i.e. model of the reader's previous knowledge) you can do lot's of interesting stuff in very few words.
> (i) the key insight varies from person to person more than you'd think,
Let G=(V,E) where V is the set of math concepts, and E are the links between them. Then there are as many ways to click on a concept x, as there there are in-links for x! In this case we have at least three routes:
Route 1: geometric
lin. trans T_A = {matrix:A, B_in:{e1,..,en}, B_out:{f1,..,fn}}
---> what happens to a unit cube 1x1x1,
represented (1,...1) w.r.t. B_in,
after going through T_A?
---> output will be (1,..,1) w.r.t. B_out
---> observe that cols of A are f1..fn
therefore
det(A) = (hyper)volume of (hyper)parallelepiped
formed by vectors f1..fn
Route 2: computational
det(A) = { formula for 2x2, formula for 3x3, ... }
---> a easy test for invertibility of an n by n matrix
sidenote: see Cramer's rule for another application of dets
Route 3: algebraic
given a 2x2 matrix A,
find a coefficients A T D that satisfy the following matrix equation:
B*A^2 + T*A + D = 0,
the linear term is the trace T = sum_i a_ii, the constant term is D = det(A).
sidenote: B*λ^2 + T*λ + D = {characteristic poly. of A} = det(A-λI)
So perhaps the task of teaching math is not so much to try to find "the right" or "the best" route to a concept, but to collect many explanations of routes (edges in G), then come up with a coherent narrative that covers as many edges as possible (while respecting the partial-sorted-order of prerequisites ).
purpose = to give a quick intro to the subject (the most important idea at least, namely, that matrix-vector product can represent linear transformations).
Suppose that you are driving a car and you are hit with a big tree. The main eigenvector is in the direction from the car to the tree, the eigenvalue measures the deformation of the car produced by this accident. If the car is a half shorter after the accident then the eigenvalue is 1/2.
Wow amazing stuff here, I agree with you that most math textbooks in our education system fail to explain concepts which are supposed to be very simple. Thank you so much for making things simpler. I'm having MATH 270 final next week at McGill, this is going to help a lot in studying :)
No pseudo-inverse, no singular value decomposition, no least squares regression, no k-nearest neighbors and I'm sure I'm leaving many other things, all fundamental algebra very much needed for understanding and developing of computer science (e.g. machine learning, data storage and compression), telecommunications (e.g. queueing theory, multimedia coding and streaming) and many other fields related to engineering. I lovehate maths (I really struggle with them), but I honestly think our linear algebra book back in grad school was 600 pages for a good reason, you just can't do proper engineering without it. (Also I have another really thick book on the most important numerical methods for implementing that algebra on a computer, so, come on! :-))
Suppose that you want to project a vector of data (x1,x2,x3,...,xn) into the one dimensional subspace generated by the vector (1,1,...,1). What's the projection in this case?
Hint: You obtain the most important concept of statistics.
I wish I had this in grad school when I had forgotten all my undergrad Linear Algebra. It's not a very hard or deep aspect of math if you understand the fundamentals, so this is very useful.
Courses would benefit from "quick-reference guides like the end of Dror Bar Natan's paper on Khovanov homology http://arxiv.org/abs/math/0201043 He says "It can fit inside your wallet."
Studying at a university where all this and more is part of the first year education of computer scientists, I - probably foolishly - assumed basic linear algebra was common knowledge in the community. Now my interest is piqued: Which educational path / career path did you take ending up in IT and (more specifically) how much mathematical education did it include?
What is your educational background and which route
In page 3, section B. Using elementary matrices: To remember what matrix correspond to a row operation just apply that operation to the identity matrix and you obtain the elementary associated matrix.
For me what makes it really interesting is the realization that all of linear regression (think y = mx + b) is based on linear algebra, specifically the notion that figuring out a best-fit line (in the case of a single-dimensional input variable) is projection of the N-dimensional vector space of observations (where you have N data points) onto the best-fitting 2 dimensional vector spaces (assuming you're fitting a slope and an intercept).
When thought of this way, a lot of linear algebra has geometric interpretations and for me this makes it a lot less abstract.
Nearly everything in linear algebra has
important geometric interpretations.
In addition, a lot that works in
the 3-space we live in (for the
non-relativistic approximation)
generalizes to linear algebra,
vector spaces more generally,
and, in particular, both Hilbert
space and Banach space.
Yeah, dergachev is pointing to one of the cool direct applications of linear algebra to machine learning.
Suppose you are given the data D = (r_1; r_2; r_3) where each row r_i is an n-vector r_i=(a_i1, a_i2, ..., a_in, b_i).
Each row consists of some observation data. We want to predict future b_j given the future \vec{a}_j, given that we have seen {r_i}_{i=1...N} (The data set consists of N data rows r_i where both \vec{a}_i and b_i are known).
One simple model for b_j given \vec{a}_i = \vec{a}_i = (a_i1, a_i2, ..., a_in) is
a linear model with $n$ parameters m_1,m_2,...,m_n:
If the model is good then y_m(\vec{a}_i) approximates b_i well.
But startup wisdom dictates that we should measure everything!
Enter error term:
e_i(\vec{m}) = |y_m(\vec{a}_i) - b_i|²,
the squared absolute-value of the difference between the model's prediction and the actual output---hence the name error term.
Our goal is to make the sum $S$ of all the error terms as small as possible.
S(\vec{m}) = \sum_{i=1}^{i=M} e_i(\vec{m})
Note that the "total squared error" is a function of the model parameters \vec{m}.
At this point we have reached a level of complexity that becomes difficult to follow.
Linear algebra to the rescue!
We can express the "vector prediction" of the model y_m in "one shot" in terms
of the following matrix equation:
A\vec{m} = \vec{b}
where A is an N by n matrix (contains the a_:: part of the data)
\vec{m} is an n by 1 vector (model parameters---the unknown)
\vec{b} is an N by 1 vector (contains the b_: part of the data)
To find \vec{m}, we must solve this matrix equation,
however A is not a square matrix: A is a tall skinny matrix N >> n,
so there is no A^{-1}.
Okay so we don't have a A^{-1} to throw at the equation A\vec{m}=\vec{b} to cancel the A, but what else could we throw at it. Let's throw A^T at it!
A^T A \vec{m} = A^T \vec{b}
\ /
N (an n by n matrix)
N \vec{m} = A^T \vec{b}
Now the thing to observe is that if N is invertible,
then we can find \vec{m} using
\vec{m} = N^{-1} A^T \vec{b}
This solution to the problem is known as the "least squares fit" solution (i.e. choice of parameters for the model m_1, m_2, ..., m_n). This name comes from the fact that the vector \vec{m} is equal to the output of the following optimization problem
Technical detail: the matrix N=A^T*A is invertible if and only if the columns of A are linearly independent.
TL;DR. When you have to do a "linear regression" model of data matrix X and labels \vec{y},
the best (in the sense of least squared error) linear model is \vec{m} = (X^T X)^{-1} X^T \vec{y}.
The hardest part about teaching linear algebra is that nobody explains the big picture. I teach mathematics and computer science and regularly tutor linear algebra students, and I encounter students all the time who ask me "What are vectors good for? I thought all we cared about were matrices and doing RREF and stuff."
For this reason, I deemphasize computations and emphasize the connection between linear maps and matrices. It can be summed up as follows: if you fix a basis, every linear map is a matrix and every matrix is a linear map, and operations on functions (like composition, inversion, whatever) correspond to operations on the corresponding matrices.
It's definitely not an analogy or anything in "scare quotes" that would imply something different is going on behind the scenes. It's exactly the same thing.
Other questions usually left out of discussions about linear algebra (and these notes): what are orthogonal vectors good for? Why would we ever want a basis besides the standard basis in real Euclidean space? Is Euclidean space the only vector space out there? Do vectors have to be lists of numbers?