Hacker News new | comments | show | ask | jobs | submit login
Why are determinants defined the weird way they are? (askamathematician.com)
65 points by ColinWright 1214 days ago | hide | past | web | 32 comments | favorite

There are two good correct equivalent ways to think of the determinant of varying generality. One is as the function from a ring of square matrices to the underlying field (e.g. from R^(n^2) -> R) that sends identity to identity, is alternating (swapping two rows or columns negates the function) and is multilinear (is a linear function in each of the columns independently). These properties are all useful and important on their own, so there is motivation to study a function which has all of them. It's not obvious that such a function exists, but you can prove that. As it turns out, these three properties uniquely determine such a function, which makes it seem like that function might be really important!

There's a more general definition too, which is based around the wedge product, a quintessential object in algebra and calculus. There's a good exposition here: http://codeblank.com/~int/det.pdf .

Ehm, the question is rather strange: as a matter of fact, what is called a determinant is interesting because of its properties, which is what you may call the 'definition'. A bit of a circular argument but: the 'determinant' property is interesting, so it is given a name.

The 'property' can be one of many.

The most 'obvious' being that the only 'coherent' way to define 'area', 'volume', etc... so that it behaves 'linearly' [i.e. correctly for parallelepipeds] ends up being quite 'complicated' but quite useful at the same time and it receives a name.

As it happens, it turns out to be quite useful for theoretically solving linear equations as well (although NOT for real-world solving them).

So... It is not weird, it is what it is because it happens to be that way and then it receives a name.

There are much weirder things you can do with a matrix which do not have a specific name. Determinants have because they are useful.

Frequently when taught, people are shown how to compute a determinant, and shown some uses of a determinant, but never really figure out what it "is".

This says - "Here are some useful properties that we want something to have, let's derive what that thing is. Ah, this is what we call a determinant."

It's not that it's circular, or back-to-front, it's just trying to give additional context and motivation.

It might be even easier to understand without mentioning parallelepipeds. If you apply the matrix to each point of an arbitrarily shaped object, the resulting object's volume will be multiplied by the determinant of the matrix, possibly with a minus sign if orientation is reversed.

Determinants: (a) aren't necessecary for most of the traditional linear algebra curriculum (b) involve a lot of tedious computation for students. (c) do not generalize to the infinite-dimensional case. (d) are almost never used in computational linear algebra, which is how the vast majority of students will actually tackle these problems in their careers as scientists and engineers.

I'm a big fan of Axler's(1) approach of simply avoiding determinants entirely as much as possible, though his text isn't really suitable for a general course. An approach that combines elements of Axler's and factorizations that are actually used in computation (schur form, the SVD, ...) can be nice, and is more useful to non-math/physics majors than the standard curriculum.

(1) "linear algebra done right"

Determinants (a) are helpful if you want to invert matrices, (b) are easy to calculate for 2x2 and 3x3 matrices, (c) nicely generalise to the infinite-dimensional case were you take appropriate generalisations of sums and the totally antisymmetric tensor occurring in the definition of the determinant (d) are widely used in theoretical physics and hence should at least be included in first year undergrad classes on linear algebra in physics and maths.

Determinants are sometimes useful for symbolic inversion of matrices, but are not used for numerical inversion of non-trivial matrices (for reasons of both performance and stability).

It has been a long time since I studied functional analysis, and I was never a physicist of any sort, but my limited understanding is that there are two common generalizations for the infinite case, both of which handle the cases of most interest to physics but are not available for general operators, and which mostly agree up to a constant factor, or are at least capable of producing compatible results in some sense. I'm not sure I'd say they "nicely generalize", though I'd admit to being hasty in saying that they don't generalize.

And yes, I agree that they're widely used in physics (which is why I think they should be taught to math/physics majors).

Determinants are never actually used to invert matrices. In fact, I can't think of any computational math that even calculates an inverse matrix. It's both faster, lower storage (unless you invert in place?) and far more numerically stable to solve Ax=b by factoring A as LU/chol/maybe QR. Even if you had multiple b you would just save your factorization.

In which bit of math do people actually instantiate an inverse?

> In which bit of math do people actually instantiate an inverse?

I am assuming you are being rhetorical here. I also think your working hypothesis is that the only use of a matrix inverse is to solve a linear system of equations. Under this hypothesis it does make sense that there is no utility to computing inverses. However, this hypothesis is incorrect. There are tons of cases where you would want to get at the entries of an inverse, but has nothing to do with solving equations.

Here area few examples: consider data generated from a multivariate Gaussians. Now you want to know which components are independent of each other. This is given by whether the inverse of the correlation matrix is zero or not. Another example, say you want to minimize a function defined over matrices, but want to minimize it only over the set of positive semi-definite matrices. One way of doing these is to use a barrier function: a function that blows up astronomically when you approach the boundary of your constraint set. For positive definite matrices such a barrier function is the logarithm of the determinant. I could go on...

> In which bit of math do people actually instantiate an inverse?

In symbolic computations with very structured matrices, one sometimes constructs explicit inverses. This is an extreme corner case, and never really comes up except in some very simple physical models (in my experience).

I find that at a high Bachelor's or low Master's level in physics I always have to re-explain determinants. People learn the idea and they take away "it's really complicated and has something to do with whether the matrix is invertible."

An eigenvector of a matrix M is a vector v such that M v = k v -- in this direction, the matrix only scales the vector and does not rotate it to some other direction. Some matrices do not have a complete set of eigenvectors -- a rotation matrix is a good example! -- but then there is a generalized notion of eigenvectors which you can fall back on.[1] You can prove that you always have a complete set of eigenvectors for any square matrix, in the generalized sense.

The trace is the sum of the eigenvalues; the determinant is the product of the eigenvalues. This drives in part the parallelepiped interpretation that the article is giving; in fact if you have a full set of eigenvectors then any volume can be made up out of little boxes made of the eigenvectors, and so any hyper-volume must transform under the linear map by multiplying by the determinant. This in turn tells you why det(A B) = det(A) det(B) -- B must scale volumes by some amount, then A must scale volumes by some amount, and to compose those two scalings you must take their product.

A matrix is not invertible if it projects away some axis, so that information is lost. This means that one of the eigenvalues is 0. Take the product of the eigenvalues -- if it's 0, the matrix is not invertible! Done, simple.

It tells you that the determinant of a triangular matrix is the product of the diagonal elements (since those are the eigenvalues), and it also tells you that a determinant of a block-diagonal matrix is given by the product of the determinants of the blocks. This is important because the whole "why does the determinant flip sign when you interchange two columns?" question is now answered. To flip two columns, you multiply by a matrix which looks like:

    1 0 0 0 0
    0 1 0 0 0
    0 0 0 1 0
    0 0 1 0 0
    0 0 0 0 1
In other words, it's block diagonal with two blocks being identity matrices and one block being [0 1; 1 0]. That block has eigenvectors [1 1] with eigenvalue +1 and [1 -1] with eigenvalue -1, so it has determinant -1. The determinant of the two identity blocks are also 1, so the whole matrix determinant is therefore -1, and det(A B) = det(A) det(B) = - det(A), where B is the block-diagonal column-swapping matrix.

With a bit of effort you can figure out the antisymmetric form for actually computing the determinant without diagonalizing the matrix; but this comment is long enough as it is. The only thing which is much simpler about the antisymmetric symbolic form for the determinant is that you immediately see that the determinant is symmetric under transpose, which tells you something a bit surprising: that a square matrix has the same left-eigenvalues as right-eigenvalues (since an eigenvalue is det(A - λ I) = 0 and all of 0, det, and I are transpose-symmetric.

[1] https://en.wikipedia.org/wiki/Generalized_eigenvector

Some matrices do not have a complete set of eigenvectors -- a rotation matrix is a good example!

This is incorrect. A rotation matrix is orthogonal and therefore has a complete set of eigenvectors (all of which satisfy |z|=1). For example, the eigenvectors of:

     0 1
    -1 0
are z=+/-i and the eigenvectors are [1,i] and [1,-i].

The right example would be a matrix like this one:

    1 0 0
    1 1 0
    0 1 1

You're right, I should have just stuck with Jordan blocks, and I'll try to remember to do that next time I explain this to someone.

Thanks for this very lucid comment. Even people who've been working with linear algebra for years have trouble wrapping their heads around the fact that rotations in 3D have eigenvalues and eigenvectors that aren't real.

Wow, thinking of the determinant as the product of generalized eigenvalues is much more intuitive than the computational definition. Thanks!

I was taught all the useful properties of a determinant, but not what a determinant actually was. Now everything makes much better sense!

The determinant is just the thing that satisfies those properties. It's a label we apply to a function that follows certain laws. You might be curious where those laws come from, or why we want those laws to be satisfied, but that's a different question.

You're looking for a deeper meaning than exists. Mathematical objects are labels applied to entities that satisfy a certain set of properties.

Even worse, he never explained why it was weird, as per the title, nor even why its not really weird after all.

A better title and overall theme would be something like "A really good short summary of why determinants are useful and their applications".

You should reread the post. The author lays out a couple of interesting properties and derives the function that satisfies them. It turns out that this function is the usual determinant. Then he shows some examples of applying this new intuition.

Not getting it. What makes it weird?

Its like listening to people talk about "power" WRT computers unless they know ohms law, you'd think there's gun barrels in there or little politicians.

Weird is in the eye of the beholder. Let me put it this way, do you understand why matrix multiplication is defined the way it is? That looks weird too if you've never tried to derive it for yourself. That is, it's the lack of intuition and familiarity with the underlying concepts (vector algebra) that make it look weird. Also, the question is about a formula.

The determinant is a function from a matrix to a scalar that behaves in a nice way. The motivation for the particular properties chosen for "nice"-ness are complicated and out of the scope of the author's answer. But the idea is that you want some number that can represent the matrix in certain contexts (exactly the contexts defined by the listed properties). From there, you just go through the derivation laid out in the OP and you wind up with a computable function. That's where the determinant comes from, that's why it's "weird", etc.. It's a derived function. It just so happens that there are other emergent properties that make it rather useful for doing practical things with matrices.

> Its like listening to people talk about "power" WRT computers unless they know ohms law, you'd think there's gun barrels in there or little politicians.

I don't follow, the argument presented in the OP is semi-rigorous; words are chosen carefully and used with their usual mathematical definitions, not with the intuitive English definitions.

After reading down the list of other questions on the right, I think this is my new favorite website.

Is volume right? I thought if I double a 3d object it's volume increases 2^3?

That's if you double all of the dimensions. This is talking about doubling just one of the dimensions.


    " ... doubling the length of any of the sides doubles
    the volume.  This also means that the determinant is
    linear (in each column)."
Then if you double each of three sides then you've increased the volume by a factor of 2x2x2 = 2^3 = 8.

I misread that. Thanks!

And as a physicist reduced an abstract concept to geometry, a million mathematicians cried out in terror.

> a million mathematicians cried out in terror

I'm sure this is meant as joshing, but I don't really get it. I reduce abstract concepts to geometry all the time in my classes-- every math person does. For instance on the first day of Calc, I reduce the rate of change of a fcn to the slope of a tangent line.

It is true though that at least in the US many students see determinants for the first time in Calc III where they are used for Jacobians and are introduced as computational gadgets (many instructors would say that they did not have the extra day to describe them in another way, such as the linked-to article does). That's too bad, and I could definitely understand it leaving a bad taste.

The difference is calc 1 is not higher level math and the derivative of a real function is not an abstract concept. Professional mathematicians/grad students/high level undergrads don't think of the determinant via some weird geometric intuition, as that won't really provide enough information or rigor to do anything useful.

It seems to me that how a person takes this statement would depend largely on which stage of mathematical education they are primarily in: http://terrytao.wordpress.com/career-advice/there%E2%80%99s-...

Umm... what? Theoretical mathematicians are always trying to reduce complicated concepts to geometry, or to other intuitive and more easily understood phenomena.

I wonder how he would explain rhombuses with negative area.

In uni we defined determinants starting with systems of linear equations and showing that hey, all these formulas for solving m-by-n systems can be generalised using this one monstrous equation, that has a simple recursive definition. We'll call it a determinant.

> I wonder how he would explain rhombuses with negative area.

The rhombus is in the II or IV quandrant. Alternatively, the endpoint of the vector sum is of the form (-x, y) or (x, -y), where x and y are positive numbers.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact