I suppose I should also say that I never "learned" matrix calculus either, in the sense that I internalized the various features unique to matrices under derivatives and integrals. The calculations I refer to above are crude, naive ones in the scalar notation under whatever coordinate system seems appropriate.
However, thanks to Minka's notes and the Matrix Cookbook, I was able to eventually get a handle on easy techniques for these derivations! It's certainly no substitute for getting a handle on the theory first by studying a textbook, but these pattern-matching shorthands are important practical techniques.
Here’s a great resource if you’re starting out today: http://datasciencemasters.org
The notation used here is commonly used in statistics and engineering, while the tensor index notation is preferred in physics.
Two competing notational conventions split the field of matrix calculus into two separate groups. The two groups can be distinguished by whether they write the derivative of a scalar with respect to a vector as a column vector or a row vector. Both of these conventions are possible even when the common assumption is made that vectors should be treated as column vectors when combined with matrices (rather than row vectors). A single convention can be somewhat standard throughout a single field that commonly uses matrix calculus (e.g. econometrics, statistics, estimation theory and machine learning). However, even within a given field different authors can be found using competing conventions. Authors of both groups often write as though their specific convention is standard.
Seriously? So if I want to read a paper that uses Matrix Calculus, it's not enough to just understand Matrix Calculus in general.. no, first I have to decipher which of a legion of possible notations the author used, and then keep that state in mind when thinking about that paper in relation to another, which might use yet another notation.
I understand that ultimately nobody is an position to mandate the adoption of a universal standard, but part of me wishes there were (this is, of course, not a problem that is limited to Matrix Calculus).
sign conventions in thermodynamics
conventions in Fourier transforms
short billion vs long billion
calorie vs Calorie
English vs. metric units
SI vs cgs metric
esu vs Gaussian units in EM
Unfortunately, several other notations are commonly used, as summarized in the following table. The notation Aᵀ is used in this work.
Aᵀ — This work; Golub and Van Loan (1996), Strang (1988)
Ã — Arfken (1985, p. 201), Griffiths (1987, p. 223)
A' — Ayres (1962, p. 11), Courant and Hilbert (1989, p. 9)
What the characteristic polynomial lets you do is to calculate it faster and more accurately because very large powers can be rewritten in terms of smaller powers that you already computed.
1) It's designed for describe graphical representation and not semantics.
2) parsing it properly is very difficult, especially considering how macros and other definitions can influence the syntax.
1) assume a standard interpretation of symbols (e.g. this site uses a-g for scalars, h-z for vectors and A-Z for matrices) and make those assumptions explicit and modifiable (e.g. WolframAlpha displays "assuming X is a Y, use as a Z instead").
2) Support only the subset that MathJax can handle, which seems to be enough for most purposes.
I definitely agree that LaTeX is not the optimal input format for that purpose, though.
If you want to start with LaTeX then the parser from this library might be useful: https://github.com/Khan/KAS
Methods are highly heuristic, and some languages
(all expressions with,say, \pi, and exp function, or something similar) are in general undecidable.
from sympy.abc import x, y, alpha, s
quad = s ** 2 - alpha * s - 2
# Let s1 and s2 be the two solutions to the quadratic equation 'quad == 0'
s1, s2 = sympy.solve(quad, s)
u = (x - s2) / (x - s1) * (y - s1) / (y - s2)
f1 = (s2 - s1 * u) / (1 - u)
f2 = (x * y - alpha * x - 2) / (y - x)
# Claim: f1 is equal to f2
# Prints "True"
>>> import sympy as sp
>>> x = sp.Symbol('x')
>>> sp.simplify(sp.Implies(sp.Eq(x**2 + 2*x + 1, 0), sp.Eq(x, -1)))
Eq(x, -1) | Ne(x**2 + 2*x + 1, 0)
>>> sp.solve(sp.simplify(sp.Implies(sp.Eq(x**2 + 2*x + 1, 0), sp.Eq(x, -1))))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/.local/lib/python3.5/site-packages/sympy/solvers/solvers.py", line 1065, in solve
solution = _solve(f, *symbols, **flags)
File "/home/user/.local/lib/python3.5/site-packages/sympy/solvers/solvers.py", line 1401, in _solve
f_num, sol = solve_linear(f, symbols=symbols)
File "/home/user/.local/lib/python3.5/site-packages/sympy/solvers/solvers.py", line 1971, in solve_linear
eq = lhs - rhs
TypeError: unsupported operand type(s) for -: 'Or' and 'int'
>>> sp.solveset(sp.simplify(sp.Implies(sp.Eq(x**2 + 2*x + 1, 0), sp.Eq(x, -1))))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/.local/lib/python3.5/site-packages/sympy/solvers/solveset.py", line 880, in solveset
raise ValueError("%s is not a valid SymPy expression" % (f))
ValueError: Eq(x, -1) | Ne(x**2 + 2*x + 1, 0) is not a valid SymPy expression
From this I learnt the hard way that tokenisation and representation (in LATEX or MathML) do not belong in the same place as a CAS (Computer-assisted Algebra System).
But maybe your example can convince me otherwise. Could you show the specific LaTeX code in question and describe how your program mishandled it?
I’m not a programmer. I erred in trying to do a programmer’s job. Also, I discovered why Nicholas Nassim Taleb has the reputation for being rather uncharitable (but deep down I feel like I deserved it, because hey, I stated a mathematical untruth).
Internally we check it numerically, i.e., generate some random data for the given variables and check the derivative by comparing it to an approximation via finite differences. We will ship this code (hopefully soon) with one of the next versions. You can then check it yourself. Otherwise, as far as I know there does not exist any other tool that computes matrix derivatives so I understand it is hard to convince anyone of the correctness of the results. But I hope the numerical tests will be helpful.
Implies[x^2 + 2 x + 1 == 0, x == -1] // FullSimplify
[In] FullSimplify[expr1 == expr2]
The way I dealt with it is to first translate vectorized expression to so-called Einstein notation  - indexed expression with implicit sums over repeated indices. E.g. matrix product `Z = X * Y` may be written in it as:
Z[i,j] = X[i,k] * Y[k,j] # implicitly sum over k
Unfortunately, the only way to calculate such expressions efficiently is to convert them back to vectorized notation, and it's not always possible (e.g. because of sparse structure) and very error-prone.
The good news is that if the result of the whole expression is a scalar, all the derivatives will have the same number of dimensions as corresponding inputs. E.g. in:
y = sum(W * X + b)
Theoretical description of the method for the first library can be found in  (page 1338-1343, caution - 76M) while the set of rule I've derived is in .