i have no idea what he's advocating for? gaussian elimination? matrix decomposition? first doesn't work for non-square and the second is still slower than gradient descent often (in particular in the case that you don't need the exact minimum [such a in data science]).