For understanding the mechanics of matrix multiplication, I found it useful to think in an analogy consisting of a grid of two layers of pipes; One -- the input-pipes -- coming in one direction, and the other -- the output pipes -- laid in an orthogonal direction. Then there would be "taps" in the cross-sections between the input and output pipes, representing the numbers (multiplication factors, really) in the matrix. I illustrated this in this little drawing: http://imgur.com/gallery/gBs64
The point then is that the taps (again, representing the matrix values) determine how much of each item in the input vector, that should be mixed into each item in the output vector.
This analogy has the limitation that the taps are allowed to enhance the flow, not just limit it, like physical taps would. That is, outputting more than 100% of the input :P Also, while this way of illustrating it may make some sense for matrix * vector multiplication, matrix * matrix would probably become a prohibitively cluttered image.
I really like that. And matrix * matrix would just have to be 3D, with a whole cube of taps. Sure, not easy to illustrate, but the concept extends well. It also makes it more obvious what the complexity of the operation is.
Look at the animation in the article, after the second matrix has been rotated and put on top of the first one. Then flip its top up so the two matrices are orthogonal. Finally, rotate the whole thing 90 degrees to the left (rotating along the axis that goes from the top of the page to the bottom of the page). Now you can see that the result will be a 2x3 matrix. And the values in each spot will be the sum of the products beneath it.
That's pretty cool. For me, everything changed when I started thinking in terms of column spaces and linear transformations. Matrices are not just a series of multiplications and additions. They describe so much more.
The point then is that the taps (again, representing the matrix values) determine how much of each item in the input vector, that should be mixed into each item in the output vector.
This analogy has the limitation that the taps are allowed to enhance the flow, not just limit it, like physical taps would. That is, outputting more than 100% of the input :P Also, while this way of illustrating it may make some sense for matrix * vector multiplication, matrix * matrix would probably become a prohibitively cluttered image.