An important note about numpy broadcasting: numpy broadcasts from the back, so your life improves dramatically when you reference indices from the back as well: use axis references < 0. So if you want to reference a row: refer to axis=-1. This will ALWAYS refer to the row (first broadcasting dimension), whether you have a 1D vector or 2D matrix or any N-D array. Numpy is deeply unfriendly if you don't do this. To smooth out this an similar issues, there's the numpysane library. But simply using negative axis references goes a long way.
Interesting idea. This makes intuitive sense to me but I’ve never seen code written like this or attempted to write code like that. I’ll have to try it out next time I’m using numpy.
I find NumPy way too complex for the relatively simple operations used in machine learning. The amount of implicit rules like broadcasting, silently truncating int64 => double, einsum complexities etc. is just mind boggling.
The result is a couple of dense lines but one cannot just read them without going into a deep analysis for each line.
It is a pity that this has been accepted as the standard for machine learning. Worse, now every package has its own variant of NumPy (e.g. "import jax.numpy as jnp" in the article), which is incompatible with the standard one:
I really would like a simpler array library that does stricter type checking, supports saner type specifications for composite types, does not broadcast automatically (except perhaps for matrix * scalar) and does one operation at a time. Casting should be explicit as well.
Bonus points if it isn't tied and inextricably linked to Python.
I think numpy closely maps to how I think so it’s not as hard to read these dense lines as it would be to read expanded versions. I think my point of view is shared by a lot of leading researchers and this is why it is used more heavily.
The kinds of type safety you want might be good for other use cases but for ML research they get in the way too much.
Don’t know if the author will see this: in the table at the end of the article, there is an error where the text description of dot product and matrix multiplication are swapped.
Otherwise - great article! Didn’t know this exists in numpy. A really neat way to express matrix operations.