They are talking about total operations, but since big O notation ignores constant factors, and they explicitly ignore the addition steps (because for large n multiplication dominates) it is, in effect, only looking at the multiplications.
To answer the GP, the minimum amount of work necessary is to write a number for each element in the array, which scales by n^2 since there are n^2 elements. It is of course only possible to beat this if you can make assumptions like “the diagonal of my matrix is nonzero and everything else is zero”, but that’s not general matrix multiplication anymore, that’s a specialized sparse matrix routine. For a dense, “normal” matrix, O(n^2) is the absolute best we could hope for.
We're saying approximately the same thing, though I'd argue that your phrasing is slightly more confusing to people that aren't super familiar with Big O notation (and therefore don't exactly understand what is meant by "ignores constant factors" and similar).
Reads from the original two matrices and writes to the final matrix aren't at all the crux of the problem, so I don't think it makes sense to focus on either of those things in your explanation. The goal of the problem is to reduce the number of multiplications which are required to find the values that you then write (of which there will always be n^2 writes). There will also be 2n^2 reads, but likewise that isn't relevant because it's not what you're trying to optimize.
The relevant idea here is that the naive method of multiplying a matrix involves n multiplications for every row-column pair, for a total time complexity of n^3 (n rows, n cols, n multiplications). The goal here is to bring down that third number, from n down to something >= 1, with the holy grail obviously being 1 itself.
It's not at all obvious to me why the number of multiplications must be at least n². Can you prove that? (To be clear, n² multiplications is most certainly the lower bound, but I can't come up with a good argument as to why.)
This is why the trivial "you must read n² entries from each input matrix and write n² entries in the end" is helpful: It's an extremely obvious lower bound on the asymptotic complexity. Sure, it's not what we're trying to reduce (and getting < n² multiplications would be a ridiculous result), but anyone can see that it's true.
If matrix A has a single nonzero column (ai) in the first column and B has a single nonzero row (bj) in the first row, then A * B computes all pairwise products ai * bj.
To be honest I’m not sure whether that really proves it. Does computing a single multiplication require a multiplication?
By definition, a matrix multiplication takes rows and multiplies them by columns. For a pair of square matrices, where there are n rows in A and n columns in B, you (by definition) need to multiply them by each other at least once, otherwise it's not matrix multiplication, it's matrix... something else.
> By definition, a matrix multiplication takes rows and multiplies them by columns.
> you (by definition) need to multiply them by each other at least once
Why does multiplying a row with a column require at least one multiplication? "By definition" it takes n multiplications (!), but in the context of matrix multiplication we can do it with less - possibly with O(1) multiplications. And it seems neither you nor me can come up with a good argument why it can't take less, despite discussing it for a while.
Which is why "you need to write n² entries" is used as an argument for the O(n²) bound, even if it has nothing to do with the number of multiplications.
To answer the GP, the minimum amount of work necessary is to write a number for each element in the array, which scales by n^2 since there are n^2 elements. It is of course only possible to beat this if you can make assumptions like “the diagonal of my matrix is nonzero and everything else is zero”, but that’s not general matrix multiplication anymore, that’s a specialized sparse matrix routine. For a dense, “normal” matrix, O(n^2) is the absolute best we could hope for.