You have two matrices, with allocated memory A and B, that you want to multiply and store to allocated memory C, as efficiently as possible using 8 cores. How do you do this without threads exactly?
I wanted an efficient no threads solution to a very common problem of matrix multiplication using multiprocessor CPUs. You gave me nonsense answers like vector operations (uses only one CPU at a time) and GPUs (memory transfer bottleneck, need to copy arrays, significantly less RAM compared to CPU RAM, and GPUs make it a completely different game). So no, I'm not moving goalposts.
And will spend more time shuffling data through their horribly bottle-necked memory interfaces than the CPU would take handling the matrix operation, especially if the GPU is already used for graphics.
Multi-core systems and tools that can use them are valuable and very much reality right now.