For depth: If I have at least n^2 processors, I can send a row and column to eac...

dragontamer · on March 23, 2021

Hmmmm. Your O(n^2) step seems unnecessary to me. Therefore: the answer is O(n) depth for the naive case.

-----------

Processors are generally assumed to be self-numbered. Ex: processor #100 knows its processor#100.

    xloc = (threadIdx.x % matrix_width);
    yloc = (threadIdx.x / matrix_width);

    performMatrixCalc(xloc,yloc);

Therefore, only O(n) time depth apparently. O(log(n)) to broadcast matrix_width to all processors, which seems to be the only communication needed to organize the calculation.

sdenton4 · on March 23, 2021

Nice, thanks!