Hacker News new | past | comments | ask | show | jobs | submit login

For depth:

If I have at least n^2 processors, I can send a row and column to each processor, which can compute the inner product in linear time. So O(n^2) time to coordinate the work, and O(n) to actually do it.




Hmmmm. Your O(n^2) step seems unnecessary to me. Therefore: the answer is O(n) depth for the naive case.

-----------

Processors are generally assumed to be self-numbered. Ex: processor #100 knows its processor#100.

    xloc = (threadIdx.x % matrix_width);
    yloc = (threadIdx.x / matrix_width);

    performMatrixCalc(xloc,yloc);
Therefore, only O(n) time depth apparently. O(log(n)) to broadcast matrix_width to all processors, which seems to be the only communication needed to organize the calculation.


Nice, thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: