Seems really neat, if not a bit click-baity.
Paper: https://dl.acm.org/doi/pdf/10.1145/3613424.3614285
Code: https://github.com/escalab/SHMT
> [ blackscholes_2d, dct8x8_2d, dwt, hotsplot_2d, srad_2d, sobel_2d, npu_sobel_2d, minimum_2d, mean_2d, laplacian_2d, fft_2d, histogram_2d ]
src/kernels: https://github.com/jk78346/SHMT/tree/main/src/kernels :
> [ convolutionFFT2D, ]
What is the (platform) kernel context switching overhead tradeoff for what size workloads of the already implemented functions?
Seems really neat, if not a bit click-baity.
Paper: https://dl.acm.org/doi/pdf/10.1145/3613424.3614285
Code: https://github.com/escalab/SHMT