Hacker News new | past | comments | ask | show | jobs | submit login

They implemented a kernel driver that processes can submit work items to (multiply this vector, summing reduce et ). It balances the computation across available hardware.

Seems really neat, if not a bit click-baity.

Paper: https://dl.acm.org/doi/pdf/10.1145/3613424.3614285

Code: https://github.com/escalab/SHMT




The work would have to be large enough to be worth the cost of kernel context switch IIUC.


src/Python/kernels/ground_truth_functions.py: https://github.com/jk78346/SHMT/blob/main/src/Python/kernels... :

> [ blackscholes_2d, dct8x8_2d, dwt, hotsplot_2d, srad_2d, sobel_2d, npu_sobel_2d, minimum_2d, mean_2d, laplacian_2d, fft_2d, histogram_2d ]

src/kernels: https://github.com/jk78346/SHMT/tree/main/src/kernels :

> [ convolutionFFT2D, ]

What is the (platform) kernel context switching overhead tradeoff for what size workloads of the already implemented functions?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: