Hacker News new | past | comments | ask | show | jobs | submit login
Memory and ILP handling in 2D convolutions (riemani.ca)
40 points by pdziepak 61 days ago | hide | past | favorite | 6 comments



Years ago I started a collection of convolution optimization resources: https://github.com/mratsim/laser/wiki/Convolution-optimisati...

Also checked and apparently Nvidia Cutlass now supports generic convolutions: https://github.com/NVIDIA/cutlass


Interesting article, thanks, IMHO mostly for the low level performance analysis.

When it comes to actual computation of convolutions, the fast Fourier transform should at least be mentioned, even if in passing. Early in grad school I peaked at the source for R's density() function, and was blown away that it was using FFT, and that I had not picked up that trick in my math classes (or maybe I had just forgotten it...)

For a 2d example:

https://stackoverflow.com/questions/50453981/implement-2d-co...

And a recent HN thread that was very good:

https://news.ycombinator.com/item?id=40840396


As cool as this is, I can't help but think how pointless the goal itself is.

XDNA 2 will have 12 TFLOPs, roughly matching the 96 core Threadripper Pro 7995WX at a much lower price point.


These sort of computations generally just get fed bigger inputs as compute gets better.

Also, plenty of threadrippers exist out there already, if you get access to some cluster, it might have whatever type of chip in it. If I have access to a cluster with many 7995’s, I don’t really care too much about what’s available on the consumer side.


ILP is instruction-level parallelism, if you had a hard time remembering like me.


I was thinking of Integer Linear Programming when I saw the title. Just another example of why acronyms are bad.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: