Thanks! Although I still have to actually graduate and the paper is in review, so maybe your congratulations are a bit premature! :)
> A long time ago, I worked on optimizing broadcast operations on GPUs [1].
Something similar happens in Futhark, actually. When something like `[1,2,3] + 4` is elaborated to `map (+) [1,2,3] (rep 4)`, the `rep` is eliminated by pushing the `4` into the `map`: `map (+4) [1,2,3]`. Futhark ultimately then compiles it to efficient CUDA/OpenCL/whatever.
Thanks! Although I still have to actually graduate and the paper is in review, so maybe your congratulations are a bit premature! :)
> A long time ago, I worked on optimizing broadcast operations on GPUs [1].
Something similar happens in Futhark, actually. When something like `[1,2,3] + 4` is elaborated to `map (+) [1,2,3] (rep 4)`, the `rep` is eliminated by pushing the `4` into the `map`: `map (+4) [1,2,3]`. Futhark ultimately then compiles it to efficient CUDA/OpenCL/whatever.