Hacker News new | past | comments | ask | show | jobs | submit login
Singe: Leveraging Warp Specialization for High Performance on GPUs [pdf] (stanford.edu)
36 points by eslaught on April 28, 2014 | hide | past | favorite | 4 comments



My first reaction: Some Nvidia cards can synchronize between warps? Nice!

I've been living in OpenCL world which is pretty much just everyone except Nvidia (because nvidia intentionally ignores OpenCL and cripples their support for it) so I have unfortunately missed this development.

On the other hand the particular use case in the article was to circumvent some other limitations of the architecture, such as relatively small register file and the shared instruction counter for a block of 32 threads. Clever and interesting nevertheless.


There are rumours that Nvidia will sooner rather than later support OpenCL 1.2 [0] - apparently CUDA 6 contains a stub library that has OpenCL 1.2 symbols and more.

[0] http://www.phoronix.com/scan.php?page=news_item&px=MTY2OTg


As a guess, this paper is appearing here not only because it's cool, but also the "Functional Programming Principles in Scala" course (by Martin Odersky) has just re-started on Coursera.

He mentioned GPU-related DSLs in his OSCON Java 2011 keynote (see : http://www.youtube.com/watch?v=3jg1AheF4n0 at ~14m20s) which was one of the listed 'Learning Resources'. However, the Stanford group he was involved in was doing 'Liszt' and this is 'Singe' (and his name isn't in the paper) - so I'm wondering if there isn't some kind of internal race going on...


It appears that these algorithms aren't a good candidate for a GPU. They require complicated consumer producers, and only run on the GPU in reduced mode which questions the scientific merit. One wonders why the authors couldn't have arbitrarily expanded or reduced their data set or perhaps done their algorithm in two passes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: