Hacker News new | past | comments | ask | show | jobs | submit login
CS 179: GPU Programming (caltech.edu)
331 points by kercker on June 14, 2016 | hide | past | favorite | 31 comments

I'm taking a free course in CUDA programming on Udacity at the moment that's co-taught by a guy from NVIDIA Research and a professor from UC Davis. If you're looking for something that starts from the basics and is really easy to follow, I highly recommend it.


I'm working on a Ph.D. working in GPU architecture, and this course is the real deal. It goes beyond how to run things on a GPU to analyzing the runtime and work efficiency of algorithms suited to the GPU.

Wen-mei Hwu's lectures on "Advanced Algorithmic Techiques for GPUs" (first lecture slides: http://iccs.lbl.gov/assets/docs/2011-01-24/lecture1_computat...) are a gold mine of GPU programming techniques. I believe he has published several books on the topic too, and released a benchmark suite (Parboil http://impact.crhc.illinois.edu/parboil/parboil.aspx) optimized with these techinques.

He runs a Coursera MOOC too, how useful is that? I gave up early on because the homework was very simple.

The university course is also a joke, except for the project, which is "what you make of it".

I took the course at UIUC (ECE 408) 2 years ago. While the assignments weren't too challenging, I thought they were thorough in covering the material from class, and the material from class came straight from Professor Hwu's book.

Plus, the final exam was extremely harsh so I wouldn't have called it a joke.

Compared to the physics classes or that algorithm class, I put my brain on auto-pilot. Sure, the final was hard but mostly because I didn't have time to write code in a word document, and people who went to the official exam area actually got significantly more time (I took it a few years before you).

Many things were missing from that class, including how to improve performance by ensuring that warp receives the optimal data size, for example by using float4.

I could have learned the same stuff by looking at his MOOC - which is what OP got bored of doing.

It is very cool to see that the class is being taught by a group of juniors/seniors (checked the top two, first one was a senior and second one was a junior), and an appointed faculty is listed only as a supervisor ....

I am really interested in the class outcome, and would love to hear what the students in class feel about this arrangement ....

I can see the good things about this. It gives the instructor/TA students an opportunity to grow while giving the peer-learning atmosphere to students in class. Plus, the students in class will learn from their peers who has the latest working knowledge of CUDA fresh in their heads, and this arrangement also frees up a faculty (or two) from having to prepare the course so that they can do their faculty/research work (prepping and teaching a class, especially an interesting and engaging one, is a really draining experience on the part of the faculty as well.)

Only downside I can see could be managing the class well enough so that class time is efficiently utilized. But I believe this should be covered by the faculty who is in supervisor position ....

I took this class as a junior and then later co-taught this class as a senior undergrad in 2015.

The motivation behind the student taught class is that it allows for more classes to be taught than could happen otherwise.

As a student: Like any other class, the quality greatly depends on the work put in by the instructors. I think a student instructor is more likely to care about the quality of teaching, but also more likely to be overworked and not have enough time to dedicate to the course. I didn't think the course was particularly good when I took it due to lack of time from the instructors, but I'm glad the course was offered and that I took it as it got my feet wet with GPU programming.

After taking the course, I did an internship doing GPU programming. Doing this internship, I learned a ton and had a lot of ideas about how to improve the course. This put the idea of teaching the course in my head.

As an instructor: Myself and one other student designed the curriculum, gave the lectures, made the problem sets, did everything. We had a 3rd student who helped with grading. Teaching the course was hugely valuable to me, and also a ton of work. The course was hugely valuable because I learned a ton about GPU programming by teaching it and answering questions. As part of my motivation for teaching was to make the course more how I thought it should be, I didn't reuse many materials from the year before and spent many hours making lecture slides and problem sets. Towards the end of the course, I fell short on time and the lectures and problem sets weren't as good as they could have been. We made the class have a large final project of the student's choosing, and a few awesome things were made. Overall, I'm glad I taught the class, and I think I mostly accomplished what I wanted with improving the learning outcome for students.

Respect. Teaching a course is tough, especially for the first time, and even more so as an undergrad.

I took this class and it was taught by students. It was really good, but I believe our TAs were above average. Definitely agree about the mixed results of students teaching.

Do you have the course content uploaded somewhere? If not, do you mind uploading it and sharing the link here?

The lecture slides from 2016 are on the course website (which this HN post links to).

> It is very cool to see that the class is being taught by a group of juniors/seniors

I'm not sure how common such courses are, but where I went they were called "Student Directed Seminars" and allowed for some interesting courses that normally wouldn't get offered.

For someone that knows a thing about CUDA and parallel programming already, the best reference is Paulius Micikevicius’ presentations. If the words in it mean something to you, these 100+ slides explain more about the hardware and programming model than any other documentation you’ll find elsewhere.


If you want to really master CUDA, Nvidia GPUs and the various programming model tradeoffs, the best thing is to write a GEMM kernel and a sort kernel from scratch. To take it even further, write two of each: one that optimizes large GEMMs/sorts, and one that optimizes for batches of small GEMMs (or large GEMMs with tiny (<16 or <32) `k` or another dim) / batches of small sorts. Specialization for different problem configurations is often the name of the game.

For GEMM, you can work through the simple GEMM example in the CUDA documentation, then take a look at the Volkov GEMM from 2008, then the MAGMA GEMM, then the Junjie Lai / INRIA GEMM, then eventually the Scott Gray / Nervana SASS implementation, in increasing order of complexity and state-of-the-art-ness.

I took this class last year. Although it was nice to see undergraduates instructing the class, the lack of teaching experience really showed: the students were pretty rough around the edges in terms of their examples and explanations. About 2/3rds of classes ended early (at least this is better than heavily wasted time). This somewhat fits in with the unofficial caltech policy of "figuring out the finer details on your own".

That said, I thought the practical nature of the class was a refreshing switch from the heavily theoretical foundation of my other CS coursework experiences.

The lecture slides are very good.

For anybody following along, there's 2 other books, Wrox Professional Cuda programming, and Cuda for Engineers, which would ease entry for those who aren't versed in HPC (PDE solvers, BLAS/LAPACK, Fourier transforms etc). The Storti/Yurtoglu book is the best intro i've seen to the topic, the Wrox book covers a lot of the material in Wilt's Handbook, not as exhaustively, but more up to date (Kepler vs Fermi).


There's other course material online, UIUC, oxford (especially good, IMO)




Are there any course videos available anywhere?

Anyone know how this compares to the course(s) NVIDIA offers on Udacity?

Is there any resource as good, that targets a recent version of OpenCL?

Wish there were videos available for the same course ! Can someone suggest a good lecture series with videos, other than udacity.

Why CUDA not OpenCL I wonder?

NVIDIA does a great job at dragging their heels on OCL support and a heavy marketing push on CUDA. IMO they produce a much higher-quality product than AMD.

If I were NVIDIA I'd probably donate scores of servers+GPUs to schools like Caltech in order to inspire curriculum just like this.

In fact, that's what NVIDIA did at my alma mater, Grinnell College. I believe the intent was for courses like the OS course to be taught using CUDA (at least to some degree). I don't think that has panned out, but now a tiny liberal arts college has a ton of GPUs to use.

It'll be interesting to see how things develop with HIP (http://gpuopen.com/compute-product/hip-convert-cuda-to-porta...)


I've looked at CUDA and OpenCL for my own development. I'd love to use OpenCL on the principle of openness and avoiding lock-in to NVidia.

But as far as I can tell, OpenCL is nothing like what a single developer would want to use for a large scale gpgpu program - it seems to be a layer that emulates all the varying features of all the manufacturer's chips to avoid advantaging any single manufacturer. Tons of boilerplate and none of the detailed "how to do X with Cuda" examples and manuals that NVidia has produced. It's not just that it's kind of hard and unfeatured, it's that given that it will basically continue to suck for the purposes of most gpgpu development, one can expect it to die or be suddenly replaced by something better. Basically, "open standards" from consortium of manufacturers have only existed to allow other large companies to produce just a few application on top of. I think any developer wants something they can "just program" and Cuda is miles ahead on that and seems likely to stay that way.

When I was learning this stuff, CUDA had better documentation (perhaps this has changed recently... I haven't done anything with GPUs in awhile), so it was a bit easier to learn. But my guess is that the instructor for this course happens to be more familiar with CUDA so they chose to teach that. Harvard has a similar course that used to be taught with CUDA and is now taught with OpenCL.

CUDA is generally considered a better API and more optimal (likely because of NVIDIA's driver support). Google's RenderScript is a close competitor that actually looks a lot like CUDA (probably for those reasons) as well as Metal (though not as much).

On the other hand, for using it from a higher-level language (through various bindings) and writing my own kernels instead of just using Nvidia's libraries I find OpenCL much easier.

Another reason is that I can run that on my AMD GPU :)

In one of the hottest area now, that is, Deep Learning, all frameworks use CUDA, not OpenCL.

Nvidia dominate the market.

sponsorship perhaps.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact