
CS 179: GPU Programming - kercker
http://courses.cms.caltech.edu/cs179/
======
ChristianGeek
I'm taking a free course in CUDA programming on Udacity at the moment that's
co-taught by a guy from NVIDIA Research and a professor from UC Davis. If
you're looking for something that starts from the basics and is really easy to
follow, I highly recommend it.

[https://www.udacity.com/course/intro-to-parallel-
programming...](https://www.udacity.com/course/intro-to-parallel-programming--
cs344)

~~~
jkloosterman
I'm working on a Ph.D. working in GPU architecture, and this course is the
real deal. It goes beyond how to run things on a GPU to analyzing the runtime
and work efficiency of algorithms suited to the GPU.

Wen-mei Hwu's lectures on "Advanced Algorithmic Techiques for GPUs" (first
lecture slides:
[http://iccs.lbl.gov/assets/docs/2011-01-24/lecture1_computat...](http://iccs.lbl.gov/assets/docs/2011-01-24/lecture1_computational_thinking_Berkeley_2011.pdf))
are a gold mine of GPU programming techniques. I believe he has published
several books on the topic too, and released a benchmark suite (Parboil
[http://impact.crhc.illinois.edu/parboil/parboil.aspx](http://impact.crhc.illinois.edu/parboil/parboil.aspx))
optimized with these techinques.

~~~
clw8
He runs a Coursera MOOC too, how useful is that? I gave up early on because
the homework was very simple.

~~~
frozenport
The university course is also a joke, except for the project, which is "what
you make of it".

~~~
oplav
I took the course at UIUC (ECE 408) 2 years ago. While the assignments weren't
too challenging, I thought they were thorough in covering the material from
class, and the material from class came straight from Professor Hwu's book.

Plus, the final exam was extremely harsh so I wouldn't have called it a joke.

~~~
frozenport
Compared to the physics classes or that algorithm class, I put my brain on
auto-pilot. Sure, the final was hard but mostly because I didn't have time to
write code in a word document, and people who went to the official exam area
actually got significantly more time (I took it a few years before you).

Many things were missing from that class, including how to improve performance
by ensuring that warp receives the optimal data size, for example by using
float4.

I could have learned the same stuff by looking at his MOOC - which is what OP
got bored of doing.

------
anocendi
It is very cool to see that the class is being taught by a group of
juniors/seniors (checked the top two, first one was a senior and second one
was a junior), and an appointed faculty is listed only as a supervisor ....

I am really interested in the class outcome, and would love to hear what the
students in class feel about this arrangement ....

I can see the good things about this. It gives the instructor/TA students an
opportunity to grow while giving the peer-learning atmosphere to students in
class. Plus, the students in class will learn from their peers who has the
latest working knowledge of CUDA fresh in their heads, and this arrangement
also frees up a faculty (or two) from having to prepare the course so that
they can do their faculty/research work (prepping and teaching a class,
especially an interesting and engaging one, is a really draining experience on
the part of the faculty as well.)

Only downside I can see could be managing the class well enough so that class
time is efficiently utilized. But I believe this should be covered by the
faculty who is in supervisor position ....

~~~
lightcatcher
I took this class as a junior and then later co-taught this class as a senior
undergrad in 2015.

The motivation behind the student taught class is that it allows for more
classes to be taught than could happen otherwise.

As a student: Like any other class, the quality greatly depends on the work
put in by the instructors. I think a student instructor is more likely to care
about the quality of teaching, but also more likely to be overworked and not
have enough time to dedicate to the course. I didn't think the course was
particularly good when I took it due to lack of time from the instructors, but
I'm glad the course was offered and that I took it as it got my feet wet with
GPU programming.

After taking the course, I did an internship doing GPU programming. Doing this
internship, I learned a ton and had a lot of ideas about how to improve the
course. This put the idea of teaching the course in my head.

As an instructor: Myself and one other student designed the curriculum, gave
the lectures, made the problem sets, did everything. We had a 3rd student who
helped with grading. Teaching the course was hugely valuable to me, and also a
ton of work. The course was hugely valuable because I learned a ton about GPU
programming by teaching it and answering questions. As part of my motivation
for teaching was to make the course more how I thought it should be, I didn't
reuse many materials from the year before and spent many hours making lecture
slides and problem sets. Towards the end of the course, I fell short on time
and the lectures and problem sets weren't as good as they could have been. We
made the class have a large final project of the student's choosing, and a few
awesome things were made. Overall, I'm glad I taught the class, and I think I
mostly accomplished what I wanted with improving the learning outcome for
students.

~~~
sjs7007
Do you have the course content uploaded somewhere? If not, do you mind
uploading it and sharing the link here?

~~~
lightcatcher
The lecture slides from 2016 are on the course website (which this HN post
links to).

------
jhj
For someone that knows a thing about CUDA and parallel programming already,
the best reference is Paulius Micikevicius’ presentations. If the words in it
mean something to you, these 100+ slides explain more about the hardware and
programming model than any other documentation you’ll find elsewhere.

[http://on-demand.gputechconf.com/gtc/2013/presentations/S346...](http://on-
demand.gputechconf.com/gtc/2013/presentations/S3466-Programming-Guidelines-
GPU-Architecture.pdf)

If you want to really master CUDA, Nvidia GPUs and the various programming
model tradeoffs, the best thing is to write a GEMM kernel and a sort kernel
from scratch. To take it even further, write two of each: one that optimizes
large GEMMs/sorts, and one that optimizes for batches of small GEMMs (or large
GEMMs with tiny (<16 or <32) `k` or another dim) / batches of small sorts.
Specialization for different problem configurations is often the name of the
game.

For GEMM, you can work through the simple GEMM example in the CUDA
documentation, then take a look at the Volkov GEMM from 2008, then the MAGMA
GEMM, then the Junjie Lai / INRIA GEMM, then eventually the Scott Gray /
Nervana SASS implementation, in increasing order of complexity and state-of-
the-art-ness.

------
rmonroe
I took this class last year. Although it was nice to see undergraduates
instructing the class, the lack of teaching experience really showed: the
students were pretty rough around the edges in terms of their examples and
explanations. About 2/3rds of classes ended early (at least this is better
than heavily wasted time). This somewhat fits in with the unofficial caltech
policy of "figuring out the finer details on your own".

That said, I thought the practical nature of the class was a refreshing switch
from the heavily theoretical foundation of my other CS coursework experiences.

------
gaius
Why CUDA not OpenCL I wonder?

~~~
wyldfire
NVIDIA does a great job at dragging their heels on OCL support and a heavy
marketing push on CUDA. IMO they produce a much higher-quality product than
AMD.

If I were NVIDIA I'd probably donate scores of servers+GPUs to schools like
Caltech in order to inspire curriculum just like this.

~~~
cbgb
In fact, that's what NVIDIA did at my alma mater, Grinnell College. I believe
the intent was for courses like the OS course to be taught using CUDA (at
least to some degree). I don't think that has panned out, but now a tiny
liberal arts college has a ton of GPUs to use.

------
gtani
The lecture slides are very good.

For anybody following along, there's 2 other books, Wrox Professional Cuda
programming, and Cuda for Engineers, which would ease entry for those who
aren't versed in HPC (PDE solvers, BLAS/LAPACK, Fourier transforms etc). The
Storti/Yurtoglu book is the best intro i've seen to the topic, the Wrox book
covers a lot of the material in Wilt's Handbook, not as exhaustively, but more
up to date (Kepler vs Fermi).

________________________

There's other course material online, UIUC, oxford (especially good, IMO)

[http://people.maths.ox.ac.uk/gilesm/cuda/](http://people.maths.ox.ac.uk/gilesm/cuda/)

[http://cseweb.ucsd.edu/classes/fa15/cse260-a/lectures.html](http://cseweb.ucsd.edu/classes/fa15/cse260-a/lectures.html)

[https://www.coursera.org/course/hetero](https://www.coursera.org/course/hetero)

------
Negative1
Are there any course videos available anywhere?

------
bathory
Is there any resource as good, that targets a recent version of OpenCL?

------
joosebox
Anyone know how this compares to the course(s) NVIDIA offers on Udacity?

------
hubatrix
Wish there were videos available for the same course ! Can someone suggest a
good lecture series with videos, other than udacity.

