Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You've mentioned twice now that GPUs can't do conditional execution; that's incorrect. GPUs have had branch instructions for quite a while now.

The limitation is that every thread on a core is executing the same instruction. If some of them take one branch and the rest take another, both branches have to be executed one after the other while masking out the threads it doesn't apply to. That reduces the performance you get, but it's certainly possible.

It's also worth pointing out that GPUs these days have a number of independent cores, each of which can execute different instructions simultaneously.

Hope that clears it up for you.



  > The limitation is that every thread on a core is
  > executing the same instruction. If some of them take
  > one branch and the rest take another, both branches
  > have to be executed one after the other while masking
  > out the threads it doesn't apply to.
In my dictionary that's far from conditional branching. And I point it out because even SIMD can have conditional branching (using the CPU's branch predictor.)

  > That reduces the performance you get
That's an understatement. It complicates the implementation significantly. Parallel-oriented (?) programming is quite hard by itself without all this.

In many non-trivial situations you need to check for reaching limits on loops or plain data structure bounds check.

Don't get me wrong, I think it's amazing to do GPGPU, what pisses me off is the overblown marketing and all the noise by people repeating that like canon when they clearly never implemented a single non-trivial program in OpenMP/CUDA/OpenCL/SIMD.

  > Hope that clears it up for you.
That phrasing can carry an implied personal attack, but let's not fall into personal attacks, shall we? (And I'm foreign and could be reading too much into it.)


"And I point it out because even SIMD can have conditional branching (using the CPU's branch predictor.)"

This doesn't sound correct to me, but perhaps I'm missing something? I can't see how the branch predictor is relevant here. Can you explain in more detail?

"That's an understatement. It complicates the implementation significantly. Parallel-oriented (?) programming is quite hard by itself without all this."

How does this complicate it? You don't have to implement it yourself. If you're coding in CUDA, OpenCL or any of the common shading languages, then you write if-statements just as you would in C and it does the right thing. Honestly, the only concern is the performance degradation you get.

"That phrasing can carry an implied personal attack"

There was none intended. I was hoping that would make it sound helpful but I guess it didn't work. Text can be a tricky medium. Sorry.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: