
Ask HN: Parallelizing GPU (graphics) workloads? - gfxquestion
Working on an application that needs to render lots of short animations on-demand.<p>I&#x27;m wondering things like, how are parallel GPU workloads scheduled? Is it better to use 1 process and batch GPU calls? Looking for material that will help me better understand this topic.<p>I can provide more implementation details if necessary, thanks!<p>(currently using OpenGL, exporting each frame with `glReadPixels `)
======
Jasper_
The glReadPixels is going to be your primary issue, since it means the CPU
can't submit additional work since the glReadPixels will block on the GPU
finishing its. The "proper" way to do this is to use a Pixel Buffer Object and
a sync fence. That way, you can submit frames, have a thread dedicated to
checking the fence + readback, and keep both the CPU and GPU busy.

In terms of actual scheduling, it's kind of complicated, but for the most part
the APIs require that draws happen in-order. Don't really expect to get many
savings out of multi-threading your render loop in OpenGL, so that means we
move to more standard game optimizations. Memory bandwidth is likely to be
your biggest bottleneck, so make sure to do any buffer transfers at the start
of the frame or scene, and definitely don't touch them between draws. Do a
clear at the start of a frame (GPUs can do "fast clears" which lets the ROP
unit simply overwrite it instead of having to pull in the contents of the
previous frame). Use modern GL best practices: that means uniform buffer
objects, since the classic uniform setters will do transfers mid-frame. Use a
good profiler for your platform to determine whether any bottlenecks or
slowness is on the CPU in your code, on the CPU in the driver, or on the GPU.
If it's on the CPU, consider moving to a modern API like D3D12, Metal or
Vulkan, but expect that to take some work.

------
ArtWomb
This is an active topic for browser based machine learning. Try searching
around "Neural Nets in WebGL". Trick is to avoid CPU lookup via glReadPixels()
as that sync will be slow. Create one giant texture atlas. And leverage GLSL
shader calls for parallelism.

TensorFire

[https://tenso.rs/](https://tenso.rs/)

------
Const-me
> how are parallel GPU workloads scheduled?

On a typical Windows PC, pretty simple. OS just throws tasks at hardware, if
they take more than a couple of seconds to complete, the OS concludes the GPU
has hang and resets the driver.

Virtualization-targeted GPUs have real schedulers, but such systems are rare.

> exporting each frame with `glReadPixels`

I recommend researching how to stop doing that and switch to the hardware
video encoder (intel quick sync, nvidia nvenc, amd has one, too).
Unfortunately for you, integrating them with OpenGL can be very hard. Quite
easy for Direct3D BTW, because media foundation.

------
gavanwoolery
There probably is not an advantage to using multiple threads, unless you
really need to free up the main thread, but even then I would just batch the
calls sparsely enough that it does not hog the main thread. IIRC you cannot
use multiple threads with the same context in vanilla OpenGL but they did
become available with Vulkan.

