GPUs are practically the "OS-free" compute device of choice these days. The way ...

GPUs are practically the "OS-free" compute device of choice these days.

The way GPUs work in most cases is through work-queues. The CPU submits work to GPUs (either OpenCL queues or CUDA Streams), and the GPU "plucks" work off of the queue to do. There are dozens of SMs (NVidia) or CUs (AMD GCN) or WGPs (AMD RDNA) that pluck the work off concurrently.

The kernels are then run to completion. Upon completion, the stream gets an event that triggers (which often times, automatically enqueues another task). If the CPU has work to do, it can get an async message (or maybe be waiting through blocking behavior).

--------

There are other models of course, not everyone runs the standard methodology. "Persistent kernels" start up early on and infinite-loop. You interact with those by passing data over PCIe and the kernel itself contains a queue/load-balancing logic to pickup the data somehow (rather than letting the device drivers / hardware logic do that)

------

The thing about GPU-level tasks is that you have so many little tasks (ex: shoot a ray, or render pixel (0,0), render pixel(1,0), etc. etc.) that just pulling tasks off of a queue is your most efficient methodology. Lots of small, roughly equal-sized tasks can be load balanced with simple methodologies.