If you pass single operations between threads you will run slower, not faster because of the synchronization overhead.
Whenever I parallelize something I build batching in from the beginning. Most other people seem to think it is an optimization you can do later but the way I see if if your goal is to get any speed up at all batching is essential.
> This implementation
additionally can distribute pools of parallel processes over native
OS level threads, thus taking advantage of multicore architectures.
Locking overhead should be small, as data is normally not shared
among processes executing on different threads.
Yeah I agree but as the sibling said if you are not in a distributed setting and have an N:M threading model the overhead can be low enough it's often not such an issue. But otherwise I'm a little unclear whether concurrent logic programming supports e.g. asynchronous comms with pipelined requests. If not I guess to some extent you could do it at the application level?
For my current side project I am making collages out of images I took with my DSLR. I like to use expensive scaling algorithms because I’ll have to look at the result when I am done.
For this the natural ‘batch’ is one image and that justifies the overhead easily.
If you pass single operations between threads you will run slower, not faster because of the synchronization overhead.
Whenever I parallelize something I build batching in from the beginning. Most other people seem to think it is an optimization you can do later but the way I see if if your goal is to get any speed up at all batching is essential.