I always struggle with this. Even though I understand IO-bound and CPU-bound tas...

DrScientist · on Aug 10, 2022

Queues.

If you want to do currency programming and you want to pass data between different tasks, then thread safe queues are quite a simple effective way to orchestrate the demand complexity in my experience.

After all that's how parallel processing is managed in the Post Office.

uniqueuid · on Aug 10, 2022

To be honest, your best cheap shot for making pipelines fast is to get all your data into RAM and then run your models. Ingestion i/o has lots of surprising bottlenecks, from small file i/o to NFS to decoding e.g. of image/video frames.

If you can afford it, create a standardized representation for your data and keep it in memory as much as possible. If that's not feasible, write the parsed representation into uncompressed tar files and load these on batch start.