I'm really excited to see folks starting to talk about parallelizing machine learning. The conversation has been dominated by GPU-friendly techniques - a classic example of "everything looks like a nail when you have a hammer".
I hope we start seeing more massively parallel training strategies (most likely with GPUs under the hood still)
This is a strange comment to see. The techniques in TFA have been at the heart of a the last few years of progress in large-scale language models, image generation, etc?
Dude, parallelization is exactly how non-trivial neural networks are trained since the beginning.
Parallel processing as in SIMD architecture is what makes Deep Learning possible now.
Parallel computing as in stacking graphics card is how literally everything is done everywhere- from big tech companies to not-so-well-funded data science departments in unknown unis.
I hope we start seeing more massively parallel training strategies (most likely with GPUs under the hood still)