This is false. You can prototype a network on an M1 [1] and teacher-student mode...

This is false. You can prototype a network on an M1 [1] and teacher-student models are a de facto standard for scaling down.

You can trivially run transfer-learning on an M1 to prototype and see if a particular backbone fits well to a small dataset, then kickoff training on some cloud instance with the larger dataset for a few days.

[1] https://blog.tensorflow.org/2020/11/accelerating-tensorflow-...