It looks like significantly better training accuracy is achieved in a small fraction of the number of training steps previously required. ~14x speedup in training is a big deal; that speedup would enable training many more models (trying new research ideas) in a given period of time.