In theory it is exciting to use Evolutionary Algorithms instead of SGD but the authors only train populations of networks with under 100 neurons and about 1000 parameters, on toy problems. It would have been more interesting if they showed speedups on problems with millions of parameters.