OTOH, it obviously matters a lot if you're constantly iterating and training multiple times a day or whatever.
It takes a week to train a standard AlexNet model on 1 GPU on ImageNet (and this is pretty far from state of the art).
It takes 4 GPUs 2 weeks to train a marginally-below state of the art image classifier on ImageNet (http://torch.ch/blog/2016/02/04/resnets.html) - the 101 layer deep residual network. This would be 20 weeks on an ensemble of CPUs. (State of the art is 152 layers; I don't have the numbers but I'd guess-timate 3-4 weeks to train on 4 GPUs).