It's nice to see an(other) implementation of this paper. I looked through the references of the paper and didn't find any source links.
It seems like the original implementation still has a few undocumented tricks up its sleeve for improving accuracy that have yet to be figured out.
It's not at the level of being as fast as cuDNN nor integrated in major deep learning libraries but at least we can expect some competition in the coming years.
Xeon are server CPUs. So that means that whoever bothers buying Xeons with scientific computing in mind may as well go all the way and buy nVidia GPUs instead.
So instead of making that framework available to all Intel Haswell and newer families and try to persuade the customer from having to buy nVidia GPUs, they cut themselves short.
Of course, many Theano-based scripts you will find somewhere are probably only tested on a very specialized environment, and this might expect some Unix-like environment. But this is something which you cannot really solve, other than contribute to the script and fix it.
About the related Nvidia CUDA discussion: OpenCL support by Theano is in the works. Not sure how far it is.
The registration form is the reason it's not included in the dl-machine Amazon EC2 image for instance: https://github.com/deeplearningparis/dl-machine
I think if there was some level of semantic tagging and weighting of different aspects of an artist's technique it might do better - identify sky, water, buildings, faces, plants, etc (none of which is particularly beyond the capability of current image classifiers), then it might produce better results. I could easily imagine this turning into a 'Rembrandtize Me' selfie-filtering app.
Then it's really only a matter of time before the extension of these techniques moves to 'make my vacation photos look like they were taken by Ansel Adams', then 'show me Star Wars as if Alfred Hitchcock had directed it', or 'play me Smells Like Teen Spirit as if it had been sung by Elvis'. Neural Remixing.