The Intel AI Lab has an introduction to NLP (https://ai.intel.com/deep-learning-foundations-to-enable-nat...) and optimized Tensorflow (https://ai.intel.com/tensorflow/)
One surprising research result for this NLP is that a simple convolutional architecture outperforms canonical recurrent networks, often. See: CMU lab, Sequence Modeling Benchmarks and Temporal Convolutional Networks (TCN) https://github.com/locuslab/TCN
If you're interested in Nervana, here are some specifics: the chip is for hardware neural network acceleration, for inference-based workloads. Notable features include fixed-point math, Ice Lake cores, 10-nanometer fabs, on-chip memory management by software directly, and hardware-optimized inter-chip parallelism.
I've worked for Intel, and I'm stoked to see the AI NLP progress.
I imagine models need to be deployed more often than built, but I thought that the pain point was usually the latter.