I think Tensorflow is developed in a very principled manner focusing on ability to add more ops and platforms in future. While this has hurt TF usability in the short term I am bullish on its future.
Don't be shy. Do share what that reason is.
> PyTorch LSTM network is faster because, by default, it uses cuRNN’s LSTM implementation which fuses layers, steps and point-wise operations. See blog-post on this here.
> Tensorflow’s RNNs (in r1.2), by default, do not use cuDNN’s RNN, and their ‘call’ function describes only one time-step of computation, hence a lot of optimization opportunities are lost. On the flip side, though, this gives user much more flexibility, provided that the user knows what he is doing.
There are 'fused' implementations in tf, but they aren't the default, and I haven't tried them out yet...
EDIT: My bad, I did not see at the end of the article that 1-bit SGD is not enabled on Keras yet, so the performance wins are coming from somewhere else. Neato.
MSFT employees talk about LSTM gains on the original submission: https://news.ycombinator.com/item?id=14473255
> TensorFlow shared the training script for Inception V3, and offered pre-trained models to download. However, it is difficult to retrain the model and achieve the same accuracy, because that requires additional understanding of details such as data pre-processing and augmentation. The best accuracy that were achieved by a third party (Keras in this case) is about 0.6% worse that what the original paper reported. Researchers in the CNTK team worked hard and were able to train a CNTK Inception V3 model with 5.972% top 5 error, even better than the original paper reported!
This suggests that an improvement in accuracy is possible by switching to CNTK, hence why I included an accuracy metric from both frameworks. (and also for sanity checking, as you note)
But it's worth noting that this code is all released:
It may be hard to replicate that across all platforms, though -- as an example, the distortions include using four different image resizing algorithms.
Some of it was true preprocessing, i.e., cleaning up the imagenet data. I wrote a bit about that here: https://da-data.blogspot.com/2016/02/cleaning-imagenet-datas...
(tl;dr - there are some invalid images and bboxes, etc., and some papers chose to deal with the "blacklisted" images differently.)
See for example the TF vs Torch implementations of Pix2Pix https://github.com/affinelayer/pix2pix-tensorflow (at the bottom of the page).
The reasons for this are many: different order of operations, failure to propagate random seeds, different order of processing files etc. I don't think there is a structured study of this so that would be a great thing to do!
I did to go through the rigmarole of building tensorflow for GPU training in GCE (building TF: installing nvidia drivers, cudnn, etc) and docker would have definitely been a boon! Or it would be nice of GCE had an image marketplace like AWS.