I was also told that doing it the real way using Tensorflow would be the way to go and I agree with that sentiment if my problem was Google scale which it wasn't. In fact I would argue that most workloads around the world are not Google scale and neither are most Google workloads.
This attitude of "real deep learning engineers use Tensorflow" is an unhelpful way of saying "I agree that the API is unreadable but I've invested so much time in the ecosystem that I'll refuse to see its usability problems". Kind of reminds me of assembly programmers that thought C wasn't for l33t 10xx pwner programmers.
The problem with TensorFlow is mainly that you, as a user, have to build a data-dependency graph. This is something a C compiler can do very well, but Python is not so suitable for that.
So, in my view, TensorFlow chose the wrong substrate for their "more efficient" library. Instead, they should have developed their own language, where the whole data-flow graph determination could be implicit, and not a concern for the programmer.
However, computing a data-flow graph as-you-go (by the library, not the user), like (I think) is done in some libraries, is quite a good approach, since the overhead is quite small (percentage-wise) compared to the large tensor operations that can be performed in highly optimized code.
You just described Swift for TensorFlow.
However, I'd like to see some numbers on how more efficient it is to build a graph in advance, given that the lion's share of the computations will be in tensor math anyway (which can be heavily optimized, and is independent of the graph).
This one? https://www.tensorflow.org/api_docs/swift/
Or, rather, it is hard but the difficulty is from getting an intuition for what part of this weird multi layer net is producing this weird behavior and is it an artefact or something interesting, and is the connectivity complete and is should I change the learning rate and activation functions?
The real reason to use Tensorflow is the same reason you might use a Go framework instead of Rails: in your heart you have this hope that this thing will one day grow into a really large project and support lots of people and that will be easier with this scalable, optimized code.
Its not even that you'll hit Google scale, its that you'll hit popular scale and still serve the whole thing out of your Digital Ocean droplet.
Are you saying that model inference is slower or less efficient for a model built and trained in Keras, than the same model architecture built directly in tensorflow?
I do think that pure TF would be easier to scale up over multiple servers etc. but that's only because I don't know how it would work in Keras. Maybe its easy.
Do you think I'll be able to use DL?
Use the right tool for the job. Keras can get you to a working model faster. However, I am not sure what the current situation is, but in the past it was not possible to dump and freeze Keras' Tensorflow graphs. This can be a problem if you want to embed a model in a non-Python application.
This attitude of "real deep learning engineers use Tensorflow"
Real engineers use whatever they need to use. But I think that you are overstating the difficulty of Tensorflow. Over the last 6 months, we have hired a couple of students for a research project. Since we standardized on Tensorflow, they had to implement new models in Tensorflow. All of them were up to speed in Tensorflow pretty quickly (they mostly do RNNs and seq2seq learning).
You can get a direct reference to the graphs if you want, that will let you do anything tensorflow lets you do. I think this is what you want:
# This assumes your model is ready to be called with .predict()
sess = keras.get_session()
graph = sess.graph
graph_dev = graph.as_graph_def()
frozen_graph = tf.graph_util.convert_variables_to_constants(
sess, graph_def, nodes_to_output)
encoded_frozen_graph = frozen_graph.SerializeToString()
Not to mention you can more easily use channels-first data, quantize to FP16/INT8 more easily, and export to ONNX for use w/ Tensor-RT and/or Intel Nervana.
This was never true.
There was no obvious Keras API for this, but you could build a model with the Keras API, then use the TF API to save it. The inference API would be the TF API (i.e. you'd need to find the names of all your input and output tensors and use those with Session.run).
Except that this was true. I do not remember the exact details, because this was the end of 2015 or beginning of 2016. But dumping/freezing definitely failed on some graphs constructed with Keras.
There was no obvious Keras API for this, but you could build a model with the Keras API, then use the TF API to save it.
That was easy to figure out. Read the backend implementation and you can see how you can get the graph definition, etc.
It's funny because this the same attitude C/C++ programmers have towards developers using other languages now...
For all, Keras/PyTorch/Tensorflow, you'll need to learn the API - but if you have any ML background, that should be straight forward.
Though, debugging matters. In TF it is easy to get errors and spend a lot of time searching for them. In PyTorch it is straightforward. It matters the most when the network, or cost function, is not standard (think: YOLO architecture).
E.g. when I wanted to write some differentiable decision tree it took me way longer in TF (I already knew) than with PyTorch, having its tutorial on another pane.
See e.g.  and  linked below.
For model ensembling, it's even easier. After training, in Keras you could simply load your multiple models and create a new Model() object that does nothing but use a merge layer (with mode set to averaging) to average across multiple input models, even if the models share layers or have other crazy constraints. Writing that final ensemble is extremely easy in Keras.
In my experience researching and productionizing very deep Keras models for an image processing use case that has moderately tight performance constraints, Keras has proved to scale extremely well and the code remains dead simple the whole time.
: < https://blog.keras.io/keras-as-a-simplified-interface-to-ten... >
: < https://www.tensorflow.org/programmers_guide/estimators#crea... >
What do you use to orchestrate distributed training in Keras?
This is a bit of a mistaken question, because you would not "convert" a DataGenerator into an estimator input. Instead, you can just wrap the DataGenerator in a simple function that lazily outputs the next batch of training examples. Input functions for Estimators are just functions that accept no arguments and produce a 2-tuple, with first component of a dictionary of named inputs and second component of the target value. You can write your own wrapper functions that consumes from a DataGenerator and normalized the output to the format. I'm sure there will be a helper function to do this automatically in the future, but it's about as easy as can be to just wrap with a function anyway.
> "add class weights, custom loss functions"
This too seems mistaken, because this is part of the compiled Keras model, before ever converting anything to TensorFlow Estimator. You can use whatever you want for this and the Keras Model.compile function accepts dictionaries for loss and loss_weights, as well as custom add_loss usage in your own layers (even pass through layers that don't affect the computation graph).
> "plug-in various Keras-based callbacks as well"
This is admittedly slightly harder, but I think it's a little bit of an unfair question because Keras offers far more functionality in its Callbacks than TensorFlow offers with predefined hooks. "Penalizing" Keras because TensorFlow offers less functionality doesn't seem right.
Either way, this is also not too hard. For any Callback you want to use from Keras, you basically just write a tiny wrapper class that subclasses from session_run_hook.SessionRunHook from tensorflow, and then maps the TensorFlow naming conventions, like "begin" or "before_run" etc., to wrap the equivalent method from the Keras callback, like "on_train_begin", or "on_epoch_end".
The bigger point is that this headache is because of TensorFlow. Both because TF chose a really silly class design for the SessionRunHooks thing, making automatic conversion from Keras (which has the more established set of pre-existing callbacks) harder for no good reason, and also because TensorFlow lacks functionality that Keras gives you for free.
For orchestration, my team just uses a simple GPU cluster where the native device placement primitives with TensorFlow allow us to scale to as many GPUs as we've needed (max in the dozens).
For distributing and orchestrating over larger clusters, Keras provides some good alternatives right on its own FAQ page:
< https://keras.io/why-use-keras/#keras-has-strong-multi-gpu-s... >
In the end, I would not claim you can immediately translate every complex feature of Keras, like deep custom callbacks or something, over to TensorFlow ... but that's usually not a big deal. Most times, you just want to port a fairly standard model to the Estimator API, and for this, it "just works" directly and is easy to use for local, small-ish clusters of GPUs.
When you have a much rarer problem that needs a huge GPU cluster, then use the other suggests like dist-keras or Horovod, or write your own simple map-reduce-ish wrapper to put data on different nodes and deploy e.g. a containerized training application.
Also people need to definitely keep in mind that most of the limitations are TensorFlow's own fault for not designing things to be compatible with heavily used Keras features like Callbacks out of the box. TensorFlow has a history of doing this, and has been very developer-unfriendly in this way even when it has no downside or impact on performance or anything. The core TensorFlow designs suffer from an unfortunate "not invented here" kind of philosophy, even when dealing with Keras.
You probably know those simple generators aren't recommended to be used by Keras, instead keras.utils.Sequence is preferred due to (Keras doc): "Sequence are a safer way to do multiprocessing. This structure guarantees that the network will only train once on each sample per epoch which is not the case with generators."
I couldn't see any equivalent of this for estimators, sadly, and wrapping it up in a naive generator seemed like a functionality downgrade.
> because this is part of the compiled Keras model, before ever converting anything to TensorFlow Estimator
Right, you specify a loss function before compiling, however if it is a custom one and you for some reason need to reload a model snapshot (i.e. resuming training), you need to provide it separately or the loading fails. I haven't found any docs on this. Imagine your training optimizer automatically generating loss functions by means of function composition, e.g. you put a mix of +-*/,log,exp,tanh etc. based off some past training experience of what helped in individual cases/literature, then taking 1000s of these loss functions and pushing them to a large cluster where they are scored on how well did they perform, keeping only the best performing ones.
Class weights are specified in fit_generator(), not in compile time; again, here I couldn't find any description on how to convert Keras' weight dictionary to what TensorFlow needs.
> Callbacks... "Penalizing" Keras because TensorFlow offers less functionality doesn't seem right.
The thing here is that some of those callbacks are mandatory for a training to converge, e.g. decreasing learning rate, escaping plateau situations, computing various stats that aren't provided by Keras (outside loss/accuracy; you might want F1, Fleiss/Cohen's Kappa, Matthews correlation coefficient, AUC ROC etc.) that might be decisive for keeping/discarding a model; then also multi GPU callbacks; some people even use callbacks to perform the whole distributed computation as well. In my examples, if I remove any of those callbacks, my models won't achieve any kind of usability but with those callbacks I match world-class results. I couldn't find any non-insane way to map them to TensorFlow prior to our conversation.
As I mentioned, I have a very large cluster, each node with multiple GPUs, so I need an orchestration on both hyperparameters/loss functions per node as well as within each node to run on multiple GPUs.
The page from Keras you mentioned was precisely my starting point and from those tf.Estimator seemed the last devops-intense way to go (Horovod needs MPI and CERNDB/Keras Spark).
I'll take a deeper look into SessionRunHook you mentioned - thanks! ;-)
> "computing various stats that aren't provided by Keras.."
It seems like you have this backward. Keras provides the easy interface to create the custom callbacks. That's why you can create extra convergence metrics, etc., that are far harder to use if implementing in pure TensorFlow. The part where TensorFlow is specifically lacking functionality is in its ability to handle these callbacks (both pre-built in Keras or user-defined). I've had good success with the solution I mentioned with SessionRunHooks, but still, it is a terrible design choice by the TensorFlow people to create this in a way that is not directly compatible with all the work Keras had done.
> "from those tf.Estimator seemed the last devops-intense way to go (Horovod needs MPI and CERNDB/Keras Spark)."
Just based on how poorly designed the tf.Estimator API is though, I'm not actually sure the other methods would require less devops or less investment. In some cases for standard models, yes. But if you've already committed to using Keras for very customized situations, then going back to the dark ages with native TensorFlow will often be much more work and more error prone than using the other solutions. The Horovod dependence on MPI in particular is fairly simple and needs little management. Most people having done ML / stats PhDs will already have managed far more difficult situations with MPI previously anyway, or at least have the Linux skills needed. The point is you have a fighting chance, whereas deciphering undocumented and badly designed corners of TensorFlow often leaves you with no fighting chance.
You can convert the Keras model to TF pretty easily if you need to, as long as you use the TF backend. I did this, and converted the string preprocessing in TF so the model could be used in TF serving taking only the string as input.
If you have experience with learning, or teaching Deep Learning with PyTorch or Keras, we’d love to hear your thoughts about them.
My adviser decided (wisely) that we all needed to learn NN, and we settled on Tensorflow. That went... poorly. I've told this before: the Seq2Seq tutorial was designed for an older version of TF, and it triggered a bug that was not fixed because that way to do Seq2Seq was deprecated and a new tutorial was coming "soon". The "tutorial" was also just a code dump with barely any comments.
Eventually we had new people coming in with even less theoretic background than ours (we had read papers for at least 6 months), and that's when we realised it would not work at all. So we organised a 1-week hackathon with Pytorch, and we've been using it ever since.
That sounds interesting, are you at liberty to say what you are doing?
I also work on production systems built around deep ResNet architecture for computer vision tasks, and my team does this using solely Keras, including when we do distributed training.
Just adding this thought in case anyone mistakenly thinks you have to start out all-in using only TensorFlow because you might expect to need distributed training at some point.
: < https://news.ycombinator.com/item?id=17416904 >
I find it less useful to see comparisons of "top 50 deep learning frameworks for 2018" which include esoteric stuff that is only there for sake of completeness.
This way a person branching out from Tensorflow (I assume its Tensorflow) knows which two frameworks to try out, and what to look for.
Though, as I see - it loads, though sometimes with a considerable delays (5-10 sec).
Optionally, you can also use tf.keras in combination w/ eager execution, enabling you to write code like this: https://colab.research.google.com/github/tensorflow/tensorfl...
One suggestion to the authors: the benchmark figures are interesting, but I wish you had shown CPU only results also. At work, I have all the GPU resources I need but for my home projects, which are all NLP deep learning experiments, I usually rent a many core large memory server with no GPUs (GPUs seem to speed up RNNs less than other model types).
First figure from our paper: how the LSTM with a twist allows for the equivalent speed of a plain convnet by running efficiently in parallel on GPUs, like image processing convents.
Best of all, as it's only an "LSTM with (these) twists", it's drop-in compatible with existing LSTMs but can get you a 2-17 times speed-up over NVIDIA's cuDNN LSTM - essentially speed equivalent to the TCN or WaveNet speed-up.
That's why Baidu implemented QRNN in their production Deep Voice 2 neural text-to-speech (TTS) system.
This isn't to say TCN or QRNN is better, simply that it's dangerous to flat out say _no_ if you're not actually certain or don't correctly recall the underlying information.
Disclaimer: I'm the co-author of the QRNN.
Double disclaimer: The TCN paper cites the QRNN but decides not to test against it. They also show results over one of my datasets.
For trying out deep learning, or build on existing models, pytorch or keras may be easier to grasp. But when making new models that involves a lot of math, the Theano/Tensorflow is more helpful IMO.
For using models it may note matter that much (though, again read YOLO in TF and PyTorch and then decide which is cleaner :)).
For new models which go beyond a standard ConvNet/LSTM... well, PyTorch is heaven, Theano sounds like a torture.
YOLO is a quite standard feed-forward model in my opinion. I mean the math part, which I am more concerned with.
I have never used Theano before, my idea from it is that Tensorflow followed its static graph approach.
If data and model are mixed, it often resort to line-by-line debugging to zone out the real problem, which often takes more time.
I mean, just looking at the "getting started, 30 seconds to Keras", there are so many magic strings and options. Of course, if one is well-versed in this domain, they make sense. But it's hard to grasp, and Keras is supposed to be the high-level one.
Why, just why.
Despite all this, I wholeheartedly recommend this course, it demystified DL for me.
I think overall if your goal is to use a preexisting architecture to get quick results fastai is a great point to start. If you want to build your own architecture, reach one level of abstraction lower.
Edit: I liked this PyTorch youtube series quite a bit: https://www.youtube.com/watch?list=PLlMkM4tgfjnJ3I-dbhO9JTw7...
but I can guess that topMovieIDx is the index of the top movie, V converts to a vector(?), dunno about m.ib, and to_np is converted to a numpy array.
I do believe c# has some machine learning libraries, but afaik they aren't anywhere near the level of tensor flow or keras.
CNTK has a C++ API but the documentation is unfortunately just "read the header file" https://docs.microsoft.com/en-us/cognitive-toolkit/cntk-libr...
Also Python obv, or use it as a backend to Keras (in R).
Also, with eager execution, Tensorflow has become much more accessible to new users.
Having said that, the world would likely be a better place if everyone just used PyTorch :)
For more advanced training for business or Kaggle competitions (version controlling of code and results, advanced charts): https://neptune.ml/
Even defining a custom deep CNN for multiple image prediction tasks (so, deep and custom architecture), Keras holds up well — and creating your own layers in Keras is very easy.
# Example of using Sequential
model = nn.Sequential(
They are simple and basic, difference between 5 lines of code or 20 lines of code makes no difference. You spend very little time actually coding these layers. Understanding the model, default parameters used underneath is more important.
It would be nice to see some examples with skip-layers, weight sharing etc. You you have to drop sequential model to do them or not?
tl;dr: not nearly as popular (which means: less tutorials, less documentation, less examples, less integration with other systems, less community support for development or discussions)
Sure, all frameworks do have some goal and once one is confident in DL, may be a good choice. As you see from the plots there - MXNet is very fast for some applications.
Currently I've been training a CNN model in Keras with good success, and using custom scripts to port it to a TensorFlow model. The .h5 file from Keras helps a lot with this step.
Next step is compiling a shared Tensorflow library so I can deploy the trained model in C++ (project requirement) and this has been a pain in the ass, regardless of framework...
and.... I'm also using React Native because I dont like Apple and hopefully I can use a friends computer the moment I compile for iphone.
If I had to summarise the frameworks in a few words, they would be: Keras for speed, Tensorflow for production, Pytorch for research.
I don't know much about Python :/
I am planning organizing a TensorFlow.js bootcamp, but here it is more difficult (as data preprocessing, and debugging in general, is way more difficult in JS than in Python).
I just had the hope, that integrating this into existing JS code-bases would be easier with TensorFlow.js :)