Maybe I'll give TF another try, but right now I'm really liking PyTorch. With TensorFlow I always felt like my models were buried deep in the machine and it was very hard to inspect and change them, and if I wanted to do something non-standard (which for me is most of the time) it was difficult even with Keras. With PyTorch though, I connect things however how I want, write whatever training logic I want, and I feel like my model is right in my hands. It's great for research and proofs-of-concept. Maybe for production too.
TF's deprecation velocity was way too high for my taste. Things we wrote would stop working randomly with their updates. I feel very similar to you about the models being "buried too deep" in their (ever-changing) machine. I much preferred how easy it was to hack Caffe V1 (once you got past the funky names, etc).
These days, I really like mxnet. Torch was a disaster, but Pytorch is much better. It's not bad in production, definitely my #2.
Maybe I'm not up to speed with the latest PyTorch, but to me Keras feels much more natural. In Keras if you want to define a deep learning network, then you just do that, you specify the first layer, the second layer, etc, then you calibrate over some test and validation samples, using a certain flavor of gradient descent, for a given loss function. In PyTorch, you have to define a class, with a constructor, some method called "forward", I don't know, maybe if I follow an example to the end I get the hang of it. My problem is that I don't want to write object oriented programming, I want to do machine learning. Keras doesn't force me to know what a class it, or what it means to inherit from nn.Module, or that a constructor in python needs to contain 'self' as a variable. PyTorch, at least the exmples I saw online want me to do just that, and that's a turnoff.
On the other hand, in Keras I can't (easily) change the architecture of a learner after I defined it. I can't prune some nodes and split others, maybe that's easy in PyTorch. If that's the case, I'll take a second look. Until then, when I have some time, I'm really tempted to invest some time in MXNet, as the book "Dive into Deep Learning" appears to be quite good.
in Keras I can't (easily) change the architecture of a learner after I defined it
I'm not sure what you mean here, because only PT lets you change architecture after you define it, while TF/Keras uses a static precompiled graph. Now that's changing with eager mode, but that used to be the main advantage of PT.
That was what I heard too, that in PyTorch you could adaptively build your graph. In Keras you could in principle emulate this in a brute-force way: you define the graph, train it, saving the weights, analyze them, decide what nodes to remove and what nodes to add, and then rebuild the graphs from scratch and use the adjusted weights from the previous round. I say in principle, personally I never did it. Some people hint that in PT this is a breeze, I'd be curious to see some example. If you have any links, that would be much, much appreciated.
In a large NN a lot of nodes end up being useless. You can remove them without degrading the performance of the NN.
To be more concrete, here's a link [1] to Google's neural network playground. I built a network with 5 layers and 37 hidden nodes. It trains quite well, but the last layer has 2 nodes that contribute with very little weight to the final output. The app allows you to change their weight (you click on the corresponding line and edit). If you change the weight to zero (effectively dropping the node), the classifier, if anything, gets better. My guess is that you can easily remove about half of the nodes. Conversely, if you look at the nodes with the highest weights out, you can in principle clone them and halve the weight out both for the original and for the clone. With this configuration, the network output is exactly the same, but if you continue training, it allows more flexibility, as the original and the clone are allowed to diverge.
This type of operations are not possible in Keras. Are they in PyTorch? If not, then what type of dynamic graphs are possible? What can one do with PyTorch that one can't do with Keras?
Your first example is commonly referred to as network pruning, and is typically used to compress a model (the nodes are still there, but a sparse weights network can be stored in compressed form). It's also possible to remove nodes themselves, rather than individual weights. This is typically done on a filter level (for convnets), so that entire filters are removed.
The second example (cloning the nodes) is typically performed to improve network robustness (by avoiding important nodes a single point of failure).
To do either one during training you need dynamic graphs, so either PyTorch, or TF eager mode. Here's one filter pruning implementation: https://github.com/jacobgil/pytorch-pruning
Might give it another try, but my latest incursion in the Tensorflow universe did not end pleasantly. I ended up recoding everything in Pytorch, took me less than a day to do the stuff that took me more than a week in TF. One problem is that there are too many ways to do the same thing in TF and it's hard to transition from one to the other.
Yeah, the only reason to use TF is really its deployment friendliness. If PyTorch addressed that more comprehensively, there'd be no good reason to use TF at all. For research PyTorch blows TF out of the water completely, and it's been that way for years, ever since it came out.
For me it's deploying to mobile, mostly. There's ONNX but it doesn't seem to be terribly mature and it doesn't support some of the common ops, and e.g. FB's own Caffe2 doesn't run it natively. There's also no mature tooling to produce quantized models. TF remains the only real option to do quantization aware training or even easy post-training quantization.
Specifically, my life would be a lot easier if I could save a mobilenet-style model to e.g. ONNX or some other static graph format that does not require model code in order to load weights. I would like then to be able to load this saved model directly into something on Android and iOS that can use GPU and DSP present on the chip, with minimal extra futzing.
One of the major benefits of TF 2.0 is apparently the capability to quickly deploy to TPU units with a single parameter change. (I haven't tried it, just followed the marketing).
AFAIK, This is still being worked on PyTorch via XLA, but not quite there yet.
I've found that with TF in general you can only go "quickly" if everything works. If anything is busted you're more or less screwed because it's so opaque. In contrast, PyTorch lets you inspect whatever you want by setting a pdb breakpoint, and when it gives you errors you most of the time can figure out what's wrong without debugging. The importance of this cannot be overstated.
This is one thing that confuses me. Why Keras is still a separate brand? Why everything isn't under just tensorflow namespace instead of having to do tf.keras all the time. I really wish tf just had one API and just one thing to learn.
I doubt it, it's more likely that they are creating higher level abstractions atop the lower level ones and are advertising/documenting the higher level ones
Keras is part of the problem for me. It is rather rigid and it's hard to get around. Works super well for the regular use case. On the other hand, when you want to start doing custom stuff, it's hell.
I am very happy that Google has realized the importance of usability. Hopefully that comes with concomitant improvements in the tf documentation, which, while thorough is completely unusable and lacks good examples for complex things.
I have spent a few evenings playing with early releases. I like the ‘turtles all the way down’ idea, but I am waiting to see more mature releases. I have spent much more time with TensirFlowJS that works well, has many great examples.
From what I understand this is mostly because they hired the Swift guy.
I understand the benefits compared to Python (although I would have preferred Go or Kotlin). But what happens when the guy eventually moves on in a year or two?
That is not what their employees talk about at CppCon, LLVM conferences, ISO C++ meetings, Java Language Summit (yes they come around in spite of Android).
Nice to see the project moving along, I'm just getting started with the basics for a way finding application and will probably start off with version 2 then.
Hopefully by the time stable comes around I'll be near production ready as well.
A bit off-topic, but does TF or pyTorch work nicely with AMD GPUs?
I'd rather not have to deal with Nvidia's blob drivers if at all possible.