Hacker News new | past | comments | ask | show | jobs | submit login

Only a few months ago people saying that the deep learning library ecosystem was starting to stabilize. I never saw that as the case. The latest frontier for deep learning libraries is ensuring efficient support for dynamic computation graphs.

Dynamic computation graphs arise whenever the amount of work that needs to be done is variable. This may be when we're processing text, one example being a few words while another being paragraphs of text, or when we are performing operations against a tree structure of variable size. This problem is particularly prominent in particular subfields, such as natural language processing, where I spend most of my time.

PyTorch tackles this very well, as do Chainer[1] and DyNet[2]. Indeed, PyTorch construction was directly informed from Chainer[3], though re-architected and designed to be even faster still. I have seen all of these receive renewed interest in recent months, particularly amongst many researchers performing cutting edge research in the domain. When you're working with new architectures, you want the most flexibility possible, and these frameworks allow for that.

As a counterpoint, TensorFlow does not handle these dynamic graph cases well at all. There are some primitive dynamic constructs but they're not flexible and usually quite limiting. In the near future there are plans to allow TensorFlow to become more dynamic, but adding it in after the fact is going to be a challenge, especially to do efficiently.

Disclosure: My team at Salesforce Research use Chainer extensively and my colleague James Bradbury was a contributor to PyTorch whilst it was in stealth mode. We're planning to transition from Chainer to PyTorch for future work.

[1]: http://chainer.org/

[2]: https://github.com/clab/dynet

[3]: https://twitter.com/jekbradbury/status/821786330459836416




Could you elaborate on what you find lacking in TensorFlow? I regularly use TensorFlow for exactly these sorts of dynamic graphs, and it seems to work fairly well; I haven't used Chainer or DyNet extensively, so I'm curious to see what I'm missing!


When you say "exactly these sorts of dynamic graphs", what do you mean? TensorFlow has support for dynamic length RNN unrolling but that really doesn't extend well to any dynamic graph structure such as recursive tree structure creation. Since the computation graph has a different shape and size for every input they are difficult to batch and any pre-defined static graph is likely excessive, wasting computation, or inexpressive.

The primary issue is that the computation graph is not imperative - you define it explicitly. Chainer describes this as the difference between "Define-and-Run" frameworks and "Define-by-Run" frameworks[1].

TensorFlow is "Define-and-Run". For loops and conditionals end up needing to be defined and injected into the graph structure before it's run. This means there are "tf.while_loop" operations for example - you can't use a "while" loop as it exists in Python or C++. This makes debugging difficult as the process of defining the computation graph is separate to the usage of it and also restricts the flexibility of the model.

In comparison, both Chainer, PyTorch, and DyNet are "Define-by-Run", meaning the graph structure is defined on-the-fly via the actual forward computation. This is a far more natural style of programming. If you perform a for loop in Python, you're actually performing a for loop in the graph structure as well.

This has been a large enough issue that, very recently, a team at Google created "TensorFlow Fold"[2], still unreleased and unpublished, that handles dynamic computation graphs. In it they tackle specifically dynamic batching within the tree structured LSTM architecture.

If you compare the best example of recursive neural networks in TensorFlow[3] (quite complex and finicky in the details) to the example that comes with Chainer[4], which is perfectly Pythonic and standard code, it's pretty clear why one might prefer "Define-by-Run" ;)

[1]: http://docs.chainer.org/en/stable/tutorial/basic.html

[2]: https://openreview.net/pdf?id=ryrGawqex

[3]: https://github.com/bogatyy/cs224d/tree/master/assignment3

[4]: https://github.com/pfnet/chainer/blob/master/examples/sentim...


Ah, fair enough, I see your point. An imperative approach (versus TensorFlow's semi-declarative approach) can be easier to specialize to dynamic compute graphs.

I personally think the approach used in TensorFlow is preferable – having a static graph enables a lot of convenient operations, such as storing a fixed graph data structure, shipping models that are independent of code, performing graph transformations. But you're right that it entails a bit more complexity, and that implementing something like recursive neural networks, while totally possible in a neat way, ends up taking a bit more effort. I think that the trade-off is worth it in the long run, and that the design of TensorFlow is very much influenced by the long-run view (at the expense of immediate simplicity...).

The ops underlying TensorFlow's `tf.while_loop` are actually quite flexible, so I imagine you can create a lot of different looping constructs with them, including ones that easily handle recursive neural networks.

Thanks for pointing out a problem that I haven't really thought about before!


I'm intrigued by pyTorch but I am really having a hard time groking what you mean by the whole "but that really doesn't extend well to any dynamic graph structure such as recursive tree structure creation. Since the computation graph has a different shape and size for every input they are difficult to batch and any pre-defined static graph is likely excessive, wasting computation, or inexpressive."

Would you mind providing a concrete example to relate to if you dont mind? Again, intrigued by PT so want to learn more about it vs TF...


You can build both the symbolic computation graph and do the computation at the time when defining the network architecture, thus, gaining the ability to be "dynamic" and also supporting advanced features with the symbolic representation that you built on the side.

In fact, with DyNet or PyTorth, you still need to bookkeeping the graph you traversed (tape) because no one is doing forward AD. If that's the case, why not have a good library to do symbolic computation graph and build dynamic feature on top of it. (I am not saying Tensorflow is a good symbolic computation graph library to build upon just arguing that start with a define-compile-run library doesn't necessarily hinder your ability to support dynamic graphs).


the biggest hindrance to do this are language constructs that cannot be or are inconveniently expressed in the symbolic graph, such as python's if vs tf.if and for vs theano.scan, or conditioning on some python-code (not tensor operations). So to build an eagerly evaluating symbolic graph framework that is allowed to do arbitrary things would mean that you would (to an extent) reimplement the language you are working with.


Let's assume Tensorflow has basic symbolic computation graph expressiveness. What you would do is to build a symbolic representation while executing your graph inline, your symbolic representation doesn't need to have any control structure, it is simpler than that. You execute while loop in Python as usual, and your symbolic representation won't have TF.While at all, it will simply be the execution you performed so far (matrix mul 5 times).

Once you have a reasonable symbolic computation graph library, you don't need to explicitly build a "tape" because the symbolic representation will record the order of execution and reverse AD even graph optimization (applying CSE etc) come naturally as well.


How is adding dynamic graphs to TensorFlow "after the fact" while adding it to Torch isn't? (Torch is much older than TF).


Torch was never written as a static graph computation framework. Torch was/is more a tensor manipulation library where you are executing the individual operations step by step and the graph can be tracked and constructed incrementally from those operations. For this reason, much of PyTorch is about building a layer on top of the underlying components (which are focused on efficiently manipulating tensors and/or implementing low level NN components) rather than re-architecting Torch itself.

This won't be the same for TensorFlow as it was written with the concept of a static computation graph at its core. I'm certainly not saying it's impossible to re-architect - and many smart people in the community and at Google are devoting thinking and code to it - but simply that the process will be far more painful as it was not written with this as an intended purpose.

To note - there are many advantages to static computation graphs. Of particular interest to Google is that they distribute their computations very effectively over large amounts of hardware. Being able to do this with a dynamic computation graph would be far more problematic.


Thanks for the clarification.

Does the upcoming XLA interact with this as well? I.e. compilation would be too costly for dynamic graphs, and so it would only make sense for static graphs?


I am not highly clued in to XLA as it's new, quite experimental, and most honestly I've just not looked at it in detail. Given XLA provides compilation, JIT or ahead of time, it doesn't really (yet) factor in to the dynamic graph discussion.

What would theoretically be interesting is a JIT for dynamic computation graphs. Frequent subgraphs could be optimized and cached and re-used when appropriate, similar to a JIT for Javascript. No doubt they're already pondering such things.

https://www.tensorflow.org/versions/master/experimental/xla/


Chainer's Define-by-run apporach is also described here https://www.oreilly.com/learning/complex-neural-networks-mad...


Any particular reason you prefer PyTorch over DyNet?


If you guys wanna use Go, Gorgonia also features dynamic graphs the way Chainer does (also Theano-style compile-execute machines)


One question: how do you save a dynamic network if it changes from time to time (e.g. from sample to sample)?


You save the parameters and the code of the model definition




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: