Dynamic computation graphs arise whenever the amount of work that needs to be done is variable. This may be when we're processing text, one example being a few words while another being paragraphs of text, or when we are performing operations against a tree structure of variable size. This problem is particularly prominent in particular subfields, such as natural language processing, where I spend most of my time.
PyTorch tackles this very well, as do Chainer and DyNet. Indeed, PyTorch construction was directly informed from Chainer, though re-architected and designed to be even faster still. I have seen all of these receive renewed interest in recent months, particularly amongst many researchers performing cutting edge research in the domain. When you're working with new architectures, you want the most flexibility possible, and these frameworks allow for that.
As a counterpoint, TensorFlow does not handle these dynamic graph cases well at all. There are some primitive dynamic constructs but they're not flexible and usually quite limiting. In the near future there are plans to allow TensorFlow to become more dynamic, but adding it in after the fact is going to be a challenge, especially to do efficiently.
Disclosure: My team at Salesforce Research use Chainer extensively and my colleague James Bradbury was a contributor to PyTorch whilst it was in stealth mode. We're planning to transition from Chainer to PyTorch for future work.
The primary issue is that the computation graph is not imperative - you define it explicitly. Chainer describes this as the difference between "Define-and-Run" frameworks and "Define-by-Run" frameworks.
TensorFlow is "Define-and-Run". For loops and conditionals end up needing to be defined and injected into the graph structure before it's run. This means there are "tf.while_loop" operations for example - you can't use a "while" loop as it exists in Python or C++. This makes debugging difficult as the process of defining the computation graph is separate to the usage of it and also restricts the flexibility of the model.
In comparison, both Chainer, PyTorch, and DyNet are "Define-by-Run", meaning the graph structure is defined on-the-fly via the actual forward computation. This is a far more natural style of programming. If you perform a for loop in Python, you're actually performing a for loop in the graph structure as well.
This has been a large enough issue that, very recently, a team at Google created "TensorFlow Fold", still unreleased and unpublished, that handles dynamic computation graphs. In it they tackle specifically dynamic batching within the tree structured LSTM architecture.
If you compare the best example of recursive neural networks in TensorFlow (quite complex and finicky in the details) to the example that comes with Chainer, which is perfectly Pythonic and standard code, it's pretty clear why one might prefer "Define-by-Run" ;)
I personally think the approach used in TensorFlow is preferable – having a static graph enables a lot of convenient operations, such as storing a fixed graph data structure, shipping models that are independent of code, performing graph transformations. But you're right that it entails a bit more complexity, and that implementing something like recursive neural networks, while totally possible in a neat way, ends up taking a bit more effort. I think that the trade-off is worth it in the long run, and that the design of TensorFlow is very much influenced by the long-run view (at the expense of immediate simplicity...).
The ops underlying TensorFlow's `tf.while_loop` are actually quite flexible, so I imagine you can create a lot of different looping constructs with them, including ones that easily handle recursive neural networks.
Thanks for pointing out a problem that I haven't really thought about before!
Would you mind providing a concrete example to relate to if you dont mind? Again, intrigued by PT so want to learn more about it vs TF...
In fact, with DyNet or PyTorth, you still need to bookkeeping the graph you traversed (tape) because no one is doing forward AD. If that's the case, why not have a good library to do symbolic computation graph and build dynamic feature on top of it. (I am not saying Tensorflow is a good symbolic computation graph library to build upon just arguing that start with a define-compile-run library doesn't necessarily hinder your ability to support dynamic graphs).
Once you have a reasonable symbolic computation graph library, you don't need to explicitly build a "tape" because the symbolic representation will record the order of execution and reverse AD even graph optimization (applying CSE etc) come naturally as well.
This won't be the same for TensorFlow as it was written with the concept of a static computation graph at its core. I'm certainly not saying it's impossible to re-architect - and many smart people in the community and at Google are devoting thinking and code to it - but simply that the process will be far more painful as it was not written with this as an intended purpose.
To note - there are many advantages to static computation graphs. Of particular interest to Google is that they distribute their computations very effectively over large amounts of hardware. Being able to do this with a dynamic computation graph would be far more problematic.
Does the upcoming XLA interact with this as well? I.e. compilation would be too costly for dynamic graphs, and so it would only make sense for static graphs?
* Digital Reasoning
The maintainers work at Facebook AI Research
Copyright (c) 2014- Facebook, Inc (Soumith Chintala)
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Notably absent is the otherwise Facebook-typical PATENTS license thing. Which I see as a good sign.
Also, it doesn't look like this has happened just now? PRs in the repo go back a couple months and the repo has 100+ contributors.
The C libraries are shared among the Lua and Python variants
I am wondering if CUDA is mandatory for torch installation ? I use a Macbook air which doesn't have graphics card, so not sure if torch can be installed and used on my machine.
I use it more and more for hobby projects. Combine it with LuaJIT (which torch uses) and you have the fastest interpreted language around. Give it a try.
Pretty much no way to use neural networks (except for playing, like above) without writing code.
 - http://kur.deepgram.com/
Lua is less used than Python in the scientific community, and a lot of the most innovative machine learning researchers already work with C++ and Python. Using yet another language with only marginal benefit increases cognitive load and drains from the researcher's mental innovation budget, forcing the researcher to learn the ins and outs of Lua rather than working on innovative machine learning solutions.
Lua is a nice language.
Python 3 is a nice language and there are many new exciting features and development styles (hello async programming?) in the making which will prevent a monoculture from forming in the near term.
asyncio success story: https://magic.io/blog/asyncpg-1m-rows-from-postgres-to-pytho...
Then use Lua for that, if you are more comfortable there and want/need the speed bump. There's nothing that says an entire project or whatnot has to be developed in a singular language.
Use each tool to its strengths, as your needs, requirements, and abilities dictate.
Is that true even if the Python used is PyPy rather than CPython?
The Python that you write when using these frameworks just the glue code / scripts. All you're doing is calling the framework's functions. Most of it gets thrown away (as researchers). The stuff that doesn't is self-contained and usually short. You're not writing 100k+ line codebases.
Lua may be faster for certain tasks (data processing), but the time it takes for does tasks is usually a rounding error in deep learning. Not to mention you can still code in C/C++ with pytorch.
If there is a monoculture in machine learning, it would be the deep learning monoculture.
If only Mike Pall created a transpiler infrastructure layer on top of LuaJIT.
Just a personal (anti-)preference I guess