
PyTorch – Tensors and Dynamic neural networks in Python - programnature
http://pytorch.org/
======
Smerity
Only a few months ago people saying that the deep learning library ecosystem
was starting to stabilize. I never saw that as the case. The latest frontier
for deep learning libraries is ensuring efficient support for dynamic
computation graphs.

Dynamic computation graphs arise whenever the amount of work that needs to be
done is variable. This may be when we're processing text, one example being a
few words while another being paragraphs of text, or when we are performing
operations against a tree structure of variable size. This problem is
particularly prominent in particular subfields, such as natural language
processing, where I spend most of my time.

PyTorch tackles this very well, as do Chainer[1] and DyNet[2]. Indeed, PyTorch
construction was directly informed from Chainer[3], though re-architected and
designed to be even faster still. I have seen all of these receive renewed
interest in recent months, particularly amongst many researchers performing
cutting edge research in the domain. When you're working with new
architectures, you want the most flexibility possible, and these frameworks
allow for that.

As a counterpoint, TensorFlow does not handle these dynamic graph cases well
at all. There are some primitive dynamic constructs but they're not flexible
and usually quite limiting. In the near future there are plans to allow
TensorFlow to become more dynamic, but adding it in after the fact is going to
be a challenge, especially to do efficiently.

Disclosure: My team at Salesforce Research use Chainer extensively and my
colleague James Bradbury was a contributor to PyTorch whilst it was in stealth
mode. We're planning to transition from Chainer to PyTorch for future work.

[1]: [http://chainer.org/](http://chainer.org/)

[2]: [https://github.com/clab/dynet](https://github.com/clab/dynet)

[3]:
[https://twitter.com/jekbradbury/status/821786330459836416](https://twitter.com/jekbradbury/status/821786330459836416)

~~~
PieSquared
Could you elaborate on what you find lacking in TensorFlow? I regularly use
TensorFlow for exactly these sorts of dynamic graphs, and it seems to work
fairly well; I haven't used Chainer or DyNet extensively, so I'm curious to
see what I'm missing!

~~~
Smerity
When you say "exactly these sorts of dynamic graphs", what do you mean?
TensorFlow has support for dynamic length RNN unrolling but that really
doesn't extend well to any dynamic graph structure such as recursive tree
structure creation. Since the computation graph has a different shape and size
for every input they are difficult to batch and any pre-defined static graph
is likely excessive, wasting computation, or inexpressive.

The primary issue is that the computation graph is not imperative - you define
it explicitly. Chainer describes this as the difference between "Define-and-
Run" frameworks and "Define-by-Run" frameworks[1].

TensorFlow is "Define-and-Run". For loops and conditionals end up needing to
be defined and injected into the graph structure before it's run. This means
there are "tf.while_loop" operations for example - you can't use a "while"
loop as it exists in Python or C++. This makes debugging difficult as the
process of defining the computation graph is separate to the usage of it and
also restricts the flexibility of the model.

In comparison, both Chainer, PyTorch, and DyNet are "Define-by-Run", meaning
the graph structure is defined on-the-fly via the actual forward computation.
This is a far more natural style of programming. If you perform a for loop in
Python, you're actually performing a for loop in the graph structure as well.

This has been a large enough issue that, very recently, a team at Google
created "TensorFlow Fold"[2], still unreleased and unpublished, that handles
dynamic computation graphs. In it they tackle specifically dynamic batching
within the tree structured LSTM architecture.

If you compare the best example of recursive neural networks in TensorFlow[3]
(quite complex and finicky in the details) to the example that comes with
Chainer[4], which is perfectly Pythonic and standard code, it's pretty clear
why one might prefer "Define-by-Run" ;)

[1]:
[http://docs.chainer.org/en/stable/tutorial/basic.html](http://docs.chainer.org/en/stable/tutorial/basic.html)

[2]:
[https://openreview.net/pdf?id=ryrGawqex](https://openreview.net/pdf?id=ryrGawqex)

[3]:
[https://github.com/bogatyy/cs224d/tree/master/assignment3](https://github.com/bogatyy/cs224d/tree/master/assignment3)

[4]:
[https://github.com/pfnet/chainer/blob/master/examples/sentim...](https://github.com/pfnet/chainer/blob/master/examples/sentiment/train_sentiment.py#L125)

~~~
PieSquared
Ah, fair enough, I see your point. An imperative approach (versus TensorFlow's
semi-declarative approach) can be easier to specialize to dynamic compute
graphs.

I personally think the approach used in TensorFlow is preferable – having a
static graph enables a lot of convenient operations, such as storing a fixed
graph data structure, shipping models that are independent of code, performing
graph transformations. But you're right that it entails a bit more complexity,
and that implementing something like recursive neural networks, while totally
possible in a neat way, ends up taking a bit more effort. I think that the
trade-off is worth it in the long run, and that the design of TensorFlow is
very much influenced by the long-run view (at the expense of immediate
simplicity...).

The ops underlying TensorFlow's `tf.while_loop` are actually quite flexible,
so I imagine you can create a lot of different looping constructs with them,
including ones that easily handle recursive neural networks.

Thanks for pointing out a problem that I haven't really thought about before!

------
smhx
It's a community-driven project, a Python take of Torch
[http://torch.ch/](http://torch.ch/). Several folks involved in development
and use so far (a non-exhaustive list):

* Facebook * Twitter * NVIDIA * SalesForce * ParisTech * CMU * Digital Reasoning * INRIA * ENS

The maintainers work at Facebook AI Research

~~~
tsomctl
Not only that, but it appears to use the same core c libray (TH) as Lua torch.

~~~
smhx
we actually share the same git-subtree between Lua and Python variants. TH,
THNN, THC, THCUNN are shared.

~~~
divbit
I have been running in the back of my mind the idea of attempting a julialang
interface to torch for a few weeks now, using the ccall interface:
[http://docs.julialang.org/en/release-0.5/manual/calling-c-
an...](http://docs.julialang.org/en/release-0.5/manual/calling-c-and-fortran-
code/?highlight=ccall). Do you have any thoughts / recommendations w'r't'
that? (This would be more of a fun / weekend(s) project for me than anything
else) My goal would be to have the tensors override the .* and * operators as
used here:
[https://gist.github.com/divbit/ec57ad2f1989bf13aecdf9e1e1056...](https://gist.github.com/divbit/ec57ad2f1989bf13aecdf9e1e10563f0)

------
spyspy
This project aside, I'm in love with that setup UI on the homepage telling you
exactly how to get started given your current setup.

~~~
artursapek
Agreed. Reminds me of this scary page I found the other day when googling
"certbot setup":

[https://certbot.eff.org/all-instructions/](https://certbot.eff.org/all-
instructions/)

------
programnature
Actually not clear if there is an official affiliation with Facebook, other
than some of the primary devs.

~~~
throwawayish
Copyright (c) 2016- Facebook, Inc (Adam Paszke)

Copyright (c) 2014- Facebook, Inc (Soumith Chintala)

Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)

Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)

Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)

Copyright (c) 2011-2013 NYU (Clement Farabet)

Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon
Bottou, Iain Melvin, Jason Weston)

Copyright (c) 2006 Idiap Research Institute (Samy Bengio)

Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy
Bengio, Johnny Mariethoz)

Notably absent is the otherwise Facebook-typical PATENTS license thing. Which
I see as a good sign.

Also, it doesn't look like this has happened just now? PRs in the repo go back
a couple months and the repo has 100+ contributors.

~~~
smhx
it's the same license file as
[https://github.com/torch/torch7](https://github.com/torch/torch7) and
[http://torch.ch](http://torch.ch)

The C libraries are shared among the Lua and Python variants

------
tdees40
At this point I've used PyTorch, Tensorflow and Theano. Which one do people
prefer? I haven't done a ton of benchmarking, but I'm not seeing huge
differences in speed (mostly executing on the GPU).

~~~
sandGorgon
Keras is going to be the interface to Tensorflow -
[https://news.ycombinator.com/item?id=13413487](https://news.ycombinator.com/item?id=13413487)

~~~
tdees40
Yes, but Keras works just fine using Theano as a backend as well...

------
taterbase
Is there any reason this might not work in windows? I see no installation docs
for it.

~~~
smhx
the C libraries are compatible with Windows, they are used in Torch windows
ports. We just dont have any Windows devs on the project to help and maintain
it :( .

~~~
randomx89
Are you guys looking for Windows devs to contribute or help maintaining it?
I'd be interested in helping out if I can. I currently use Chainer, but I'd
like to try pytorch

~~~
apaszke
Yes! There's an issue on that, where we'll be coordinating the work:
[https://github.com/pytorch/pytorch/issues/494](https://github.com/pytorch/pytorch/issues/494)

------
EternalData
Been using PyTorch for a few things. Love how it integrates with Numpy.

------
theoracle101
Most important question. Is this still 1 indexed (Lua was 1 indexed, which
means porting code you need to be aware of this)?

~~~
apaszke
No! Python 0 based indexing everywhere.

------
rtcoms
I've never fiddled with machine learning thing so don't know anything about
it.

I am wondering if CUDA is mandatory for torch installation ? I use a Macbook
air which doesn't have graphics card, so not sure if torch can be installed
and used on my machine.

~~~
itg
It's not mandatary, but for some problems such as using image data, it
provides as substantial speedup when training a classifier.

------
baq
Very nice to see Python 3.5 there.

------
jbsimpson
This is really interesting, I've been wanting to learn more about Torch for a
while but have been reluctant to commit to learning Lua.

~~~
veli_joza
Lua is a pleasure to learn and use. The language core is so simple and
elegant, you can learn it in a day. Standard library is also very light, which
is both strength and weakness.

I use it more and more for hobby projects. Combine it with LuaJIT (which torch
uses) and you have the fastest interpreted language around. Give it a try.

~~~
etiene
I want to reiterate this. I started learning it for guilt because it was
created in the university I studied. Then I realised it was really a pleasure
to use it. I still use it in many hobby projects nowadays whenever I can.

------
ankitml
I am confused with the license file. What does it mean? Some rights reserved
and copyright... Doesnt look like a real open source project.

~~~
yincrash
It is a standard 3-clause BSD license. The "All rights reserved" portion
definitely adds ambiguity (and only exists in the BSD license out of all major
OSS licenses). There is StackExchange answer that goes into the history of
it[1].

[1] [http://opensource.stackexchange.com/questions/2121/mit-
licen...](http://opensource.stackexchange.com/questions/2121/mit-license-and-
all-rights-reserved)

~~~
ankitml
Got it. It makes sense now.

------
gallerdude
What's the highest level neural network lib I can use? I'm a total programming
idiot but I find neural nets fascinating.

~~~
visarga
Keras requires just a few lines of code, it's designed for easy use and
practicality.

~~~
apaszke
torch.nn offers a very similar interface to Keras (e.g. see Alexnet definition
at
[https://github.com/pytorch/vision/blob/master/torchvision/mo...](https://github.com/pytorch/vision/blob/master/torchvision/models/alexnet.py#L13)).

------
aaron-lebo
Is this related to lua's Torch at all?

[http://torch.ch/](http://torch.ch/)

~~~
zo7
They don't seem to explicitly say it, but it might be using the same core code
given the structure of the framework and their mentioning that it's a mature
codebase several years old. The license file also goes back to NYU before
being taken over by Facebook, similar to Torch.

~~~
apaszke
The core libraries are the same as in Lua torch, but the interface is
redesigned and new.

------
0mp
It is worth adding that there is a wip branch focused on making PyTorch
tensors distributable across machines in a master-workers model:
[https://github.com/apaszke/pytorch-dist/](https://github.com/apaszke/pytorch-
dist/)

------
shmatt
i've been running their dcgan.torch code in the past few days and results have
been pretty amazing for plug and play

------
vegabook
Guess there's no escaping Python. I had hoped Lua(jit) might emerge as a
scientific programming alternative but with Torch now throwing its hat into
the Python ring I sense a monoculture in the making. Bit of a shame really
because Lua is a nice language and was an interesting alternative.

~~~
jjawssd
Lua is extremely flexible to the point where there is basically no standard
library. This causes problems with code reuse and moving between codebases
because everyone does things drastically differently. Compare this to Numpy in
the Python world, a single fundamental package for scientific computing in
Python.

Lua is less used than Python in the scientific community, and a lot of the
most innovative machine learning researchers already work with C++ and Python.
Using yet another language with only marginal benefit increases cognitive load
and drains from the researcher's mental innovation budget, forcing the
researcher to learn the ins and outs of Lua rather than working on innovative
machine learning solutions.

Lua is a nice language. Python 3 is a nice language and there are many new
exciting features and development styles (hello async programming?) in the
making which will prevent a monoculture from forming in the near term.

~~~
vegabook
Thanks for the interesting and informative comment. Do I sense just a tiny bit
of regret though? Yet another Python interface. YAPI. You heard it here first.
And no, Py3 is not that nice. Too much cruft by far. And lua is miles faster
than Python when you're outside the tensor domain, ie while you're sourcing
and wrangling your data. Arguably luajit obviates the need for C , something
you can't say about Python. Disclosure: I am a massive, but increasingly
disenchanted, user of Python. I had actually started looking at Torch7,
foregoing tensorflow, precisely because of Lua. But the walls are closing
in....

~~~
jjawssd
A very large portion of performance problems can be mitigated with the use of
cython and the new asyncio stuff.

asyncio success story: [https://magic.io/blog/asyncpg-1m-rows-from-postgres-
to-pytho...](https://magic.io/blog/asyncpg-1m-rows-from-postgres-to-python/)

cython: [http://scikit-
learn.org/stable/developers/performance.html](http://scikit-
learn.org/stable/developers/performance.html)

~~~
vegabook
Luajit is at least 10x faster than python and easily obviates the need to mess
around with cython. That's an easy win for Lua. Let's be honest: Torch has
decided that if you cannot beat them, join them. It is about network effects.
Not about Python better than Lua intrinsically.

~~~
baq
but why do you care if luajit is faster than python if everything that matters
is computed on the GPU anyways?

------
plg
Every time I decide I'm going to get into Python frameworks again, and I start
looking at code, and I see people making everything object-oriented, I bail

Just a personal (anti-)preference I guess

~~~
apaszke
But it is possible to write your model in purely functional style. Check out
the PR to examples repo with functional ResNets
[https://github.com/pytorch/examples/pull/22](https://github.com/pytorch/examples/pull/22).

