
Ask HN: Why TensorFlow instead of Theano for deep learning? - DrNuke
I am an average solo user &#x2F; applied researcher using Windows locally with a GTX 1070 8GB and looking for speed and documentation first, so Theano is way ahead in these departments. That said, we are also told TensorFlow is the next big thing because of scalability (?). TensorFlow works under Windows with Anaconda and Python 3.5 through Keras as well, so I have it available indeed and can try the benchmarks. Where do we stand, really? Thanks.
======
asah
TensorFlow automatically discovers and uses GPUs and multiple cores, and I'm
assuming Google is working on better supporting multiple GPUs, which currently
requires hacks/tweaking to get speedups (it's easy to 'use' them)

TensorFlow is a platform "winner" and approx 100% of all innovations will
quickly be ported to TensorFlow - TBD which of the others will "keep up" with
innovations as they continue to come out.

other recommendations:

\- by default, TensorFlow allocates 100% of GPU RAM for each process. You'll
want to control this: [https://stackoverflow.com/questions/34199233/how-to-
prevent-...](https://stackoverflow.com/questions/34199233/how-to-prevent-
tensorflow-from-allocating-the-totality-of-a-gpu-memory)

\- Keras. yes, this. Dramatically reduces code by 2-10x, without loss of
control AFAICT.

\- cloud hardware. Pretty quickly, you'll want to scale and run multiple tests
at once, and e.g. quickly backup & copy data, replicate system images, etc. I
use Google Cloud Hosting and it's much easier (and cheaper) than AWS. Haven't
tried Azure but heard good things. At least once, Google's internet bandwidth
has saved hours waiting for data transfers.

~~~
laingc
Your comments are mostly valid, but I disagree about Keras. Although it's
marvellous for patching something together quickly, if you want to branch out
at all then it quickly becomes an absolute mess.

Far better, in my view, is to work with the newer Estimators API. It is almost
as fool-proof as Keras, but instead of trying to be a framework as such, the
Estimators/learn API essentially just wraps up some of the boilerplate that
you need with raw tensorflow, and internally looks fairly similar to the code
you might write yourself. Consequently, it preserves the composability of TF
far better than Keras.

~~~
asah
I'll try it!

------
rryan
Anyone who loves the Lisp concept of "code is data" will love TensorFlow.

Instead of coding imperatively, you write code to build a computation graph.
The graph is a data structure that fully describes the computation you want to
perform (e.g. training or inference of a machine learning model).

* That graph can be executed immediately, or stored for later.

* Since it's a serializable data structure, you can version it quite easily.

* You can deploy it to production without production having to depend on ANY of the code that _built_ the graph, only the runtime necessary to _execute_ it.

* You can run a compiler on it (such as XLA or TensorFlow's built in graph rewriter) to produce a more efficient version of the graph.

* In some circumstances, you can even compile the runtime away, producing a single .h/.o that you can link directly into e.g. a mobile app.

It's a beautiful and highly useful abstraction that allows TensorFlow to have
both a great development and production story in one framework. Most
frameworks only have a good story for either development or production.

If you are a machine learning researcher who doesn't need or care about
deploying your work (i.e. mostly publishing papers), you may not want the
overhead of having to deal with building a graph, and may prefer something
that computes imperatively like PyTorch. If you are building products /
services that use ML and developing/training your own models (as opposed to
taking pre-trained models and using them), there is really no credible
competitor to TensorFlow.

Disclaimer: I work at Google. I spend all day writing TensorFlow models. I'm
not on the TensorFlow team nor do I speak for them or Google.

~~~
fnl
> If you are building products / services that use ML and developing/training
> your own models (as opposed to taking pre-trained models and using them),
> there is really no credible competitor to TensorFlow.

MXNet has amalgamation [http://mxnet.io/how_to/smart_device.html#amalgamation-
making...](http://mxnet.io/how_to/smart_device.html#amalgamation-making-the-
whole-system-a-single-file)

CNTK provides a managed ("evaluation") library solution to deploy your models
and embed them in C, C++, C#, Python, and even an experimental Java version.
[https://docs.microsoft.com/en-us/cognitive-toolkit/CNTK-
Eval...](https://docs.microsoft.com/en-us/cognitive-toolkit/CNTK-Evaluation-
Overview)

How's that not competitive to TF? MXNet's approach is a bit unwieldy, yes, but
seems easily streamlined. And CNTK's deployment method looks perfectly fine.
Note I haven't checked other DL libs, but it seems unreasonable that Microsoft
and Amazon have no "competitive" solution for deployment.

~~~
fnl
I also completely forgot about Caffe(2), which I recall to always have been
the most easily deployable library, and possibly DL4J.

[http://www.cio.com/article/3193689/artificial-
intelligence/w...](http://www.cio.com/article/3193689/artificial-
intelligence/which-deep-learning-network-is-best-for-you.html)

~~~
agibsonccc
Disclaimer: I built dl4j and will be highly biased.

A lot of what we see is deployment to production. We are embedded in a few
apache projects now as well as other "enterprise" suites like knime.

The reason for this is simplicity and integration with the JVM as well as
supplemental addons for things like ETL (see: jdbc, hdfs, kafka,spark,..) as
well as what I think is the easiest way to do both multi threaded model
serving and data parallel training (parallelwrapper and parallelinfernece).

We also made things tunable from the JVM (eg: You can configure cuda, native
cpu variables, blas libraries,..) from the JVM.

We import python models as well.

We aren't heavily used in the research world but are used at scale (especially
in china).

Something we will have coming up is also the ability to import tensorflow
models (right now we have keras 1 and need to add keras 2). The only thing we
are missing and finishing out now (among other projects) is autodiff. We will
have a lot of the same properties as the other "computation graph" frameworks
like TF,theano,pytorch.

I'd also note we have a chainer/pytorch like cmpgraph api built in to our
neural net dsl already.

Next release we will also have our new parameter server using aeron. Aeron is
miles ahead of GRPC([https://github.com/benalexau/rpc-
bench](https://github.com/benalexau/rpc-bench)) being used in the low
latency/quant world as well as being the default transport for akka now.

------
sirfz
We've moved over to Tensorflow from Theano around a year ago. I'm a Software
Engineer on the team and here's what I think are advantages from my POV:

1) Transition was fairly straightforward, both APIs' interfaces are more-or-
less similar and share some design characteristics.

2) Having said that, TF's API is easier to use and without a doubt a lot
easier to read.

3) Consistency: Deploying Theano in different environments surprised me on
several occasions with different output compared to the training environment.
TF is more consistent on this front (never had such issues).

4) Running multiprocessing with Theano + GPU is a disaster (due to forking) so
I end up having to create process pools before initializing Theano. No such
issues with TF.

5) TF provides many helpful operators (such as queues and batching ops) as
well as monitoring tools (Tensorboard) and debugging tools.

6) Its development is extremely rapid, new releases every couple of months
with a lot of improvements and new features every time.

In short, TF is what Theano should have been. A lot of new papers are being
developed in TF as well so it helps to understand it.

~~~
digitalzombie
> 6) Its development is extremely rapid, new releases every couple of months
> with a lot of improvements and new features every time.

How stable is the api then?

I think google is a bit notorious for this (e.g. Angular vs Angular 2).

~~~
sirfz
All changes are very well documented for each release [1]. The biggest changes
I've seen were, as expected, upgrading from 0.x to 1.x which were mostly
function attribute names changes that were made to comply with numpy's
attribute names (a welcome change). They did provide tools that automatically
convert 0.x code to 1.x so that was helpful.

It is not necessary to upgrade if you're satisfied with the current version
and I sure won't deploy a different version to production than training.
Upgrading is mainly worth it if there are performance improvements or new
features/tools that make a big impact (and those are the changes we're mostly
interested in and look forward to every release).

[1]
[https://github.com/tensorflow/tensorflow/releases](https://github.com/tensorflow/tensorflow/releases)

------
paulsutter
Fortunately, it's not an irrevocable decision like choosing a JavaScript
framework. With deep learning you spend a lot of time considering a small
amount of code.

We use several frameworks because sample code from different papers uses
different frameworks. It's not that big of a deal.

------
cs702
The main reason to bet on TensorFlow is that it seems to have by far the
greatest adoption of all frameworks, as evidenced by github statistics, HN
polls, and other surveys:

* [https://twitter.com/fchollet/status/765212287531495424](https://twitter.com/fchollet/status/765212287531495424)

* [https://news.ycombinator.com/item?id=12391744](https://news.ycombinator.com/item?id=12391744)

* [https://github.com/aymericdamien/TopDeepLearning](https://github.com/aymericdamien/TopDeepLearning)

~~~
aisofteng
Selection bias could mean that you're substantively wrong.

------
julsimon
When it comes to scalability, Apache MXNet
([http://mxnet.io/](http://mxnet.io/)) is actually the best choice. Multi-GPU
support and distributed training on multiple hosts are extremely easy to set
up. It's also supported by Keras (still in beta, though).

------
visarga
TensorFlow is better for deployment. Pytorch is better for research.
Theano/Keras is simpler to use and a little faster than TensorFlow

~~~
DLEnthusiast
"PyTorch is better for research" is a weird, unsubstantiated statement. The
fact is that few serious researchers use PyTorch (and even those complain
about it). It's mostly grad students in a handful of labs. The only
researchers I know who use PyTorch have been from FaceBook, and that's because
they were implicitly forced to use it (PyTorch is developed by FaceBook).

According to [https://medium.com/@karpathy/icml-accepted-papers-
institutio...](https://medium.com/@karpathy/icml-accepted-papers-institution-
stats-bad8d2943f5d) , 3 of the top research labs in the world are DeepMind,
Google Brain (and the rest of Google), and Microsoft Research. Let's see:

* DeepMind: TensorFlow

* Google Brain: TensorFlow

* Microsoft Research: CNTK

Ok, so what about academia? The top deep learning groups in academia are:

* Montreal: Theano

* Toronto: TensorFlow

* IDSIA: TensorFlow

So, what about the greater academic research community? Maybe we could get
some data about who uses what by looking at the frameworks cited by
researchers in their papers. Andrej did that: it's mainly TensorFlow and
Caffe. [https://medium.com/@karpathy/a-peek-at-trends-in-machine-
lea...](https://medium.com/@karpathy/a-peek-at-trends-in-machine-learning-
ab8a1085a106)

~~~
chenzhekl
Few people use PyTorch largely because it is relatively new (0.1.12). It even
doesn't have distributed training capabilities (coming in 0.2). Your arguments
don't say anything about frameworks themselves. It is unfair!

When people say PyTorch is better for research, they mean it is more flexible,
and it is easier to implement non-trivial network architectures with it, such
as recursive network, which is a cumbersome task for TensorFlow. MXNet's
documentation provides a good overview to these two different styles
([http://mxnet.io/architecture/program_model.html](http://mxnet.io/architecture/program_model.html)).

~~~
DLEnthusiast
All of these frameworks are "relatively new". TensorFlow: 1.6 years. CNTK: 1
year. PyTorch: 0.5 year. Are they really impossible to compare?

> When people say PyTorch is better for research, they mean

That's not what "people" say. They tend to say the opposite. Maybe we can ask
OP what he meant when he said it.

> it is easier to implement non-trivial network architectures with it, such as
> recursive network

It is interesting that you mention recursive networks. There are only a few
dozens of researchers who work with recursive networks, and they are all
accounted for, we know what tools they use. They use Chainer and DyNet.

~~~
pilooch
With all due respect you don't appear to know much about what you are
commenting about. There is a huge community of research and applications
around rnn and all other architectures. Pytorch has an extremely vivid and
fast growing codebase, across all neural architectures and applications, it's
remarkable actually. One reason might be because it is simple an effective,
including easy debugging. Another is because research now is much focused on
new possibly complex and exotic architectures, new optimizers, inner guts and
behavior understanding​, including theoretically. And it appears pytorch makes
that easy. Don't throw it away too fast as a DL enthusiast :)

------
ssivark
An observation when taking a step back: The discussion about deep learning
frameworks seems almost as complicated as the Javascript framework discussions
a couple of years ago. Google and Facebook pushing their own frameworks (among
other participants) also adds to the deja vu!

Why is the choice of framework such a big deal? Is it unreasonable to expect
someone well-versed in one framework to be able to pick up another reasonably
fast _if /when collaborating with someone proficient in the latter_ ?

~~~
laingc
To answer your first question, I actually don't think it's a big deal.

It may become important if you end up having a ton of models running in
production that need to be maintained and further developed, but in general
for new applications I would say that substantially less than 5% of your time
would (should!) be spent actually writing any code.

------
anxman
One other benefit with TensorFlow is that transitioning to cloud based
processing on Google Cloud / Tensor Processing Units is seamless. It will
turbo charge your training when compared to typical GPU performance.

Disclosure: Work for Google Cloud

~~~
asah
Any ETA on those TPUs? Stop teasing! ;-)

------
nafizh
Have you considered using Pytorch? Actually, many in the DL community thinks
it is the next big thing as it is more intuitive and dynamic than Tensorflow.

------
k__
What's the best thing to build when starting TF? Like, the todo list of TF?

~~~
mmv
Tensorflow has a nice introduction tutorial using the MNIST dataset for
recognition of handwritten digits.

Creating a small neural network and training it over the MNIST dataset is like
the 'todo list' starter project of this kind of frameworks.

------
massaman_yams
I found this to be informative - [https://svds.com/getting-started-deep-
learning/](https://svds.com/getting-started-deep-learning/)

------
ma2rten
Here are some reasons why TensorFlow might be better:

* more widely used, more example code

* developed by a bigger team, likely to improve faster

* easier to deploy

* training with Cloud ML

* better support for distributed training

* no compile time (this can be long especially for RNNs)

------
yuanchuan
Tensorflow might not be the fastest in terms of computation speed, but it can
be used from research to production with Tensorflow Serving.

As such you won't need to implement/convert your model in another format for
usage.

------
torbjorn
tensorflow has tensorboard, a great application that allows your to explore
your models in depth. it makes neural networks less of a black box.

------
johnsmith21006
Over 60k stars on github for TF. It won.

~~~
ci5er
If people weren't lemmings, that would be a valuable indicator. But, it turns
out that...

------
manis404
Personally, I use a combination of Tensorflow and Appex frameworks. I find
Theano simply lacking in features.

------
chronic7ui
r/Machine Learning

------
he0001
Use a AI program to answer that question!

~~~
he0001
Wouldn't it a bit reasonable to be able to use an AI program to tell which AI
program is the best? That would be a some sort of Turing test by itself? And
if not, so much for AI?

