Hacker News new | comments | show | ask | jobs | submit login
Ask HN: Why TensorFlow instead of Theano for deep learning?
138 points by DrNuke 160 days ago | hide | past | web | favorite | 49 comments
I am an average solo user / applied researcher using Windows locally with a GTX 1070 8GB and looking for speed and documentation first, so Theano is way ahead in these departments. That said, we are also told TensorFlow is the next big thing because of scalability (?). TensorFlow works under Windows with Anaconda and Python 3.5 through Keras as well, so I have it available indeed and can try the benchmarks. Where do we stand, really? Thanks.



TensorFlow automatically discovers and uses GPUs and multiple cores, and I'm assuming Google is working on better supporting multiple GPUs, which currently requires hacks/tweaking to get speedups (it's easy to 'use' them)

TensorFlow is a platform "winner" and approx 100% of all innovations will quickly be ported to TensorFlow - TBD which of the others will "keep up" with innovations as they continue to come out.

other recommendations:

- by default, TensorFlow allocates 100% of GPU RAM for each process. You'll want to control this: https://stackoverflow.com/questions/34199233/how-to-prevent-...

- Keras. yes, this. Dramatically reduces code by 2-10x, without loss of control AFAICT.

- cloud hardware. Pretty quickly, you'll want to scale and run multiple tests at once, and e.g. quickly backup & copy data, replicate system images, etc. I use Google Cloud Hosting and it's much easier (and cheaper) than AWS. Haven't tried Azure but heard good things. At least once, Google's internet bandwidth has saved hours waiting for data transfers.


Your comments are mostly valid, but I disagree about Keras. Although it's marvellous for patching something together quickly, if you want to branch out at all then it quickly becomes an absolute mess.

Far better, in my view, is to work with the newer Estimators API. It is almost as fool-proof as Keras, but instead of trying to be a framework as such, the Estimators/learn API essentially just wraps up some of the boilerplate that you need with raw tensorflow, and internally looks fairly similar to the code you might write yourself. Consequently, it preserves the composability of TF far better than Keras.


I'll try it!


Keras definitely means you lose control, but it's a tradeoff that's worth making in many cases.


dumb/quick q: why not simply add classes/functions to Keras ? I read the code and it looks pretty simple...


Anyone who loves the Lisp concept of "code is data" will love TensorFlow.

Instead of coding imperatively, you write code to build a computation graph. The graph is a data structure that fully describes the computation you want to perform (e.g. training or inference of a machine learning model).

* That graph can be executed immediately, or stored for later.

* Since it's a serializable data structure, you can version it quite easily.

* You can deploy it to production without production having to depend on ANY of the code that built the graph, only the runtime necessary to execute it.

* You can run a compiler on it (such as XLA or TensorFlow's built in graph rewriter) to produce a more efficient version of the graph.

* In some circumstances, you can even compile the runtime away, producing a single .h/.o that you can link directly into e.g. a mobile app.

It's a beautiful and highly useful abstraction that allows TensorFlow to have both a great development and production story in one framework. Most frameworks only have a good story for either development or production.

If you are a machine learning researcher who doesn't need or care about deploying your work (i.e. mostly publishing papers), you may not want the overhead of having to deal with building a graph, and may prefer something that computes imperatively like PyTorch. If you are building products / services that use ML and developing/training your own models (as opposed to taking pre-trained models and using them), there is really no credible competitor to TensorFlow.

Disclaimer: I work at Google. I spend all day writing TensorFlow models. I'm not on the TensorFlow team nor do I speak for them or Google.


> If you are building products / services that use ML and developing/training your own models (as opposed to taking pre-trained models and using them), there is really no credible competitor to TensorFlow.

MXNet has amalgamation http://mxnet.io/how_to/smart_device.html#amalgamation-making...

CNTK provides a managed ("evaluation") library solution to deploy your models and embed them in C, C++, C#, Python, and even an experimental Java version. https://docs.microsoft.com/en-us/cognitive-toolkit/CNTK-Eval...

How's that not competitive to TF? MXNet's approach is a bit unwieldy, yes, but seems easily streamlined. And CNTK's deployment method looks perfectly fine. Note I haven't checked other DL libs, but it seems unreasonable that Microsoft and Amazon have no "competitive" solution for deployment.


I also completely forgot about Caffe(2), which I recall to always have been the most easily deployable library, and possibly DL4J.

http://www.cio.com/article/3193689/artificial-intelligence/w...


Disclaimer: I built dl4j and will be highly biased.

A lot of what we see is deployment to production. We are embedded in a few apache projects now as well as other "enterprise" suites like knime.

The reason for this is simplicity and integration with the JVM as well as supplemental addons for things like ETL (see: jdbc, hdfs, kafka,spark,..) as well as what I think is the easiest way to do both multi threaded model serving and data parallel training (parallelwrapper and parallelinfernece).

We also made things tunable from the JVM (eg: You can configure cuda, native cpu variables, blas libraries,..) from the JVM.

We import python models as well.

We aren't heavily used in the research world but are used at scale (especially in china).

Something we will have coming up is also the ability to import tensorflow models (right now we have keras 1 and need to add keras 2). The only thing we are missing and finishing out now (among other projects) is autodiff. We will have a lot of the same properties as the other "computation graph" frameworks like TF,theano,pytorch.

I'd also note we have a chainer/pytorch like cmpgraph api built in to our neural net dsl already.

Next release we will also have our new parameter server using aeron. Aeron is miles ahead of GRPC(https://github.com/benalexau/rpc-bench) being used in the low latency/quant world as well as being the default transport for akka now.


We've moved over to Tensorflow from Theano around a year ago. I'm a Software Engineer on the team and here's what I think are advantages from my POV:

1) Transition was fairly straightforward, both APIs' interfaces are more-or-less similar and share some design characteristics.

2) Having said that, TF's API is easier to use and without a doubt a lot easier to read.

3) Consistency: Deploying Theano in different environments surprised me on several occasions with different output compared to the training environment. TF is more consistent on this front (never had such issues).

4) Running multiprocessing with Theano + GPU is a disaster (due to forking) so I end up having to create process pools before initializing Theano. No such issues with TF.

5) TF provides many helpful operators (such as queues and batching ops) as well as monitoring tools (Tensorboard) and debugging tools.

6) Its development is extremely rapid, new releases every couple of months with a lot of improvements and new features every time.

In short, TF is what Theano should have been. A lot of new papers are being developed in TF as well so it helps to understand it.


> 6) Its development is extremely rapid, new releases every couple of months with a lot of improvements and new features every time.

How stable is the api then?

I think google is a bit notorious for this (e.g. Angular vs Angular 2).


All changes are very well documented for each release [1]. The biggest changes I've seen were, as expected, upgrading from 0.x to 1.x which were mostly function attribute names changes that were made to comply with numpy's attribute names (a welcome change). They did provide tools that automatically convert 0.x code to 1.x so that was helpful.

It is not necessary to upgrade if you're satisfied with the current version and I sure won't deploy a different version to production than training. Upgrading is mainly worth it if there are performance improvements or new features/tools that make a big impact (and those are the changes we're mostly interested in and look forward to every release).

[1] https://github.com/tensorflow/tensorflow/releases


I am also interested in that - People seemingly are complaining a lot about broken RNN(Cell) functionality/insufficient regression testing. https://news.ycombinator.com/item?id=14576912


Fortunately, it's not an irrevocable decision like choosing a JavaScript framework. With deep learning you spend a lot of time considering a small amount of code.

We use several frameworks because sample code from different papers uses different frameworks. It's not that big of a deal.


The main reason to bet on TensorFlow is that it seems to have by far the greatest adoption of all frameworks, as evidenced by github statistics, HN polls, and other surveys:

* https://twitter.com/fchollet/status/765212287531495424

* https://news.ycombinator.com/item?id=12391744

* https://github.com/aymericdamien/TopDeepLearning


Selection bias could mean that you're substantively wrong.


When it comes to scalability, Apache MXNet (http://mxnet.io/) is actually the best choice. Multi-GPU support and distributed training on multiple hosts are extremely easy to set up. It's also supported by Keras (still in beta, though).


TensorFlow is better for deployment. Pytorch is better for research. Theano/Keras is simpler to use and a little faster than TensorFlow


"PyTorch is better for research" is a weird, unsubstantiated statement. The fact is that few serious researchers use PyTorch (and even those complain about it). It's mostly grad students in a handful of labs. The only researchers I know who use PyTorch have been from FaceBook, and that's because they were implicitly forced to use it (PyTorch is developed by FaceBook).

According to https://medium.com/@karpathy/icml-accepted-papers-institutio... , 3 of the top research labs in the world are DeepMind, Google Brain (and the rest of Google), and Microsoft Research. Let's see:

* DeepMind: TensorFlow

* Google Brain: TensorFlow

* Microsoft Research: CNTK

Ok, so what about academia? The top deep learning groups in academia are:

* Montreal: Theano

* Toronto: TensorFlow

* IDSIA: TensorFlow

So, what about the greater academic research community? Maybe we could get some data about who uses what by looking at the frameworks cited by researchers in their papers. Andrej did that: it's mainly TensorFlow and Caffe. https://medium.com/@karpathy/a-peek-at-trends-in-machine-lea...


Few people use PyTorch largely because it is relatively new (0.1.12). It even doesn't have distributed training capabilities (coming in 0.2). Your arguments don't say anything about frameworks themselves. It is unfair!

When people say PyTorch is better for research, they mean it is more flexible, and it is easier to implement non-trivial network architectures with it, such as recursive network, which is a cumbersome task for TensorFlow. MXNet's documentation provides a good overview to these two different styles (http://mxnet.io/architecture/program_model.html).


Yep, to make it clear, TensorFlow is like Angular (acclaimed, widely used) and PyTorch is like React (much more flexible and composable). By the way, funny, both are made by respectively Google and Facebook. History repeats itself.


All of these frameworks are "relatively new". TensorFlow: 1.6 years. CNTK: 1 year. PyTorch: 0.5 year. Are they really impossible to compare?

> When people say PyTorch is better for research, they mean

That's not what "people" say. They tend to say the opposite. Maybe we can ask OP what he meant when he said it.

> it is easier to implement non-trivial network architectures with it, such as recursive network

It is interesting that you mention recursive networks. There are only a few dozens of researchers who work with recursive networks, and they are all accounted for, we know what tools they use. They use Chainer and DyNet.


With all due respect you don't appear to know much about what you are commenting about. There is a huge community of research and applications around rnn and all other architectures. Pytorch has an extremely vivid and fast growing codebase, across all neural architectures and applications, it's remarkable actually. One reason might be because it is simple an effective, including easy debugging. Another is because research now is much focused on new possibly complex and exotic architectures, new optimizers, inner guts and behavior understanding​, including theoretically. And it appears pytorch makes that easy. Don't throw it away too fast as a DL enthusiast :)


> All of these frameworks are "relatively new". TensorFlow: 1.6 years. CNTK: 1 year. PyTorch: 0.5 year. Are they really impossible to compare?

1.6 years is a long time in DL community.

> That's not what "people" say. They tend to say the opposite. Maybe we can ask OP what he meant when he said it.

Go ahead! Ask it.

> They use Chainer and DyNet.

You know Chainer came far before PyTorch, which heavily influenced PyTorch's design. You are always saying XX is using XX. Why not talk about frameworks themselves?

If you insisted to your idea, let's settle down. I don't want to start a framework war. I just want all frameworks to be equally considered.


As long as you're citing @karpathy, "I've been using PyTorch a few months now and I've never felt better. I have more energy. My skin is clearer. My eye sight has improved." (https://twitter.com/karpathy/status/868178954032513024).

My two cents as a researcher who has used theano, caffe, pytorch and TF: they all have their pros and cons. After starting out with theano, I really appreciate the dynamic nature of pytorch: makes debugging and exploration easier compared to the static frameworks. Researchers tend to value these features over deployability, scalability and raw speed (though pytorch is no slouch). So I fully expect pytorch to get a lot of momentum in the near future.


> @karpathy, "I've been using PyTorch a few months now and I've never felt better. I have more energy. My skin is clearer. My eye sight has improved."

http://www.oneweirdkerneltrick.com/


Your list of who uses what is kind of contaminated by who wrote the code, I don't think it proves anything.

Obviously Google Brain use TF and Montreal use Theano - they wrote them. Deepmind use TF, but they used Torch before the google takeover. Similarly Google and Toronto are deeply intertwined.


I am kind of new to all of this but as far as I understood you can use Keras with TensorFlow as well.


Yes you can - if you're new its a remarkably painless way to get started, at least compared to the pain you would otherwise endure :-)


I remember when it took me an hour to understand what regularization is. These simple frameworks take that for granted. They are not really accessible if you don't have a good intuition about the algorithms, hyperparameters and architectures.


>Pytorch is better for research.

and yet a lot of researchers are using Caffe.


An observation when taking a step back: The discussion about deep learning frameworks seems almost as complicated as the Javascript framework discussions a couple of years ago. Google and Facebook pushing their own frameworks (among other participants) also adds to the deja vu!

Why is the choice of framework such a big deal? Is it unreasonable to expect someone well-versed in one framework to be able to pick up another reasonably fast if/when collaborating with someone proficient in the latter ?


To answer your first question, I actually don't think it's a big deal.

It may become important if you end up having a ton of models running in production that need to be maintained and further developed, but in general for new applications I would say that substantially less than 5% of your time would (should!) be spent actually writing any code.


One other benefit with TensorFlow is that transitioning to cloud based processing on Google Cloud / Tensor Processing Units is seamless. It will turbo charge your training when compared to typical GPU performance.

Disclosure: Work for Google Cloud


Any ETA on those TPUs? Stop teasing! ;-)


Have you considered using Pytorch? Actually, many in the DL community thinks it is the next big thing as it is more intuitive and dynamic than Tensorflow.


What's the best thing to build when starting TF? Like, the todo list of TF?


Tensorflow has a nice introduction tutorial using the MNIST dataset for recognition of handwritten digits.

Creating a small neural network and training it over the MNIST dataset is like the 'todo list' starter project of this kind of frameworks.


You can train the mnist handwritten digit dataset


I found this to be informative - https://svds.com/getting-started-deep-learning/


Here are some reasons why TensorFlow might be better:

* more widely used, more example code

* developed by a bigger team, likely to improve faster

* easier to deploy

* training with Cloud ML

* better support for distributed training

* no compile time (this can be long especially for RNNs)


Tensorflow might not be the fastest in terms of computation speed, but it can be used from research to production with Tensorflow Serving.

As such you won't need to implement/convert your model in another format for usage.


tensorflow has tensorboard, a great application that allows your to explore your models in depth. it makes neural networks less of a black box.


Over 60k stars on github for TF. It won.


If people weren't lemmings, that would be a valuable indicator. But, it turns out that...


Personally, I use a combination of Tensorflow and Appex frameworks. I find Theano simply lacking in features.


r/Machine Learning


Use a AI program to answer that question!


Wouldn't it a bit reasonable to be able to use an AI program to tell which AI program is the best? That would be a some sort of Turing test by itself? And if not, so much for AI?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: