Hacker News new | past | comments | ask | show | jobs | submit login
Eager Execution: An imperative, define-by-run interface to TensorFlow (googleblog.com)
125 points by alextp on Oct 31, 2017 | hide | past | favorite | 35 comments

You can read out more about it in the blog post ( https://research.googleblog.com/2017/10/eager-execution-impe... ) or the README ( https://github.com/tensorflow/tensorflow/tree/master/tensorf... ). This is still a preview release, so you may hit some rough edges.

Looking forward to your feedback as you try it out.

I'm on the team that worked on this -- happy to answer questions!

Hot damn this has got me all giddy. How will this work on single node multi-GPU systems? For example, with PyTorch you have to either use threading, multiprocessing, or even MPI. Can you think of a not-too-scary way to use eager execution with multiple GPUs?

We're still fairly early in the project, so for now threading is the only supported way.

We can do better, however, and we're working on ways to leverage the hardware better (for example, if you have no data-dependent choices in your model we can enqueue kernels in parallel on all GPUs in your machine at once from a single python thread, which will perform much better than explicit python multithreading).

Stay on the lookout as we release new experimental APIs to leverage multiple GPUs and multiple machines.

Announcing TensorFlow's new development roadmap mandate: copy everything PyTorch is doing :-)

I think you mean Google is following the leadership of Chainer, like Facebook already does? PyTorch started as a Chainer fork. Its dynamic graph internals are all from Chainer.

This isn't art. There are no points for originality. If open source projects borrow the best parts from each other, that's a good thing.

It's not a bad thing. It's good for users. But give credit to the leaders in the field. If you make an iPod clone, you call it an iPod clone, not a clone of the Zume HD.

Chainer started it, was around years earlier, and it still has more users. So Google is not copying PyTorch, it's copying Chainer.

Do you have a source for Chainer currently having more users than TensorFlow?

Not more users than Tensorflow. But maybe more user than other dynamic deeplearning framework (PyTorch, Gluon, DyNet...)

Totally agree!

This is the first time I am hearing this. I though pytorch was based on torch (like the name implies). Do you have a reference or more information?

PyTorch use the same backend as torch (cutorch for GPU ...) But PyTorch use almost the same Python API than Chainer. On this point we can say PyTorch "copy" Chainer.

Based on your reasoning PyTorch is copying TensorFlow static optimizations and production capability with JIT and ONNX then? I've seen many folks requesting an imperative API.

You can't please everybody, as if they listen or not to users people still complain. If both are making effort to improve themselves though, the community has only to benefit from this competitiveness.

I'm usually against this type of framework baiting, but being a tensorflow guy myself & having just spent the week coding with pytorch full time.... this is basically identical to pytorch

What are the strengths and weaknesses of each? I've been using keras but planning on diving into a real deal framework next. Tensorflow is appealing for the momentum it has in the community, but pytorch looks easier to learn.

Doing image classification, object localization, and homography (given an input image, which of my known template images is matches it and in what orientation).

I think Keras is a real deal framework. It provides a higher-level API than most other frameworks, but it has pretty sweet portability of models across frameworks and platforms and most research papers are implementable in Keras without too much trouble.

In my opinion, the real deal with Pytorch or Chainer, there are similar than numpy API. So the learning curve is flat. The NN construction part and gradiant part are specific but all the glue is regular python unlike Keras, tensorflow ...

That's actually very good that they are copying good things from other frameworks.

the question now is, are tensorflow eager's RNN as slow as pytorch's are?

(I'm author of the TF rnn api & tf.contrib.seq2seq)

There's a lot of work being done on this specific part. If you have a standard RNN architecture you want to run, you can probably use the cudnn code in tf.contrib.cudnn to get a super fast implementation.

There is some performance work that needs to be done on properly caching weights between time steps of an RNN if you use a tf.nn.RNNCell. Currently if you want to implement a custom architecture, or a seq2seq decoder, or an RL agent, this is the API you would want to use. Several of the eager benchmarks are based on this API; so that performance will only improve.

I'm hopeful that for the next major release, we'll also have support for eager in tf.contrib.seq2seq.

TensorFlow: everything to all people.

Eager is actually not as innocent as "open-source projects borrowing the best parts from each other", as some commenters here suggest.

Google is attempting to dominate the machine-learning API and the Python ecosystem for scientific computing.

The company that controls the API influences which apps are built on it and how. Think about how Google bundled Android services on top of Android, and how that posed an existential threat to other companies. That's what's coming for TensorFlow. Many developers are too naive to realize it, or too short-sighted to care.

Huh? They're attempting to dominate the machine learning ecosystem by writing a bunch of free and high quality machine learning libraries? What exactly are they doing wrong?

I wouldn't compare a permissively licensed library to Android services at all.

I'm surprised I have to write this, but Google is not a charity. They are pouring commercial resources into Tensorflow for a reason. That reason is Google Cloud. Tensorflow is a Trojan horse to get people to use Google Cloud and other paid Google products. How do I know this? Because Tensorflow works better on Google Cloud than anywhere else, and Google is making a concerted effort to catch up with AWS in cloud, mostly through machine learning.

I didn't compare Tensorflow to Android services. I said that Tensorflow would serve as the basis of a service bundle, much like Android did. Let's come back in a couple years and I'll tell you I told you so.

> I'm surprised I have to write this,

Insulting the reader

> but Google is not a charity


> They are pouring commercial resources...

As opposed to "non-commercial resources"?

> ... for a reason.

Everything happens for a reason.

> That reason is Google Cloud.

> How do I know this?

Pray tell!

> Because Tensorflow works better on Google Cloud than anywhere else.

This is the only real argument in this conspiracy. And if "anywhere" includes the users' hardware, it's wrong: tensorflow runs flawlessly on any Linux/NVIDIA hardware. Maybe it works better with GCE than AWS, but that would once again fall into that "rather unsurprising" category of factoids.

> Google is making a concerted effort to catch up with AWS in cloud, mostly through machine learning.

This can be re-written as "Google has a cloud offering, which it tries to sell. And right now, machine learning is pretty hot". Throwing a "concerted effort" in there is just trying to jazz it up to something ominous. Which it isn't.

> I didn't compare Tensorflow to Android services. I said that Tensorflow would serve as the basis of a service bundle, much like Android did.

"The basis of a service bundle" actually doesn't sound that scary. Nobody is disputing that Google offers services build on tensorflow. It just isn't any sort of "Trojan horse" conspiracy, and it is somewhat limited by the fact the tensorflow is OSS licensed and could be forked by anybody people suddenly find out it's full of geek soldiers.

> truism

Maybe, be people in this thread treat Tensorflow's creation as an act of simple altruism.

> And if "anywhere" includes the users' hardware, it's wrong: tensorflow runs flawlessly on any Linux/NVIDIA hardware. Maybe it works better with GCE than AWS, but that would once again fall into that "rather unsurprising" category of factoids.

Sorry, Tensorflow is slow on GPUs compared to other frameworks. This is not just an early blip, its a consistent pattern that has been repeatedly demonstrated. Why is Tensorflow slow on commodity hardware? Why isn't Google with it's infinite resources making Tensorflow run as fast as other frameworks on GPUs? Because it needs to demonstrate an advantage on the Google Cloud with TPUs.

On that cloud, it surrounds Tensorflow with other functionality that makes it easy to build AI, which aren't part of the Tensorflow project. Tensorflow is hard and inefficient to serve for inference, for example.

Machine learning is Google cloud's only hope to salvage Diane Greene's efforts and extend their dominance to a new sector. They're running a distant fourth.

> actually doesn't sound that scary.

It sounds scary to a lot of companies that don't want to be controlled or destroyed by Google. But by all means, lend them a hand, geek soldier.

In what way is Tensorflow working better on Google Cloud? Are they tuning the ML code for specifics of their infrastructure or does Google Cloud just have more tooling for Tensorflow?

Disclaimer: I am notoriously anti google and have tons of reasons to post these links. We push on hybrid cloud/on prem deep learning with our own deep learning framework that competes with the commerical sides of tensorflow, mxnet,..

Sample of search results: https://cloud.google.com/ml-engine/ https://cloud.google.com/tpu/

Even their docs: https://cloud.google.com/tpu/

Marketing content/training: https://www.coursera.org/learn/serverless-machine-learning-g...

vs (1 link I found with googling) for AWS: https://aws.amazon.com/tensorflow/

If we push the amazon equivalent though, run this: site:amazon.com aws mxnet

Every cloud vendor has their own framework. Microsoft has CNTK on azure as well.

Google doesn't want a repeat of what happened with map reduce and hadoop: https://www.quora.com/What-is-the-relationship-between-MapRe...

That being said, as a user: Just take it. You benefit from vendors competing. Google would love to pay you to use their tools.

Hey guys, if I could request... Please fix the serialization story for tensorflow. There 6 googleable methods to export from tensorflow and nobody knows what will work on the cloud, what can be exported from cloudml and what can be loaded on Android.

It has to be consistent and there has to be one way to do it.

I personally have a 10 message thread with Google cloud support on exporting a Cloud trained model to tensorflow and nobody could figure it out [Case #13619720].

Did you try using SavedModel? It should be seamless to use downstream with tensorflow serving and it's not that hard to get estimators to spit those out.

I really wish. https://github.com/tensorflow/tensorflow/issues/12750

In fact if you dig up the case, then even official support told me that savedmodel needs some freezing using bazel otherwise it doesn't work.

The github page and stackoverflow are full of these. If you can, please take the message to the other side :(

I don't think the cloud guys (where training will happen in distributed mode) talk to the android guys (where models will be used after quantization). There is a huge serialization problem that all of us are currently struggling with.

Ah, I didn't know SavedModel didn't work in android. I think freezing is still the way to go there? I'm sorry, I don't personally work on the mobile side of things.

I should apologize for hijacking this thread(and i'll stop here). But Tensorflow is getting to be unusable because of the serialization story. We don't have such issues on Caffe2 or anywhere else. It essentially means different parts of the tensorflow ecosystem are unable to talk to each other.

I really pray the tensorflow teams give it due importance.

I'm the original author of the freeze_graph script, so I'm to blame for a lot of the on-going mess here. For what it's worth I'm actively working on cleaning this up, since I know what a painful experience it is. Apologies for everyone who's struggled with this, and I will take a look at the case number mentioned above and follow up internally to see if there's anything I can help with.

Thanks for this! I would like to bring two things to your attention :

1. We don't know what to use and its very confusing. For example, now there is https://stackoverflow.com/questions/42216208/should-tensorfl.... Will freeze_graph become canonical and we forget about SavedModel? And everything else deprecated? It should be part of the core API and workable on CloudML, where we don't have a lot of control on running scripts and certainly not Bazel builds.

2. Android/ios story. Now you have the Pixel Visual Core as well... Please make it seamless all the way to Android or Ios or raspberry pi (whatever you guys support).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact