Hacker News new | past | comments | ask | show | jobs | submit login
TensorFlow: open-source library for machine intelligence (tensorflow.org)
1559 points by jmomarty on Nov 9, 2015 | hide | past | favorite | 196 comments

This is really significant. At this moment in history, the growth of computer power has made a bunch of important signal-processing and statistical tasks just feasible, so we are seeing things like self-driving cars, superhuman image recognition, and so on. But it's been very difficult to take advantage of the available computational power, because it's in the form of GPUs and clusters. TensorFlow is a library designed to make it easy to do exactly these things, and to scale them with your available computing power, along with libraries of the latest tricks in neural networks, machine learning (which is pretty close to "statistics").

As a bonus, it has built-in automatic differentiation so that you can run gradient descent on any algorithm — which means that you can just write a program to evaluate the goodness of a solution and efficiently iterate it to a local maximum. If you do this enough times, hopefully you'll find a global maximum. There are a variety of other numerical optimization algorithms, but gradient descent is very simple and broadly applicable.

And it runs in IPython (now Jupyter), which is a really amazingly powerful way to do exploratory software development. If you haven't tried Jupyter/IPython, google up a screencast and take a look.

I'm just repeating the stuff that's on the home page in a different form, but this is a really big deal. Most of the most significant software of the next five or ten years is going to be built in TensorFlow or something like it. (Maybe Dan Amelang's Nile; see Bret Victor's amazing video on the dataflow visualization he and Dan built for Nile recently at https://youtu.be/oUaOucZRlmE?t=24m53s, or try the live demo at http://tinlizzie.org/dbjr/high_contrast.html, or download and build the Nile system from https://github.com/damelang/nile.)

Apparently the Googlers think that too, because among the 39 authors of the white paper are several shining stars of systems research, including some who may be up for a Turing Award in the next decade or two.

What about Torch or Theano which allow you to use multiple GPUs and clusters? they also have a wide array of libraries which allows you to extend capabilities (itorch etc.). torch is also very fast, most parts written in C so I don't know if Tensorflow would be really that fast compared to existing librairies.

One thing find I found interesting is the ability to use the software you designed in research directly in production (without having to rewrite the whole thing in java)

That said, I'm excited to see Google release this. Hopefully it would encourage Facebook or Baidu to also release

As far as I can tell, this may not be faster than theano/torch, but it doesn't sound like it'll be slower. It takes a similar approach of sending all the big computation stuff out to efficient implementations. The "slow" language code is simply building up a graph of work to be done by the efficient implementations.

Do 10% speedups really matter for its success, if both libraries are ultimately calling into CUDNN? I do not think so.

Right now, Torch/Caffe adoption is really good with lots of research code being released for it (even from within Google DeepMind). So it will be interesting to see how this catches up amongst the research and developer community.

As a side note, it feels like the front page picture of interactive computational graph is a bit misleading - looks like the interface is still using Python (or C++), not fancy GUI... But yes, it looks very similar to a great tool from Yoshua Bengio's group - Theano. So, great! Another amazing tool to take away the grind of doing ML projects!

TensorFlow seems to have better multi cpu/gpu support than Theano.

Agreed, this is huge. I've been thinking along these lines for a while, but more focused on traditional programming systems. The approach of programming via data-flows and visually representing those is going to be enormous in the coming decade, IMHO. Especially in asynchronous and heterogenous programming environments.

Actually, it feels like I've been "scooped" somewhat. However, at the same time it's awesome to find that many of the core assumptions I've been working in formulating for traditional programming systems are actually already being used by other top engineers/researchers at Google. Thanks for posting about the whos-who of systems research. I'll have to check up on their research more!

I think there is a lot of room left to apply this approach to a lot of existing systems outside of machine learning. We're working on releasing a prototype in the coming months at our new startup (vexpr.com) based on applying these same data-flow principles to normal programming projects. Actually this might give a huge boost in presenting the underlying approach to systems engineering to audiences.

I wouldn't worry too much about being scooped by this--the idea of data flow programming has been around since the 60s. Tensorflow looks like a very nice implementation of those ideas for neural network training, and is potentially useful for more general numerical computing, but to what extent remains to be seen.

Yeah ... I'm a big fan of data-flow programming (Been working on my own approach for a little while too ...)

Nice, lol, hope TensorFlow didn't scoop any of your ideas too. ;) Overall, it's exciting to see data-flow paradigms gain popularity.

What kind of dataflow approach are you most interested in? I've been shifting my approach to work within existing systems. One things I've been doing is abusing the intent of web-components and DOM events to create data visualization tools.

Recently my team and I decided to focus down a little more. Have you had success in bridging the gap between esoteric academic papers and industry?

Nothing too serious ... mostly to glue embedded machine vision system components together in a simulation environment.

Do you guys know of any theoretical analysis on Data flow vs Control flow software ?

This comment is not being answered, not because there is none, but because there is far too much to fit a good summary of it into this margin. It includes most of modern compiler and CPU design, as well as most current HPC work.

That's a fair point overall. It's basically the entirety of the debate between functional languages and imperative languages. Nonetheless, it is still pretty good question as far as someone trying to understand the differences and why it might be useful.

@hmehta, one of my recent favorite videos generally on this topic has been from Curry On conference [1]. It's not strictly data-flow vs control flow, but it does discuss some of the more practical differences in between using monads for structuring data flows vs imperative control.

My take is that monads can be roughly considered as wrapping up imperative control flow along with the data. E.g. wrapping up the if/else choices into a data structure. Currently I actually found studying up on actual category theory easier than all the "easy introduction" things regarding monads. As in, the ideas when presented straightforwardly and technically correct are simpler than we give credence to.

[1]: https://www.youtube.com/watch?v=449j7oKQVkc

One small note on the paper itself. I love that the paper's author are in alphabetical orders, without any stupid jostling over first/last authorship.

I think that's fantastic and wish academia went that way too (yeah right).

This is the convention in mathematics.

From the AMS statement on The Culture of Research and Scholarship in Mathematics: "For this reason, mathematicians traditionally list authors on joint papers in alphabetical order. An analysis of journal articles with at least one U.S. based author shows that nearly half were jointly authored. Of these, more than 75% listed the authors in alphabetical order. In pure mathematics, nearly all joint papers (over 90%) list authors alphabetically."


Why is alphabetical order great? It just only means that people with last names starting with z will get penalized every single time because of choosing wrong parents! I would think randomized order would be chosen - especially if you were mathematician.

Why "penalized"? If there's no significance attached to author ordering, why should it matter who's listed first?

(I can actually answer my own question to some degree: the "alphabetical authors" convention is also the norm in my own field of high energy physics. Only after I finally landed a tenure-track job and served on another search committee did I realize that folks from other fields of physics would see my lack of first-author publications as a black mark. But in principle, among those who know the convention, order ought to be unimportant, shouldn't it?)

"Choose" parents with a surname starting with A then.

You can also legally change your last name to anything you want at any point in time. If you care enough.

Or use a pseudonym.

Meet Mr. Aaberg.

Interestingly, with this sounding like a somewhat realistic Norwegian/Danish name, it should actually show up at the bottom by the sorting rules of those two languages.

The digraph "aa" is an archaic representation of the modern character "å", the last letter of the Norwegian and Danish alphabets, and should be sorted accordingly.

Mr. Aaberg has nuthin' on Mr. Aaaron

If only it were so simple. It really depends on the field, but in the biological sciences it really matters where you are in the authorship list. If you are first it is assumed you did most of the work and if you are last you were the head honcho of the group. When you apply for scholarships and grants the committees will only consider the first and last authors as important and everyone else is just filler. It really important to be last when you are trying to get grants as so much emphasis is placed on track record.

I think that is usual for any large project. Take for instance the papers published by CERN.

Also consider that a single "order" might not make any sense. How do you determine if somebody "contributed" more than anybody else?

I realize the jostling going on, but I personally appreciate that I can tell from the order of names who to contact with inquiries.

That isn't that simple, as there is a cultural component in the order authors are listed (https://en.m.wikipedia.org/wiki/Academic_authorship#Order_of...)

Also, at CERN, with papers with 1000+ authors, any chance of picking the 'right' name from the list approaches zero, anyways. Such papers typically list a "corresponding author" to whom to send inquiries.

Alphabetic order also gave us the fun Google search phrase "Does G. Aad exist?" (Gives a link to https://jeffollerton.wordpress.com/2014/05/28/does-gaad-exis..., but Google also suggests a correction to your "typo")

Depending on how long it takes to "evaluate the goodness of a solution" techniques like multi-start gradient decent can rapidly become intractable though, especially in higher dimensions. There are a handful of open source libraries out there that try to tackle this time consuming and expensive black box optimization problem from a more Bayesian approach [1] [2]. There's also a YC company to do this as a service (full disclosure, I'm a co-founder) [3]. Combining these methods with TensorFlow could be very powerful and is one step closer to more automatic ML flows though.

[1]: https://github.com/Yelp/MOE (full disclosure, I co-wrote this)

[2]: https://github.com/hyperopt/hyperopt

[3]: https://sigopt.com/cases/machine_learning

Could something like MOE be made to run on TensorFlow?

Yes, if you could expose the parameters to MOE from within TensorFlow. We're going to work on an example of showing how this can be done with SigOpt, which has a similar interface.

You've just described Theano. I don't know why you think this is a watershed moment in history.

> I didn't see the author list on the white paper.


starts with the Author list.

What is Nile?

I can't find it on Google.

Nile is a language built by Dan Amelang, a researcher @ viewpoints research institute. It is a language made for graphics rendering, it has the goal of having no incidental complexity.

2D vector graphics rendering as an example, takes ~10.000 lines of code using traditional libraries like Cairo (used by Firefox) or Skia (used by Android and Chrome). In Nile, it's 200 lines of code.

Check out this repo, and the pdf inside the readme https://github.com/damelang/nile

It was also built with OMeta:



That tool can be used to solve a lot more problems. REBOL/Red and Racket are also taking DSL approach.

This is awesome, thanks so much for sharing this - and I love how whimsical the docs are:

Our pond is a perfect 500 x 500 square, as is the case for most ponds found in nature.

A feedback on the PDE aspect: is there any nice way to write your model and grid using TensorFlow, and still obtain the free derivatives/optimized computation graph if you use a different solver to actually integrate the equations? The example shows off a very simple toy first order Euler solver, but in practice this is grossly inadequate. I suspect that it's very difficult if not impossible - especially since most solver libraries are very old Fortran libs, and carrying forward the reverse mode differentiation there is a nightmare. Still, this makes for very nice building blocks.

Another question - would you consider adding an example of using this together with Google Ceres - to make the two autodiffs play nice together?

Maybe of interest: this is (amongst others) by Jeff Dean, famous for Bigtable, Map Reduce, Spanner, Google Web Search, Protocol Buffers, LevelDB, and many, many other things. He's a Google Senior Fellow, but also doubles as the patron saint of Google Engineering.

He is also a ninja, an astronaut, a master chef, and a damn good fellow!

Did they give him a Turing Award yet?

> The TensorFlow Python API requires Python 2.7

The whole scientific Python stack is available for Python 3. This seems like a somewhat backwards thing to do -- or perhaps the requirement is intended to mean at least 2.7?

Edit: added context

Edited again: More careful wording

We're looking to support Python 3 -- there are a few changes we are aware of that are required, and we welcome contributions to help!

Tracking here: https://github.com/tensorflow/tensorflow/issues/1

Thanks for a constructive reply to a (well-meaning, but) not very nice comment!

Congrats on the library, which looks awesome.

Criticism is considered "not very nice"? Your initial comment wasn't impolite or anything.

Python 3 is very much needed! <3 because it's issue #1 :)

is this compatible with pypy (or plan to be). Pypy is such an awesome project, done on such a small budget that it needs some well deserved love !

This "backwards" meme doesn't make any sense. Python 2 and Python 3 are different languages that happen to share the same name, a lot of syntax, and near-source compatibility, but they've been developed concurrently for ten years now. It's kind of like C and C++. If the current Python 2 team (which is part of the Python Software Foundation) abandons it, probably somebody else will take over maintenance, because it seems unlikely that it will fall out of use.

If the Python Software Foundation drops the ball, someone else will pick it up.

Let me start by saying I understand what you're saying. I understand why what you're saying is true. However, many people see (and consequently think) this:

"Short version: Python 2.x is legacy, Python 3.x is the present and future of the language"

... which was lifted from https://wiki.python.org/moin/Python2orPython3. It follows that some people would think something along the lines of "...so this library was built for legacy Python?"

The Python Software Foundation has stated explicitly that Python 2.7 will EOL in 2020...

...but the science guys have been offered zero incentive to move, this "end of life" argument has been worked to death without success, and so it's perhaps time to shift focus to the science community's real needs (ie: GPU).

Check out Dynd.

Python 2 is dead.

Like it or not, it is very much not dead.

hmmm. Google does not seem to agree. Your assertion is a wish not a fact. Dogmatic 3.x people are being shown, yet again, the uncomfortable, brutal truth, that 3.x is not happening in science.

Seriously. If you're advising someone considering Python on the 2 v 3 question today, how can you not tell them that possibly one of the most important new machine learning libraries, only works on 2.7.

First off, 3.x is happening in science. I'm a scientist, and I use 3.x, and as of this fall, every single scientific package that I use has been ported to 3.x. Using @ for matrix multiplication is also a pretty big deal to me.

It's worth remembering that most scientists don't do any machine learning, because it's not all that useful in many domains. A lot of science is trying to find an explanatory model for the observations. Machine learning is much better at finding a good predictive model for the observations that may not offer any insight to what is happening behind the scenes.

Yes - I have also moved (a holdout library was ported recently). I even liked it. But here I am thinking about moving backward again after having crossed that rubicon.

Three's problem is that even if 95% of stuff is on it, basically 100% of stuff is on 2.7, including brand new stuff. Let's say you use 10 libraries on average and 95% of libraries have been ported. The odds of having one library missing are still 40%! (1-0.95^10) This ratio goes up the more specialised your work, or the greater your investment in legacy, which is why most 3.x people are generalists/web where they cannot understand the 2.x position. For me this Tensorflow library transports me right back into exactly that 40% problem. That problem does not exist the other way around unless you at all costs need to use Asyncio instead of something that's already been there for years.

In my case I'm going to persevere and put 2.7 back into a virtualenv or something but it's really not ideal and my point is still that if 3.x gave science/engineering people something properly compelling (@ operator is not enough in my view) then the above problem would not be happening.

For example a difficult-but-possible megawin would be to put GPU computing natively into Python. GPU is more than 50% of the compute silicon in most computers nowadays (Iris 6200 uses up 50% of die size in recent Intel core i7). If you include multicore and discrete GPU then standard Python is only targeting about 10% of the available compute silicon in a modern PC! Imagine how very rapidly everybody would move if there were a GPU-enabled Numpy array in 3.6.

I am really excited about this library. Tensors are the future; matrices are going to look so old in a few years. In fairness, Theano already knew that. I cannot wait to dig into this and am very relieved that Google's best are still using Python for this endeavour...but...using Python 2.7.

I have just recently been persuaded by the community that 3.5 is cost free, and here I have this enormous counterexample. For the science guys, clearly, the message is not getting through, and I'm not surprised: 3.x offers them nothing. Hence they're blissfully continuing with 2.7.

So I guess I'll have to run two Pythons on my system. Not the ideal situation.

Now if Python 3.(x > 5) would give us native GPU, which might require putting (a GPU-enabled) Numpy into the standard library...as opposed to, say, spending 2 years re-inventing Tornado....

> Tensors are the future; matrices are going to look so old in a few years.

I am honestly curious about this point of view. Is there any example where actual multidimensional tensor have any relevance? What I mostly see around is just standard linear algebra operations on matrices and vectors lifted to higher-dimensional tensors point-wise (for instance applying a certain operation to all 2-dimensional subtensors of a given tensor).

I never saw the need for modeling matrices in terms of tensors - the latter seem just more complex to use and usually tensor operations (like symmetrization, antisymmetrization, wedge products...) are not even available in libraries

(And by the way, both matrices and tensors are now more than one century old...)

> What I mostly see around is just standard linear algebra operations on matrices and vectors lifted to higher-dimensional tensors point-wise

Equally what is matrix multiplication but a bunch of 1-dimensional dot products applied pointwise? why do we need matrices?

I do get what you're saying, and that part of it is that ML / CS folk just use 'tensor' as a fancy word for a multi-dimensional array, whereas physics folk use the word for a related coordinate-basis-independent geometric concept. But for numerical computing broadcasting simple operations over slices of some big array is really useful thing to be able to do fast and to express concisely.

Numerics libraries which don't bother to generalise arrays beyond rank 1 and 2 always feel rather inelegant and limiting to me. Rank 3+ arrays are often really useful (images, video, sensory data, count data grouped by more than 2 factors, ...), and lots of operations generalise to them in a nice way. Good array programming environments (numpy, torch, APL take advantage of this to provide an elegant general framework for broadcasting of operations without ugly special cases.

The traditional pure mathematical of looking at it is that vectors are members of a vector space. Matrices are linear maps, and matrix multiplication is composition of linear maps.

So what algebraic concept do tensors correspond to?

Multilinear maps. You can view contraction with a vector (a dot product) as mapping a vector into the scalars, contraction with a matrix as mapping two vectors into the scalars, and contraction with a tensor as mapping several vectors into the scalars. You don't always have to contract all the indices at once, so with a rank m tensor, you can map n vectors into a collection of m - n vectors.

In my domain (finance) correlations between vectors are unstable but (maybe) dependent on cross-sectional relationships in the problem space. Some of the mathematics behind elastic body deformation (car tyres in in mechanical engineering, fluid dynamics in weather forecasting) have high applicability. Tensors are required.

It's true that tensors are hard to reason about - they overclock my brain most of the time - but there is no doubt that, just like moving from scalars to vectors massively increases your real-world modelling power, so does the natural extrapolation to matrices, and from there, tensors.

It's not that tensors are particularly hard to reason about. It's just that the way they are represented as multidimensional arrays hides very well the fact that they are element of a tensor product (hence tensors). Natural operations like, well, the tensor product are often not even available.

I agree that tensors are often used simply as a convenient representation for multiple lower-dimensional objects. I suspect that is because there is still plenty of value in many fields, including mine, in exploring scalar, vector, and matrix representations of problems, and tensors are often unexploited. They're used as convenient data structures, not as algorithmic necessities. Still, as more and more people have access to data and explore it, increasing competition, reducing the "alpha" of simpler analyses, moving deeper into dimensionality on algorithms will be the only choice for those who want to innovate, is my view. So it is probably necessary to have the tools at our disposal already.

>Tensors are the future

"Tensor said the tensor. Tension, apprehension, And dissension have begun."


Tensors were the future even before they were cool.

It's open source, so feel free to port it to Python 3.

It has a C++ API, so there is no lock-in to Python, either 2.7 or 3.x.

It looks like the C++ API is for executing graphs only, which is kind of odd.

So, i have a very simple question.. :

I want to start deep learning now, to implement RNNs, autoencoders, Q-learning on a very specific application (not images).

I've read a lot of papers, but not done any DL implementations yet (although many classical ML so far), my question is very simple :

Where do I start ????

Should I use Torch ? Theano ? Theano + pylearn2 ? Theano + pylearn + Lasagne ? Caffe ?

Or should I just switch to this magical new library directly ?

I feel confused.. any advice ?

Methinks embrace the magical library written by engineers and researchers at Google instead of by a random goulash of machine learning grad students with no investment in the outcome other than graduating.

> with no investment in the outcome other than graduating

Come on. Grad students are motivated by advancing the field.

... are they not? Maybe you can also be saying that the ones who write libraries are motivated only by graduating.

Sure, we all start out that way. But by the 3rd or 4th year, you frequently just want to write that thesis and get on to the next thing because you've already advanced the field as much as you're going to as a grad student.

I have a large package of code on sourceforge that consists of my grad school and postdoctoral efforts. If I compare its style to what 2 decades in the industry since has taught me, it's laughable. That said, it is C code, so I occasionally plunder it for algorithms I figured out way back when.

Nobody forced us to open-source Lasagne, so I think that remark was a bit unfair. If we really didn't care about anything but graduating, why would we bother going through the trouble of sharing the code in the first place?

But I do see your point. Google obviously has a lot more manpower to spend on this, so it might be a better bet in the long run.

It's also worth comparing this to a few similar projects that have been announced recently: MXNet (http://mxnet.readthedocs.org/en/latest/), Chainer (http://chainer.org/) and CGT (http://rll.berkeley.edu/cgt/). And Theano of course, which has been the flag-bearer of the computational graph approach in deep learning for many years.

Actually lasagne is pretty good, I wasn't targeting you (but then you're pretty good at winning kaggle competitions so perhaps there's a connection there, no?)...

I'm thinking mostly of Theano, which, from a performance standpoint, appears to have died the death of a thousand inexperienced cooks in the kitchen. The ~1000x performance regressions that it invokes when a junior data scientist goes off the rails and ends up with a python inner loop amidst GPU kernels is just depressing and seemingly unfixable. Hopefully, TensorFlow will be better if only because it was written in a world now very aware of Pixel's Law.

Mxnet is awesome, but perhaps a little too parameter servery for my personal tastes, and I'm now wondering what the point of CGT is now other than to be Coke to Google's Pepsi. I also think the whole deep learning framework business model just took a torpedo amidships (and not long after the layoffs at Ersatz).

Finally, I had never heard of Chainer until today, thanks! That said, without autograd functionality, the people I work with would probably stick with Caffe + cuDNN.

Pixel's Law? what s that? can' t find it on google...

Interesting, it does appear to be gone...

Basically, old NVIDIA marketing material used to state that GPUs double in performance every year or so whilst Moore's Law is slowing down w/r to CPU performance because clock speeds have been the same for a long time

This strictly isn't true because core count, SIMD width and vector unit counts in CPUs have all been increasing. However, from the perspective of a single-threaded C application, this is indeed so. CUDA/OpenCL OTOH automagically subsume multi-core, multi-threading, and SIMD into the language itself so the hypothetical "single-threaded" CUDA app just keeps getting better(tm).

The reality though (IMO of course) is that Intel promises and delivers backwards-compatibility at the expense of free performance beer. In contrast, NVIDIA delivers performance at the expense of 100% backwards-compatibility beer for optimized code (but read each GPU's whitepaper and spend a week refactoring your code per GPU generation and you get both, also IMO and of course experience). Of course, to be fair, if you refactor your code every time they improve AVX/SSE, CPUs are a lot mightier than what Python/Javascript/R usually imply.

Well, wait to see how support goes etc... In the meantime take on Keras or Lasagne, and do your stuff, i.e. acquire hands-on experience now. When time comes that you need to go into production, or you need a model that is not available in your current lib or framework, then consider a switch. Point being that experience acquired with one framework will empower you whatsover.

In my view, these frameworks are like programming languages, you do need to pick a new one up from time to time, but you can do most of the things with all of them.

Right now Keras is your best bet for most deep learning tasks, and Caffe is best for image processing (i.e. convolutional neural networks). Keep an eye on tensorflow though because it may come to be the gold standard over the next several months to year.

Edit: Btw, François Chollet, the author of keras, made the following tweet this morning: "For those wondering about Keras and TensorFlow: I hope to start working on porting Keras to TensorFlow soon (Theano support will continue)."

I tried going through the site and also the comments but couldn't wrap my head around what this library actually is. It sounds awesome based on the response/comments. Can anyone explain this to a layman?

It is a library for the optimization of machine learning algorithms, similar to Theano. You write a model you want to optimize as a graph of symbolic expressions, and the library will compile this graph to be executed on a target platform (could be CPU, GPU). This is really neat because you essentially wrote your program in Python and now it's going to be running as optimized C++ code (or as optimized kernels on your GPU).

The fact that you defined your model as a symbolic graph means you can derive gradient equations automatically, hence ridding you of a rather tedious step in optimization. With such tools, the workflow for a practitioner is much simpler: 1-code models as graphs of symbolic expressions, and 2-define objective functions (ex. for a regression problem, your objective would be the root mean squared error, plus some regularization). Disclaimer: I did not look at the code, but from what I understand it is pretty much the same as Theano (which I've been using a lot lately).

If you want to train neural nets, you can either rewrite everything from scratch and get a bug-ridden sub-optimal implementation, or you can use a kind of off-the-shelf library.

The problem is that there are about 3-5 alternatives out there, and none of them are mature enough or convincing enough to dominate. The field changes so fast that it's easy for them to become obsolete.

What you're seeing here is the enthusiasm of people who really want to get a good tool with proper support, and be able to stick with it. I'm still not sure if TensorFlow is that tool, but it depends on what will happen to it during the coming years.

It looks like to me the big win here is that this is a ML library just like all the others, except that the code is not written for a specific type of processor (GPU versus CPU), nor is it written for a specific compute level (desktop, server, phone). It's target is to be a general purpose ML library. This should have the affect of getting ML solutions into useable applications more quickly, since you don't have to develop on one platform, then port to another. It just works everywhere. Best guess on my part based on what I'm reading on the site.

it enables you to do lots of maths in parallel in a pretty scalable way.

I'm not a machine learning/math guy by any means - but the whitepaper (http://download.tensorflow.org/paper/whitepaper2015.pdf) is pretty good and offers some very interesting concepts and comparisons.

Does anyone know how this compares to Twitter's recently released torch autograd?


EDIT: To be a bit more specific, autograd just requires that one writes the forward model, and it can involve any program logic like conditionals, etc, so long as the end result is differentiable. It then takes care of all the gradient calculations.

i really wish people would stop calling multidimensional arrays tensors.

there is already well defined language for this. multidimensional array is fine, psuedotensor is fine. tensor is confusing if you have any previous background with tensor calculus before the word became machine learning flavour of the month.

still. this does look pretty cool overall. processing large volumes of data is becoming increasingly less specialised and more generic, which can only be good. :)

I'd say tensor is exactly as confusing as vector -- vectors are just elements of some vector spaces, and you get a list of numbers by choosing a basis of that vector space. Similarly, tensors are just elements of a tensor product of some vector spaces (or modules), and you get a multidimensional array by choosing bases in the factors.

Can't agree more. The first time I looked up tensors when they appeared in Torch I got totally confused by the description of tensors used in physics.

The problem is complicated a bit more because physicists use the word "tensor" to mean a tensor field, i.e. a collection of tensors on which calculus makes sense.

Tensor is a much better term than multidimensional array:

- It's not as verbose

- The concept of tensor has linear algebra operations associated with it while multidimensional array is just a programming term without any attached Math operations.

> The concept of tensor has linear algebra operations associated with it while multidimensional array is just a programming term without any attached Math operations.

But that's the precise problem. The multidimensional arrays that programmers call "tensors" do not generally have the operations defined on them that you would expect from a tensor, such as index contraction, tensor products, or differentiation. At the same time, a tensor is different from an array, just like a number is different from its decimal representation. These distinctions are important when you want to change the basis, a very useful and frequent operation for both tensors and numbers.

Then again, this battle was already fought and lost for vectors, which programmers specialised to mean "arrays" instead of meaning "elements of a linear space".

true, and pseudotensor can be a mouthful too, but its still confusing.

François Chollet, the author of keras, made the following tweet this morning: "For those wondering about Keras and TensorFlow: I hope to start working on porting Keras to TensorFlow soon (Theano support will continue)."

"TensorFlow is an Open Source Software Library for Machine Intelligence" and then later "TensorFlow™ is an open source software library for numerical computation using data flow graphs."

So it seems to be a dataflow computation library that is being used for AI/learning. Since I know almost nothing about either, I'm wondering if this (dataflow) approach has other applications unrelated to deep learning. Any comments?

Also, i wonder if this could be implemented in hardware - say on a FPGA?

FPGAs are not measuring up to GPUs yet for these tasks. There's too much floating-point math (and yes, Altera is addressing this) for forward prediction, and way too much intermediate state to save for training (GPU memory controllers are 'da bomb for this). Finally, OpenCL compilation time on FPGAs is measured in hours as compared to seconds for GPUs.

That said, address all of the above and maybe NVIDIA's stock won't hit 50 next year.

There are a few people that seem to be working with integer-based deep learning and FPGAs, and even though they're at the beginning of research, things are looking pretty promising. Here's a paper that was released earlier this year that seems to show that at least some algorithms can see a big benefit from FPGAs http://arxiv.org/pdf/1502.02551v1.pdf .

Now THAT has some potential. Both for FPGA's and SIMD/MIMD architectures. Thanks for the link.

For those looking for more context, here's the Google research blog post:


Honestly, the raving that's going on in this thread is unwarranted. This is a very nice, well put together library, but it does not do anything fundamentally different from what has already been done with Theano / Torch. It is not a "game-changer" or a spectacular moment in history as some people seem to be saying.

Are there any major conceptual differences to Theano? Not that I wouldn't appreciate a more polished, well funded competitor in the same space.

It looks like using TensorFlow from Python will feel quite familiar to a Theano user, starting with the separation of graph building and graph running, but also down into the details of how variables, inputs and 'givens' (called feed dicts in tensorflow) are handled.

I'm looking at the RNN implementation right now (https://github.com/tensorflow/tensorflow/blob/master/tensorf...). It looks like the loop over the time frames is actually in Python itself.

    for time, input_ in enumerate(inputs): ...
This confuses me a bit. Maybe the shape is not symbolic but must be fixed.

I also haven't seen some theano.scan equivalent. Which is not needed in many cases when you know the shape in advance.

I think this loop actually still only builds the graph -- what `scan` would do. The computation still happens outside of python. That is, in tensorflow they perhaps don't need `scan` because a loop with repeated assignments "just works"... Let's try this:

It seems like in TensorFlow you can say:

    import tensorflow as tf 
    sess = tf.InteractiveSession() # magic incantation

    state = init_state = tf.Variable(1) # initialise a scalar variable

    states = []
    for step in range(10):
         # this seems to define a graph that updates `state`:
         state = tf.add(state,state)

at this point, states is a list of symbolic tensors. now if you query for their value:

    print sess.run(states)
    >>> [2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]
you get what you would naively expect. I don't think that would work in Theano. Cool.

Why wouldn't this work in Theano?

    >>> import theano
    >>> import theano.tensor as T
    >>> state = theano.shared(1.0)
    >>> states = []
    >>> for step in range(10):
    >>>     state = state + state
    >>>     states.append(state)
    >>> f = theano.function([], states)
    >>> f()

Thanks! When I tried this before, I thought compilation was stuck in an infinite loop and gave up after about a minute. But you're right, it works. Though on my machine, this took two and a half minutes to compile (ten times as long as compiling a small convnet). For 10 recurrence steps, that's weird, right? And the TensorFlow thing above runs instantly.

Agreed. Theano has trouble dealing efficiently with very deeply nested graphs.

You're right. There is not currently a theano.scan equivalent that dynamically loops over a dimension of a tensor.

That said, you can do a lot with truncated BPTT and LSTM. See the sequence modeling tutorial on tensorflow.org for more details.

Geoffrey Hinton and Jeff Dean will give separate tutorials at NIPS. https://nips.cc/Conferences/2015/Schedule?day=0

Google says they are working on a distributed version and will release when it is ready. https://github.com/tensorflow/tensorflow/issues/12#issuecomm...

Many many thanks in advance to anyone who answers:

I've futzed around with ML docs (dataquest, random articles, first 3 weeks of andrew ng course on coursera) & what I don't get about this (http://tensorflow.org/tutorials/mnist/beginners/index.md) is how to actually make a prediction / classification on something that is not "test" data. Can anyone point me in the right direction of how to "use" a model once I have it validated, like we do at the end of that tutorial?

How does TensorFlow compare to Torch7 (http://torch.ch)? They look very similar to me.

Actually, this looks much more like Theano to me. It's all symbolic, it has symbolic differentiation, it's Python and it even seems to have a similar API like Theano.

In the Whitepaper (http://download.tensorflow.org/paper/whitepaper2015.pdf), it's compared to Theano, Torch and others.

That might be a strong advantage in its favor, though. Building a better version of a proven concept with similar usage/API's often gets more market uptake. Might get more use.

TensorFlow seems to emphasise the distributed computing aspects a lot more compared to Torch7. Also, TensorFlow is made for deploying machine learning pipelines in general, whereas Torch7 focuses on deep neural networks specifically.

> This open source release supports single machines and mobile devices.

Can someone clarify if "single machines" means the Apache license only applies to single machines and not distributed systems?

Sidenote: I wish every open source project had a release video.

> Can someone clarify if "single machines" means the Apache license only applies to single machines and not distributed systems?

I'm not aware that there is a way to do that and still legitimately call it the Apache license. I took the statement to mean that the source does not have the capability of parallelizing across machine boundaries.

There's a note on the site about being able to move computation to a process on a different machine, but it does mention copying the model. If I understand correctly, I think models tend to be quite small, though.

Does it actually make sense to train a model simultaneously on different machines? I can understand running the same model on lots of machines, of course.

Yes, there are two types of parallelism: model and data. Data parallelism is simply training the the model on multiple computers with different minibatches, and aggregating the gradients. Model parallelism is hosting different parts of the model on different computers. Of course, these two parallelisms can be combined. A good explanation is the One Weird Trick paper: http://arxiv.org/abs/1404.5997

I was interested in whether the license only permits a single machine.

Nope, nothing in the license prohibiting it. Seems more a case that they just haven't released the glue for it to run on multiple machines, but hope to have it added one way or another.

If there is randomness in some part of the training process (and most surely there will be), then it makes sense. Different trainings may end up in different local minimums/maximums, so there is value in running multiple trainings simultaneously in different machines.

Is it common for code to open sourced, but only for use on single machines?

The distributed part of the system probably depends on other components of Google's proprietary infrastructure, e.g., for cluster job scheduling. So it's a bit of work to tear that out and replace with something that's externally usable. I can't tell whether they've committed to doing that, but I hope they do.

Minor nitpick: The authors are not the first ones to recognize that 'tensor' makes a great (brand-) name. As someone who is paid to think about tensors this is a bit annoying, because the first thing in my mind when I read a page like OP, is "where are the tensors?". Alas, there are none.

For a mathematician, a "multidimensional data array" is not a tensor. It is a tuple. A tensor is a tuple with much more structure, associated to linear actions on its components. Said differently, if you can't tell me, what 7*[Alice] means, you have no right to call [Alice] a tensor.

I'm kinda floored, by this. Have been developing a flow based language to do this. Even started with the approach of using Python and then moving to scripting directly in C++. Maybe I tweak my own project. Anyone playing with low level GPU deployments, like Metal?

[EDIT] Ok, so there is some GPU - should probably play some more before asking obvious questions. Really keen on scripting out flow directly.

link to your project ?

Sorry, not ready for prime-time. Closest is a visualization of the previous version of the script: https://www.youtube.com/watch?v=a703TTbxghc

That version wasn't a NN graph for ML; more of an executable flow graph for visual music. Will post something on https://github.com/musesum in a couple months.

I can't wait to try this tonight. I have a fun "messing around" project (I am using GnuGo to generate as much training data as I need for playing Go on a very small board, and I am almost to the point of starting to train and test models).

BTW, Ruby has always been my scripting language but because of the wealth of libraries for deep learning I am thinking of switching to Python.

I believe the big facebook and google Go networks were trained on data from professional games. Probably you'd get better performance if you did that as well.

Why switch? I use Ruby, when I can and Python when I can't.

Would really love to see ruby forks of popular ml libs.

Or is it because Python is better perfomance-wise?

Really good question since I really enjoy using Ruby, more so even than Clojure and Haskell.

I too would like to see Ruby ports of some of the popular ML libraries. The classifier gem provides an example.

I have started to experiment with it. Really awesome documentation and setup nstructions (using Ubuntu).

AWS has GPU clusters already available FYI: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using_clu...

Not sure how much effort is required to use them with this library.

2:21 in the video you can see a link to: corp.google.com

I love their UI for single sign on.

Hmmm, the video appears to only be 2:17 long now...

1:52 actually, sorry I was mistaken.

Any reason you wrote the Python wrapper in SWIG instead of Cython?

The nice thing about Cython is that you can wire up two C extensions to talk to each other directly, without going through Python. Cython also gives you 2/3 compatibility out of the box.

SWIG supports many other languages besides Python.

What kind of tool can be used to create that animated illustration? Looks really neat.

It looks like it is TensorBoard, included with TensorFlow http://tensorflow.org/how_tos/summaries_and_tensorboard/inde...

The programming model here is very similar to that of the language I work on, Streams Processing Language, normally just called SPL. A paper we have submitted on the language itself: http://www.scott-a-s.com/files/ibm_tr_2014.pdf

Our domain is online stream processing for, generally, the big data space. I do think, however, that describing computations in this manner gives enormous flexibility to runtime systems to actually exploit all available parallelism dynamically.

Seems like Ben Lorica's call for open source tensor libraries for data science [1], were answered!

[1.a] http://radar.oreilly.com/2015/03/lets-build-open-source-tens... (March 2015)

[1.b] http://radar.oreilly.com/2015/05/the-tensor-renaissance-in-d... (May 2015)

Interesting! I wrote in a draft post back in late 2013 [1], that asked:

"What if one could have a fully declarative “matrix language” in which all data transformations ever needed could be declaratively defined in a way that is very easy to comprehend?"

I'm now pondering whether TensorFlow isn't quite an answer to this question?

[1] Posted the draft now for reference: http://bionics.it/posts/matrix-transformation-as-model-for-d...

The idea of representing a program by a declarative computation graph of matrix transformations has been big in deep learning research for a few years. Theano [1] is the canonical example, though more recently the space has gotten bigger with other frameworks including CGT [2] and now TensorFlow. The computational graph abstraction is really nice in part because it gives you very straightforward automatic differentiation, which dovetails very nicely with the trend in machine learning of casting a wide range of algorithms as (stochastic) gradient ascent on some loss function.

[1] http://deeplearning.net/software/theano/

[2] http://rll.berkeley.edu/cgt/

Thanks! Yea, though in fact my idea is more about wiring the actual (matrix/tensor) operation with dataflow, rather than just the dataflow between such operations.

But it might be bit of a different problem area :)

I am a newbie to machine learning.

How does it compete to Azure ML besides it is open source?

It's not really directly comparable. This is a computational framework which you can use to implement a variety of ML algorithms (or in general numerical computation), so the interface to this is lower-level than Azure ML.

If Azure ML is a bunch of premade sandcastle molds, TensorFlow is a more accurate, faster way to pour sand. You make the mold.

TensorFlow built a language, AzureML uses a bunch of languages of your choice to build any machine learning on Azure. Try it https://studio.azureml.net/ follow the tutorial

This looks similar to Orange. Orange has a visual designer, easy data access (SQL, arbitrary files) and lots of widgets, but it's a desktop tool not a library and doesn't standardize the data structures.



The newest version of Orange (version 3) is standardizing on the scientific python stack (numpy, scipy, scikit-learn)

Does anyone know if TensorFlow can apply algebraic simplifications and numerical optimisations to the compute graph, in the way that Theano does with its optimisations?

Sounds like it doesn't suffer from the (alleged) slow compile times of Theano, but I wonder if the flipside of that is that you have to implement larger-scale custom Ops (like torch's layers) in order to ensure that a composite compute graph is implemented optimally?

This is very similar to what we did in Samsung several years ago: https://velesnet.ml

Anyone experienced with machine learning want to try this out on some high frequency trading data? I'm in the processing of preparing a data set for more traditional analysis and would be willing to share.

We are in Colorado, but I'm happy to work with someone remotely.

Something tells me that Facebook's AI group will release its React to their Angular

Didn't read the paper, but when I saw that a main focus is on distribution I started thinking about how awesome it would be if people could start crowdsourcing compute resources to solve some big problems.

Found a mistake in the docs:


In the "Variables" example, looks like a variable name got changed but not updated everywhere:

    # at the start:
    var = tf.Variable(0, name="counter")

    # then:
    one = tf.constant(1)
    new_value = tf.add(state, one)
    update = tf.assign(state, new_value)    
The variable "state" should be the variable "var", or vice-versa.

Does it support OpenCL and thus Intel integrated and discrete AMD GPUs?

Appears to be only CUDA from the code. Also no distributed systems support released. Also, I cannot get it to work since this morning.

Requires nvidia capability >=3.5 at that (GK11x and Maxwell) which rules out anything I've currently got (a couple of 760's/GK104's)... which is the chip Amazon EC2 uses as well.

They're probably using Dynamic Parallelism.

The diagram at least looks a lot like Max/MSP and PureData, which are data-flow tools for processing MIDI and audio data. Could this be used to implement something along those lines?

what would be really awesome is if Andrew Ng or Norvig build a course around Tensorflow. It is really not useful to a beginner to be learning everything in matlab.

My first thought when reading the docs was: this is a game changer for the efficiency of applied ML (nlp, vision, speech) phd students. Coming from a school where there's a bit of a libertarian "write it all yourself, from scratch!" ethos, I always marveled at how much mileage other students got from research groups that built off a common codebase. Exciting to start to see glimmers of that possibility across the entire field.

There are already a few university ML courses out there using Theano (for which Tensorflow is essentially a drop-in replacement), and I think this will be a much bigger trend over the next few years. IMHO for a first course it's useful to do some work at the Matlab/numpy level just so you get experience with deriving/implementing gradients yourself, but for larger (deep) models automatic differentiation is an amazing productivity boost that should make it possible to cover a lot of interesting topics that you'd otherwise not have space for.

actually numpy would be a brilliant starting point - but I'm not able to find any popular ones that dont use matlab (or some dialect thereof).

disclaimer: I have no idea what I'm talking about, but I do know that coursera and stanford courses are the oft cited ones and they use matlab/octave.

The same goes for Torch: New York, Oxford ...

Great stuff to see that Google is releasing their own tools for research and production systems. Does anybody have a grasp of the main differences between the triple Ts: Tensorflow, Theano and Torch?

As far as I can see Theano and Tensorflow support dataflow-like computations, automatic differentiation and GPU support via CUDA.

Finally, why didn't Google, The Lisa Lab and Facebook worked together on building a unique library instead of three?

Quite an ugly set of dependencies to build. In particular, bootstrapping bazel is excruciating.

Is there a part of TensorFlow that generates that slick diagram on the front page?

This seems really similar to Brain Simulator: http://www.goodai.com/#!brain-simulator/c81c

can anybody tell me if this can calculate the gradients of a conditional graph? (something that is not implemented in theano which gives me a huge headache right now)

GPU support, woohoo!

And just 4 years after their insistence that GPUs would never have a role at Google... How 'bout that?

Google is a big place, with diverse opinions. Even better, it encourages updating one's opinion in face of new evidence. GPUs have proven themselves cost-effective in tackling a number of vision problems [and others], Google has those problems, so the opinion was updated and the problems were solved in a cost-effective manner.

That... Was not... My experience there...

But kudos to the deep learning guys for overcoming that potential energy barrier NVIDIA couldn't surmount on their own...

To me, processing lots and lots of data with "relatively simple" algorithms spells GPU, pretty much. Machine learning seemed destined to bump into GPUs sooner or later.

Google is certainly Nvidia's top customer.

Sure, now. 4 years ago, not so much.

One might even conclude that one person inside Google had written a brawny and influential paper that was right in a lot of ways, but wimpily dead-wrong to apply to GPUs.

Might one conclude that getting hung up on one opinion from a large group of people projected forward in time is... not very useful?

Might one conclude that this influential opinion might have created a big old stinky career mess for someone in the field that took many months to clean up and cost one a fair chunk of change to do so? So perhaps one could be forgiven for noting such a disruptive change in viewpoint?

Now we all must own our choices and one was very silly, stupid, and naive to blindly accept a position at Google, but everyone makes mistakes, no?

Further, TensorFlow looks fantastic but given that even just a year ago, when one was recruited to return to Google for !(Deep Learning) but for something which also happened to be another braindead obvious choice to run on GPUs, one was informed said viewpoint on GPUs was still in effect and declined.

So what I'd really love to know is how the message finally sunk in? I'm betting there's a really great story here.

Uh, so what happened? Were you working on a GPU project at Google before management came around to this way of thinking, and it got canned?

That sucks, but that kind of thing happens a lot. It's not that surprising. Google wanted a homogeneous infrastructure for a long time. But a new application (neural networks) motivated a new infrastructure (GPUs in clusters).

There was that Stanford paper everyone talks about which compared training a neural network on Google's 16K cores (DistBelief paper) vs a handful of GPUs with infiniband. Even Andrew Ng has seemed to subtly criticize Google for thinking "the old way" (i.e. "cloud" technology vs HPC technology, HPC being more effective for neural nets)

Also, I don't quite understand your comment about brawny vs. wimpy. Yes Urs wrote a rebuttal against wimpy cores. As far as I remember, his argument is basically that there is some portion of software on the critical latency path that's not multithreaded. wimpy cores make the most sense in a world of perfect parallelism.

I'm done ranting about Google and GPUs at last. From a technical standpoint, I believe TensorFlow vindicates my viewpoint and that's enough for me. This is closure I really needed, OK? And besides, now I can concentrate my energy on asking WTH happened to Android performance between 4.x and 5.x J/K?

And from a professional standpoint, I moved on and fixed the damage I let them inflict on my career. I eventually built exactly what I was told couldn't and shouldn't be done with GPUs.

But back then, the argument Google made to me was that too much of the workloads I wanted to run on GPUs were serialized (hence brawny versus wimpy). And from a classical parallel algorithms perspective, they were correct. The problem is that from a GPU programming perspective, they weren't even wrong(tm). And at the time, I had ~5 years of CUDA programming (very early adopter) and 30,000+ lines of CUDA code to back that up whereas the people I tried to convince had zero such experience so there was just no way. My argument boiled down to TLDR: O(n^2) or higher probably belongs on GPUs. Would anyone disagree these days?

More economic argument: any sufficiently restricted and important algorithm will fund enough (1) PhDs to pull out the parallelism and then (2) devoted engineers to map it down successively close to the metal (perf/watt), even if the legacy code is polished. CPU->GPU->FPGA->ASICS. Regex, neural nets, doesn't really matter.

I was going to dive into Theano. Is this much different than Theano? Or better?

Could this system be used to implement a Gaussian Process?

I'm really excited to dig in and read the source, actually.

Thank you for open sourcing this.

Great job guys!

Just found what I'm going to do today

Great job, guys! The Docker of Neural Compute..

For those interested in a minimal data flow JavaScript Engine see my project dflow: https://www.npmjs.com/package/dflow

OTOH code completion is still only mediocre for most programming languages! Of course, those things aren't really related, but it feels easier than self-driving cars.

If we could stop using the term Neural Networks in relation to computing, that would be great. Computers don't have neurons.

If we get hung up on every abuse of terminology we run into, nobody will get anything useful done.

At this point the name is well established. If it makes you feel better, just remember to tag "artificial" on to the beginning, nobody will mind.

There are artificial and biological neural networks. The artificial one is based upon the biological one. Therefore dropping the 'artificial' or 'biological' seems fine to me.

This is practical, however I strongly disagree with this semantically. The term should always be accompanied with 'artificial' or another name should be invented as they are vastly different fields. Search Neural Network, I assure you that you will no longer find biological information; it is littering.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact