As a bonus, it has built-in automatic differentiation so that you can run gradient descent on any algorithm — which means that you can just write a program to evaluate the goodness of a solution and efficiently iterate it to a local maximum. If you do this enough times, hopefully you'll find a global maximum. There are a variety of other numerical optimization algorithms, but gradient descent is very simple and broadly applicable.
And it runs in IPython (now Jupyter), which is a really amazingly powerful way to do exploratory software development. If you haven't tried Jupyter/IPython, google up a screencast and take a look.
I'm just repeating the stuff that's on the home page in a different form, but this is a really big deal. Most of the most significant software of the next five or ten years is going to be built in TensorFlow or something like it. (Maybe Dan Amelang's Nile; see Bret Victor's amazing video on the dataflow visualization he and Dan built for Nile recently at https://youtu.be/oUaOucZRlmE?t=24m53s, or try the live demo at http://tinlizzie.org/dbjr/high_contrast.html, or download and build the Nile system from https://github.com/damelang/nile.)
Apparently the Googlers think that too, because among the 39 authors of the white paper are several shining stars of systems research, including some who may be up for a Turing Award in the next decade or two.
One thing find I found interesting is the ability to use the software you designed in research directly in production (without having to rewrite the whole thing in java)
That said, I'm excited to see Google release this. Hopefully it would encourage Facebook or Baidu to also release
Right now, Torch/Caffe adoption is really good with lots of research code being released for it (even from within Google DeepMind). So it will be interesting to see how this catches up amongst the research and developer community.
Actually, it feels like I've been "scooped" somewhat. However, at the same time it's awesome to find that many of the core assumptions I've been working in formulating for traditional programming systems are actually already being used by other top engineers/researchers at Google. Thanks for posting about the whos-who of systems research. I'll have to check up on their research more!
I think there is a lot of room left to apply this approach to a lot of existing systems outside of machine learning. We're working on releasing a prototype in the coming months at our new startup (vexpr.com) based on applying these same data-flow principles to normal programming projects. Actually this might give a huge boost in presenting the underlying approach to systems engineering to audiences.
What kind of dataflow approach are you most interested in? I've been shifting my approach to work within existing systems. One things I've been doing is abusing the intent of web-components and DOM events to create data visualization tools.
Recently my team and I decided to focus down a little more. Have you had success in bridging the gap between esoteric academic papers and industry?
@hmehta, one of my recent favorite videos generally on this topic has been from Curry On conference . It's not strictly data-flow vs control flow, but it does discuss some of the more practical differences in between using monads for structuring data flows vs imperative control.
My take is that monads can be roughly considered as wrapping up imperative control flow along with the data. E.g. wrapping up the if/else choices into a data structure. Currently I actually found studying up on actual category theory easier than all the "easy introduction" things regarding monads. As in, the ideas when presented straightforwardly and technically correct are simpler than we give credence to.
I think that's fantastic and wish academia went that way too (yeah right).
From the AMS statement on The Culture of Research and Scholarship in Mathematics:
"For this reason, mathematicians traditionally list authors on joint papers in alphabetical order. An analysis of journal articles with at least one U.S. based author shows that nearly half were jointly authored. Of these, more than 75% listed the authors in alphabetical order. In pure mathematics, nearly all joint papers (over 90%) list authors alphabetically."
(I can actually answer my own question to some degree: the "alphabetical authors" convention is also the norm in my own field of high energy physics. Only after I finally landed a tenure-track job and served on another search committee did I realize that folks from other fields of physics would see my lack of first-author publications as a black mark. But in principle, among those who know the convention, order ought to be unimportant, shouldn't it?)
Or use a pseudonym.
The digraph "aa" is an archaic representation of the modern character "å", the last letter of the Norwegian and Danish alphabets, and should be sorted accordingly.
Also consider that a single "order" might not make any sense. How do you determine if somebody "contributed" more than anybody else?
Also, at CERN, with papers with 1000+ authors, any chance of picking the 'right' name from the list approaches zero, anyways. Such papers typically list a "corresponding author" to whom to send inquiries.
Alphabetic order also gave us the fun Google search phrase "Does G. Aad exist?" (Gives a link to https://jeffollerton.wordpress.com/2014/05/28/does-gaad-exis..., but Google also suggests a correction to your "typo")
: https://github.com/Yelp/MOE (full disclosure, I co-wrote this)
starts with the Author list.
I can't find it on Google.
2D vector graphics rendering as an example, takes ~10.000 lines of code using traditional libraries like Cairo (used by Firefox) or Skia (used by Android and Chrome). In Nile, it's 200 lines of code.
Check out this repo, and the pdf inside the readme https://github.com/damelang/nile
That tool can be used to solve a lot more problems. REBOL/Red and Racket are also taking DSL approach.
Our pond is a perfect 500 x 500 square, as is the case for most ponds found in nature.
A feedback on the PDE aspect: is there any nice way to write your model and grid using TensorFlow, and still obtain the free derivatives/optimized computation graph if you use a different solver to actually integrate the equations? The example shows off a very simple toy first order Euler solver, but in practice this is grossly inadequate. I suspect that it's very difficult if not impossible - especially since most solver libraries are very old Fortran libs, and carrying forward the reverse mode differentiation there is a nightmare. Still, this makes for very nice building blocks.
Another question - would you consider adding an example of using this together with Google Ceres - to make the two autodiffs play nice together?
The whole scientific Python stack is available for Python 3. This seems like a somewhat backwards thing to do -- or perhaps the requirement is intended to mean at least 2.7?
Edit: added context
Edited again: More careful wording
Tracking here: https://github.com/tensorflow/tensorflow/issues/1
Congrats on the library, which looks awesome.
If the Python Software Foundation drops the ball, someone else will pick it up.
"Short version: Python 2.x is legacy, Python 3.x is the present and future of the language"
... which was lifted from https://wiki.python.org/moin/Python2orPython3. It follows that some people would think something along the lines of "...so this library was built for legacy Python?"
Seriously. If you're advising someone considering Python on the 2 v 3 question today, how can you not tell them that possibly one of the most important new machine learning libraries, only works on 2.7.
It's worth remembering that most scientists don't do any machine learning, because it's not all that useful in many domains. A lot of science is trying to find an explanatory model for the observations. Machine learning is much better at finding a good predictive model for the observations that may not offer any insight to what is happening behind the scenes.
Three's problem is that even if 95% of stuff is on it, basically 100% of stuff is on 2.7, including brand new stuff. Let's say you use 10 libraries on average and 95% of libraries have been ported. The odds of having one library missing are still 40%! (1-0.95^10) This ratio goes up the more specialised your work, or the greater your investment in legacy, which is why most 3.x people are generalists/web where they cannot understand the 2.x position. For me this Tensorflow library transports me right back into exactly that 40% problem. That problem does not exist the other way around unless you at all costs need to use Asyncio instead of something that's already been there for years.
In my case I'm going to persevere and put 2.7 back into a virtualenv or something but it's really not ideal and my point is still that if 3.x gave science/engineering people something properly compelling (@ operator is not enough in my view) then the above problem would not be happening.
For example a difficult-but-possible megawin would be to put GPU computing natively into Python. GPU is more than 50% of the compute silicon in most computers nowadays (Iris 6200 uses up 50% of die size in recent Intel core i7). If you include multicore and discrete GPU then standard Python is only targeting about 10% of the available compute silicon in a modern PC! Imagine how very rapidly everybody would move if there were a GPU-enabled Numpy array in 3.6.
I have just recently been persuaded by the community that 3.5 is cost free, and here I have this enormous counterexample. For the science guys, clearly, the message is not getting through, and I'm not surprised: 3.x offers them nothing. Hence they're blissfully continuing with 2.7.
So I guess I'll have to run two Pythons on my system. Not the ideal situation.
Now if Python 3.(x > 5) would give us native GPU, which might require putting (a GPU-enabled) Numpy into the standard library...as opposed to, say, spending 2 years re-inventing Tornado....
I am honestly curious about this point of view. Is there any example where actual multidimensional tensor have any relevance? What I mostly see around is just standard linear algebra operations on matrices and vectors lifted to higher-dimensional tensors point-wise (for instance applying a certain operation to all 2-dimensional subtensors of a given tensor).
I never saw the need for modeling matrices in terms of tensors - the latter seem just more complex to use and usually tensor operations (like symmetrization, antisymmetrization, wedge products...) are not even available in libraries
(And by the way, both matrices and tensors are now more than one century old...)
Equally what is matrix multiplication but a bunch of 1-dimensional dot products applied pointwise? why do we need matrices?
I do get what you're saying, and that part of it is that ML / CS folk just use 'tensor' as a fancy word for a multi-dimensional array, whereas physics folk use the word for a related coordinate-basis-independent geometric concept. But for numerical computing broadcasting simple operations over slices of some big array is really useful thing to be able to do fast and to express concisely.
Numerics libraries which don't bother to generalise arrays beyond rank 1 and 2 always feel rather inelegant and limiting to me. Rank 3+ arrays are often really useful (images, video, sensory data, count data grouped by more than 2 factors, ...), and lots of operations generalise to them in a nice way. Good array programming environments (numpy, torch, APL take advantage of this to provide an elegant general framework for broadcasting of operations without ugly special cases.
So what algebraic concept do tensors correspond to?
It's true that tensors are hard to reason about - they overclock my brain most of the time - but there is no doubt that, just like moving from scalars to vectors massively increases your real-world modelling power, so does the natural extrapolation to matrices, and from there, tensors.
"Tensor said the tensor.
And dissension have begun."
Tensors were the future even before they were cool.
I want to start deep learning now, to implement RNNs, autoencoders, Q-learning on a very specific application (not images).
I've read a lot of papers, but not done any DL implementations yet (although many classical ML so far), my question is very simple :
Where do I start ????
Should I use Torch ? Theano ? Theano + pylearn2 ? Theano + pylearn + Lasagne ? Caffe ?
Or should I just switch to this magical new library directly ?
I feel confused.. any advice ?
Come on. Grad students are motivated by advancing the field.
... are they not? Maybe you can also be saying that the ones who write libraries are motivated only by graduating.
I have a large package of code on sourceforge that consists of my grad school and postdoctoral efforts. If I compare its style to what 2 decades in the industry since has taught me, it's laughable. That said, it is C code, so I occasionally plunder it for algorithms I figured out way back when.
But I do see your point. Google obviously has a lot more manpower to spend on this, so it might be a better bet in the long run.
It's also worth comparing this to a few similar projects that have been announced recently: MXNet (http://mxnet.readthedocs.org/en/latest/), Chainer (http://chainer.org/) and CGT (http://rll.berkeley.edu/cgt/). And Theano of course, which has been the flag-bearer of the computational graph approach in deep learning for many years.
I'm thinking mostly of Theano, which, from a performance standpoint, appears to have died the death of a thousand inexperienced cooks in the kitchen. The ~1000x performance regressions that it invokes when a junior data scientist goes off the rails and ends up with a python inner loop amidst GPU kernels is just depressing and seemingly unfixable. Hopefully, TensorFlow will be better if only because it was written in a world now very aware of Pixel's Law.
Mxnet is awesome, but perhaps a little too parameter servery for my personal tastes, and I'm now wondering what the point of CGT is now other than to be Coke to Google's Pepsi. I also think the whole deep learning framework business model just took a torpedo amidships (and not long after the layoffs at Ersatz).
Finally, I had never heard of Chainer until today, thanks! That said, without autograd functionality, the people I work with would probably stick with Caffe + cuDNN.
Basically, old NVIDIA marketing material used to state that GPUs double in performance every year or so whilst Moore's Law is slowing down w/r to CPU performance because clock speeds have been the same for a long time
This strictly isn't true because core count, SIMD width and vector unit counts in CPUs have all been increasing. However, from the perspective of a single-threaded C application, this is indeed so. CUDA/OpenCL OTOH automagically subsume multi-core, multi-threading, and SIMD into the language itself so the hypothetical "single-threaded" CUDA app just keeps getting better(tm).
In my view, these frameworks are like programming languages, you do need to pick a new one up from time to time, but you can do most of the things with all of them.
Edit: Btw, François Chollet, the author of keras, made the following tweet this morning: "For those wondering about Keras and TensorFlow: I hope to start working on porting Keras to TensorFlow soon (Theano support will continue)."
The fact that you defined your model as a symbolic graph means you can derive gradient equations automatically, hence ridding you of a rather tedious step in optimization. With such tools, the workflow for a practitioner is much simpler: 1-code models as graphs of symbolic expressions, and 2-define objective functions (ex. for a regression problem, your objective would be the root mean squared error, plus some regularization). Disclaimer: I did not look at the code, but from what I understand it is pretty much the same as Theano (which I've been using a lot lately).
The problem is that there are about 3-5 alternatives out there, and none of them are mature enough or convincing enough to dominate. The field changes so fast that it's easy for them to become obsolete.
What you're seeing here is the enthusiasm of people who really want to get a good tool with proper support, and be able to stick with it. I'm still not sure if TensorFlow is that tool, but it depends on what will happen to it during the coming years.
EDIT: To be a bit more specific, autograd just requires that one writes the forward model, and it can involve any program logic like conditionals, etc, so long as the end result is differentiable. It then takes care of all the gradient calculations.
there is already well defined language for this. multidimensional array is fine, psuedotensor is fine. tensor is confusing if you have any previous background with tensor calculus before the word became machine learning flavour of the month.
still. this does look pretty cool overall. processing large volumes of data is becoming increasingly less specialised and more generic, which can only be good. :)
- It's not as verbose
- The concept of tensor has linear algebra operations associated with it while multidimensional array is just a programming term without any attached Math operations.
But that's the precise problem. The multidimensional arrays that programmers call "tensors" do not generally have the operations defined on them that you would expect from a tensor, such as index contraction, tensor products, or differentiation. At the same time, a tensor is different from an array, just like a number is different from its decimal representation. These distinctions are important when you want to change the basis, a very useful and frequent operation for both tensors and numbers.
Then again, this battle was already fought and lost for vectors, which programmers specialised to mean "arrays" instead of meaning "elements of a linear space".
So it seems to be a dataflow computation library that is being used for AI/learning. Since I know almost nothing about either, I'm wondering if this (dataflow) approach has other applications unrelated to deep learning. Any comments?
Also, i wonder if this could be implemented in hardware - say on a FPGA?
That said, address all of the above and maybe NVIDIA's stock won't hit 50 next year.
It looks like using TensorFlow from Python will feel quite familiar to a Theano user, starting with the separation of graph building and graph running, but also down into the details of how variables, inputs and 'givens' (called feed dicts in tensorflow) are handled.
for time, input_ in enumerate(inputs): ...
I also haven't seen some theano.scan equivalent. Which is not needed in many cases when you know the shape in advance.
It seems like in TensorFlow you can say:
import tensorflow as tf
sess = tf.InteractiveSession() # magic incantation
state = init_state = tf.Variable(1) # initialise a scalar variable
states = 
for step in range(10):
# this seems to define a graph that updates `state`:
state = tf.add(state,state)
>>> [2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]
>>> import theano
>>> import theano.tensor as T
>>> state = theano.shared(1.0)
>>> states = 
>>> for step in range(10):
>>> state = state + state
>>> f = theano.function(, states)
That said, you can do a lot with truncated BPTT and LSTM. See the sequence modeling tutorial on tensorflow.org for more details.
I've futzed around with ML docs (dataquest, random articles, first 3 weeks of andrew ng course on coursera) & what I don't get about this (http://tensorflow.org/tutorials/mnist/beginners/index.md) is how to actually make a prediction / classification on something that is not "test" data. Can anyone point me in the right direction of how to "use" a model once I have it validated, like we do at the end of that tutorial?
In the Whitepaper (http://download.tensorflow.org/paper/whitepaper2015.pdf), it's compared to Theano, Torch and others.
Can someone clarify if "single machines" means the Apache license only applies to single machines and not distributed systems?
Sidenote: I wish every open source project had a release video.
I'm not aware that there is a way to do that and still legitimately call it the Apache license. I took the statement to mean that the source does not have the capability of parallelizing across machine boundaries.
Does it actually make sense to train a model simultaneously on different machines? I can understand running the same model on lots of machines, of course.
For a mathematician, a "multidimensional data array" is not a tensor. It is a tuple. A tensor is a tuple with much more structure, associated to linear actions on its components. Said differently, if you can't tell me, what 7*[Alice] means, you have no right to call [Alice] a tensor.
[EDIT] Ok, so there is some GPU - should probably play some more before asking obvious questions. Really keen on scripting out flow directly.
That version wasn't a NN graph for ML; more of an executable flow graph for visual music. Will post something on https://github.com/musesum in a couple months.
BTW, Ruby has always been my scripting language but because of the wealth of libraries for deep learning I am thinking of switching to Python.
Would really love to see ruby forks of popular ml libs.
Or is it because Python is better perfomance-wise?
I too would like to see Ruby ports of some of the popular ML libraries. The classifier gem provides an example.
Not sure how much effort is required to use them with this library.
I love their UI for single sign on.
The nice thing about Cython is that you can wire up two C extensions to talk to each other directly, without going through Python. Cython also gives you 2/3 compatibility out of the box.
Our domain is online stream processing for, generally, the big data space. I do think, however, that describing computations in this manner gives enormous flexibility to runtime systems to actually exploit all available parallelism dynamically.
[1.a] http://radar.oreilly.com/2015/03/lets-build-open-source-tens... (March 2015)
[1.b] http://radar.oreilly.com/2015/05/the-tensor-renaissance-in-d... (May 2015)
"What if one could have a fully declarative “matrix language” in which all data transformations ever needed could be declaratively defined in a way that is very easy to comprehend?"
I'm now pondering whether TensorFlow isn't quite an answer to this question?
 Posted the draft now for reference: http://bionics.it/posts/matrix-transformation-as-model-for-d...
But it might be bit of a different problem area :)
How does it compete to Azure ML besides it is open source?
If Azure ML is a bunch of premade sandcastle molds, TensorFlow is a more accurate, faster way to pour sand. You make the mold.
Sounds like it doesn't suffer from the (alleged) slow compile times of Theano, but I wonder if the flipside of that is that you have to implement larger-scale custom Ops (like torch's layers) in order to ensure that a composite compute graph is implemented optimally?
We are in Colorado, but I'm happy to work with someone remotely.
In the "Variables" example, looks like a variable name got changed but not updated everywhere:
# at the start:
var = tf.Variable(0, name="counter")
one = tf.constant(1)
new_value = tf.add(state, one)
update = tf.assign(state, new_value)
They're probably using Dynamic Parallelism.
disclaimer: I have no idea what I'm talking about, but I do know that coursera and stanford courses are the oft cited ones and they use matlab/octave.
As far as I can see Theano and Tensorflow support dataflow-like computations, automatic differentiation and GPU support via CUDA.
Finally, why didn't Google, The Lisa Lab and Facebook worked together on building a unique library instead of three?
But kudos to the deep learning guys for overcoming that potential energy barrier NVIDIA couldn't surmount on their own...
One might even conclude that one person inside Google had written a brawny and influential paper that was right in a lot of ways, but wimpily dead-wrong to apply to GPUs.
Now we all must own our choices and one was very silly, stupid, and naive to blindly accept a position at Google, but everyone makes mistakes, no?
Further, TensorFlow looks fantastic but given that even just a year ago, when one was recruited to return to Google for !(Deep Learning) but for something which also happened to be another braindead obvious choice to run on GPUs, one was informed said viewpoint on GPUs was still in effect and declined.
So what I'd really love to know is how the message finally sunk in? I'm betting there's a really great story here.
That sucks, but that kind of thing happens a lot. It's not that surprising. Google wanted a homogeneous infrastructure for a long time. But a new application (neural networks) motivated a new infrastructure (GPUs in clusters).
There was that Stanford paper everyone talks about which compared training a neural network on Google's 16K cores (DistBelief paper) vs a handful of GPUs with infiniband. Even Andrew Ng has seemed to subtly criticize Google for thinking "the old way" (i.e. "cloud" technology vs HPC technology, HPC being more effective for neural nets)
Also, I don't quite understand your comment about brawny vs. wimpy. Yes Urs wrote a rebuttal against wimpy cores. As far as I remember, his argument is basically that there is some portion of software on the critical latency path that's not multithreaded. wimpy cores make the most sense in a world of perfect parallelism.
And from a professional standpoint, I moved on and fixed the damage I let them inflict on my career. I eventually built exactly what I was told couldn't and shouldn't be done with GPUs.
But back then, the argument Google made to me was that too much of the workloads I wanted to run on GPUs were serialized (hence brawny versus wimpy). And from a classical parallel algorithms perspective, they were correct. The problem is that from a GPU programming perspective, they weren't even wrong(tm). And at the time, I had ~5 years of CUDA programming (very early adopter) and 30,000+ lines of CUDA code to back that up whereas the people I tried to convince had zero such experience so there was just no way. My argument boiled down to TLDR: O(n^2) or higher probably belongs on GPUs. Would anyone disagree these days?
Thank you for open sourcing this.
At this point the name is well established. If it makes you feel better, just remember to tag "artificial" on to the beginning, nobody will mind.