Hacker News new | past | comments | ask | show | jobs | submit login
DeepMind moves to TensorFlow (googleresearch.blogspot.com)
420 points by hektik on Apr 29, 2016 | hide | past | web | favorite | 79 comments

This is great news! One of the most intimidating things about getting started with deep learning if you want to understand and extend cutting-edge work is the Tower of Babel situation: aside from mastering some quite difficult and opaque concepts, you need to learn multiple frameworks in multiple languages, some of which are quite uncommon. (Want to use Torch? You have to learn Lua. Want to use Theano or Caffe? Need to learn Python too. Need to implement optimizations? Hope you've mastered C++.)

And DeepMind's research output was a major reason to need to use Torch, and hence have to learn Lua.

But by switching over to TensorFlow, this means you now have one language to learn which is supported well by all the major frameworks - Python - and you can benefit from several frameworks (Theano, Keras, TensorFlow). So the language barrier is reduced and you can focus on the framework and actual NN stuff. Further, this will also drive consolidation onto TensorFlow, reducing the framework mental overhead. As long as TF is up to the job, and it reportedly is, this will benefit the deep learning community considerably.

I'd been wondering myself what language and framework I should focus on when I start studying NNs, and this settles it for me: Python and TensorFlow.

You're overstating things a bit. There has never been a "need to learn multiple frameworks in multiple languages." As a beginner, you pick one and go with that (as evidenced by the fact that... you've picked one!). This announcement doesn't change that situation.

Nobody needs to use multiple frameworks unless you're someone (not a beginner) who wants to be able to take the code from research papers or something.

Researchers need multiple frameworks because the feature sets aren't the same in each.

I haven't done anything with deep learning, but I worked in a research lab with others who did. On the image processing side, we prototyped code in OpenCV, both with Python and C++, and MATLAB (with toolboxes) regularly because of this.

At the end of the day they have a limited amount of time and just want to test their idea the fastest way they can.

I don't think this is the reason. You can do pretty much anything in Torch/Theano. But, researchers are lazy and writing code is only a means to an end. They will shamelessly copy/hack as much open source code as possible, so if a framework has some quick module for a desired algorithm, or if a researcher wrote their paper in some framework, and you want to build off that paper, then you'll just copy that.

That's definitely correct.

One related unsolved problem in that type of research is getting people to share their actual code. Especially when multiple universities are at the bleeding edge in a field, they often publish just enough [1] to prove their point without giving everyone else the same foundation to build on easily. Even in science it gets political... who knew.

1: i.e., just their algorithms, or their code without very useful implementation-level optimizations

In ML, you can generally email the authors and very often they will be willing to send you (their really crappy) code. Although it probably helped that I sent these emails from my academic email address.

> There has never been a "need to learn multiple frameworks in multiple languages."

> you can generally email the authors and very often they will be willing to send you (their really crappy) code.

Code obtained from multiple authors, or even from the same author but different time periods, is code written using multiple frameworks in multiple languages. Standardizing on Python / TensorFlow reduces the risk of cognitive load along one's journey and is likely to speed up the field. If speed is what the field was missing :)

I guess that by the time the paper is in the public domain things have moved on - it's six months from submission to publication.

> Nobody needs to use multiple frameworks unless you're a researcher or something who wants to be able to take use the code from other people's papers or something.

I never said I wasn't. I'm interested in more than one thing, and so it makes me sad to see that the cutting-edge stuff I'm interested in is implemented in a variety of languages and frameworks (typically for no better reason than that particular researcher uses that particular framework), which means I either need to invest a huge amount of time and effort into becoming a polyglot or abandon any more in depth understanding than the paper.

I agree with you; your original comment was talking about beginners specifically, though.

That being said, it's the natural state of any field moving fast. Web development has even worse fragmentation. Mobile development has about the same amount of fragmentation. There are a dozen distributed computing frameworks (Hadoop? A dozen things on top of Hadoop? Spark? Mesos? Kubernetes?).

I agree. The frameworks are similar enough that you don't really get much from switching between them. There are differences, sure, but they are far smaller than the similarities.

Can you speak for the implementing of algorithms in TensorFlow? My impression is that it's still quite difficult to introduce something interesting that doesn't already exist. This is basically the case for all the frameworks, but I've found that even though Torch is written Lua, there is much much less abstraction.

Implementing a new operation in TensorFlow does require a fair amount of overhead, and you'll need to know CUDA if you want to create something that works on (NVIDIA) GPUs.

The "Creating an Op" how-to[1] goes over the basic steps necessary to implement an operation, but there are a couple of things missing. Notably, proper documentation for the various functions and classes used when writing operations, as well as information on writing (and registering) a Python wrapper for additional functionality.

All this said, if you want to commit your work to the GitHub master, the Google team working on the repository does an excellent job of walking contributors through the steps necessary to get their work over the hump.

[1]: https://www.tensorflow.org/versions/master/how_tos/adding_an...

TensorFlow makes some things difficult in truly baffling ways. For example, the last time I checked there's no built-in way to get gradients w/r/t say a single weight, rather you get a gradient sum. So if you want to do anything with the Jacobian (e.g. contractive autoencoders) you are better off using Theano.

I can't speak for TensorFlow, but I've personally found Theano to be the most easily extensible among the other widespread deep/reinforcement/gradient learning libraries (Caffe, Torch, Keras, Chainer). There does seem to be a performance hit between Theano and Caffe/Torch, but it's great for prototyping.

If anyone wants to switch to TensorFlow but misses the Torch interface, you will always have Keras: https://github.com/fchollet/keras

I also recommend reading @fchollet's guide on integrating Keras and TensorFlow, especially for those wanting to implement novel components at a lower level :) http://blog.keras.io/keras-as-a-simplified-interface-to-tens...

I second that. Keras was the first really intuitive interface to neural networks that I stumbled upon. It was also convenient that it was built in Python and played well with the numpy/scipy ecosystem. Thank you fchollet!

Thanks for posting that - keras looks a lot easier to use for mere amateurs like myself.

Keras is the best framework out there.

A simple python layer is exactly what I was looking for. Thanks!

I like these comments on the Reddit discussion: it's not like DeepMind ever really open sourced anything (other than their Atari code from years ago).

Another a Google team switching over to a product maintained by another Google team makes a lot of sense for the team. They get instant development/deployment/infra support and huge control over development roadmap.

Hopefully this motivates them to open source much more...

To be clear, TensorFlow is about a lot more than deep learning. It's a distributed math library, a bit like Theano. It's ultimate rivals in the Python ecosystem are Numpy and SciPy and even Sci-kit Learn. You'll see the TF team implement a lot more algorithms on top of their numerical computing eventually. (In the JVM world, I work on ND4J -- http://ND4J.org -- and we see a lot of similarities, which is why I bring this up.)

So is Torch though.

Besides, deep learning is mostly just matrix operations anyway, so you're kind of saying "TensorFlow is about a lot more than matrix operations - it's a matrix library too"...

Kind of. Deep learning is about more than matrix operations, and matrix operations are useful for applications other than deep learning, so I believe the distinction is worth making. Just like with programming languages, which may all be used for the same application, it's all about what you make easy to do, and what you make difficult. I'm saying the TF's intention is to make many things beyond DL easy, although people think of it chiefly as a DL library atm.

Stanford's CS224d: Deep Learning for Natural Language Processing uses TensorFlow. Although they have only just got up to the part where they are beginning to use it.

Here's the "Introduction to TensorFlow" lecture.


You don't need to watch the previous 6 lectures to make sense of it but it would help if you knew a bit (but not super detail) about neural nets e.g. the terms forward propagation, backward propagation and gradient descent of neural networks mean something to you.


Slightly off topic but for anyone who is taking this course ...are the materials only related to NLP or are the techniques much more broadly applicable to other areas of deep learning (cursory look of the syllabus suggests this but would be great if someone who is actually taking this course can comment)

I've watched all 8 available videos, which is as far as my knowledge goes but it has been background on gradients, calculating derivatives, introduction to word vectors and how they relate to each other, recurrent neural nets and how to push time series through, introduction to tensor flow and finally how to scan backwards and forwards through "time" in a recurrent RNN (each word in a sentence is a time step in NLP).

Word vectors are "just" high dimensional entities - 100-300 dimensions, used as input. So the introduction to them was about how you go about building a dataset that is a collection of 50,000 column vectors each of which is 300 rows. And then how to use that to go on and build a neural net to do useful work.

The conclusion is that all the work done on syntax, grammar and word classification can effectively be replaced by having a huge corpus (e.g. all of wikipedia is small), 300 dimensions for each word and then a loss function to classify each word.

One can imagine how that would be applied to sales data of multiple products or other data.

It foes on to suggested how sentiment analysis is performed and how entity recognition would work (entities being places, names of people and companies).

The info has been general but described in terms of NLP, the techniques so far are not just for use in NLP.

I'm not an NLP person and tbh I've never even made a neural net (although I could if I had a reason) I'm just interested in the subject.

> The conclusion is that all the work done on syntax, grammar and word classification can effectively be replaced by having a huge corpus

Is that a surprise? You don't teach a child how to speak by telling him about verbs and grammar. He will learn how to use them without having any formal idea about what they are.

Apparently it was a surprise to the AI NLP teams that spent years doing manual classification, suddenly a Deep NN out performed them without any prior knowledge. Just make a 300 dimension vector of the occurrence frequencies of word combinations and out fall the rules of language!

Apparently it was a surprise to the AI NLP teams [...]

Similar techniques were well known and used for years in NLP. E.g. Brown clustering has been used since the early nineties and have been shown to improve certain NLP tasks by quite an amount. NMF also been used for quite some time to obtain distributed representations of words. Also, many of the techniques used in NLP now (word embeddings, deep nets) have been known for quite a while. However, the lack of training data and computational power has prevented these techniques from taking off earlier.

Just make a 300 dimension vector of the occurrence frequencies of word combinations and out fall the rules of language!

The 'rules of language' don't just fall out of word vectors. They fall out of embeddings combined with certain network topologies and supervised training. In my experience (working on dependency parsing), you also typically get better results by encoding language-specific knowledge. E.g. if your language is morphologically rich or does a lot of compounding, the coverage of word vectors is going to be pretty bad (compared to e.g. English). You will have to think about morphology and compounds as well. One of our papers that was recently accepted at ACL describes a substantial improvement in parsing German when incorporating/learning explicit information about clausal structure (topological fields).

Being able to train extremely good classifiers with a large amount of automatic feature formation does not mean that all the insights that were previously gained in linguistics or computational linguistics is suddenly worthless.

(Nonetheless, it's an exciting time to be in NLP.)

I was rather over simplifying a tad and being conversational (and I'm not an expert, not even much beyond beginner).

It is indeed an exciting time.

> Apparently it was a surprise to the AI NLP teams that spent years doing manual classification, suddenly a Deep NN out performed them without any prior knowledge. Just make a 300 dimension vector of the occurrence frequencies of word combinations and out fall the rules of language!

Hogwash! While there is certainly some truth to what you say and how "Deep Learning" has become mainstream in NLP over the last two years, it is far from as easy as you portray it to be.

The key paradigm shift has been in the downplay (not removal, mind you) of hand-crafted features and moving away from imposing constraints on your model. State-of-the-art NLP research, in general, no longer tends to spend time coming up with new indicator features, coming up with clever constraints, or finding ways of training models that require approximation techniques to even be feasible computationally. Instead, models tend to learn in an end-to-end fashion, where manipulating the model structure is significantly easier and we now learn features as opposed to specify them by hand. This is great and something I am happy to be a part of, but, if you want state-of-the-art results it is still fairly common to mix in some "old-school" features as well, just to squeeze that very last bit of performance out of your model.

It is also not fair to say "without any prior knowledge". Even if you train a parser in the new paradigm (like Vinyals et al. (2014)), you still need to supply your model with training data describing syntactic structure, this data was largely constructed by linguists in the 90s. The same thing goes for pretty much any NLP task beyond simple lexical semantics. We also knew that distributional features were useful even before the "Deep Learning" revolution, see Turian et al. (2010) for example, where the "Deep Learning" methods of that time were defeated by an "old-school" co-occurrence clustering method from the early 90s. Heck, the whole idea of distributional semantics was alive and well throughout the early 2000s and can trace its roots back to work such as Harris (1954) and arguably even the later Wittgenstein.

Note that I am saying all of this as a "Deep Learner" that has been pushing this agenda for about four years now, and I will continue to work along these lines since I think that "Deep Learning" (or rather Representation Learning) is currently the best approach for semantics in NLP. But hype is dangerous, even if it in many ways supports my cause.

Thank you for the input. Yes I was being a bit flippant and shallow, ewll more conversational really.

You're right about hype being dangerous.

An interview with a pioneer in the field


A child learns much more and more deeply about language from just a fraction of the amount of unsupervised data. The point is that the mechanisms are entirely different, it's not very useful to compare.

Thanks for posting these!

This is a pleasant surprise. The more people that work on TensorFlow the better, especially as the DeepMind team will be more aligned with extending TensorFlow's research potential.

I am curious how well TensorFlow fits for many of DeepMind's tasks though. Much of their recent work has been in reinforcement algorithms and hard stochastic decision tasks (think gradient approximation via Monte Carlo simulations rather than exactly computed gradients) which TensorFlow hasn't traditionally been used for.

Has anyone seen TensorFlow efficiently used for such tasks? I'm hoping that DeepMind will release models showing me what I've been doing wrong! =]

(note: I produce novel models in TensorFlow for research but they're mostly fully differentiable end-to-end backpropagation tasks - I might have just missed how to apply it efficiently to these other domains)

TensorFlow is the machine learning codebase, but typically how do machine learning research teams manage their training sets, dataset metadata and collaboration on these large datasets?

Here are a couple of examples from the Python deep learning ecosystem:

IDSIA, affiliated with Juergen Schmidhuber and many other leading ML researchers, has released Sacred, "a tool to help you configure, organize, log and reproduce experiments." https://github.com/IDSIA/sacred

MILA, affiliated with Yoshua Bengio and the Theano project, offers fuel, "a data pipeline framework for machine learning": https://github.com/mila-udem/fuel

Most teams I have seen have either template scripts or boilerplate that generates datasets, and share both the generated data and the scripts via normal ways that people share data and code: disk, S3, github, emailing of notebooks, etc.

It requires a fair amount of set-up, but works surprisingly well once there is a core team and problems established.

We are building mldb.ai to help bring the data and the algorithms for ML together in a less ad-hoc manner and to help move things out of research and into prod once they are ready. Many of the hosted ML solutions (Azure ML, Amazon ML, Google Data Lab, etc) and other toolkits (eg Graphlab) are working on similar ML workflow and organizational structure problems.

Which projects you know use "disk, S3, github,..." to share their datasets? I'm curious what you think because I haven't read about any ML projects actually using hosted ML solutions like Amazon ML+S3. I've only seen Amazon recommend Amazon ML.

S3 is a good way to share files

I'm working toward moving my deep learning project DeepSentry into e-commerce. DeepSentry was developed in Python3 and C. After any significant amount of time 60+ days the dataset per host is TB-PB. So, I've been digging into implementing a Big Data solution. I've found that Apache Spark on Hadoop-MapR with Postgres is popular.

NVidia's NVCC has performance & compile time issues with Tensorflow.[1]

NVCC vs GPUCC benchmarks 8% - 250% slower compilation & 3.7% - 51% slower runtimes.[2]

Google use GPUCC internally so weren't optimising for NVCC.

LLVM based GPUCC is the 1st fully open source toolchain for CUDA.

Google announced that the guts of GPUCC will make their way into CLANG.

[1] https://plus.google.com/+VincentVanhoucke/posts/6RQmgqcmx2d [2] http://research.google.com/pubs/pub45226.html

This is a very money-where-thier-mouth-is move. Like they said, moving away from Torch is a big deal.

I know that google has been criticized for not dog-fooding GCS, does anyone know if that has changed? For example, does DeepMind use it?

I'll speak for BigQuery, since that's the product I know best. BigQuery itself is used ubiquitously at Google internally. I've offered evidence to the writer who made that argument, but unfortunately he was not willing to change his stance.

Why is it called Tensor flow? Do the multi-dimensional matrices that exchange data between the nodes transform like tensors? If so, when does the need arise to transform them?

> Do the multi-dimensional matrices that exchange data between the nodes transform like tensors?

Yes, if you design the moeel/graph that way.

> If so, when does the need arise to transform them?

The need arises whenever tensors are needed. For deep learning, most people treat them like multidimensional arrays. TensorFlow is an excellent name.

Multidimensional arrays are a thing of the past. Now we call them tensors. Get with the program or become an aging, forgotten physicist not involved in deep learning.

Haha. Hope machine learning students wont be equally annoyed by physicists misusing "their" tensors.

Don't forget the branding angle.

A lot of people heard of tensors as something used in quantum physics, which is considered by many the most advanced/difficult hard science.

So using the word Tensor suggest highly advanced stuff used by very smart people.

Expect much more Tensor stuff in the future.

Other physics terms have the same high branding potential. "Gauge" comes to my mind. However almost nobody outside of physics/maths heard of this one (in the "gauge symmetry" sense, not the "wire gauge" one), so it would need some time to grow.

It uses Tensor in the computer-sciencey-we-abuse-terms sense of "a multidimensional matrix", not in the physics sense. It could be called multidimensionalmatrixflow, but I'm glad I don't have to type that on a daily basis. :)

Doesn't the word 'tensor' "abuse" the terminology in exactly the same way as 'matrix' does?

Sure, you can be mathy and insist that these are all abstract things and transformations between them, but meanwhile CS people will keep calling arrays "vectors", "matrices", and "tensors".

Some of the operations that are performed on the tensors in a neural network are non-linear. An example might be taking the tanh of all of the elements of the tensor. For these steps, you won't have invariance (or covariance) under change of basis.

Even in physics, there are applications of tensors which essentially treat tensors as multidimensional arrays (see for example, tensor networks) with no predefined transformation properties. But the operations done on tensors are always linear.

It is a dataflow language and deals with tensors (multi dim arrays), so tensor + dataflow = tensorflow.

Tensor in tensorflow is just a a typed multi-dimensional array.

Anyone know whether they are primarily working on Python 2 or 3?

the library is compatible with both version of python, but I feel like python2.7 is still the mainstream one

+this. We (the TensorFlow team) use python2.7 by default, but work hard to make sure that we maintain compatibility. Our tests explicitly run on both platforms - http://ci.tensorflow.org/

I guess this (switching from Torch to other deep learning libraries) will become a trend as deep learning have become more mainstream in tech companies. I say Facebook, Twitter and others who use Torch (I don't know of any others actually), will move away from torch gradually. Unless the Torch community steps its game up.

I'm a layman but I find it quite interesting that a big release such as TensorFlow doesn't affect more people outside Google - or at least thats my impression. One would think, at least, that online store recommendations would become better or something like that.

TensorFlow doesn't make the algorithms more effective, it just makes them easier to describe, and recently, more quick to train / test. Also, with the kind of predictions Google is making, it's very unlikely that you'd notice improvements, since they would be gradual.

...but if you want to make your algorithms more effective, you'd probably benefit if they were easier to describe, quicker to train and test, and you'd want to take advantage of gradual improvements. Right?

Not so, for the same reason that low level languages are more effective computationally but less easy to describe and more difficult for code development.

Lua is more low level and has an extremely isolated and fractured community relative to the current Python ecosystem. It is also non-intuitive and has negligible benefits compared to the current scientific Python ecosystem.

I find the abstractions offered by Python and its standard library to be very easy to comprehend, write, and maintain relative to Lua.

..."easier to describe" makes it sound like it's a HIGHER level language, not a lower level language.

Yes, probably. The improvements would still be gradual, though.

This is probably because many people are using pre-existing tools that do very similar things as TensorFlow. To the best of my knowledge, it's not implementing some radical new algorithm with better classification performance, so "online store recommendations," etc. won't become better but may become more widely used.

Why yes of course recommendation engines will get better with these improvements in machine learning. I'd even say that significant, noticeable improvements are not unlikely. Consider image classification, playing Go etc., where RNNs have advanced the state of the art from "research interest" to "reliable product" within a few years.

There may be some reasons why recommendation engines may not advance as fast:

- It's always been more valuable than playing Go, allowing vastly larger ressources to be dedicated to optimizing the current models - Image-processing and NLP both each profited from specific inventions (i. e. RNNs allowing position-independent feature detection in images).

I'm actually looking forward to a time where amazon can recommend me books based on the shoes I bought last year. I don't think I've ever seen a recommendation engine that impressed me.

OP is referring to the existence of mature tools like Theano and Torch. If Deep Learning can help, Amazon is already using it internally, possibly with either their own framework, or with one of the other open-source ones. If something will increase revenue or decrease cost, you can be assured there's already an initiative for it.

Tensorflow is not adding some magical improvements to Machine Learning, it's just one more framework (from a reputable company). The hard(er) part is getting data, cleaning it, testing, making sure it works in production, and updating as data changes.

The business cycle of implementing an open-source takes a while. For instance many e-shops are relying on 3rd party services. In those somebody needs to proactively start playing with TF, then convince the leadership to adopt it and then it will be deployed. As with every other solution, it takes couple of years until the circle closes up (look how long it took for the industry to adopt hadoop, spark, etc.).

It takes a while for the eco-system around TF to get settled. I feel Google wants companies to spring up around TF, and use it in every industry.

I'd say now is the time to start a niche company on top of TF. You'll be acquired faster than you can say "minimum viable product".

Makes sense. If you can make a viable product atop TensorFlow, then acqui-hiring you is probably a good bargain.

Tensorflow is just a development framework. It's not a plug and play solution.

So when do we get to see the alphago code?

I guess they dont want to be under facebook's thumb (didnt they invent torch?)

This is huge news for the AI space. May move things forward a couple of years.

I think the neat thing about Google is the high degree of crossfertilization between teams. In many organizations, teams rarely share information either due to political reasons or a lack of sharing culture in the company as a whole. That being said, this framework/API change doesn't really surprise me; DeepMind was more a proof-of-concept than an actual battle tested framework, unlike TensorFlow. So in that sense this news isn't surprising at all.

Should we be worried or glad that a potential future Skynet is written in C++?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact