And DeepMind's research output was a major reason to need to use Torch, and hence have to learn Lua.
But by switching over to TensorFlow, this means you now have one language to learn which is supported well by all the major frameworks - Python - and you can benefit from several frameworks (Theano, Keras, TensorFlow). So the language barrier is reduced and you can focus on the framework and actual NN stuff. Further, this will also drive consolidation onto TensorFlow, reducing the framework mental overhead. As long as TF is up to the job, and it reportedly is, this will benefit the deep learning community considerably.
I'd been wondering myself what language and framework I should focus on when I start studying NNs, and this settles it for me: Python and TensorFlow.
Nobody needs to use multiple frameworks unless you're someone (not a beginner) who wants to be able to take the code from research papers or something.
I haven't done anything with deep learning, but I worked in a research lab with others who did. On the image processing side, we prototyped code in OpenCV, both with Python and C++, and MATLAB (with toolboxes) regularly because of this.
At the end of the day they have a limited amount of time and just want to test their idea the fastest way they can.
One related unsolved problem in that type of research is getting people to share their actual code. Especially when multiple universities are at the bleeding edge in a field, they often publish just enough  to prove their point without giving everyone else the same foundation to build on easily. Even in science it gets political... who knew.
1: i.e., just their algorithms, or their code without very useful implementation-level optimizations
> you can generally email the authors and very often they will be willing to send you (their really crappy) code.
Code obtained from multiple authors, or even from the same author but different time periods, is code written using multiple frameworks in multiple languages. Standardizing on Python / TensorFlow reduces the risk of cognitive load along one's journey and is likely to speed up the field. If speed is what the field was missing :)
I never said I wasn't. I'm interested in more than one thing, and so it makes me sad to see that the cutting-edge stuff I'm interested in is implemented in a variety of languages and frameworks (typically for no better reason than that particular researcher uses that particular framework), which means I either need to invest a huge amount of time and effort into becoming a polyglot or abandon any more in depth understanding than the paper.
That being said, it's the natural state of any field moving fast. Web development has even worse fragmentation. Mobile development has about the same amount of fragmentation. There are a dozen distributed computing frameworks (Hadoop? A dozen things on top of Hadoop? Spark? Mesos? Kubernetes?).
The "Creating an Op" how-to goes over the basic steps necessary to implement an operation, but there are a couple of things missing. Notably, proper documentation for the various functions and classes used when writing operations, as well as information on writing (and registering) a Python wrapper for additional functionality.
All this said, if you want to commit your work to the GitHub master, the Google team working on the repository does an excellent job of walking contributors through the steps necessary to get their work over the hump.
Another a Google team switching over to a product maintained by another Google team makes a lot of sense for the team. They get instant development/deployment/infra support and huge control over development roadmap.
Hopefully this motivates them to open source much more...
Besides, deep learning is mostly just matrix operations anyway, so you're kind of saying "TensorFlow is about a lot more than matrix operations - it's a matrix library too"...
Here's the "Introduction to TensorFlow" lecture.
You don't need to watch the previous 6 lectures to make sense of it but it would help if you knew a bit (but not super detail) about neural nets e.g. the terms forward propagation, backward propagation and gradient descent of neural networks mean something to you.
Word vectors are "just" high dimensional entities - 100-300 dimensions, used as input. So the introduction to them was about how you go about building a dataset that is a collection of 50,000 column vectors each of which is 300 rows. And then how to use that to go on and build a neural net to do useful work.
The conclusion is that all the work done on syntax, grammar and word classification can effectively be replaced by having a huge corpus (e.g. all of wikipedia is small), 300 dimensions for each word and then a loss function to classify each word.
One can imagine how that would be applied to sales data of multiple products or other data.
It foes on to suggested how sentiment analysis is performed and how entity recognition would work (entities being places, names of people and companies).
The info has been general but described in terms of NLP, the techniques so far are not just for use in NLP.
I'm not an NLP person and tbh I've never even made a neural net (although I could if I had a reason) I'm just interested in the subject.
Is that a surprise? You don't teach a child how to speak by telling him about verbs and grammar. He will learn how to use them without having any formal idea about what they are.
Similar techniques were well known and used for years in NLP. E.g. Brown clustering has been used since the early nineties and have been shown to improve certain NLP tasks by quite an amount. NMF also been used for quite some time to obtain distributed representations of words. Also, many of the techniques used in NLP now (word embeddings, deep nets) have been known for quite a while. However, the lack of training data and computational power has prevented these techniques from taking off earlier.
Just make a 300 dimension vector of the occurrence frequencies of word combinations and out fall the rules of language!
The 'rules of language' don't just fall out of word vectors. They fall out of embeddings combined with certain network topologies and supervised training. In my experience (working on dependency parsing), you also typically get better results by encoding language-specific knowledge. E.g. if your language is morphologically rich or does a lot of compounding, the coverage of word vectors is going to be pretty bad (compared to e.g. English). You will have to think about morphology and compounds as well. One of our papers that was recently accepted at ACL describes a substantial improvement in parsing German when incorporating/learning explicit information about clausal structure (topological fields).
Being able to train extremely good classifiers with a large amount of automatic feature formation does not mean that all the insights that were previously gained in linguistics or computational linguistics is suddenly worthless.
(Nonetheless, it's an exciting time to be in NLP.)
It is indeed an exciting time.
Hogwash! While there is certainly some truth to what you say and how "Deep Learning" has become mainstream in NLP over the last two years, it is far from as easy as you portray it to be.
The key paradigm shift has been in the downplay (not removal, mind you) of hand-crafted features and moving away from imposing constraints on your model. State-of-the-art NLP research, in general, no longer tends to spend time coming up with new indicator features, coming up with clever constraints, or finding ways of training models that require approximation techniques to even be feasible computationally. Instead, models tend to learn in an end-to-end fashion, where manipulating the model structure is significantly easier and we now learn features as opposed to specify them by hand. This is great and something I am happy to be a part of, but, if you want state-of-the-art results it is still fairly common to mix in some "old-school" features as well, just to squeeze that very last bit of performance out of your model.
It is also not fair to say "without any prior knowledge". Even if you train a parser in the new paradigm (like Vinyals et al. (2014)), you still need to supply your model with training data describing syntactic structure, this data was largely constructed by linguists in the 90s. The same thing goes for pretty much any NLP task beyond simple lexical semantics. We also knew that distributional features were useful even before the "Deep Learning" revolution, see Turian et al. (2010) for example, where the "Deep Learning" methods of that time were defeated by an "old-school" co-occurrence clustering method from the early 90s. Heck, the whole idea of distributional semantics was alive and well throughout the early 2000s and can trace its roots back to work such as Harris (1954) and arguably even the later Wittgenstein.
Note that I am saying all of this as a "Deep Learner" that has been pushing this agenda for about four years now, and I will continue to work along these lines since I think that "Deep Learning" (or rather Representation Learning) is currently the best approach for semantics in NLP. But hype is dangerous, even if it in many ways supports my cause.
You're right about hype being dangerous.
I am curious how well TensorFlow fits for many of DeepMind's tasks though. Much of their recent work has been in reinforcement algorithms and hard stochastic decision tasks (think gradient approximation via Monte Carlo simulations rather than exactly computed gradients) which TensorFlow hasn't traditionally been used for.
Has anyone seen TensorFlow efficiently used for such tasks? I'm hoping that DeepMind will release models showing me what I've been doing wrong! =]
(note: I produce novel models in TensorFlow for research but they're mostly fully differentiable end-to-end backpropagation tasks - I might have just missed how to apply it efficiently to these other domains)
IDSIA, affiliated with Juergen Schmidhuber and many other leading ML researchers, has released Sacred, "a tool to help you configure, organize, log and reproduce experiments." https://github.com/IDSIA/sacred
MILA, affiliated with Yoshua Bengio and the Theano project, offers fuel, "a data pipeline framework for machine learning": https://github.com/mila-udem/fuel
It requires a fair amount of set-up, but works surprisingly well once there is a core team and problems established.
We are building mldb.ai to help bring the data and the algorithms for ML together in a less ad-hoc manner and to help move things out of research and into prod once they are ready. Many of the hosted ML solutions (Azure ML, Amazon ML, Google Data Lab, etc) and other toolkits (eg Graphlab) are working on similar ML workflow and organizational structure problems.
NVCC vs GPUCC benchmarks 8% - 250% slower compilation & 3.7% - 51% slower runtimes.
Google use GPUCC internally so weren't optimising for NVCC.
LLVM based GPUCC is the 1st fully open source toolchain for CUDA.
Google announced that the guts of GPUCC will make their way into CLANG.
I know that google has been criticized for not dog-fooding GCS, does anyone know if that has changed? For example, does DeepMind use it?
Yes, if you design the moeel/graph that way.
> If so, when does the need arise to transform them?
The need arises whenever tensors are needed. For deep learning, most people treat them like multidimensional arrays. TensorFlow is an excellent name.
Multidimensional arrays are a thing of the past. Now we call them tensors. Get with the program or become an aging, forgotten physicist not involved in deep learning.
A lot of people heard of tensors as something used in quantum physics, which is considered by many the most advanced/difficult hard science.
So using the word Tensor suggest highly advanced stuff used by very smart people.
Expect much more Tensor stuff in the future.
Other physics terms have the same high branding potential. "Gauge" comes to my mind. However almost nobody outside of physics/maths heard of this one (in the "gauge symmetry" sense, not the "wire gauge" one), so it would need some time to grow.
Sure, you can be mathy and insist that these are all abstract things and transformations between them, but meanwhile CS people will keep calling arrays "vectors", "matrices", and "tensors".
Even in physics, there are applications of tensors which essentially treat tensors as multidimensional arrays (see for example, tensor networks) with no predefined transformation properties. But the operations done on tensors are always linear.
Lua is more low level and has an extremely isolated and fractured community relative to the current Python ecosystem. It is also non-intuitive and has negligible benefits compared to the current scientific Python ecosystem.
I find the abstractions offered by Python and its standard library to be very easy to comprehend, write, and maintain relative to Lua.
There may be some reasons why recommendation engines may not advance as fast:
- It's always been more valuable than playing Go, allowing vastly larger ressources to be dedicated to optimizing the current models
- Image-processing and NLP both each profited from specific inventions (i. e. RNNs allowing position-independent feature detection in images).
I'm actually looking forward to a time where amazon can recommend me books based on the shoes I bought last year. I don't think I've ever seen a recommendation engine that impressed me.
Tensorflow is not adding some magical improvements to Machine Learning, it's just one more framework (from a reputable company). The hard(er) part is getting data, cleaning it, testing, making sure it works in production, and updating as data changes.
I'd say now is the time to start a niche company on top of TF. You'll be acquired faster than you can say "minimum viable product".