
Google Open-Sourcing TensorFlow Shows AI's Future Is Data, Not Code - walterbell
http://www.wired.com/2015/11/google-open-sourcing-tensorflow-shows-ais-future-is-data-not-code/
======
chimtim
IMHO, Google released Tensorflow because AI is currently being driven by
research, and researchers were mostly writing code for Torch that is used at
Facebook. So FB folks were enjoying lots of new algorithms, benchmarked
against their systems. Google open-sourcing TF shows that benefits of open-
source outweigh the disadvantages of being closed source. Even if the future
is data and not code, companies will not open-source their code if they have
nothing to benefit.

~~~
datashovel
Yes, I think at Google's scale one of the biggest pain points is almost
certainly integrating new employees to do things "the Google way". Instead of
picking up qualified people who may or may not buy in to what you're doing
internally, why not pick from a group of candidate employees who have already
bought in and have self-taught, at zero cost, how to do things "the Google
way".

Also, as far as I understand, TensorFlow is not technically a set of
proprietary algorithms. It's basically a framework for ML.

~~~
munificent
> I think at Google's scale one of the biggest pain points is almost certainly
> integrating new employees to do things "the Google way".

Ramping up at Google is definitely a stressful process, but I don't think open
sourcing technology puts much of a dent in that. Even if you show up your
first day at work knowing every tool Google uses, on your second day there
will be some new tool and something else will be deprecated. By a couple of
years, damn near every piece of software you were familiar with will have been
replaced by something different. You are in a constant state of learning.

This is, I think, one of the reasons Google places such a premium on
algorithms and data structures in hiring. They are some of the few things that
don't change often and being familiar with fundamental concepts makes it much
easier to quickly pick up a new tool that uses them.

~~~
datashovel
Thanks for the insights. I can definitely see what you're saying from the
perspective of a software engineer.

I wonder if those same concepts translate over to onboarding people who might
have a weaker programming background, such as mathematicians / theorists who
may have a more difficult time making the switch. For example if they've been
using the same toolset their entire careers.

This last point isn't in response to your comment, but more a response to the
ideas presented in the article. I don't necessarily buy the idea that the data
is incentive enough to researchers at the top of their fields to leave what
they're doing to go work at Google. Surely they have access to plenty of
public datasets large enough to accomplish what they want to accomplish. So
the requirement for switching tools may be a much bigger hurdle when trying to
recruit for ML. Maybe I'm wrong in assuming they are targets for employment at
Google.

------
zhanwei
Yes, data is important to AI but open-sourcing TensorFlow doesn't mean that
code is not. Rather, it means that data and code have different strategic
value. Data is their secret sauce and code is their network. The more people
use and contribute to the library, the better the code gets.

Also important to note TensorFlow is probably not the complete package of what
they use at Google.

~~~
eva1984
Second the last point. Without the Google's infrastructure, you can
effectively leverage the effectiveness of more data.

------
wslh
This is old news: Peter Norvig, The Unreasonable Effectiveness of Data (2011):
[https://www.youtube.com/watch?v=yvDCzhbjYWs](https://www.youtube.com/watch?v=yvDCzhbjYWs)
and a previous article from 2009:
[http://static.googleusercontent.com/media/research.google.co...](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35179.pdf)

------
abhshkdz
Aren't we getting too ahead of ourselves here? It's _just_ been released, we
don't even have favorable benchmarks yet. Sure, it's by Google and lots of big
names are associated with its development, but it still has to be adopted by
the community and proved to be faster/easier/more flexible than Torch/Theano.

------
cwyers
This is like saying that the future of houses is wood, not hammers and saws.
The future of houses is in HOUSES.

------
kybernetyk
But code is data :)

~~~
jlas
Found the Lisp programmer!

------
cgio
Why is every other article playing the anti-privacy game these days? If
anything, the discussion in the article should read as pro-privacy. If there
is so much value in private information, then further to ethical, there might
also be financial motivation to safeguard our data. Even more so in enterprise
environments and as the boundaries between personal and business devices are
getting blurrier. This is also why I think enterprise will move on to MYOD
(make your own device) from BYOD.

~~~
nindalf
I think you wanted to comment on the NYT article on encryption [1]. This is a
link to a Google AI library.

[1] -
[https://news.ycombinator.com/item?id=10580412](https://news.ycombinator.com/item?id=10580412)

~~~
danlindley
The article refers to Apple's taking a more "extreme" stance on privacy and it
being at a disadvantage for doing so. While I appreciate the article is about
data in AI, they neglect to even entertain the idea that privacy may be more
important in the long term than obtaining some added benefit by using the
personal information entrusted to them.

------
davidy123
Not a word in the article about Wikipedia, which is the source of so much
learning material by all players. Not to mention WikiData, which Google
supports. It's really a shame how concrete signs of a thriving commons, a
really exciting thing in the world and as important as the internet, are
ignored by surface level press.

------
masonhipp
It's been pretty clear for a while now that data has enormous financial and
strategic value.. But that doesn't mean code _isn't_ the future. You really
don't have much value out of one without the other, at least in the age where
most of our data is not easily understood.

The biggest issue with our learning algorithms is that they are incredibly
complicated and require high levels of mathematical understanding. The number
of people driving forward machine-learning is small simply because it is such
a difficult subject. There are many more people aggregating large and
interesting collections of data. I think by releasing TensorFlow Google is
encouraging data-collection built around their software; making it easier for
a majority of people to benefit from machine learning while ensuring the
continuation of their own product, code, data-collection, and ecosystem.

------
0xdeadbeefbabe
This isn't news to wired is it? I thought wired knew what Andrew Ng said, "I
think AI is akin to building a rocket ship. You need a huge engine and a lot
of fuel. If you have a large engine and a tiny amount of fuel, you won’t make
it to orbit. If you have a tiny engine and a ton of fuel, you can’t even lift
off. To build a rocket you need a huge engine and a lot of fuel.

The analogy to deep learning [one of the key processes in creating artificial
intelligence] is that the rocket engine is the deep learning models and the
fuel is the huge amounts of data we can feed to these algorithms."
[http://www.wired.com/brandlab/2015/05/andrew-ng-deep-
learnin...](http://www.wired.com/brandlab/2015/05/andrew-ng-deep-learning-
mandate-humans-not-just-machines/)

------
tacos
The "future" of AI is systems, not code _or_ data individually. It's also the
past and present. Funny, that.

------
asgard1024
I think _the data_ can be, and should be, crowdsourced, like Wikipedia or
OpenStreetMaps are. However, I have no idea how to do that. How to take two
learned neural networks (or whatever) for a specific application (like image
recognition) and merge them efficiently? I think it needs to be figured out
first.

~~~
joshmarlow
I don't know much about actually _merging_ networks. I knew a guy who trained
multiple networks in parallel for his M.S. and then combined them by just
averaging the corresponding weights. It seemed to work well. For this to work,
the networks needed to have the same architecture (same number of layers,
nodes in each layer and same connectivity between layers).

Now if you just want to train multiple neural networks (or other classifiers)
on different datasets (to have different strengths) then you can keep them
separate and build a composite system that lets each network "vote" on an
answer to a given problem; the decision of the overall system is a weighted
some of the components. See [0].

[0] -
[https://en.wikipedia.org/wiki/Ensemble_learning](https://en.wikipedia.org/wiki/Ensemble_learning)

------
graycat
Data? Code? I vote for engineering and, there, sometimes applied math.

E.g., once some colleagues and I gave a paper at an AAAI IAAI conference at
Stanford. All the good work was just engineering. For our work, basically just
some code, later I found and published some applied math that did much better.

------
data_spy
This article is spot on. 'In the Plex' constantly referred to Google's search
being strong because it had such rich search data and not because of
predictive algorithms. They used the experience of what other people searched
to determine what was best to show

------
1024core
There's a reason the field is called "data mining", and not "code mining" :D

~~~
p1esk
Data mining is done with code.

------
fauigerzigerk
Here's a provocative idea: Maybe it's because deep learning and all the other
popular AI algorithms are complete and utter rubbish.

Maybe using them has nothing to do with standing on the shoulders of giants
but much more with standing on the shoulders of the local maximum that is
achievable by throwing insane amounts of data at dumb algorithms.

If you could choose between access to algorithms and data structures that
exactly mimic the human brain and a data set that contains everything all
humans taken together know, what would you choose?

~~~
bduerst
They're not rubbish, they have their uses. You kind of outlined where deep
learning tends to shine - when you have large amounts of data and massive CPU
infrastructure. Pattern recognition and machine learning work, but typically
require more manual overhead than throwing tons of data at "dumb" algorithms
to achieve the same output.

In end, you need both the algorithms and the data to do the work, and choosing
between the two leaves you still wanting the other.

~~~
fauigerzigerk
Of course, and I don't seriously claim that they are rubbish. They are useful
and I admire some of the people who have developed them.

But I think we need to question why data seems to have this outsized value
compared to algorithms. I don't think it is some sort of information
theoretical invariant. It's a relationship between the specific algorithms and
the specific sort of data we have.

------
enlightenedfool
I beg to differ. Assuming we want to make machines intelligent by mimicking
humans, more focus would go into modeling brain which means algorithms. A
human (perhaps even a baby) could see one image (and its context?) and
identify anything similar. It's not processing terabytes of similar images to
"learn" about the object.

~~~
RobertoG
"It's not processing terabytes of similar images to "learn" about the object."

Evolution did the processing and saved the results as the architecture of the
visual cortex. I suppose that, in a way, it's a work in progress.

------
yeukhon
I thought TensorFlow is designed to run mostly on Google's infrastructure,
although I have seen a post about someone trying to get the CUDA code working
on Amazon GPU instance. Am I mistaken?

~~~
steamer25
I haven't tried it out yet but one of their 'selling' points is portability.
From the current front page of tensorflow.org:

"TensorFlow runs on CPUs or GPUs, and on desktop, server, or mobile computing
platforms. Want to play around with a machine learning idea on your laptop
without need of any special hardware? TensorFlow has you covered. Ready to
scale-up and train that model faster on GPUs with no code changes? TensorFlow
has you covered. Want to deploy that trained model on mobile as part of your
product? TensorFlow has you covered. Changed your mind and want to run the
model as a service in the cloud? Containerize with Docker and TensorFlow just
works."

------
siscia
Machines by themselves don't and will never understand data, what we feed into
the machine must be carefully clean, it won't work at the first run and it is
necessary to have at least an idea of the why it isn't working...

Even if the software is trivial, and is not, is still necessary a lot of
specialized, high skilled, work to make the whole AI deal work...

Not to mention that to collect data you need well crafted software...

~~~
pjmlp
Until they learn to think by themselves and release Skynet upon us....

Jokes aside, it if perfectly natural that if we ever manage to understand how
biological computers work, we might be able to make them think just like us.

It doesn't need to be now, it can take a few hundred years more, assuming we
don't destroy ourselves until then.

~~~
yeukhon
But you need guidance throughout your life to understand things and build
knowledge. You can have a computer as sophisticated as a human brain. I am
sure in the history of evolution our human ancestors did not told by a deer
how to start a fire, or make clothes. But I think they slowly build up the
knowledge, and pass to the next generation. You are right that there is brain
in DNA.

