
How Google Is Remaking Itself for “Machine Learning First” - steven
https://backchannel.com/how-google-is-remaking-itself-as-a-machine-learning-first-company-ada63defcb70#.nljh17nb5
======
arbre
I don't believe in "everyone should work on machine learning". I worked on
several deep learning models but I don't really like it. It is a very
different job than software engineering in my opinion. ML is more about
gathering data and tuning the models as opposed to building stuff. I have
spent months working on models and barely wrote any code. It is more efficient
to have ML experts focus on the modeling and software engineers use the model.

I do believe however that some experience is needed to understand what is
possible and best benefit from existing tools or to be able to communicate
with machine learning engineers about your needs.

~~~
giardini
I concur. ML isn't programming per se; it is experimental problem-solving with
a particular dataset and algorithm. Your result may/not work well, may/not
generalise, and will almost undoubtedly not contribute anything new to any
discipline, even to ML. When all ML work is done we'll have great pattern
recognizers but nothing remotely akin to thought. And we won't understand how
they work or the best way to build the next one. It isn't AI, although it is a
part of AI, just as the visual system is part of AI.

I was reading Domingos' "The Master Algorithm" several days ago and a
mathematician inquired about the book. He knew a group of ML developers. His
opinion was that "ML doesn't look very interesting: all you do is play with
the parameters, turn the knobs, and/or change the model until something works.
There's no real progress there; nothing substantial."

Rather than sending a batallion of bright developers into the ML swamp where
they will largely be frustrated, learn little and contribute less, I'd be
tempted to guide them into other fields.

~~~
tdullien
I am a mathematician by trade, and was doing development along with other
stuff (reverse engineering and security work, first in my own company, then at
Google). So ...

1) I think working knowledge of ML is extremely useful to many developers, and
generally under-taught in universities. See the old Joel article which
mentions "Google uses Bayesian filtering like MS uses the IF statement"
[http://www.joelonsoftware.com/items/2005/10/17.html](http://www.joelonsoftware.com/items/2005/10/17.html)).
A well-rounded developer should know the basics (Logistic regression, SVMs,
know some things about CDNNs etc.), it will make him much more adept at
problem-solving. I suspect Google's internal push to get people up to speed is
not to turn them all into ML researchers, but rather to make sure that
everybody "knows the basics well enough".

So I think it is useful to teach developers about the things ML has to offer.

2) Mathematically, it seems that in ML the "engineering" side has run far
ahead of the theory side. The sudden breakthrough in the mid-2000s is IMO
still not fully understood - and parts of it may have been very accidental.
Initially, it was thought that pre-training was the big breakthrough, but it
is quite unclear _what_ the big breakthrough was. It could be that simply the
increase of data / compute sizes and the switch to minibatch-SGD explains why
modern DNNs generalize well (interesting paper on the topic:
[https://arxiv.org/abs/1509.01240](https://arxiv.org/abs/1509.01240)). There
is a lot of good mathematics to be written, but I am not sure whether the
folks at Google will write it - given the incentive structures (performance
reviews, impact statements) it is unlikely that somebody gets promoted for
"cleaning up the theory".

3) From a development perspective: There are a ton of interesting engineering
problems underneath the progress in ML. If you look at Jeff Dean, he is a
superstar _engineer_ , not necessarily a mathematician, and a lot of the
progress the Google Brain team made were engineering advances to scale /
distribute etc. - so by training the engineers in ML, you also get to have
better infrastructure over time.

So I don't think they are sending "developers into ML swamps"; I think they
are trying to reach the point where "Google uses DNNs like MS uses IF".

Cheers, Thomas

~~~
randcraw
I don't think your points are invalid, but I think you overvalue the data
that's available and relevant to most programming tasks. And without novel
data, ML can offer little novel value.

Google, Facebook, M$ Research, and perhaps Yahoo are extreme outliers. They
have zottabytes of broad unstructured text data, so they mine it. Everybody
else has megabytes of narrow structured data, most of it commercial
transations of their products. That stuff has already been effectively mined
by traditional basic OLAP methods. Most/all of the value has been extracted.

Mainstream software apps have yet to show the value of using ML. Such apps
have access to very limited data of very narrow relevance. The utility of ML
in such domains isn't new; it's classic optimization. Or it's bayesian
anticipation. But it's not a game changer. Frankly, the use of ML in most
mainstream apps is more likely to add distraction and annoyance as the
computer mispredicts your intent -- like Microsoft Bob did.

Maybe "life in the cloud" will create new opportunities for smarter software.
But I _definitely_ don't want free apps making their own decisions when to
notify me. I guarantee that will get old immediately. So how _will_ this work?
Frankly, I can't guess. Like Apple's iAds, programming ML into the mainstream
or cloud sounds like an idea that will serve the software / cloud vendor far
better than the user.

~~~
dgacmu
Think outside consumer-facing applications. Medicine, biology, geology (oil,
gas, and mining), finance, transportation. Tons of data, tons of dollars, and
important problems.

~~~
randcraw
I work in a big pharma analyzing image and experimental data. In a prior life
I analyzed social cliques from vast numbers of user transactions. In both
cases it seems like greater volumes of data should lead to deeper insights.
But as it happens, the amount of useful actionable _information_ in that data
was surprisingly limited.

Often the available sensors/assays failed to detect reliable info. Or the
phenomenon of interest interdepended on too many variables expressed with too
great a dynamic range for us to detect reliably or model usefully. (The
present lull in genomics R&D illustrates this well, as do automated
interpretation of signals like EEG and NMR spectra.) And the signals that we
can extract are often uninterpretable or sporadic. Alas, gathering more data
won't yield more signal. Given the present limit on sensor resolution, you
just get more mixed signals.

The potential of all ML is limited by the depth of the data that are essential
for the discrimination of subtler signals. In the domains you mention
(medicine, biology, geology, other sciences) I'm convinced we need better
sensors more than greater amounts of the same data available now. We need
better hypotheses which lead to better ideas of where to look and what to look
for. In general, ML can't help with that. Until we better imagine how the
mechanism _might_ work, our questions remain too vague.

To wit, I'm afraid that applying ML to most software apps will suffer from the
same limited ROI. I suspect that most app and user data is too shallow for
mining to add appreciable value, no matter how clever it is.

------
yomly
Articles like this for me tend to vindicate Google's notorious hiring
processes.

While it is true that for most people will not need to be able to whiteboard a
binary tree inversion in their day to day, it seems like they expect their
engineers to be able to throw themselves at any problem they're given and
require them to be able to pivot in skillset quickly, and have an appreciation
of all the developments going on around them so they can apply anything novel
ideas developed internally to what they are currently working on.

In those cases, hiring based on sound knowledge of CS fundamentals seems like
a good bet...

60k engineers is a pretty terrifying number though.

~~~
gonyea
Google's largely moved away from those BS questions. They just bias towards
people who memorize answers on Leetcode, but aren't actually capable of
producing anything.

~~~
throwaway42069
I know two people who've interviewed at Google in the past three months and
have received a full slate of computer science homework problems.

------
xenihn
Anyone happen to have a suggested self-teaching path for Machine Learning?
I.e. books and courses. I know that Andrew Ng's course is a great resource,
but I know that I'm not ready to start it yet. I'm actually way behind on the
mathematical pre-requisites, so recommendations for that would be greatly
appreciated as well. I've never taken a statistics course, and never received
any formal education for mathematics past trig. I know that I'm looking at a
good 6 months to a year just to get caught up on the math alone.

~~~
sputknick
Six months ago, I would have said Kaggle, Juptyer, Python, figure things out.
I've since discovered Microsoft's ML Studio. It allows you to start out with
drag and drop (no code to learn) and, most importantly, you can visually see
the output of your experiments. For example if you run a binary decision tree
algorithm you can actually look at images of the 1000 trees it created and
what the nodes from them is. Not important for practical functioning in the
real world, but I like it a lot as a tool to learn.

~~~
shostack
Does this actually teach you much though? Or will you more likely end up
toggling a bunch of things, seeing an output, and not having any better
understanding of what led to the output or why a given approach works better?

Not that there isn't value in immediate results for building excitement and
interest--I just want to have proper expectations before I check it out as I'm
in a similar state to the parent in terms of where my math is and wanting to
dive in.

~~~
true_religion
When learning, I like to continually create a mental model for what will
happen, and check if I'm correct. It's like doing problem sets in math, then
plugging in the problem to MathLab to see the result.

I've never used this Microsoft product, but if lets you take educated guesses
at what will work, and gives you some insights into the intermediate steps,
then its useful as a check that your mental model of machine learning is
becoming more coherent and useful.

Plus, if you slot in something and it gives a better output, you can go back
to your studies with a new target of finding out why X param changed things.

~~~
shostack
The particular concern that sparked this with me is the concern of over-
fitting to the data set. I don't know enough about ML to know how much of a
risk that might be, but with a tool like this I wonder if that becomes
obvious, or if you risk taking away false learnings just because you saw the
output you hoped for, despite it being perhaps horribly overfit.

Again, that's just one example, and the instant visual feedback is awesome
(I'm a visual learner, so that's huge). But at the end of the day, I know that
there is a lot of math and code under the pretty graphics, and at some point
I'll need to tackle that to make sure I am actually learning this and not just
making assumptions based on what I can eyeball with some visualizations.

~~~
Joof
Learn to multiply matrices (you can probably Google this). Note that A _B !=
B_ A in matrix math. Learn derivatives and how to do them with a lookup table.
Learn what log() means (the inverse of some number to a power).

That's enough to implement and understand neural networks. You'll fumble
around a lot more than you have to, but you can figure it out.

Honestly, you could probably fight your way through Ng's class with just
matrix multiplication, which you can learn in less than an hour fairly easily.

------
matt_wulfeck
And my anecdotal experience is that it's working extremely well. Take the
Google Photos app that does automatic image recognition and tagging. The other
day I was looking for a picture we took of our cat the first night we brought
him home. I remembered we left him with a blanket in the bathroom but couldn't
remember much else.

"kitten bathroom 2013"

And there was a picture of the cat sitting in the tub on a blanket. Simply
amazing.

~~~
cxseven
Strangely, half the time I try to use Google Now on my phone, it doesn't seem
to understand basic queries that worked two years ago. And in the meanwhile,
features and APIs that used to allow more reliable and explicit control (e.g.
like in Picasa) are being shut down. I guess someone at Google figured that
imitating Apple is worth sacrificing what remained of their power user appeal.

~~~
dzhiurgis
Just yesterday I was amazed I was unable to Google 'what is the smallest
website possible' or 'what can you fit on 32kb website' or whether html
demoscene exists at all. Or sometimes the results seem like complete spam,
instead of showing me answers on xcode, it was showing some heavily seo-ized
apple blogs.

~~~
superuser2
Pretty sure "HTTP/1.0 200 OK" is the smallest possible website.

------
hoodoof
I seem to recall Google focusing the entire company on social/GooglePlus. Is
this now saying the company is now being focused on machine learning in the
same way?

Reminds me of the Ballmer/Gates strategy of everything must be Windows, which
seemed flawed to me.

~~~
Bjorkbat
That's an interesting way to look at it.

I would argue that Google+ didn't work out because Google was trying to play
catch-up in a field that it just lacked knowledge in (social networks).

Whereas with machine learning, they're not playing catch-up, everyone else is.
Of all the other tech titans out there, they're the ones really leading the
pack.

That remark aside though, I agree with you. An attempt to go hard on machine
learning and apply it everywhere will probably work out pretty badly. As
fascinating as ML is, I just haven't bothered to learn it yet because I
haven't the slightest idea what new and novel problem I'd solve with it that
doesn't have a better solution through a more straight-forward approach.

~~~
Harimwakairi
"An attempt to go hard on machine learning and apply it everywhere will
probably work out pretty badly. I haven't the slightest idea what new and
novel problem I'd solve with it that doesn't have a better solution through a
more straight-forward approach."

Assuming they have the money, isn't this exactly the kind of reason Google
should train up a wide spectrum of engineers from different teams and then see
how they apply machine learning to their respective domains? It would be
foolish for Google's management to think they can divine a priori all the best
possible uses of ML in their various lines of business. Why not tool up a
bunch of smart people, set them loose, and see what works?

------
xg15
I was kind of surprised this article hooks with that relatively small "Ninja"
workshop. My impression so far was that Google more or less _created_ the
whole machine Learning movement (out of necessity from their two core field,
search and ads/analytics) and is employing several authorities of the field.

After Google Now, DeepDream and all the self driving car hype, reading about
that workshop being the start of the big transformation seems strange.

~~~
Houshalter
In 2008 Peter Norvig was quoted saying there was very little or any machine
learning in Search. They found it unreliable.

~~~
a_imho
I thought 8 years is a lot and felt ML is just becoming mainstream.

Interestingly trends shows me a steady incline for 'machine learning', while
searches for 'neural networks' are dropping since 2004

[https://www.google.com/trends/explore#q="machine%20learning"...](https://www.google.com/trends/explore#q="machine%20learning"%2C"neural%20networks")

------
glx1441
Peter Domingos? Really? Did they mean Pedro?

Sigh. Another instance of pop science getting most everything wrong (and I
haven't even bothered to write anything about the technical content in the
article).

~~~
apsec112
Could you say more? What do you think are the technical inaccuracies?

~~~
argonaut
A few I noted: Neural nets don't emulate the brain. NIPS is not an obscure
conference, it's been the top ML conference for decades (sure, it's an obscure
conference to laymen, but so is pretty much every science publication
conference).

~~~
a7x11
Agreed with this guy. Back when I started grad school (2012), NIPS was already
so big they moved it to Vegas, but the casino venue didn't fly so well, so it
moved to Montreal. NIPS was obscure maybe in the early 2000s, but definitely
NOT since the last 5 6 years.

------
z92
That's a good change from "social first" from a few years back. Google was
never a social company to start with. Remember Orkut?

AI is google's leverage. It should explore on that path.

------
Dowwie
I find this article alarming.

Jeff Dean said, "The more people who think about solving problems in this way,
the better we'll be". I sincerely hope that Sundar emphasizes the thoughtful
application of ML and not allow black box algorithms take too central a role.

This kind of hubris swept through wall street banks during the structured
products boom, ultimately leading to products such as synthetic collateralized
debt obligations. Taking Jeff Dean's opinion about whether machine learning
would be a good thing is like taking the opinion of the creator of synthetic
CDOs whether they were a good thing. The authors and evangelists are blinded
by optimism and opportunity.

Is Sundar Pichai swept away by the opportunities of machine learning and too
biased to be aware of risks ? Is Sundar acting like Stan O'Neil did as he
pulled all the stops at Merrill Lynch and went all-in with CDOs? I hope he
isn't. It does not seem to be the case as he mentions thoughtful use of ML.

Nonethless, caution should be taken.

------
nborwankar
Bit of a self-plug here - LearnDataScience
[http://learnds.com](http://learnds.com) has been well received as a starting
point for newcomers. It's a set of Jupyter notebooks with a lot of hand
holding. Git repo has data sets included so you can clone and go. All Python.

------
DrNuke
Not sure where it is going at all: evolutionary leaps often come from outliers
and sometimes from serendipity. What about this reinforced confirmation bias?

~~~
rhizome
On first blush, my sense is that a translation could go something like "we're
prioritizing the analytics API over the results API." Not analytics in the
webserver sense, but the OLAP/DW one. So, e.g. ad targeting fidelity over
results presentation algorithms. Backend biz vs frontend.

------
entee
This is a really great idea, especially when done right. The difficulty with
machine learning and AI is understanding the pitfalls inherent in selecting
data and training systems. You can fool yourself pretty easily into thinking
you've got something that works when you really don't. That said it sounds
like they're doing things well, I have no doubt this will have a positive
impact in demystifying the "magic" of ML/AI and making all those Google
products I use better!

~~~
wlamond
"And then (this is hard for coders) trusting the systems to do the work."

Like you say, it can be easy to think that something works when it really
doesn't. I hope that the above quote isn't meant to be interpreted as "believe
the results are correct." Evaluation is paramount when working on these
systems to avoid making such mistakes. I assume Google is including evaluation
in their machine learning training, but it would have been nice to see that
pointed out in the article for folks who may have an interest in machine
learning but don't know what's important to focus on.

~~~
tjl
One big problem with ML is that it's highly based on your training set.
There's been a few papers published in computational linguistics that discuss
how poorly ML based sentiment analysis is if you try and apply the data to
domains outside the training set. For instance, if you train the sentiment
data on movie reviews (which is actually a data set commonly used for that
purpose) and try and apply it to Twitter or the Web, the results are terrible.
But, people keep on trying it.

------
tdkl
I guess now we know who's responsible for asinine UI decisions lately (YouTube
apps, Material wastespace design). /s

------
jdeisenberg
The article says that Mr. Giannandrea is no longer head of the machine
learning division; out of curiosity, who has taken that position? It's not
clear from the article.

~~~
shoyer
He's still in charge of research -- he's just in charge of search now, too.

------
ycosynot
Maybe I talk nonsense, but the term "machine learning" could be detrimental to
learning it, because it feels so machinesque ... It's a cool term, but also
very vague and mystical, and from the antropomorphism it kinda implies the
engineer is a teacher, or a translator. You're not even started, and you're
already confused.

Surely it is better to talk of learning deep neural nets, and such things. Or
maybe "machine training" would be less intimidating. But I guess we're stuck
with it, and it's not so bad.

------
srtjstjsj
When will they move past the "Slogan First" magpie direction-switching?

------
StevePerkins
Great article, but I can't help but CRINGE at the "ninja" references. I think
that's already played out within the industry... and although pop-tech writers
tend to lag a few years behind, it will sound extremely dated in the
mainstream within a few years.

~~~
lugg
> “The tagline is, Do you want to be a machine learning ninja?”

I don't really like the word, but I don't really give a flop either.

I'm not sure how its better or worse than guru, rockstar, or any other lame
word recruiters like to use to make us feel like the special snowflakes we
are.

Which word would you like to see in place of 'ninja'?

~~~
StevePerkins
I'd rather see all of those juvenile testosterone labels discarded in general.

Sheesh... " _Do you want to make the world a better place?_ ", with a photo of
Gavin Belson holding an animal, would make me more inspired.

~~~
aoki
if it makes you feel better, i don't think you were supposed to be inspired.
"ML Ninja" is just the name of the rotation program. if your team sends you,
it's because they need someone to get the training, not because the program
name makes it sound cool. i doubt the PM thought it would be public when she
named it.

------
holografix
Reading shit like this makes me wanna drop everything and start a Maths degree
and get seriously into Machine Learning. Can you imagine being picked at work
to study something AWESOME while being paid for it?!? She must be a genius.

