
Machine learning without centralized training data - nealmueller
https://research.googleblog.com/2017/04/federated-learning-collaborative.html
======
nostrademons
This is one of those announcements that seems unremarkable on read-through but
could be industry-changing in a decade. The driving force between
consolidation & monopoly in the tech industry is that bigger firms with more
data have an advantage over smaller firms because they can deliver features
(often using machine-learning) that users want and small startups or
individuals simply cannot implement. This, in theory, provides a way for users
to maintain control of their data while granting permission for machine-
learning algorithms to inspect it and "phone home" with an improved model,
_without revealing the individual data_. Couple it with a P2P protocol and a
good on-device UI platform and you could in theory construct something similar
to the WWW, with data stored locally, but with all the convenience features of
centralized cloud-based servers.

~~~
brianorwhatever
Which is why I am surprised seeing this come from Google. Everyone has already
admitted they are fine sending all of their data to them which benefits them
greatly.

~~~
nintendo1889
Google's internal training emphasizes to do the right thing and compete
fairly%, going so far as to not use terms in PR or even internal email such as
'crush the competition' 'dominate' 'destroy', and always doing what's good for
the user, rather than bad for the competition.

% and often mentions competition/monopoly laws

~~~
lern_too_spel
There's nothing altruistic about that. Emails with those words will cause
problems for the legal department when they come up in discovery during
antitrust litigation.

~~~
brain5ide
Wittgenstein disagrees.

~~~
lern_too_spel
Google forcing their employees to go through training to avoid bribery, sexual
harassment, and antitrust problems for the company is not due to anything
other than saving the company money. To be pedantic, the disagreement with GGP
is not whether the actions are altruistic but whether the actions were done
out of altruism.

------
whym
Their papers mentioned in the article:

Federated Learning: Strategies for Improving Communication Efficiency (2016)
[https://arxiv.org/abs/1610.05492](https://arxiv.org/abs/1610.05492)

Federated Optimization: Distributed Machine Learning for On-Device
Intelligence (2016)
[https://arxiv.org/abs/1610.02527](https://arxiv.org/abs/1610.02527)

Communication-Efficient Learning of Deep Networks from Decentralized Data
(2017) [https://arxiv.org/abs/1602.05629](https://arxiv.org/abs/1602.05629)

Practical Secure Aggregation for Privacy Preserving Machine Learning (2017)
[http://eprint.iacr.org/2017/281](http://eprint.iacr.org/2017/281)

------
binalpatel
Reminds me of a talk I saw by Stephen Boyd from Stanford a few years ago:
[https://www.youtube.com/watch?v=wqy-
og_7SLs](https://www.youtube.com/watch?v=wqy-og_7SLs)

(Slides only here: [https://www.slideshare.net/0xdata/h2o-world-consensus-
optimi...](https://www.slideshare.net/0xdata/h2o-world-consensus-optimization-
and-machine-learning-stephen-boyd))

At that time I was working at a healthcare startup, and the ramifications of
consensus algorithms blew my mind, especially given the constraints of HIPAA.
This could be massive within the medical space, being able to train an
algorithm with data from everyone, while still preserving privacy.

~~~
gabrielgoh
I think the distinction here between "handing over your data" and "letting a
model train on the data on your device" may be more subtle than you might
think. There is still no guarantee of privacy - it is trivial to construct
objective functions which probe data from your device.

~~~
pepve
I just skimmed their secure aggregation paper (linked in the post), and while
I'm no expert, I believe they can actually guarantee privacy. At least for the
strong version they describe (there's also a weak one which requires trust in
the server).

Edit, link to the paper:
[http://eprint.iacr.org/2017/281](http://eprint.iacr.org/2017/281)

------
andreyk
The paper:
[https://arxiv.org/pdf/1602.05629.pdf](https://arxiv.org/pdf/1602.05629.pdf)

The key algorithmic detail: it seems they have each device perform multiple
batch updates to the model, and then average all the multi-batch updates.
"That is, each client locally takes one step of gradient descent on the
current model using its local data, and the server then takes a weighted
average of the resulting models. Once the algorithm is written this way, we
can add more computation to each client by iterating the local update. "

They do some sensible things with model initialization to make sure weight
update averaging works, and show in practice this way of doing things requires
less communication and gets to the goal faster than a more naive approach. It
seems like a fairly straighforward idea from the baseline SGD, so the
contribution is mostly in actually doing it.

~~~
fudged71
I was told a long time ago that a cluster of raspberry pis would be useless
for machine learning due to processing power and I/O constraints.

This paper seems to suggest that this parallelization might actually be
feasible. Would you agree?

------
itchyjunk
"Federated Learning enables mobile phones to collaboratively learn a shared
prediction model while keeping all the training data on device, decoupling the
ability to do machine learning from the need to store the data in the cloud."

So I assume this would help with privacy in a sense that you can train model
on user data without transmitting it to the server. Is this in any way similar
to something Apple calls 'Differential Privacy' [0] ?

"The key idea is to use the powerful processors in modern mobile devices to
compute higher quality updates than simple gradient steps."

"Careful scheduling ensures training happens only when the device is idle,
plugged in, and on a free wireless connection, so there is no impact on the
phone's performance."

It's crazy what the phones of near future will be doing while 'idle'.

\------------------------

[0] [https://www.wired.com/2016/06/apples-differential-privacy-
co...](https://www.wired.com/2016/06/apples-differential-privacy-collecting-
data/)

~~~
jd20
While I think you can definitely draw some parallels, differential privacy
seems more targeted at metric collection. You have to be able to mutate the
data in a way that it becomes non-identifying, without corrupting the answer
in aggregate. Apple would still do all their training in the cloud.

In contrast, what Google's proposing is more like distributed training. In
regular SGD, you'd iterate over a bunch of tiny batches, sequentially through
your whole training set. Sounds like Google's saying each device becomes it's
own mini-batch, and it beams up the result, and Google will average them all
out in a smart way (I didn't read the paper, but this was the gist I got from
the article).

Both ideas are in the same spirit, just the implementations are very
different.

~~~
Eridrus
Differential Privacy is much more than what Apple's PR department says,
differentially private SGD is already a thing.

~~~
jd20
Well, forget Apple for a moment (that was just an example, since parent asked
about them specifically): my point was what Google's describing is separate
from differential privacy. There's no controlled noise or randomness being
applied.

They even say at the end of the paper: "While federated learning offers many
practical privacy benefits, providing stronger guarantees via differential
privacy, secure multi-party computation, or their combination is an
interesting direction for future work." So, the "practical privacy benefits"
here is referring to the dimensionality reduction from running the raw data
thru the LSTM.

------
sixdimensional
This is fascinating, and makes a lot of sense. There aren't too many companies
in the world that could pull something like this off.. amazing work.

Counterpoint: perhaps they don't need your data if they already have the model
that describes you!

If the data is like oil, but the algorithm is like gold.. then they still
extract the gold without extracting the oil. You're still giving it away in
exchange for the use of their service.

For that matter, run the model in reverse, and while you might not get the
exact data... we've seen that machine learning has the ability to generate
something that simulates the original input...

~~~
justonepost
Laugh out loud. This was the premise of what we were doing 24/7 last year. I
really really doubt we were the only ones doing this. Anytime you have highly
valuable data that you can't share specifically but want to share the
aggregated ML results of, this is how you do it.

~~~
sixdimensional
I see. My understanding from the article was that it was a novel approach to
do the ML on the device but then efficiently transmit and somehow combine the
results together into a larger model.

I'm not an expert on ML (clearly). What format does the model actually take
when you share it? Is it raw data (like weights for neurons or something) +
the configuration of the algorithms?

I understand deanonymizing data and sharing aggregated results, but I thought
actually sharing that data via essentially encoding into an algorithm and then
sharing that algorithm is quite different.

~~~
Joof
Pretty much. It's batch learning; they train small batches on the phones and
upload the differences. It looks like they have a fancy way to keep the data
compressed, train during ideal times and encrypting the data until it gets
averaged into the model.

I'm not sure this is ready for decentralized P2P yet, but I would love to see
someone working towards that.

------
azinman2
This is quite amazing, beyond the homomorphic privacy implications being
executed at scale in production -- they're also finding a way to harness
billions of phones to do training on all kinds of data. They don't need to pay
for huge data centers when they can get users to do it for them. They also can
get data that might otherwise have never left the phone in light of encryption
trends.

~~~
willvarfar
I'm not understanding this as homomorphic privacy.

They take pains to say:

> your device downloads the current model, improves it by learning from data
> on your phone, and then summarizes the changes as a small focused update.
> Only this update to the model is sent to the cloud, using encrypted
> communication, where it is immediately averaged with other user updates to
> improve the shared model. All the training data remains on your device, and
> no individual updates are _stored_ in the cloud.

(my emphasis on the word _stored_ )

Now there are lots of scholarly articles on reverse-engineering and rule-
extraction from neural nets.

So Google, having the diff can actually get some idea what it is you are
trying to teach the net.

They just promise not to.

~~~
fauigerzigerk
> _They just promise not to._

Google makes the OS and the keyboard. If they wanted to run a keylogger on
every device against the express wish of users they could.

So I think the more important question is if someone else could steal or
"legally" request that data from Google and recover my keystrokes.

------
argonaut
This is speculative, but it seems like the privacy aspect is oversold as it
may be possible to reverse engineer the input data from the model updates. The
point is that the model updates themselves are specific to each user.

~~~
ehsankia
Well you obviously can't fully reverse engineer, since whatever model update
being sent is far far smaller than the overall data. Now, could you
theoretically extract "some" data? Maybe, but it still is strictly better than
sending all of the data.

~~~
sdenton4
The 'sentiment neuron' two posts over gives an indication of what this could
look like. 'Oh, I see you're giving a strong positive update to the porn
neuron...'

More generally, there's the notion that a sufficiently complex model encodes
the training data. There's been work on extracting training data (or highly
deformed versions of it) from a fully trained neural network. We should be
under no illusions that networks offer cryptographically​ strong protections
to their memories. It's simply not a design goal.

~~~
justonepost
Yep. Basically you just reverse the network and predict the source - if not
done right.

------
TY
This is an amazing development. Google is in a unique position to run this on
truly massive scale.

Reading this, I couldn't shake the feeling that I heard all of this somewhere
before in a work of fiction.

Then I remembered - here's the relevant clip from "Ex Machina":

[https://youtu.be/39MdwJhp4Xc](https://youtu.be/39MdwJhp4Xc)

------
siliconc0w
While a neat architectural improvement, the cynic in me thinks this is a fig
leaf for the voracious inhalation of your digital life they're already doing.

~~~
jd20
Particularly relevant: "Federated Learning allows for smarter models ... all
while ensuring privacy". Reading the paper, Google would still receive model
updates, so this statement seems based on assumption that you can't learn
anything meaningful about me based on those higher level features (which are
far reduced in dimensionality from the raw data). I'm curious how they back up
that argument.

------
emcq
Even if this only allowed device based training and not privacy advantages
it's exciting as a way of compression. Rather than sucking up device upload
bandwidth you keep the data local and send the tiny model weight delta!

~~~
visarga
It would probably train while the phone is charging and upload while using
WiFi, so, no problem.

------
sandGorgon
Tangentially related to this - numerai is a crowdsourced hedge fund that uses
structure preserving encryption to be able to distribute it's data, while at
the same time ensuring that it can be mined.

[https://medium.com/numerai/encrypted-data-for-efficient-
mark...](https://medium.com/numerai/encrypted-data-for-efficient-markets-
fffbe9743ba8)

Why did they not build something like this ? I'm kind of concerned that my
private keyboard data is being distributed without security. The secure
aggregation protocol doesn't seem to be doing anything like this.

------
muzakthings
This is literally non-stochastic gradient descent where the batch update
simply comes from a single node and a correlated set of examples. Nothing
mind-blowing about it.

~~~
wnoise
For the bits described in
[https://arxiv.org/abs/1610.02527](https://arxiv.org/abs/1610.02527) you're
essentially correct. Though it's still stochastic, and you can have mini-
batching on each node.

The interesting technical bits are in
[https://arxiv.org/abs/1610.05492](https://arxiv.org/abs/1610.05492)

To save on update bandwidth, they either restrict the gradient to a lower
dimensional space, or compress by quantizing the full gradient (which should
effectively add zero-mean noise) before sending it back. (In theory they could
do both of these, but they didn't actually test that.)

~~~
muzakthings
Just because you shuffle the examples on a single phone/user doesn't make it
stochastic.

The entire point of using stochasticity (ie: random shuffling) is to avoid
similar and/or a same-ordered run of examples from redirecting the hill
climbing in a globally non-optimal direction all at once.

A single user's examples will be very similar, so you can shuffle all the
examples from one user you want - that doesn't make it truly stochastic in the
context of gradient descent optimization.

The quantization / compression part is pretty cool though. I suppose that
could obfuscate slightly what the original example was for privacy purposes?
Seems like you'd lose on accuracy though.

------
legulere
Where is the security model in this? What stops malicious attackers from
uploading updates that are constructed to destory the model?

~~~
willvarfar
No different from a centralized approach?

When the rail company in Sweden first offered voice bookings people would hoax
it by calling and saying "I want to book a ticket from Göteborg to Stockholm",
and a robotic voice would reply "Do you want to book a ticket from Göteborg to
Stockholm?"; at this point the hoaxer says "No, I want to book a ticket from
Stockholm to Göteborg". And back and forth for hours and hours until the
system could no longer distinguish between Stockholm and Göteborg and the
system was basically useless.

A system that tries to learn corrections, e.g. the keyboard example in the
article, are all vulnerable to poisoning. As anyone who plays with poisoning
Google search results delights in.

~~~
anonymousDan
That's hilarious! Do you have a link to an article about it by any chance?

~~~
willvarfar
Heard it first-hand from a dev working on the project. Railway company is
called SJ.

My Google-fu is weak.

------
yeukhon
To be honest I have thought about this for long for distributed computing. If
we have a problem which takes a lot of time to compute but problem can be
computed with small pieces and then combined then why can't we pay user to
subscribe for the computation? This is a major step toward thr big goal.

------
holografix
I don't work with ML for my day job but find it exhilaratingly interesting.
(true story!)

When I first read this I was thinking: surely we can already do distributed
learning, isnt that what for example SparkML does?

Is the benefit of this in the outsourcing of training of a large model to a
bunch of weak devices?

~~~
nudpiedo
Just that a few teams at google need more CPU power and cannot get more budge
than their peer teams... perhaps even the publication of a paper like that is
rewarded internally in the company for the publishers. In spite of encrypted
communication between both parts, I am not sure how they will trust clients,
especially since recently has been a public call for generate (and now train)
faked data in user's client.

Perhaps a statistically random tests to ensure that client's code has not been
tampered.

In the other hand no one speaks about energy/battery consumption of the
clients (you got 8 cores and a gpu in your pocket right? Finally there is an
application a part of videogames which will take profit of them).

------
alex_hirner
I think the implications go even beyond privacy and efficiency. One could
estimate each user's contribution to fidelity gains of the model. At least as
an average within a batch. I imagine such an attribution to rewarded in money
or credibility in the future.

------
nudpiedo
Where is the difference between that and distributed computing? A part of the
specific usage for ML I don't see many differences, seti@home was an actual
revolution made of actual volunteers (I don't know how many google users will
be aware of that).

------
orph
Huge implications for distributed self-driving car training and improvement.

------
jonbaer
[https://en.wikipedia.org/wiki/Swarm_intelligence](https://en.wikipedia.org/wiki/Swarm_intelligence)

------
nialv7
I had exactly this idea about a year ago!

I know ideas without execution don't worth anything, but I'm just happy to see
my vision is on the right direction.

~~~
Dim25
are you working in related area now?

------
Joof
Could we build this into a P2P-like model where there are some supernodes that
do the actual aggregation?

------
mehlman
I would argue there is no such thing. The model will after the update now
incooperate your traning data as a seen example, clever use of optimization
would enable you to partly reconstruct the example.

------
hefeweizen
How similar is this to multi-task learning?

------
yk
Google is building a google cloud, that is they try to use the hardware of
other people, instead of other people using Googles hardware.

~~~
kornish
This appears to be mostly about privacy concerns and on-device performance.
Google hardly needs the computational power of even millions of phones versus
the behemoths that are their data centers.

(and the phones are only part-time, at that!)

~~~
Jabbles
Consider that it's not Google as a whole, it's the keyboard team, who
obviously have only a small fraction of Google's compute power.

Also, gboard is installed on between 500 million and 1 billion phones.

[https://play.google.com/store/apps/details?id=com.google.and...](https://play.google.com/store/apps/details?id=com.google.android.inputmethod.latin&hl=en_GB)

So I can't confidently dismiss the computational cost away just yet.

~~~
kornish
Ah, this is really interesting. And I suppose phones will only get more and
more powerful going forward. Thanks to you and nudpiedo for the respectful
counter!

------
exit
i wonder whether this can be used as a blockchain proof of work

------
Svexar
So it's Google Wave for machine learning?

