
Federated Learning - dedalus
https://federated.withgoogle.com/
======
grantlmiller
First, I've loved that Google open sourced Tensor Flow Federated as a way to
encourage the rest of the world to adopt this method of decentralized machine
learning.

Second, I was a bit disheartened that this concept had to be explained with a
comic strip to make it accessible because I hoped the benefits were clear to
everyone.

Third, I read the comic strip, learned new things (secure aggregation
protocol, wtf, amazing!), kicked myself for being smug and appreciated the
huge amount of effort that someone invested to communicate this.

~~~
pm90
Popular culture is a tool for the education of the masses. Even for people who
may be technically inclined, its not always evident what certain technologies
_really_ do.

I am a software engineer but mostly work on DevOps-y stuff. This was a very
accessible, low-investment way for me to understand exactly what "Federated
Learning" really meant.

Some of the best teachers at Univ had a way of explaining things in simple
terms. This comic strip has captured that experience in a more permanent form
a lot better than a textbook would.

~~~
dmix
Universities have a captive audience, they can take the time to walk you
through incrementally.

Most websites and online communication don't have that luxury. It's
interesting how the comic works well in these situations, while still pushing
a long-read format. Google did the long-form comic thing with Chrome too and I
remember reading it page-to-page back then.

But at the same time, is it a good idea as your primary website homepage as it
is here? Which would be unusual if there was anything more to it like
documentation, code, etc. Right now this website is clearly in an educate-the-
public mode only which is how they can get away with this being the primary
content.

------
walterbell
Google mentioned at I/O that speech recognition will soon (this summer?) be
performed locally on Android devices, with no voice data being sent to Google,
because they have been able to reduce the size of the model dramatically. Is
that related to federated learning?

Paper: [https://arxiv.org/abs/1811.06621](https://arxiv.org/abs/1811.06621)

~~~
ma2rten
No, federated learning is about training on the device not running prediction
on the device. Training a speech model on the device would be hard, because
there is no labeled data. We don't know what the user said.

~~~
strin
I can imagine the world relying more and more on unsupervised pre-training
approaches, such as BERT and GPT-2. Then we’ll just need a few labeled data to
generalize.

------
ahelwer
All right, I'm cynical as all heck about ad companies and privacy, but this
has me optimistic. Somebody disillusion me, why shouldn't I be optimistic?

~~~
cavisne
Well the cynical view would be.

1) This still lets you have personalized models, just trained on more than 1
user, thats fine at google's scale anyway

2) Their competitors (FB, AMZN) dont have the edge compute (Android) to do
this, and to a lesser degree don't have the ML stack (however Android
implements this at the API level will be very Tensorflow focused)

3) Now google can push for privacy regulations that prevent FB and AMZN from
storing your raw data

4) Profit

That said theres nothing stopping FB doing federated learning within their app
on mobile, I just don't think they have the privacy background to bother.

~~~
defen
> Their competitors (FB, AMZN) dont have the edge compute (Android) to do this

It's not Android scale but Amazon has sold 100 million Alexas.

~~~
hatsunearu
Doubt Alexas have enough juice to do on device training...

------
ivan_ah
This is very interesting for many reasons. First we have the privacy stance,
which is a tremendous step for big G. Whoever managed to push this through in
the "machine" of internal office politics deserves applause. The very fact of
acknowledging that users might want to control their data locally rather than
rsync everything all the time is a big step—it takes us off the "give me all
your data" train that we have been on for some time.

Talking about specific applications of your users' data makes a lot more
sense: "If you share X with us, you're helping to build a better model Y that
helps you with Z." Then the prompt "Do you want to share X?" makes a lot more
sense than the current generic prompts "App V wants to access all your data
W?" which doesn't tell you anything.

The anonymisation-by-aggregation aspect is interesting on it's own since it
provides a practical approach we can use today and not have to wait for
homomorphic encryption. There will probably still be "data leakage" but I can
see how aggregation can be fundamentally better than trying to shared
anonymized data by fuzzing identifiers, randomization, and binning, which are
notoriously hard to pull off and suffer from de-anonymisation attacks by cross
linking with other datasets.

Research-wise this could be a whole new field. Let's revisit all the ML
algorithms and look at the ones that lend themselves to federated updates.
Perhaps certain ML algorithms have been overlooked historically because they
are not "cutting edge" but lend themselves better to distributed model
updates? (I bet this is already a thing...)

The communication complexity aspects are also very interesting since it forces
us to think about bandwidth needed to communicate model updates and training
batching. For high-bandwidth settings we could consider training a model from
scratch, for medium bandwidth you can send model updates regularly, but what
would be particularly interesting to see async and VERY low bandwidth
updates—like just a few MB every, exchanged once in a while when connectivity
is available.

~~~
pm90
> "If you share X with us, you're helping to build a better model Y that helps
> you with Z." Then the prompt "Do you want to share X?" makes a lot more
> sense than the current generic prompts "App V wants to access all your data
> W?" which doesn't tell you anything.

That would lead to wayy too much notifications. Just like ToS, people would
say yes or no blindly.

~~~
krick
Well, I really would like to say "no" to all of them, but somehow I don't
expect to be given an option.

------
archgoon
So, correct me if I'm wrong, but this basically only works when you've already
done your data exploration phase, you've committed to a particular topology,
and now you just want to optimize your weights?

It seems that this won't work so great if you don't have any initial data to
bootstrap yourself with. So, perhaps the idea is you bootstrap with a few
people, do your explorations, and then scale it out with federation?

------
ximeng
Linked paper on using this for Google Keyboard
([https://arxiv.org/pdf/1903.10635.pdf](https://arxiv.org/pdf/1903.10635.pdf))
highlights that there are nevertheless still privacy issues with this
approach:

While Federated Learning removes the need to upload raw user material — here
OOV words — to the server, the privacy risk of unintended memorization still
exists (as demonstrated in (Carlini et al., 2018)). Such risk can be
mitigated, usually with some accuracy cost, using techniques including
differential privacy (McMahan et al., 2018). Exploring these trade-offs is
beyond the scope of this paper.

~~~
CyanTas
You’re not wrong, but it does say those privacy risks can be mitigated with
differential privacy. That McMahan et al. paper (which is also Google) makes
the accuracy cost seem low.

[https://arxiv.org/abs/1710.06963](https://arxiv.org/abs/1710.06963)

------
wybiral
How do they assure you that the training algorithm isn't just exfiltrating
your data?

Edit: By that I mean... What's stopping the model from being as simple as
"learn my personal information"?

~~~
AlexCoventry
It looks from the comic like they aggregate data with linear structure
(probably derivatives on the model parameters.) Each device adds a mask to
their part of the data, and somehow the masks are coordinated across devices
so that when the data are summed on the central training server, the masks
cancel out.

It's unclear from the comic how the masks are coordinated, or how they
compensate for the risk that a participating device drops out (which will make
all the other data from that iteration useless, if you set the masks up in a
naive way.)

~~~
wybiral
> It looks from the comic like they aggregate data with linear structure
> (probably derivatives on the model parameters.)

Thanks, that's the part I must have glossed over. It looks like they're using
secret sharing to distribute as shares that all need to be together to
reassemble [1].

[1] [https://storage.googleapis.com/pub-tools-public-
publication-...](https://storage.googleapis.com/pub-tools-public-publication-
data/pdf/ae87385258d90b9e48377ed49d83d467b45d5776.pdf)

~~~
AlexCoventry
Thanks, just came back to share that link. :)

From the introduction, it looks like they group participants into smaller
clusters, coordinate between those via the centralized server in a star
topology, use Diffie-Helman to share secrets between the participants in a
cluster, and construct the canceling noise within that cluster.

The Shamir secret sharing squicks me a bit. It looks like if the adversary can
control cluster membership (and Google is the adversary, here), they can
recover the gradients.

> _To prevent the server from simulating an arbitrary number of clients (in
> the active-adversary model), we require the support of a public key
> infrastructure that allows clients to register identities, and sign messages
> using their identity, such that other clients can verify this signature, but
> cannot impersonate them. In this model, each party u will register to a
> public bulletin board during the setup phase. The bulletin board will only
> allow parties to register keys for themselves, so it will not be possible
> for the attacking parties to impersonate honest parties._

------
pas
What happens with the zero-sum cancelling out phase if one device disappears
during the process?

~~~
tylerhou
From the paper: [https://storage.googleapis.com/pub-tools-public-
publication-...](https://storage.googleapis.com/pub-tools-public-publication-
data/pdf/ae87385258d90b9e48377ed49d83d467b45d5776.pdf)

> We rely on Shamir’s t-out-of-n Secret Sharing [50], which allows a user to
> split a secret s into n shares, such that any t shares can be used to
> reconstruct s, but any set of at most t − 1 shares gives no information
> about s.

------
gok
Federated learning is a potentially really great idea, but it's important to
be upfront about its limitations. Just because I can't prove that a piece of
data came from your device doesn't mean that a machine learned model trained
on that data isn't violating your privacy.

For example, say we deployed federated learning to train a predictive language
model, and allowed it to learn from emails, say, inside Google. Looking at
what the model predicts when you type "Here at Google our next secret project
is..." could very likely reveal something they wouldn't want widely revealed.

------
jonathanhd
I'm genuinely still unsure if this is a parody or not. The first half of the
comic just describes Google's business model and the second seems to be trying
to outsource the cost of G/TPUs to the end user. Then at the end they go
bankrupt and (presumably) sell their control over the data to a vulture fund.

None of this addresses the fundamental problem of advertising companies, once
people learn what they're doing they just want them to feck off and leave them
alone, without any regard for future promises.

~~~
AlexCoventry
It's no joke.
[https://federated.withgoogle.com/#learn](https://federated.withgoogle.com/#learn)

------
im3w1l
My gut feeling tells me not to believe their promises that it's impossible to
deduce the data from the model updates. That there should be attacks.

My stylistic criticism is that they portray white men in a demeaning way that
they would never dare do to any other group.

edited to make a weaker claim

~~~
CyanTas
Do you have a specific technical criticism of the secure aggregation protocol?
That’s what’s supposed to make it impossible to deduce the data from model
updates. Or is your concern something else?

~~~
im3w1l
I hadn't really read it at that point. It more seemed like a too big
achievement for me to believe that anyone could solve.

One issue with their approach I found while causally browsing is

"for the proof against active adversaries, we assume that there exists a
public-key infras-tructure (PKI), which guarantees to users that messages they
receive came from other users (and not the server). Without this assumption,
the server can perform a Sybil attack on the users in RoundShareKeys"

Basically assume there is some trustworthy entity that solves Sybil attacks. I
don't think such an entity exists. So question is how they solve that in
practice.

------
arthurcolle
How can the data be sent in an encrypted manner that can then be useful
without the server having a copy of the private keys used to encrypt the data
itself?

~~~
ddtaylor
Homomorphic Encryption

~~~
arthurcolle
Does this actually work today? I was tangentially involved in some 'zero
protocol'/zcash-related projects a few years back and the lack of ability to
communicate and transfer information while being able to perform computation
on it was a major drawback to most of the interesting ideas in the space.

Are they actually using this in this intended federated learning plan? If so
that's a truly major innovation.

~~~
CyanTas
Fully homomorphic encryption, in which you can do arbitrary computation on
encrypted data, is still quite slow. But partially homomorphic decryption, in
which you can add encrypted values together but not multiply (or vice versa),
is quite efficient. And since the secure aggregation protocol only needs to
add together encrypted values to get an average, it only needs partially
homomorphic encryption properties.

~~~
ddtaylor
I believe there is also a proof that says any partially homomorphic system can
be reworked into a FHE.

~~~
CyanTas
You're thinking of "somewhat homomorphic encryption", which is homomorphic
encryption that can support both addition/OR and multiplication/AND, but only
in circuits of a limited depth. The original FHE paper did indeed prove that
you can rework any "somewhat homomorphic" system into a fully homomorphic one.

Partially homomorphic encryption is different because it really only enables
one of those two types of operations. For example, Pallier encryption has the
property that Enc(A) + Enc(B) = Enc(A+B), but there's no way to go from Enc(A)
and Enc(B) to Enc(A×B).

~~~
ddtaylor
Thanks for the clarification.

------
satokema
Renting my phone out to process data gives me a bad feeling. The airplane mode
guy is now just straight up turning off the phone and battery.

~~~
anonytrary
This is reminiscent of bitcoin mining, except the thing being mined here is an
AI's "intelligence", and consumer's data is the key to it. The benefit is that
the consumer doesn't have to give up their data, just their compute power.
Obviously, this should be an opt-in service and people should be getting paid
for the compute time they loan out.

------
arkades
call me jaded but:

If you’re paying for PR firms to produce cartoons about how good you are for
privacy, you’re probably terrible for privacy.

This feels like Google’s Joe Camel moment.

~~~
Thorrez
Google Chrome launched with a comic by Scott McCloud in 2008. It looks like
Scott McCloud helped on this Federated Learning comic as well.

[https://www.google.com/googlebooks/chrome/big_00.html](https://www.google.com/googlebooks/chrome/big_00.html)

------
unreal37
So instead of sending the data to Google encrypted for them to analyze, it
analyzes the data on your device and sends that data to Google encrypted for
them to combine the results.

But your data still gets sent to Google. I don't see the difference. It's just
another layer on top.

~~~
carlosdp
Well no, if you read the part about Secure Aggregation, Google has no way of
knowing which piece of training results comes from which device, they can only
see the aggregated results of a batch.

So sure, technically the training results based on your data are still sent to
Google, but that's not really the concern they're addressing. They're
addressing Google having a record in a database of every shop you visited in
the last week and such and that data getting in the wrong hands (or being used
wrong by them). What if they could benefit from training on that sort of data,
without ever actually storing it themselves?

