```
Federated learning (also known as collaborative learning) is a machine learning technique that trains an algorithm across multiple decentralized edge devices or servers holding local data samples, without exchanging them.
```
basically you can train ML models collaboratively without ever seeing the other datasets. One example would be multiple hospitals training models to detect breast cancer without the need to exchange the data samples.
I'm interested in whether federated learning can bring ML to situations where you don't want to pool all the data in one spot for privacy reasons. Say you run a B2B Saas business in seperate tenancies and in each tenant contains sensitive information about that client (eg about their clients). Could you run a federated learning model such that it could learn in each tenant and improve the overall model but not share any of the sensitive information between tenants?
This depends on details of the ML model. There is a mathematical field devoted to this specific question, differential privacy, and the techniques are in production at scale at Google, Apple and in the US census.
Yes indeed. Using federated learning with Flower makes training over multiple disconnected partitions possible. Additionally there are privacy enhancing techniques such as differential privacy but they come with a cost and are not always nesesecary.
You can train models over multiple silos, devices, users and many other kind of partitioning where for some reason you can't aggregate the dataset centrally.
FL has emerged as a promising technique for edge devices to collaboratively learn a shared prediction model while keeping their training data on the device, thereby decoupling machine learning from the need to store the data in the cloud. However, FL is difficult to realistically implement due to scale and system heterogeneity. Although there are several research frameworks for simulating FL algorithms, none of them support the study of scalable FL workloads on heterogeneous edge devices.