
Machine Unlearning - Tomte
https://arxiv.org/abs/1912.03817
======
jmmcd
I object to this paper title. The idea of unlearning is already well-known,
given there are benchmarks and previous results mentioned in the abstract.
This paper introduces a new model and gets good performance but doesn't
deserve to be named after the whole field.

------
mar77i
I wonder: is this the usual case, that users behavior or other data is used as
input for ML projects without their consent? Would they always have to opt out
rather than preparing the data in a way that prevents from disregarding users'
privacy? Or, assuming that testing for a relevant outcome invariably
transgresses on users' privacy, I wonder if this kind of ML work isn't a bit
unethical as a whole?

~~~
mlthoughts2018
In many cases, the terms of service of a company include provisions that usage
data of the products will be analyzed for research and experimentation
purposes, which are often considered necessary for the health of the business
(and thus even meet GDPR requirements for this data capture & use).

For example, a company couldn’t remain competitive and serve customers or
continue existing if it can’t perform a/b testing on new features or changes,
or look at descriptive statistics about which types of customers use which
products. Creating statistical models to answer these questions or to have
aspects of a product that personalize based on these data is a routine matter
of business operation. Rightly or wrongly, the terms of service are usually
enough to allow the business to use data this way, and often label it as
critical for the operation of the business.

~~~
mar77i
I don't see a problem of creating ML models for improving the same company's
services, which I can't really even imagine to require much explicit
customers' consent. If it's for the same company, the same way all clearance
would go to researchers as it went to statisticians doing evaluations for
companies in the past. If it's about optimizing your business, do you even
need ML? Asking questions about statistical data has been done long before the
current age of big-data statistical self-betrayal.

The way I see it is that people start to try building their businesses around
the ideas of ML, basically ML as a service, the catch there is just that their
ordering businesses data, which is really their customers' data will end up in
the big mess of aggregated, weakly correlated data, from which they then try
to derive their models that are supposed to make their money. At no point
there, I as the customer of company A, can be sure if I'm correctly or
incorrectly being correlated in those models. The need to delete me from these
evaluations arises from my wish to protect not just my individuality from
Brazil-like misinterpretations, but also to protect the companies asking the
questions for their businesses, too.

I don't know about you, but to me this casts doubt on the utility of non-
specific ML as an arbitrary interpretation of unspecific data that is as
useless to me as it is to my competitors, seems just Jack shit, really. You
wanna solve a problem? Go solve it by bringing the consumer and the producer
closer together, that counts for any business out there, especially insurance
and policy, and stop ramming another PC-driven layer of middle management ML
between them.

------
tw1010
I have such trouble trying to figure out which of these many algorithms that
are released will end up having a significant impact on a 5 year horizon. But
this I have a rare hunch about that it could be quite significant (at least
the direction in which it's trying to push).

------
throwaway29303
But if the point of Machine Learning is to generalise a given dataset,
wouldn't a particular pattern (the one one wishes to forget) be
(unintentionally) found given other unrelated and/or similar patterns?

------
seek3r00
I know it’s probably a stupid question but: why not just anonymising the data?

~~~
JorgeGT
Anonymising data is surprisingly difficult. I'm wondering if there is the
"falsehoods programmers believe about" about this, as there are for topics
such as names, addresses, time, etc.?

~~~
amelius
In support of this:

[https://techcrunch.com/2019/07/24/researchers-spotlight-
the-...](https://techcrunch.com/2019/07/24/researchers-spotlight-the-lie-of-
anonymous-data/)

------
mlthoughts2018
> “ Machine learning (ML) exacerbates this problem because any model trained
> with said data may have memorized it,”

Yikes, that’s a really draconian scare tactic way to frame it. It clearly is
meant to exacerbate misunderstandings of how statistical modeling actually
works.

