“The people whose photos were used were not asked for their consent, their images were scraped off the web from search engines and videos under the terms of the Creative Commons license that allows academic reuse of photos.”
Is it ethically acceptable to republish someone's CC-licensed face annotated for the purpose of training recognition algorithms?
Is the dataset inclusive as a composite whole? Does it have biases such as “primarily white-male” or “majority white”?
Should facial recognition data and training studies of people’s faces be required to adhere to the same ethical review practices that psychology and sociology studies of people are required to adhere to?
If I were Microsoft, I would red flag every one of these questions as reason enough to take it down immediately until answered.
That person posted the image/information willingly knowing they lose all control over how it is used and who it is seen by.
This seems like a dangerous precedent. There have been cases where images of recognisable people that were made available with some liberal licence were then used as part of marketing for deeply offensive campaigns or illegal activities, for example.
I don't think it's reasonable to say that anyone who volunteered to let others use their image for general purposes should automatically accept the kind of portrayal that would result in a defamation lawsuit in other contexts. You can call them naive for not anticipating nasty people doing that with their image, but naivety isn't a crime. Meanwhile, being portrayed deliberately and without warning as a child abuser or a supporter of a highly unpopular politician or a drug addict or a terrorism suspect could have profound and immediate consequences for the subject, who obviously didn't intend to consent to that and may have no idea it has been done until the reality catches up to them.
Misrepresentation of the images is something Microsoft has no more power over than Google Image Search. At the very least their dataset here wasn't including names/locations/etc. I don't really see how this is any different from Google using their own data in projects like DeepMind. At least Microsoft admitted the project didn't go as planned and they're shuttering it, and cleaning up their data.
To narrow this further to the science-fiction ethical issue that deepfakes and facial recognition are both forcing to the foreground, try this on for size:
“All humanity has the inalienable right to control how their likeness is transformed by others. Consent must be given freely by either the human or their delegated representative, and no discrimination against refusal to permit transformation, whether by default or by declaration, shall be permissible under law.”
I’m not asking if they gave up copyright on their photos. They did. I’m asking, for example, if it’s ethically appropriate for Microsoft to publish annotated public domain photos without requiring a human ethical review for each use of their dataset. If I wanted to perform a sociological study on that dataset, I’d have to get a review board’s approval. Why is performing a statistical study (literally, machine learning) somehow exempt from that ethical concern?
Or to avoid privacy PR backlash. Or to gain advantage with it internally while denying that benefit to those who haven't already copied it. Or to monetize access to it differently.
What in the ACTUAL fuck Microsoft.