
Microsoft quietly deletes largest public face recognition data set - Turukawa
https://www.ft.com/content/7d3e0d6a-87a0-11e9-a028-86cea8523dc2
======
josefresco
"The site was intended for academic purposes. It was run by an employee that
is no longer with Microsoft and has since been removed."

What in the ACTUAL fuck Microsoft.

~~~
floatingatoll
“The people whose photos were used were not asked for their consent, their
images were scraped off the web from search engines and videos under the terms
of the Creative Commons license that allows academic reuse of photos.”

Is it ethically acceptable to republish someone's CC-licensed face annotated
for the purpose of training recognition algorithms?

Is the dataset inclusive as a composite whole? Does it have biases such as
“primarily white-male” or “majority white”?

Should facial recognition data and training studies of people’s faces be
required to adhere to the same ethical review practices that psychology and
sociology studies of people are required to adhere to?

If I were Microsoft, I would red flag every one of these questions as reason
enough to take it down immediately until answered.

~~~
DarkStar851
I don't personally see the ethics issue if they were using publicly available
pictures. That would be like someone scraping public FB profiles.

That person posted the image/information willingly knowing they lose all
control over how it is used and who it is seen by.

Could definitely cause some potential bias though if your input set isn't
filtered for some kind of diversity.

~~~
Silhouette
_That person posted the image /information willingly knowing they lose all
control over how it is used and who it is seen by._

This seems like a dangerous precedent. There have been cases where images of
recognisable people that were made available with some liberal licence were
then used as part of marketing for deeply offensive campaigns or illegal
activities, for example.

I don't think it's reasonable to say that anyone who volunteered to let others
use their image for general purposes should automatically accept the kind of
portrayal that would result in a defamation lawsuit in other contexts. You can
call them naive for not anticipating nasty people doing that with their image,
but naivety isn't a crime. Meanwhile, being portrayed deliberately and without
warning as a child abuser or a supporter of a highly unpopular politician or a
drug addict or a terrorism suspect could have profound and immediate
consequences for the subject, who obviously didn't intend to consent to that
and may have no idea it has been done until the reality catches up to them.

~~~
DarkStar851
Misrepresentation of the images is something Microsoft has no more power over
than Google Image Search. At the very least their dataset here wasn't
including names/locations/etc. I don't really see how this is any different
from Google using their own data in projects like DeepMind. At least Microsoft
admitted the project didn't go as planned and they're shuttering it, and
cleaning up their data.

------
lohszvu
Anyone have a bittorrent hash for the data set?

