RTFA - "The company said the users who were affected chose the option in Facebook’s Messenger app to have their voice chats transcribed. The contractors were checking whether Facebook’s artificial intelligence correctly interpreted the messages, which were anonymized."

This sounds totally reasonable to me? Low quality machine learning algorithm needs human labellers?

But why sample on realworld data from non-employees?

Huh? Because that's the product you're trying to improve!

