Hacker News new | past | comments | ask | show | jobs | submit login
Facebook Paid Hundreds of Contractors to Transcribe Users’ Audio (bloomberg.com)
149 points by minimaxir 7 days ago | hide | past | web | favorite | 19 comments
 help




> The company said the users who were affected chose the option in Facebook’s Messenger app to have their voice chats transcribed. The contractors were checking whether Facebook’s artificial intelligence correctly interpreted the messages, which were anonymized.

Where exactly is this setting? I've looked through Facebook's settings and Messenger's settings, but this option is rarer than a cheap white truffle. Does anyone know?



They chose the option to have their voice chats transcribed by a machine. Not by a person.

Unlike previous articles about tech-companies-listening-to-user-audio, this is over voice transcription rather than smart speaker QA.

Facebook does have a smart speaker (Portal) with voice commands (https://portal.facebook.com/help/2149102838698668/) but that isn't mentioned in the article.


Paywall Workaround: https://outline.com/pyzYAB

Am I the only one who see’s a problem with them actively working to convert a non-indexable data source into indexable and searchable one?

Is it from WhatsApp audio conversations? Or just random recordings through the apps - which Zuckerberg denied before?

RTFA - "The company said the users who were affected chose the option in Facebook’s Messenger app to have their voice chats transcribed. The contractors were checking whether Facebook’s artificial intelligence correctly interpreted the messages, which were anonymized."

This sounds totally reasonable to me? Low quality machine learning algorithm needs human labellers?

But why sample on realworld data from non-employees?

Huh? Because that's the product you're trying to improve!

Is it fair to wonder why they are using people when automagical Ai transcription should do this for them like the man from Google/deepmind/amazon/IBM/Microsoft said? Or is FAIRs really not up to much?

Not exactly sure what you're asking, but all tech companies hire people to transcribe audio precisely to gather data to train ML models to do transcription.

Is there a reason that these ML models are being hoarded as "secret sauce" when, for these companies, all the rivals they're concerned about also have all the resources required to build one that's nearly as good? It feels strange that we've got six different tech giants that have all independently spent tons of capital building up the training data required to sell people smart speakers/mobile speech control/etc. with these ML models, without any of them entering into cross-licensing agreements.

It seems like it'd make a lot more sense for Apple, Google, Amazon, Facebook, etc. to all pool their training data in an "industry working group" to build and license out one "best" model, the way that IWGs are formed to build and license out e.g. AV codecs.


> "to all pool their training data"

The press would skewer them alive and politicians will have a field day about tech companies violating privacy and sharing data.


It's bad enough that one BigCorp has my data. I'd rather not have them also give it out to every other BigCorp

ML is an extremely competitive field right now and everyone's trying to get an advantage over everyone else. Not too ripe for cooperation right now.

It's the same reason car makers don't all use the same platform. Everyone is hoping to get a slight edge over the others to preform better in the market.

Hang on, everyone's been at this for years now - are we seriously saying that Facebook et-al don't have large training sets for speech transcription? Why are they still labelling this? Why do they need 100's of contractors?

I can see that a couple of folks might be engaged in carefully reviewing low confidence transcription events, but 100's?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: