
Apple employees are listening to Siri conversations - tomlagier
https://www.reddit.com/r/worldnews/comments/cbxdaw/comment/etjj59i
======
tomlagier
I posted this not to bash on Apple, but to highlight that this is not just a
Google issue - any voice recognition will be trained on labeled data, and that
labeled data will largely come from users.

This has happened for years, across anyone with a voice recognition model, and
will continue to happen until the market or regulations say otherwise.

We can, and should, debate the ethics of voice recognition models in general
and the practices around collection and handling of that data. I just want to
move the discussion beyond "Google is evil" and towards "this is a cornerstone
of building voice recognition software - how do we deal with that?"

~~~
m-p-3
> "this is a cornerstone of building voice recognition software - how do we
> deal with that?"

Maybe by using paid employers or volunteers who explicitly agree their voice
to be processed for machine learning and voice recognition improvement.

Something like Common Voice¹ by Mozilla?

¹[https://voice.mozilla.org/](https://voice.mozilla.org/)

~~~
tomlagier
That's a really cool project, props to Mozilla for spearheading it. I'm
especially a fan of the clear callout about what profile data is public.

My thoughts on why a public dataset like this doesn't seem to have been
adopted by the other BigCo's

\- $$$: having a better voice recognition model is moat. I think this is the
most obvious answer, but I think there are likely other factors as well.

\- Representation: perhaps users who contribute voice recordings to an open
project do not provide a complete model of people who use voice recognition
software, leading to tough-to-bridge gaps (especially across languages - seems
this project is English-only)

\- Size: I'm not a ML person so I don't have any idea the amount of data
needed to accurately train a good voice recognition model, but I'd guess that
2.3k hours of validated content isn't enough to really do the trick. Maybe
it's difficult to get people to volunteer for the quantity necessary to build
a good model.

\- Quality: Getting representative data might be harder when it's via explicit
donation rather than identified error cases.

\- Privacy: I actually think that having the training data available to the
public presents an interesting set of privacy challenges. It's available
through explicit consent, so that's a big benefit. However, it hugely
amplifies the possible exposure of a person's voice which means that you need
to have strict controls around the content of the recordings. You also need to
be careful about rendering any particular donator vulnerable (could you hijack
someone's voice and use it to brute-force other voice-matched assistants, for
example?)

I'm curious whether, at the end of the day, it's practical to build a high-
quality model from only explicitly donated clips.

Here's something I'm wondering about now as well. In many states in the US,
it's legal to record people without their consent in public places. Could
there be some sort of public voice data collection, sort of like street view?
That would be completely anonymous, and might give more representative data,
though I'm sure external noise would be a big challenge.

