If you want to improve your voice recognition, then you need to have a ground truth. To continuously improve, you must validate, isolate and update the corner cases.
This means listening to the failed speech to text events, ones with low confidence, and a sampling of high confidence events.
This is literally what google voice was for, it was a way of google getting lots and lots of samples of people talking using a phone's microphone.
Almost certainly google/apple/amazon will be sampling photos, emails anything that goes through a ML pipeline, humans will see.
Now, ofcourse apple have positioned them selves as a privacy first company, so this is a big no no. They should have re-asked for permission (I'm pretty sure they talked about this in the Terms and conditions, there was some noise about it when siri first came out. )
Personally, I remember Google telling me that they "may use recordings to improve the voice recognition quality" or something like that. What I did not realize is that that meant humans would be listening. Sure, any ML expert will say "Well obviously! A machine can't train itself!", and in retrospect I wonder why this was not obvious to me. But here I am, a software engineer who worked for Google for eight years, and it was not obvious to me, somehow.
These services need to state more clearly how data is being used. When you sign up, it needs to say, clearly: "Human employees of [company] may listen to your recordings for the purpose of improving voice recognition."
I think the reason we are not presented with such clear language is because the people building these products know they'll lose a lot of users. But that's no excuse to mislead people.
In some cases it can. That's what unsupervised learning is.
Isn't that unsupervised learning?
With voice recognition, that's a lot more vague and human supervision is helpful.
Agree, not a surprise, and already on the record. Here, from 2017:
Siri records your queries too, but she doesn’t catalog them or provide access to the running list of requests. You can’t listen to your history of Siri interactions in Apple’s app universe.
While Apple logs and stores Siri queries, they’re tied to a random string of numbers for each user instead of an Apple ID or email address. Apple deletes the association between those queries and those numerical codes after six months. Your Amazon and Google histories, on the other hand, stay there until you decide to delete them.
From Wired, “Apple finally reveals how long Siri keeps your data”, in 2013:
Once the voice recording is six months old, Apple "disassociates" your user number from the clip, deleting the number from the voice file. But it keeps these disassociated files for up to 18 more months for testing and product improvement purposes.
"Apple may keep anonymized Siri data for up to two years," Muller says "If a user turns Siri off, both identifiers are deleted immediately along with any associated data."
In general, customers are used to this pattern: “This call may be recorded for quality assurance purposes” clearly means someone will be listening to it later. The same “product improvement purposes” language for voice assist would naturally mean the same thing.
This is true, but that doesn't automatically make it okay for them to offload this expensive resource to their customers without making it transparent that they were doing so.
>This is literally what google voice was for
Who cares? It's completely expected from Google, they use their customer data to do everything. I expect more from Apple. They have a (so far) great track record of protecting their customer's privacy. I fully expect Google, Facebook, et al to be doing ML on my photos, and that's why I'm bought into the Apple ecosystem instead of Google's. I value that my data is private more than being bombarded with stupid "collections" that the Google Assistant puts together with AI.
I really hope this doesn't indicate a new direction for Apple.
I expect them to improve their voice recognition which is way behind essentially everyone else's, especially Google's. There's nothing wrong with improving the tech in this manner as long as user's aren't surprised by it.
The fact that people are surprised by it is a problem though.
Given the way this is being debated, it's almost certainly undesired by customers and even potentially problematic under the GDPR.
but, given their recent drive, they should have asked for permission
They haven't done anything with your data, though, that doesn't show they 100% value your privacy. None of these recordings are identifiable and they're scrapped once they're audited. If you turn off Siri functionality, they're all automatically deleted whether they've been audited or not.
This is all addressed on the Siri screen right before you train Siri to recognize "Hey Siri".
Had thay said it before hand, I am pretty sure 99.9% of Apple's user would trust Apple and just click yes.
The problem is they didn't ask, nor did they tell. What privacy entails, in Steve Jobs's own words
Privacy means people know what they’re signing up for, in plain English, and repeatedly. I’m an optimist; I believe people are smart, and some people want to share more data than other people do. Ask them. Ask them every time.
Make them tell you to stop asking them if they get tired of your asking them. Let them know precisely what you’re going to do with their data.
They did and they do.
Edit: In fact, I went to that link they provided in the billboard, and then followed it to here.
Nothing under the Siri and Dictation section says anything about letting contractors listen to your audio. Please show me where it says on these pages in plain English that third parties would be listening to your siri audio?
Now the goal posts have moved a little. The reason this is in the news is because a third-party contractor did this. This is not Apple's policy and I expect that the contractor has voided their contract with Apple.
You're not going to be able to go in and hand-fix the model so that it doesn't cause these errors. This is one of the biggest ongoing issues with convolutional neural networks - you cannot debug them to determine why something went wrong.
I suspect that when you are dictating, and you correct a blue squiggle, it raises a flag and sends that back as a failed event.
It also makes me think about the predictive text, does that call back home as well?
The difference is that predicting the next word in a text message can be validated without direct human intervention. It takes a human to tell whether Siri interpreted the voice command correctly. It just takes more data to evaluate next-word predictive systems.
I went to that link they provided in the billboard, and then followed it to here.
Nothing under the Siri and Dictation section says anything about letting contractors listen to your audio.
The kicker here isn't that Apple was listening to these recordings. The issue is they advertised something else, and then on top of that, Apple sent recordings to 3rd party contractors.
I don't think this is a "no no" based on Apple's privacy stance. The terms do say that they can send the recording in order to perform additional analysis and there's no user data, user ID info, or anything else that would give up a user's privacy included in what's sent over. Unless someone managed to recognize a specific voice in a recording, there's literally no way to tie a specific user to that recording because Apple anonymizes all of it on-device before it gets sent out.
Plus, the latter actually warns you about it every time you are recorded, not somewhere beforehand buried in a TOS. And the experience of "just talking to a computer" (not a person) makes people likely to share all sorts of private stuff.
Users likely didn't know about it because, like most people, they didn't read the full TOS. Apple would've been smart to include a conspicuous statement about employees/contractors listening to recordings (if only to preempt this kind of PR debacle)... in addition to the legalese.
If someone presses the home button to actively activate Siri they are purposefully sending messages to Apple and shouldn’t be surprised that the data is being processed and maybe even listened to by humans.
However, I could see someone being slightly more concerned about accidental activations causing data to be sent to Apple.
It would also be easy for Apple not to allow humans to listen to recordings triggered by Hey Siri.
That's what GOOG411 was for as well.
Of course it makes sense to do this if you want to improve your training set.
I agree that it shouldn't be surprising, but what does surprise me is that this isn't a massive legal problem for Apple at the criminal level. Every time someone uses Siri in a one-party consent state, isn't that a violation?
Seems like everything could have been fixed with a EULA clause saying that Siri queries and all other communications with Apple are "recorded for quality purposes."
The fact that we aren't talking criminal consequences suggests that there is, in fact, such a clause in there somewhere, and that this entire story has been grossly sensationalized.
The key sentence is toward the end:
> By using Siri or Dictation, you agree and consent to Apple’s and its subsidiaries’ and agents’ transmission, collection, maintenance, processing, and use of this information, including your voice input and User Data, to provide and improve Siri, Dictation, and other Apple products and services.
So both the data collection and the fact that contractors may be involved are mentioned in the EULA (and have been mentioned for many years, going back to at least iOS 5: http://images.apple.com/legal/sla/docs/ios51.pdf).
Whether people feel they've been adequately briefed on this is a separate question.
Just because you want to improve your product, that doesn't mean you can do anything you want. No one should have to assume that the products they're buying are Trojan horses that are secretly spying on them.
Now sure they could say every individual think they do with this data. But, Apple makes it clear they collect and use this information to improve Siri. What else do people think this means?
That you link an investigative Wired article shows that most people aren't going to know this information. Siri users aren't all 2013 Wired readers, nor are they waiting around to hear what an Apple spokesperson is going to clarify.
Frankly, Apple should at least prompt people even just to be the good guy when it comes to privacy fallout on all these services.
Granted, that might not seem like enough, but it was clear enough IMO.
It would drive the point home if Siri just said, every N queries, "anything you say to me can and will be sent to Apple's servers for analysis purposes".
As to regular pop ups or messages, that’s simply a worse user experience for zero benefit.
There is a reason I don't have an echo/home/$other. There is a reason I don't use cloud stuff for my personal life(where practical) I don't have web enabled CCTV, I don't smart locks, I generally don't have stuff with cameras in them.
There is evidence for suggesting that the anonymisation apple uses is quite good. Unlike with alexa, siri recording I don't think have been used in court cases. This means that apple are fairly sure that they can't recover people's recording, on pain of contempt of court.
Why is that important? because it means that its most likely GDPR compliant. If you are an EU citizen, I fully recommend you doing a subject access request, to see what data they hold on you. You are within your rights to have that data deleted, or amended. Or if you so wish, none of your details to be ever processed again.
Now, sadly for the USA, your protections are frankly shite. I suspect you'll have to lobby your local politicians.
I use Android phones, and there are many times I wish there was a human listening, usually when I am cursing at the OK Google lady for being so stupid. (yesterday: "no you idiot, I'm not going to tap one of the options for gas stations, isn't it f*ing obvious I'm driving a car??!!!")
Google does the exact same thing to train the “OK Google” functionality. Apple wasn’t mechanical turk-ing Siri, this is about humans tagging prerecorded data for ML training datasets.
That's actually what I am hoping for -- I'm not trying to make a minimum wage employee miserable, I would like the actual software to get smarter. (and maybe make the AI Google Lady a bit unhappy, if that is possible)
But I guess you are right, in that if they have the transcripts, they don't need to actually listen to recordings. Then again they might need to listen to recordings if they are trying to gauge user frustration.
You probably speak English with a common American, Canadian, or British accent. Quality for common-accent English is fairly high, but is much lower for people with thick accents or who are speaking languages with smaller available training datasets. These sorts of initiatives are mostly about bringing quality up for those demographics.
I don't know if anyone is actually listening, or if it's just being thrown into a huge dataset somewhere, but it's something.
Recently GHome broke reminders for me. I'm still upset about it.
I went there for a meeting but was not allowed on the floor because it was not pre-approved by security.
If the Apple centers where Siri errors are reviewed is like that, it would be hard to sneak recordings out.
At some level, you have to accept that things like Siri operate just like talking to human customer service representatives, who these days always begin conversations with a rote "This call may be recorded for quality and customer service purposes."
What it comes down to is whether you trust the company to be unreasonably diligent in controlling what happens to these recordings. If there is a criticism to be levelled here, I think it is around the question of how well Apple and everyone else in this space allow consumers to make informed choices.
Fine print in a voluminous user agreement is not nearly the same standard of disclosure as "Your words may be recorded for quality and customer service purposes" every time you say "Hey Siri." The latter may be impractical, of course, but perhaps there is a middle ground. It's not a dichotomy.
Then why isn't it mandatory for Siri to start every conversation like that too?
They took this very seriously.
Now when it comes to a deliberately bad actor, well, nothing is 100% perfect, but there were many other security things going on that I am not going to describe here, plus I know for a fact that there were security measures they did not disclose to me.
But let's face it: Somebody, somewhere, can train themselves to memorize a screen full of information. They could memorize something, go for a smoke break, and upload what they memorized. Lather, rinse, repeat.
The point I made, and am still making, is that some companies care enough to do everything reasonably possible to keep customer data secure, while other companies do not. The company I described here cares. I believe Apple cares too.
I suspect it will always be possible for someone to pull a small data heist, but extraordinarily difficult to set up a regular pipeline to exfiltrate data. The weak point is probably the digital systems. Most attackers would want everything, and the way to get everything is with a vulnerability.
I'd also be concerned with leaking of mass amounts of records at once due to a security lapse on servers where it's a bit easier for grossly inadequate protections to lurk unnoticed-
I think I've read that Google doesn't even store the original recordings, but distorts them randomly to make them unrecognizable but still intelligible by their models. That seems like a pretty reasonable way to protect people, provided they're informed and there's no link back to the original account.
I guess the voice scrambling thing must have been specific to their human-supervised learning program. I expect the unaltered voice data is treated with some special care on Google's backend... but I'm still creeped out.
We live in an age where everyone wanders around with a device for broadcasting video to the world in their pocket.
So it's not about how long until it happens. It's already happening.
If you would like to really take on Siri and Alexa, I would recommend using beefier hardware.
Mycroft is hooked into some sort of mozilla cloud thing (deepspeech) I gather.
So plan is to roll a baseline and then see if I can local host a deep speech server
>I would recommend using beefier hardware
Plan is to ultimately deploy a gpu accelerated tensorflow integration so beefiness of hardware has been considered.
4 cause true gigabit and usb 3
One aspect I've worked out is how to make a balanced contact mic using two piezo discs stuck back to back with a rigid layer in between acting as the pickup. Trick is that as the two discs are back to back, they effectively produce a balanced signal and that alone helps eliminate so much of the static/noise that contact mics due to impedance are susceptible too.
Mic array...pretty sure the tech I ordered (respeaker 6 mic) is towards the top end of DIY mic arrays so hopeful that I'll get good real life results.
Also very keen to work out whether I can use tensorflow to filter out baseline background noise BEFORE feeding the audio stream into mycroft.
Planning on launching a new blog to document my experiments and failures on this ( https://kaizen.today - not yet live cause still waiting for said tech...but SEO links and all that)
https://www.matrix.one/products/voice Just read about that today, no idea if better than the respeaker option - software/support/features..etc, but may be worth having a look in too.
To my untrained eye the specs seem superior (higher SNR & sensitivity).
Respeaker appealed to me due to it being fairly mainstream. I've learned not to stray too far from the pack on SBC things. Community support & info falls off a cliff
Thanks for the helpful link
But only a tiny fraction of the billion users are tech savvy enough to realize this. Most people are kept in the dark. And we are trusting Apple to be benevolent to not do evil with these recordings.
We, the ones who truly understand both sides if this, must stand in favor of the ones who don't.
Had they just not doubled down and came out with this statement when the other companies were getting grilled for listening I think I’d have a better opinion of them over all this.
We’re they grading Siri’s quality responding? Translating the voice to text to help Siri get smarter? Both? Something else?
It is just cheaper for them to do this.
On your iOS device, visit the link above, download the “Prevent server-side logging of Siri commands.mobileconfig”
Switch to the Raw view, tap Allow, then download the profile.
Complete the profile installation in Settings by reviewing it and tapping Install.
Whenever you call a call center they usually says this call may be recorded.
Hidden wire tapping of conversations which some might legal wrangle to it is not legal in many countries. Aldo transfer of personal records to third parties is not legal.
If someone speaks “I like political party x” and that is recorded that might be a registration of political opinion.
Yeah, we have stuff like GDPR to mitigate the damage, fine people when bad stuff happens.
But ultimately it's _gone_. What actually happens to it is almost immaterial because most of the time, it won't be publicised, and you won't even know. But it's still there, on someone else's server.
Or we're already at that point, and the general populace has chosen perceived convenience above all else?
But I expect the outrage will be very similar to the outrage we see now towards industrialism:
Yes, the costs are devastating, and a small minority perceive that devastation acutely (including me). But the vast majority of people think: oh well, would you rather be starving on a farm in the dustbowl?
So I don’t expect any kind of major political unrest regarding privacy ever. Privacy Within Society was a blip in history. After that it was either/or and privacy was a thing you had to construct through separation.
I only see a problem with it all controlled by few corporations and all the proprietary algorithms underneath closed up..
They know all about us and we almost nothing about them. .. or in a dystopian future, even "they" then do not know how all the proprietary libaries, training data and algorithms etc. of all the smart homes and self driving cars and autonomous killer robots anymore, and then some conflicting virus gets out of controll and ... Booom.