reply
The important question here is "What happens to the audio data streamed to the cloud, a few seconds AFTER the query has been identified (or not identified) and the results sent back to the Alexa device?"
The nieve assumption is "all audio data is deleted after processing". The reality is that data is still valuable to Amazon for a variety of uses such as
(1) further training their voice recognition software
(2) advertising data mining [how many people are in the room, things they are talking about --- note, Facebook's mobile app infamously does this]
If they just store the query text, that is 'best case' from a privacy perspective. If they just store query text and query audio, that is less than ideal, but not too bad. If they store all audio, ever recorded, for an indefinite period of time... that is what this police request could reveal: Audio data stored for a non-keyword-trigger, and for days/weeks after the fact.
That doesn't solve a murder.
I wonder if they've received a lot of these requests and/or if they have an emergency team on standby.
There is no data for the police to have, because beyond requests, there is no data.
Unless someone knows more about this than Amazon is telling us?
Storing a year's worth of 96kbps audio costs 380GB. If you don't record silence and you assume the people around an Alexa are only speaking for at most 4 hours a day on average, that goes down to 76GB a year.
So if you then assume 5m Alexa's are active at any given point in time that works out to 380k PB. Ok, that doesn't work yet.
However, if you then layer on a flagging system, where only certain users' full record is stored, or only "suspicious incidents" are stored, and you get this down to only flagging 0.1% of all data, you arrive at 380PB of storage.
Amazon Glacier costs about $88.000 a year per PB, but there's a profit margin included in that, so I'll assume it costs Amazon just $75k a year.
In conclusion, it would cost Amazon about $28.5m a year to run such a system. That's certainly within the realm of possibility and of what LE/SIGINT clients would pay; I assume the NSA would gladly pay that sum x100 for that capability. Sounds like it'd be booming business for Amazon.
It is also the case that a consumer level service like glacier presumably has more redundancy than what might be needed for best-effort storage of these recordings, where losing any fraction of them wouldn't really be a problem.
Depends, first of all storing compressed audio isn't that space-expensive, especially in some long term data storage like s3. Additionally they could only be storing the transcriptions, but not the voice behind them, which would be a lot less data.
We don't know as Amazon hasn't been very forthcoming about the privacy aspects of Alexa. I personally suspect they are keeping some voice information so they can use it to improve their NLP. I hope they are doing so in a way that is detached from accounts / IDs, but you never know.
Additionally, you can indeed delete a record of the query from the app, but who knows if the voice data or even the query itself is still stored after deletion, just not visible to us end users.
Almost definitely yes. I've never known a tech company that truly deletes anything
Sometimes deleted stuff is archived offline or in slow warehouse databases that are not live, etc.
Speaking of which I wonder what the net traffic usage of the Echo is?
Questions for anyone in the field: how much is preserved? Is there a < audio but > text form that allows for iterative testing? Maybe the output of a first-pass pheneme decoder? If so, what kind of space requirements?
Would it be possible to test this? Check the battery life of the Dot in a completely silent room vs the battery life of a Dot listening to an audiobook played on repeat. If it is actually listening and transcribing it should have a higher power consumption and thus die faster - right?
I could imagine some sort of log data being used to refute an alibi but what is implied by what is missing from the article, that it could be used as an after the fact witness, is not really feasible.
About $2.55 / year [0][1][2].
Average man uses 7,000 words/day, woman uses 20,000. Let's say average household has two people and half those words are spoken at home, 27,000/2 = 13,500.
Let's say 8 bytes per word on average,
108k/day. That's a little under 40 MB/year. Not too expensive.
I've started smart-devicing my home and a little wary about that, not from a privacy perspective, but from a hacking perspective. I can almost imagine a sci-fi/horror movie there somewhere.
