This is not a technological issue. This is a policy issue. The total size of Wik...

euyyn · on May 22, 2017

Good luck asking Wikipedia what's the weather today, what's your first meeting in the morning, and which route of your commute has less traffic at the moment.

marssaxman · on May 22, 2017

I don't ask for those things anyway, precisely because I don't want Google to have that much information about me. I'd love to have a local-only, offline map system.

fixermark · on May 22, 2017

You can pretty much get that right now by buying any one of the fine Garmin navigational products on the market, right?

euyyn · on May 22, 2017

Well, that makes it not a "Star Trek-esque voice-activated computer interaction", which is the point of it.

TeMPOraL · on May 22, 2017

Asking for weather is an API issue, it totally does not require uploading your voice.

euyyn · on May 22, 2017

It requires uploading where you are. And it requires uploading the voice for processing if you want it to distinguish "what's the weather today" from "tonight", "tomorrow", "this weekend at <place X>", or "back at home".

TeMPOraL · on May 23, 2017

In theory, voice processing could be done locally, and then you'd only have to upload coordinates and/or place name, just as every non-voice weather app.

pacala · on May 23, 2017

You could easily anonymize the location up to, for example, a 50x50 miles rectangle. Or you could state in your EULA "we don't keep the location your device sends to our servers for more than 30 minutes."

> It is difficult to get a man to understand something, when his salary depends on his not understanding it.

fixermark · on May 23, 2017

Neither of those changes would leave the data in much of a state to improve the product. For example, the fine-grained location data is fed back into flow-analysis to determine if there's a traffic jam on roads that aren't monitored by other systems (like highway traffic counters).

https://www.ncta.com/platform/broadband-internet/how-google-...

This sort of data cross-pollination is done all over the place in the Google software ecosystem. It's not that it's easier to collect the data than not; it's that Google software literally wouldn't be as effective without the massive amounts of data it has access to analyze across use cases.

fixermark · on May 22, 2017

It is a technological issue. Without the real-world audio samples to use as training data, the system cannot be improved enough to work reliably.

TeMPOraL · on May 22, 2017

I built an off-line voice recognition system 10 years ago using Microsoft Speech API and I guarantee you it worked quite well. There's nothing in base voice recognition that would require uploading everything to the cloud. You only have to be willing to read out some text for few minutes, once, to train the model.

euyyn · on May 22, 2017

> You only have to be willing to read out some text for few minutes, once, to train the model.

I guess that's what kills it as practical in the real world.

TeMPOraL · on May 23, 2017

Laziness is competitive advantage. There are costs to it though - in this case, we pay for it with our data. Small bits of our souls, if you will.

fixermark · on May 23, 2017

I don't expect these technologies to appeal to people who think of the data their behavior generates as "small bits of their souls."

For everyone else though, there are some neat families of features in all the major tech stacks.

pacala · on May 22, 2017

"We need real-world audio samples" does not entail "let's collect audio [and behavioral] data from one billion users". There are less intrusive ways to collect data. How about paying specific individuals for their data. For example, there are about 5000 Nielsen families, http://entertainment.howstuffworks.com/question433.htm.

For a company that supposedly is employing the best minds in statistics, the inability to effectively use statistical samples is, uhm, intriguing.

fixermark · on May 22, 2017

With the inception of digital cable boxes, many cable providers no longer need the Nielsen data because the cable box tells the provider what channel it's tuned to (a technical necessity if the box is using switched digital video). It's much easier (and safer for the end user) to passively collect usage data via the application you provide than to trust some third-party to (a) collect a statistical sample unbiasedly and (b) collect the sample securely while protecting user privacy.

Even if Google is using statistical sampling, it's still something they'd want to collect directly through the app, not via a third-party.

As for paying individuals for their data, Google does that too. https://play.google.com/store/apps/details?id=com.google.and.... But if your argument is "They should be paying everyone they collect passive samples from during use of their products," that's an argument over price-point, not over whether the collection itself is right.

The rightness question seems to me to look a lot more like "If you can collect data from a billion users without doing any harm to the users, and that data is going to be more useful than a statistical subset of that data, why should you not collect it?"

TeMPOraL · on May 22, 2017

> The rightness question seems to me to look a lot more like "If you can collect data from a billion users without doing any harm to the users, and that data is going to be more useful than a statistical subset of that data, why should you not collect it?"

In this particular case (speech recognition), one harm done to the users is tying the product to the Internet, and thus requiring what should be closed-loop tasks to go through vendor's servers.

fixermark · on May 22, 2017

"Should be closed-loop" is an interesting assumption that I'm not convinced aligns with the reality of the technology. To what extent has speech recognition been improved by being able to feed it through a constantly-updated set of open-loop ML infrastructure?

TeMPOraL · on May 22, 2017

Probably a lot; open-ended voice recognition with no prior training is hard (though I'm not convinced it can't be made to work off-line now that the models are there). Still, a lot of devices don't need open-ended voice recognition (structured grammars greatly simplify the problem), and if you allow for users having to pre-train their devices for few minutes, off-line processing becomes easier.

My impression is that the main driver behind cloud-first voice processing is business, not technology.