Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is not a technological issue. This is a policy issue.

The total size of Wikipedia text is 12 GB. The total size of a state-sized chunk of maps.me is under a GB. The total memory of an iPhone is 32GB - 256GB. I can store the entirety of Wikipedia + the entire geographical data of my state and neighbouring states on my mobile phone and have a Star Trek-esque voice-activated computer interaction without ever uploading a single byte into the cloud. It won't have pretty pictures. I'm happy with that.



Good luck asking Wikipedia what's the weather today, what's your first meeting in the morning, and which route of your commute has less traffic at the moment.


I don't ask for those things anyway, precisely because I don't want Google to have that much information about me. I'd love to have a local-only, offline map system.


You can pretty much get that right now by buying any one of the fine Garmin navigational products on the market, right?


Well, that makes it not a "Star Trek-esque voice-activated computer interaction", which is the point of it.


Asking for weather is an API issue, it totally does not require uploading your voice.


It requires uploading where you are. And it requires uploading the voice for processing if you want it to distinguish "what's the weather today" from "tonight", "tomorrow", "this weekend at <place X>", or "back at home".


In theory, voice processing could be done locally, and then you'd only have to upload coordinates and/or place name, just as every non-voice weather app.


You could easily anonymize the location up to, for example, a 50x50 miles rectangle. Or you could state in your EULA "we don't keep the location your device sends to our servers for more than 30 minutes."

> It is difficult to get a man to understand something, when his salary depends on his not understanding it.


Neither of those changes would leave the data in much of a state to improve the product. For example, the fine-grained location data is fed back into flow-analysis to determine if there's a traffic jam on roads that aren't monitored by other systems (like highway traffic counters).

https://www.ncta.com/platform/broadband-internet/how-google-...

This sort of data cross-pollination is done all over the place in the Google software ecosystem. It's not that it's easier to collect the data than not; it's that Google software literally wouldn't be as effective without the massive amounts of data it has access to analyze across use cases.


It is a technological issue. Without the real-world audio samples to use as training data, the system cannot be improved enough to work reliably.


I built an off-line voice recognition system 10 years ago using Microsoft Speech API and I guarantee you it worked quite well. There's nothing in base voice recognition that would require uploading everything to the cloud. You only have to be willing to read out some text for few minutes, once, to train the model.


> You only have to be willing to read out some text for few minutes, once, to train the model.

I guess that's what kills it as practical in the real world.


Laziness is competitive advantage. There are costs to it though - in this case, we pay for it with our data. Small bits of our souls, if you will.


I don't expect these technologies to appeal to people who think of the data their behavior generates as "small bits of their souls."

For everyone else though, there are some neat families of features in all the major tech stacks.


"We need real-world audio samples" does not entail "let's collect audio [and behavioral] data from one billion users". There are less intrusive ways to collect data. How about paying specific individuals for their data. For example, there are about 5000 Nielsen families, http://entertainment.howstuffworks.com/question433.htm.

For a company that supposedly is employing the best minds in statistics, the inability to effectively use statistical samples is, uhm, intriguing.


With the inception of digital cable boxes, many cable providers no longer need the Nielsen data because the cable box tells the provider what channel it's tuned to (a technical necessity if the box is using switched digital video). It's much easier (and safer for the end user) to passively collect usage data via the application you provide than to trust some third-party to (a) collect a statistical sample unbiasedly and (b) collect the sample securely while protecting user privacy.

Even if Google is using statistical sampling, it's still something they'd want to collect directly through the app, not via a third-party.

As for paying individuals for their data, Google does that too. https://play.google.com/store/apps/details?id=com.google.and.... But if your argument is "They should be paying everyone they collect passive samples from during use of their products," that's an argument over price-point, not over whether the collection itself is right.

The rightness question seems to me to look a lot more like "If you can collect data from a billion users without doing any harm to the users, and that data is going to be more useful than a statistical subset of that data, why should you not collect it?"


> The rightness question seems to me to look a lot more like "If you can collect data from a billion users without doing any harm to the users, and that data is going to be more useful than a statistical subset of that data, why should you not collect it?"

In this particular case (speech recognition), one harm done to the users is tying the product to the Internet, and thus requiring what should be closed-loop tasks to go through vendor's servers.


"Should be closed-loop" is an interesting assumption that I'm not convinced aligns with the reality of the technology. To what extent has speech recognition been improved by being able to feed it through a constantly-updated set of open-loop ML infrastructure?


Probably a lot; open-ended voice recognition with no prior training is hard (though I'm not convinced it can't be made to work off-line now that the models are there). Still, a lot of devices don't need open-ended voice recognition (structured grammars greatly simplify the problem), and if you allow for users having to pre-train their devices for few minutes, off-line processing becomes easier.

My impression is that the main driver behind cloud-first voice processing is business, not technology.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: