I criticize Google and Microsoft a lot, and I will continue to do so when necessary, but I applaud their efforts toward this goal, especially since there's so little financial incentive to do so. Even if this product is as limited in practice as I suspect, it's heartening to see such crucial technology mature.
Isn't that prohibited and eavesdropping onto unsuspecting citizens?
I mean it's not just temporarily processed and immediately deleted afterwards. That's Google trying to get WiFi SSIDs, (Satellite, Street, Indoor, Face, Object) Pictures, Voices, real-time Geo-Location of consumers and whatnot.. it's getting off the hand..
I was shocked when I first discovered that my "ok, Google" conversations were stored in the cloud indefinitely with a badly designed and hidden Web-UI that allows "deletion" (whatever that legally means with Google-speak).
isn't cloud connectivity almost a necessity for things to work as well as they do?
Particularly, Google's game is machine learning... it cannot do it if it doesn't have the data that it does (e.g. Google Now won't be able to connect the dots between my buying a hotel room and my buying plane tickets... conveniently having location directions to hotel just when I get done at the airport, etc. etc.).
Consider that languages are an always-evolving entity. I'm barely 30 and I have seen idioms and sayings fall and rise within the last decade. Certainly it is true that the best translation of some 20-word sentence would be different if it is done 10 or maybe even 5 years apart.
And of course it is a matter of fact that we all talk about the same things... e.g. some shooting occurred at some place recently, that means people will be talking about that thing and looking it up. It means we might say a name of a place or a hotel that we haven't before said before... the translating service will be better off if it has some awareness of it and then has a bias for voice transcription with cultural awareness, it'll work better that way. For Google to do its best job when it comes to translation, it needs to be connected to all of us, it has to be with us in the Cloud.
Apple does a lot of on-device work, and anonymises what does need to talk to their backend for (eg map directions).
The problem is, google can’t adopt that pattern even if they want to (which they don’t) because their whole push for years has been about very thin (software wise) clients talking to a largely web based set of google services.
The more important question is whether a third party can use that data to cause you harm, and if that's a concern, I'm afraid the only real solution is to limit your ability to be included in that data.
No, of course not. Ignoring for a moment that Translate works perfectly fine offline already this would only ever be transmitting anything after you hit the button. It's not passively transmitting everything it hears to the cloud. Even if you ignore the privacy implications of that it'd drain the battery crazy fast.
"And this year’s Pixel will take advantage of the phone’s always-on microphones to listen for music (not just the phrase “OK Google”) and display what you’re listening to on the screen, even if it’s something on the radio."
Pretty sure it can't do that without at least transmitting audio fingerprints to the cloud passively...
Idea: a device that sniffs wifi and bluetooth MAC addresses and warns you when there's a Google Pixel2 in earshot...
(Note: That's the Pixel2, not the earbuds, but for the "magic" to work, the earbud wearers will have one of those in their pockets...)
Using a not particularly efficient but reliable fingerprinting algorithm such as http://www.ismir2002.ismir.net/proceedings/02-FP04-2.pdf, you need 2.6kbits/sec for the fingerprint. Popular songs tend to be short, so lets say 3:30 per song on average. That works out to about 70K per song, or 70MB per 1000 songs. But compression and/or other sorts of encoding cleverness could massively reduce this number. Bottom line is that it wouldn't be hard to store thousands of fingerprints given a few hundred megabytes of storage (out of 64GB or more).
I couldn't find anything which said how big their db is, but you could do some artist/song popularity smarts to figure out the tracks a user is most likely to listen to.
But to constantly be searching through fingerprints for every sound? I'd be interested to see what Googles solution to this may be.
No way you can fit their fingerprints and the metadata into 64GB, compressed or not.
Shazam has 40M fingerprints in their DB, and they definitely don't have everything. Their software runs on beefy servers.
That won't satisfy the jazz nerd's needs, but they probably know their entire catalog by heart anyway.
This is just passive recognition which can have much less coverage since you aren't relying on it.
I remember this stood out to me, because that was a large concern of mine when I first heard about the feature. I respected that they went out of their way to make this point.
She said that it is based on machine learning, and something like a database of "audio samples". it uses the samples to figure it out, I guess.
Use a recording app start recording the mic on your phone, put it in your pocket, and then talk at normal conversation volumes. It's not gonna pick up much, to say nothing of a person on the other side of the room.
I would assume the privacy expectations of public/private spaces used for photography also apply here. If you happen to be in the frame when I take a photo, am I prohibited from uploading the image to Facebook or Google Photos?
In Germany there's a law giving you the copyright on your own photo, it's difficult in the USA and less difficult in other countries
Voice recordings are practically illegal in the public without consent in Germany and also in the USA .
Name one current conflict in which language and the inability to communicate is a barrier.
While the sentiment is nice and fluffy it’s also without any substance.
This is a toy for tourists and expats not a tool for world peace.
For the few cases when this isn’t the case those earbuds won’t save you from beheading.
We see it from new-Nazis as well as antifa --they actually hate people with mainstream views and, for the most part, we're the same culture, writ large.
>If Joe Sixpack and Mohammed can communicate with each other they won't sit idly by while the fear-mongers use the other to justify terrorism/enforced liberty importation
is easily disproven by the fact that this still happens all the time among people who speak the same language. Speaking a different language doesn't help communication obviously, but speaking the same language doesn't magically remove prejudice and hatred.
What they were saying was that "if Joe Sixpack and Mohammed can communicate with each other, [more often] they won't sit idly by while the fear-mongers [...]"
The counter-example of single-language conflict ignores the larger forest of language being a key leveller of cultural differences (i.e. necessary but not sufficient), as well as gp's obviously statistical rather than absolute example.
It's pretty clear he or she wasn't saying that 'as long as two people speak the same language, there will never be conflict between them.'
It’s funny that everyone brings up Muslims while one is likely to have harder times finding English speakers in many parts of Europe than in many parts of the Middle East.
But somehow you don’t dehumanize French or Italian ruralists?
It'd be easier to list the wars in which language wasn't a barrier. Excluding the category of civil wars, there aren't many wars where opposing sides largely speak a common language.
Well, that's a pretty big one, wouldn't you say? Some of the biggest, hairiest and most bloody conflicts have been civil.
The Syrian government and the rebels speaks the same language, and so do most of IS. The Hutus and Tutsis in Rwanda speaks the same languages. North and South Korea speaks the same language. Croatia/Serbia/Bosnia speak (almost) the same language. The jews in Nazi Germany spoke german.
If you're trying to get to a theory that conflicts are primarily or even significantly fueled by an inability to communicate, you can't just hand-wave civil wars away.
In-group/out-group conflicts is a powerful force and it cuts a lot deeper than language.
Oh, ferchrissakes, you're really scraping the bottom of the barrel now. The OP did a 'name one' gambit, and I pointed out that there were plenty.
Now, I'm not sure if you know this, but in WWII, the Germans fought more than just their own German-speaking Jewish population. They fought the French (don't speak German), the British (don't speak German), the Americans (don't speak German), the Polish (don't speak German), the Belgians (don't speak German), the Soviets (a collection of people, none of whom speak German), the Danish (don't speak German), the Norwegians (don't speak German), the Canadians (don't speak German), the Australians (don't speak German), the Dutch (DO speak German... but their assisting colonies didn't), the Indians (don't speak German), the Kiwis (don't speak German), the South Africans (don't speak German), the Greeks (don't speak German), the Chinese a little bit (don't speak German), the Rhodesians (don't speak German), some Yugoslavians (don't speak German), the Kenyans (don't speak German), an assortment of smaller, lesser-involved nations such as Costa Rica (also don't speak German) and, towards the end of the war, Italy... which doesn't speak German.
Not to mention that on the eve of the war, there were only about 200,000 jews left in Germany - most emigrated in the half-decade beforehand to distant places like the US. The 6M jews that were killed in the Holocaust were drawn from all over Europe - about half of them from Poland. It wasn't 6M German-speaking jews that died in the camps.
WWII claimed the lives of over 60M people, and 200k people is a rounding error on this number. It's drawing a ridiculously long bow to claim it as a civil war because a few of the victims lived in-country.
> you can't just hand-wave civil wars away.
Again, I wasn't. I was responding to the OP's 'name one'. It wasn't even a 'name one where it's a cause', just 'name one where it's a barrier'.
War is a blip.
This device isn't going to translate mosquito (for now, I'm sure at some stage it will, the possibilities are exciting for apps that analyze sounds around you)
The issues are around trade and such.
This is a very rosy sentiment but it's unrealistic not only because of the existing cultural differences and bias, but also because of the premise of the multiplayer aspect, play DOTA, BF or HALO without muting other users and tell me how it goes.
I'm pretty sure most middle easterners including Israelis don't need Google's magical earbuds to understand the meaning of "ya ibn sharmuta"...
Google Translate still fails at very basic grammatical constructs, despite all the deep learning machinery they recently upgraded to (one of my favorite examples being the extremely simplistic "my cousin and her wife", which a 5 year old would properly parse, and yet gets erroneously translated to e.g. "mon cousin et sa femme" in French, when it should be "ma cousine et sa femme" (inferred from the her)). For corpuses in languages that are closely related (e.g. Swedish and English which they showed on stage), you generally get fairly usable results, which are acceptable for getting the general meaning of a news article but would get frustrating quickly for a real life conversation. For languages that are not closely related, the tech is just plain unusable.
People who need to work with multiple languages professionally will never use this - too many nuances would be lost, and too many mistakes would be made. The UN is not getting rid of its army of translators any time soon.
On the other end of the spectrum, people who punctually need to use a foreign language (e.g. because they are visiting a country for a few weeks) don't have that much of a pain point. When my French parents visit foreign countries, they communicate with the locals in broken English and by pointing and gesturing. Sure, there might be a bit of friction there, but there's nothing that this approach doesn't get them that this product would. Not to mention that speaking to someone while wearing earbuds looks plain rude, and that you can only really have a 2 way conversation if they also are wearing these earbuds, which 99.9% of the world won't.
It certainly makes for a very cool demo, but as a product it doesn't really have much of a place.
There absolutely is. I'm fluent in English, and in a visit to Japan a couple months ago I found myself in a restaurant with a waiter that didn't speak English, and an issue with my credit card that she couldn't explain to me. After several attempts at communicating in other ways, I just opened Google Translate on the phone and we were able to have a spoken conversation much like the one you saw in the presentation.
If you need real translations done and use a translation company there is usually a way to provide context to their human translators.
I watched the earbud demo though, it seems very impressive.
The point I am making that you are referring to is that while you may be able to understand a piece of text in a foreign language translated visually by Google Translate (because your eyes can jump forward/backwards in the text, reread a confusing sentence, you can pause for a second and try to make sense of it, etc.), if the exact same text were translated in audio form, as part of a live, 1-1 conversation, you wouldn't be able to make sense of a lot of it because you can't do all the things I just listed which help deal with imperfect translations.
It's the same thing as if I read you a very complicated scientific paper you're not familiar with out loud, versus you reading it yourself. You will understand more in the latter situation.
(imagine if we had this back and forth in 2 different languages using Google earbuds, instead of over a textual medium :-)
(These considerations don't address the fact that there are most likely many other people who do speak Spanish and are willing to volunteer, and assistance organizations would probably prefer those to someone with Google's latest gadget)
Google Translate does have a feature where you can take a photo, OCR the text and translate it - https://support.google.com/translate/answer/6142483
The translation overlay on camera feed feature of Google Translate is great, and it does not suffer from most of the problems I outlined in my original post. It is for a very different kind of use case though (it's more about solving the problem of "what does this menu/street sign say?", rather than trying to offer full conversation live translation)
Yes, clearly actually knowing the language would be better.
Commenting online isn't the same as hearing their voice and standing in front of them. That makes a huge different in how some responds and stops a lot of hate and trolling.
In the past, the people who needed to understand the difference between private and public speech were those who spoke for a living: orators, salesmen, politicians. Yet the power granted by the internet has made this knowledge crucial for everyone, but nowhere is it really taught.
And even though this dynamic is enabled by the internet, the internet is not limited to it. One-on-one communication is quite possible. Video chat becomes more accessible every year. Even sites like ChatRoulette and Omegle were once great ways to have actual, honest conversations with strangers (still are, if you can wade through the sea of literal wankers). This conversation has given me a lot of food for thought of what it would look like to make one-on-one video chat a core feature of public forums like Reddit.
as for video chat - they also diminish the emotional experience(trust), because of lack of eye-contact. but once you offer full eye contact and body language, for ex. via telepresence, this improves dramatically, probably very close to real-life conversation.
So how do we create such a system fit for consumers ? and since text, etc is more convinient and addictive, can we design something that consumers will love to use ?
I mean, Google is selling this product - they're not giving them out for free.
These giant tech companies definitely a lot of shady and ambiguous stuff, but I still think we should commend them when they put resources in these other techs that aren't necessarily money makers, but provide a net plus for humanity. It's clear that these projects are being funded by money made in other parts of Google, such as their ad business.
It's similar to how BuzzFeed is mostly a trashy website, but once in a while they will come up with well researched thorough articles. It's clear that their shitty front page is a fund for them to achieve greater things.
were the mooncakes good last night? -> google translate -> 月饼好吗昨晚好吗？
月饼好吗昨晚好吗？ -> google translate -> Would you like me last night?
I ended up forming my own question, which was slightly wrong - my colleague asked 你是要说好吃吗? ("did you mean to say 'tasty'?") which I threw into el goog for a laugh, and got in return: "Are you going to be delicious?"
So either google translate is a bit iffy (still amazing, but iffy) or mooncakes are some sort of aphrodisiac and I'm not understanding that :)
> It's hard to see the financial incentive to provide unlimited translations to anyone in the world for free.
More communication = more trade, more market penetration = more money floating around. It's not a direct incentive, but it's there.
See the last sentence:
¹The Google Assistant on Google Pixel Buds is only available on Android and requires an Assistant-enabled Android device and data connection. Data rates may apply. For available Assistant languages and minimum requirements go to g.co/pixelbuds/help. Requires a Google Account for full access to features. Google Translate on Google Pixel Buds is only available on Pixel.
We thought about listening for AVRCP key events to detect when a certain button was pressed and released -- probably play/pause, which seems to be the most prominent button on most headsets. It would have been hacky, though, and we ran into several problems. For one thing, a lot of headsets power off if the play/pause button is held down for several seconds.
We also had concerns about audio quality with third party headsets, especially those which didn't support modern versions of SCO (which introduced new codecs with 16khz support and other improvements), or with poor antennas leading to high packet loss (SCO is a lossy protocol, so we still get some speech and attempt to translate it, but accuracy suffers). We were concerned that all accuracy problems would make Google Translate look bad, even if the headset was to blame.
What about an app that you interact with rather than something physical on the headphones? Considering the phone is required, anyway...
The problem that the Pixel Buds integration aims to solve is that it gets cumbersome for two people to hand a phone back and forth. With the new UX, one person can use the phone while the other uses Pixel Buds for the duration of the conversation.
"For phone to access the Google Assistant:
Android 6.0 Marshmallow or higher and other requirements"
You can have friends in other countries and still happily go to war against those barbarians/heretics/whatever-your-justification-is, and keep telling yourself that you are doing it just as much for them as for yourself, even if they might currently disagree.
Very recently I went to a resort in the Dominican Republic, where the main language is Spanish. Most waiting staff spoke good English, so everything was fine, until one of the people refilling the mini bar came in with a present and started ranting in Spanish. My first instinct was to grab google translate, do speech to text and then translate. Together, speech to text and translate were quite useless, probably because of the speed at which he spoke and possibly the accent.
I completely agree that this is a nice step. I doubt, however, that it will be anything more than a cool gimmick (in most cases), because of my recent experience.
It’s a good start but typical Google: try to release a hardware product with half-baked tech.
1) English is spoken in lots of places (don't forget songs and movies are great propaganda).
2) The USD is recognized in a lot of place
In fact, you can try it right now. Open the Google Translate app on android, put on a pair of headphones with a mic, and tap the audio translation. As you hear spoken word in another language, it'll translate in realtime back to your non-google headphones.
*note - The trick is you need to press the button so it enables translation across both languages, as that seems to be the only way to keep translation on even after the first phrase.
That's to say, I don't believe the Pixel Buds are necessary at all for real time translation. But I think they're presented with the real time translation to give people some positive and cool imagery to associate with the earbuds. It's advertising as far as I can tell.
(Other than their choosing it to be so, of course. ;)
e.g, "Don't eat that!"
"We are friendly, don't shoot"
"There is food in the kitchen"
"Where is the nearest bathroom?"...That will all work relatively well.
Punishing others with Vogon poetry in their own native tongue... never gonna happen.
The other factor is that complex use of natural languages depend on and require ambiguity and layered meaning and hosts of other factors that are really hard to handle.
A fairly trivial example is a pun. Translating highly idiomatic things of this nature turns out to be extraordinarily hard, and just throwing more data at a DNN is not going to get you too far down that road.
You can absolutely make a "direct" word-for-word translation of a pun in english to, say, russian. It's just not a pun any more when you're done. Often there are no "pun equivalents" with totally different words, because usually they hinge on culturally specific references that also don't translate well.
Basically none of this matters when what you're interested in is subway directions or ordering food or whatever, but it becomes intractable really fast whenever you're interested in talking about something more meaningful.
Translate words and it's gibberish. Pairs of words and you start to get slang. Triples and you can distinguish word that are different parts of speech in different contexts. Quads and grammar is mostly in the bag. 5-grams and most puns are handled. 6-grams and you've taken care of all simple sentences. Etc.
No need for semantics when n-gram counts does just as well.
With enough people talking, we'll eventually have taught Google all the translations for all possible sentences. (joking, but only halfway)
you can't do this anyway.
>No need for semantics when n-gram counts does just as well.
They don't. neither do bag of words, word2vec, or whatever.
Simple imperative language? Absolutely, this all works pretty well.
Anything else? Ha.
Google's branding department, as usual, leaves much to be desired.
Would you say that Apple's marketing department leaves much to be desired?
The problem is that the processor probably isn't powerful enough and the wireless won't connect to the internet.
That’s like saying AirPods make new music to your taste because you can get an app on your phone that does that.
It’s sketchy marketing.
Neat idea, but sketchy sounding marketing.
I'm not sure about the Google Assistant integration's Internet requirements though, however, given the impact of reducing lag, I would suppose they would want to be able to predownload as much as possible.
At least last time i tried, from greece to english.
I didn't want to trash it, it looks like an amazing concept. It would be interesting to see it in action.
I believe European rules go in effect this year to make roaming affordable as well.
To say nothing of the availability of prepaid SIMs in most countries.
I understand they're trying to create an air of exclusivity but it really just comes off kind of greedy and dickish when in reality, any earbuds with a microphone should be capable of doing the same thing through Google Translate.
Exclusivity means they have a limited hardware profile and control the quality out of the gate. Rolling it for use on all e.g. Android6 devices is sure to be a support nightmare.
It also may expand to other (Android|iOS) devices in future.
The "killer feature" of realtime-ish translation only works with the Pixel phones.
To use as a headset, you need a Bluetooth® enabled companion device running:
Android 5.0 or higher
iOS 10.0 or higher
To use with Google Assistant you need:
An Assistant enabled Android device
Android 6.0 or higher
A Google Account
A data connection
To use with Google Translate you need:
Pixel or Pixel 2
The Google Translate app (A list of supported languages can be found here.)
Sure the technology isn't perfect, but it actually works surprisingly well (at least the translation part). I have been learning Mandarin Chinese recently, and the translations provided by Google Translate are pretty darn good (if not exact, they are close enough to understand).
We have been hitting sci-fi goals for a while now, and it is awesome!
It's unclear to me though; will these work with any phone?
As wireless ear buds these are somewhat interesting, as a translation device these are fascinating, why can't google sell more translation stuff?
> The two way functionality works with only one pair of buds - the other user holds the phone
So just keep the ear wax encrusted earbud for yourself. :-)
I've seen it used (and used it myself) in Japan and it was pretty good. Also, it was used by an old guy in a Ryokan in rural Japan. He didn't speak English so we just went to and fro speaking in his tablet, checking that the audio was correctly recognized (it converts it to text before translating) and then handling it to the other person who would read/hear the sentence in his own language.
speech_to_text --language=english | google_translate --from=english --to=swedish | text_to_speech --language=swedish
Bing Translate, even though it's been used by default on both Facebook and Twitter, will probably never give you good enough results to have a conversation with a stranger. Google Translate probably will.
I mean, if there's anyone that combine those couple of parts, it's Google and no one else.
(OTOH, it may also mess with code-switching.)
But this is super dope. Great job by the team at Google.
Can't wait until this becomes commonplace.
In states that require two-party consent in their wiretapping statutes, it probably does.
At least Apple tried to solve the problem. Google appears to be acting like Samsung here, adding gimmicks while releasing a lesser product.
(I only say it this way for the sake of argument, not as some type of Google or Apple "fan" statement.)
I personally don't deal with frequent pairing and unpairing of bluetooth headphones. I don't even own a pair (just some cheap earbuds and some pretty decent Sennheisers). As such I didn't even realize there was a problem being solved by Apple putting out another model of BT earbuds.
But I am in a relationship with someone for whom English is a second language. She's a great English speaker and I am frequently impressed by her command of the language, given how different our two native languages are. I've been learning some of the basics but when I try to talk to her family, it's essentially smiles and nods.
The Translate phone app is pretty amazing for taking turns and having it translate back and forth, but I would love to have something like these new phones the next time we visit with her family. It may be borderline gimmicky and the appeal may be narrow (compared to just an easier to use pair of wireless earbuds) but it's a lot more interesting to me in terms of existing tech combined in a new way that I'd hoped to see for a while.
I'd use it lots... and lots. I'm an immigrant, and though I do alright with the local language (Norwegian, and I'm a native English speaker), conversations are sometimes difficult. Neil Gaiman in Norwegian is a difficult read for me, but I*m past kids books. Some folks would talk to me more, since lots of people understand some english but aren't so comfortable speaking it. Google translate isn't perfect, but it continues to get better.
I can see retail employees having access to a pair at work. Restaurants. Travel. And so on.
And, oh man, the "Tech Specs" section of their site is a textbook example of how NOT to lay out text for easy parsing/readability.