Hacker News new | past | comments | ask | show | jobs | submit login
Google Pixel Buds are wireless earbuds that translate conversations in real-time (arstechnica.com)
325 points by Sujan on Oct 4, 2017 | hide | past | web | favorite | 230 comments

If you're a technologist looking for ways to save the world, enabling communication between people of different countries is about as important a task as you can find. The web was a good first step, but language is still a barrier. Being unable to communicate with someone dehumanizes them. Automatic translation isn't going to automatically bring about world peace (and it isn't going to automatically enable us to transcend cultural barriers), but it's an enormously important next step.

I criticize Google and Microsoft a lot, and I will continue to do so when necessary, but I applaud their efforts toward this goal, especially since there's so little financial incentive to do so. Even if this product is as limited in practice as I suspect, it's heartening to see such crucial technology mature.

Is all voice in a room with a person wearing Google Buds transmitted to the cloud??

Isn't that prohibited and eavesdropping onto unsuspecting citizens?

I mean it's not just temporarily processed and immediately deleted afterwards. That's Google trying to get WiFi SSIDs, (Satellite, Street, Indoor, Face, Object) Pictures, Voices, real-time Geo-Location of consumers and whatnot.. it's getting off the hand..

I was shocked when I first discovered that my "ok, Google" conversations were stored in the cloud indefinitely with a badly designed and hidden Web-UI that allows "deletion" (whatever that legally means with Google-speak).

I know that this is a point that is raised a lot of times, but if we think about it ---

isn't cloud connectivity almost a necessity for things to work as well as they do?

Particularly, Google's game is machine learning... it cannot do it if it doesn't have the data that it does (e.g. Google Now won't be able to connect the dots between my buying a hotel room and my buying plane tickets... conveniently having location directions to hotel just when I get done at the airport, etc. etc.).

Consider that languages are an always-evolving entity. I'm barely 30 and I have seen idioms and sayings fall and rise within the last decade. Certainly it is true that the best translation of some 20-word sentence would be different if it is done 10 or maybe even 5 years apart.

And of course it is a matter of fact that we all talk about the same things... e.g. some shooting occurred at some place recently, that means people will be talking about that thing and looking it up. It means we might say a name of a place or a hotel that we haven't before said before... the translating service will be better off if it has some awareness of it and then has a bias for voice transcription with cultural awareness, it'll work better that way. For Google to do its best job when it comes to translation, it needs to be connected to all of us, it has to be with us in the Cloud.

It’s absolutely possible to do machine learning based tasks without sending all data to an ad network that builds a profile of your entire life to serve you ads.

Apple does a lot of on-device work, and anonymises what does need to talk to their backend for (eg map directions).

The problem is, google can’t adopt that pattern even if they want to (which they don’t) because their whole push for years has been about very thin (software wise) clients talking to a largely web based set of google services.

The question is not does it need cloud, the question is, is it worth it. When you send an audio stream to Google, you are imposing your answer to that question on everybody else in the room.

Or let any third party [view a photo | talk to you on the phone | listen to a voice memo ] with everybody else in the room. We already have answers to those "worth it" questions, largely.

The more important question is whether a third party can use that data to cause you harm, and if that's a concern, I'm afraid the only real solution is to limit your ability to be included in that data.

Not necessarily… all life works without a cloud doesn’t it? You can certainly realize your example with a completely local AI. You may need the cloud for training (at the moment) but not for day to day execution!

> Is all voice in a room with a person wearing Google Buds transmitted to the cloud??

No, of course not. Ignoring for a moment that Translate works perfectly fine offline already this would only ever be transmitting anything after you hit the button. It's not passively transmitting everything it hears to the cloud. Even if you ignore the privacy implications of that it'd drain the battery crazy fast.

Are you sure?

From: https://www.theverge.com/circuitbreaker/2017/10/4/16408962/n...

"And this year’s Pixel will take advantage of the phone’s always-on microphones to listen for music (not just the phrase “OK Google”) and display what you’re listening to on the screen, even if it’s something on the radio."

Pretty sure it can't do that without at least transmitting audio fingerprints to the cloud passively...

Idea: a device that sniffs wifi and bluetooth MAC addresses and warns you when there's a Google Pixel2 in earshot...

(Note: That's the Pixel2, not the earbuds, but for the "magic" to work, the earbud wearers will have one of those in their pockets...)

The feature you linked is also handled on-device.

Really? I'd love to know how they do that? Can you _really_ cram a useable sized db of popular music fingerprints into something small enough to store locally on a phone these days?

I work for google but have no clue what underlying tech is being used here. But I have some familiarity with audio fingerprinting, so I thought I'd comment.

Using a not particularly efficient but reliable fingerprinting algorithm such as http://www.ismir2002.ismir.net/proceedings/02-FP04-2.pdf, you need 2.6kbits/sec for the fingerprint. Popular songs tend to be short, so lets say 3:30 per song on average. That works out to about 70K per song, or 70MB per 1000 songs. But compression and/or other sorts of encoding cleverness could massively reduce this number. Bottom line is that it wouldn't be hard to store thousands of fingerprints given a few hundred megabytes of storage (out of 64GB or more).

You don't need then entire song for a fingerprint, and the MusicBrainz (MetaBrainz) project has long been supported by Google.

I couldn't find anything which said how big their db is, but you could do some artist/song popularity smarts to figure out the tracks a user is most likely to listen to.

But to constantly be searching through fingerprints for every sound? I'd be interested to see what Googles solution to this may be.

There were 75K albums (not just songs) released in 2010 alone. The total number of songs is in hundreds of millions.

No way you can fit their fingerprints and the metadata into 64GB, compressed or not.

Shazam has 40M fingerprints in their DB, and they definitely don't have everything. Their software runs on beefy servers.

The claim in the presentation was "the 10k most popular songs", updated weekly.

That won't satisfy the jazz nerd's needs, but they probably know their entire catalog by heart anyway.

I mean, would that satisfy a majority of needs, even? When I use Shazam, it's usually to identify a weird instrumental or otherwise obscure song that I've never heard before.

If you do the explicit ask for what song is playing then it starts recording and sends it to the cloud just like Shazam. Presumably that recognizes far more than just 10k most popular songs.

This is just passive recognition which can have much less coverage since you aren't relying on it.

It depends on what the goal is. Is it to identify the song à la Shazam or to display lyrics, concert tickets info, ... ?


I'm a bit late, but I watched the announcement for this feature and they explicitly said that nothing was sent to that cloud to determine what song it was.

I remember this stood out to me, because that was a large concern of mine when I first heard about the feature. I respected that they went out of their way to make this point.

She said that it is based on machine learning, and something like a database of "audio samples". it uses the samples to figure it out, I guess.

> the earbud wearers will have one of those in their pockets

Use a recording app start recording the mic on your phone, put it in your pocket, and then talk at normal conversation volumes. It's not gonna pick up much, to say nothing of a person on the other side of the room.

You don't know this at all, and should not be speculating.

> Isn't that prohibited and eavesdropping onto unsuspecting citizens?

I would assume the privacy expectations of public/private spaces used for photography also apply here. If you happen to be in the frame when I take a photo, am I prohibited from uploading the image to Facebook or Google Photos?

Depends on whom you take a photo off, where and from where. It's a complicated matter juristically, but the morale is easy. Don't take a photograph of someone visible in your photo without their consent. If so, at least don't upload it anywhere public or online.

In Germany there's a law giving you the copyright on your own photo[1], it's difficult in the USA and less difficult in other countries[2]


Voice recordings are practically illegal in the public without consent in Germany[3] and also in the USA [4][5].


[1] https://de.wikipedia.org/wiki/Recht_am_eigenen_Bild_(Deutsch...

[2] https://en.wikipedia.org/wiki/Photography_and_the_law

[3] https://dejure.org/gesetze/StGB/201.html

[4] https://en.wikipedia.org/wiki/Legality_of_recording_by_civil...

[5] https://www.rcfp.org/browse-media-law-resources/digital-jour...

From [2]: "Should the subjects not attempt to conceal their private affairs, their actions immediately become public to a photographer using normal photographic equipment."

not sure photography matches audio recording bit for photography here's a good reference


Well, I think the obvious way to solve this is simply to make it obvious the translation is occuring with a third party (who may do anything with the data). I think this is still fantastic technology even if it is limited to public conversations.

People are quite able to communicate, willing is another issue.

Name one current conflict in which language and the inability to communicate is a barrier.

While the sentiment is nice and fluffy it’s also without any substance.

This is a toy for tourists and expats not a tool for world peace.

If you believe in Mark Twain's famous line "travel is fatal to prejudice, bigotry and narrow-mindedness", and if these earbuds promote travel by lowering the discomfort of communication, then you can expect that the earbuds would help reduce things like racial conflict.

There seems to be far less narrow mindedness across large distances than in local sectarian conflicts.

For the few cases when this isn’t the case those earbuds won’t save you from beheading.

Local sectarian conflicts are definitely a problem in lots of places, but a bigger problem in the US right now is that lots of folks never meet anyone outside their own culture. I'm not sure making it easier to communicate will help (lots of people don't even have passports...), but maybe it will, and I don't see how it could hurt.

You don't think the language barrier played any part in dehumanizing communists during the cold war?

No, a communication barrier did. They didn’t call it the iron curtain for nothing.

I don't think so. We treated American communists with pretty much the same contempt --not that we were not right in condemning the ideology, which as we know leads to massive deaths due to ideology -gulags, cultural revolutions, etc., to reach "purity" but still, it's evident speaking a common language does not prevent us from treating others ill.

We see it from new-Nazis as well as antifa --they actually hate people with mainstream views and, for the most part, we're the same culture, writ large.

If Joe Sixpack and Mohammed can communicate with each other they won't sit idly by while the fear-mongers use the other to justify terrorism/enforced liberty importation. At least in theory.

This would seem easily disproven by any country that has sectarian conflicts where both sides speak the same language. Or for instance, racial division in America.

I'm not sure what the argument here is. That the US would be better off if Republicans and Democrats spoke mutually unintelligible languages? By what logic does it make sense to say "group A and group B are at odds; group A and group B speak the same language; therefore we disprove that speaking the same language will ever resolve anything"?

The argument is that

>If Joe Sixpack and Mohammed can communicate with each other they won't sit idly by while the fear-mongers use the other to justify terrorism/enforced liberty importation

is easily disproven by the fact that this still happens all the time among people who speak the same language. Speaking a different language doesn't help communication obviously, but speaking the same language doesn't magically remove prejudice and hatred.

Saying that example disproves the grandparent is being pedantic.

What they were saying was that "if Joe Sixpack and Mohammed can communicate with each other, [more often] they won't sit idly by while the fear-mongers [...]"

The counter-example of single-language conflict ignores the larger forest of language being a key leveller of cultural differences (i.e. necessary but not sufficient), as well as gp's obviously statistical rather than absolute example.

It's pretty clear he or she wasn't saying that 'as long as two people speak the same language, there will never be conflict between them.'

There are more English proficient Muslims in the world than there are Americans.

And the world is a far better place for the ability of a portion of Muslims and Americans to communicate. What's your point?

That this is some wishful thinking that the “differences” between incompatible parts of cultures are just language barriers.

It’s funny that everyone brings up Muslims while one is likely to have harder times finding English speakers in many parts of Europe than in many parts of the Middle East.

But somehow you don’t dehumanize French or Italian ruralists?

> Name one current conflict in which language and the inability to communicate is a barrier.

It'd be easier to list the wars in which language wasn't a barrier. Excluding the category of civil wars, there aren't many wars where opposing sides largely speak a common language.

> Excluding the category of civil wars

Well, that's a pretty big one, wouldn't you say? Some of the biggest, hairiest and most bloody conflicts have been civil.

The Syrian government and the rebels speaks the same language, and so do most of IS. The Hutus and Tutsis in Rwanda speaks the same languages. North and South Korea speaks the same language. Croatia/Serbia/Bosnia speak (almost) the same language. The jews in Nazi Germany spoke german.

If you're trying to get to a theory that conflicts are primarily or even significantly fueled by an inability to communicate, you can't just hand-wave civil wars away.

In-group/out-group conflicts is a powerful force and it cuts a lot deeper than language.

> The jews in Nazi Germany spoke german.

Oh, ferchrissakes, you're really scraping the bottom of the barrel now. The OP did a 'name one' gambit, and I pointed out that there were plenty.

Now, I'm not sure if you know this, but in WWII, the Germans fought more than just their own German-speaking Jewish population. They fought the French (don't speak German), the British (don't speak German), the Americans (don't speak German), the Polish (don't speak German), the Belgians (don't speak German), the Soviets (a collection of people, none of whom speak German), the Danish (don't speak German), the Norwegians (don't speak German), the Canadians (don't speak German), the Australians (don't speak German), the Dutch (DO speak German... but their assisting colonies didn't), the Indians (don't speak German), the Kiwis (don't speak German), the South Africans (don't speak German), the Greeks (don't speak German), the Chinese a little bit (don't speak German), the Rhodesians (don't speak German), some Yugoslavians (don't speak German), the Kenyans (don't speak German), an assortment of smaller, lesser-involved nations such as Costa Rica (also don't speak German) and, towards the end of the war, Italy... which doesn't speak German.

Not to mention that on the eve of the war, there were only about 200,000 jews left in Germany - most emigrated in the half-decade beforehand to distant places like the US. The 6M jews that were killed in the Holocaust were drawn from all over Europe - about half of them from Poland. It wasn't 6M German-speaking jews that died in the camps.

WWII claimed the lives of over 60M people, and 200k people is a rounding error on this number. It's drawing a ridiculously long bow to claim it as a civil war because a few of the victims lived in-country.

> you can't just hand-wave civil wars away.

Again, I wasn't. I was responding to the OP's 'name one'. It wasn't even a 'name one where it's a cause', just 'name one where it's a barrier'.

Agreed. I do think the tech is 'cool', though. I was recently in a location where nobody spoke English, outside of a few words like 'hello', etc. We communicated a lot nonverbally, quite easily. We laughed together, attempted to speak each others language(which led to more laughing), did things together, ate together, etc. Sure we couldn't sit around and discuss politics or philosophy, but I found that situation more endearing oddly.

Communication enables social inclusivity which brings people together. I went to a large university and many exchange students would gravitate towards their own groups during social situations. Picture a group of individuals from $X country huddled together in a corner of a bar for a couple hours not able to talk to anyone else.

That’s not because of language. That’s because of commonality in behavior and cultural background. Translating earbuds don’t solve that.

As far as I read in no way shape or form were they talking about 'conflict' ie war which is what you seem to be implying?

War is a blip.

This device isn't going to translate mosquito (for now, I'm sure at some stage it will, the possibilities are exciting for apps that analyze sounds around you)

The issues are around trade and such.

Yeah. In Israel we have an NGO that selects the most inflammatory stuff that gets said on Arab news media about Israel and the Jews, and translates it into Hebrew and English. Honestly, it would be far better for Arab-Israeli relations if that shit was left untranslated.

Imagine if instead of that, Israeli kids all had Arab friends they chatted with online every day while playing video games. That's the sort of thing this technology could be used for.

There are plenty of Arabs living in Israel, and last time I've checked all Israeli kids study Arabic in School as a 3rd language.

This is a very rosy sentiment but it's unrealistic not only because of the existing cultural differences and bias, but also because of the premise of the multiplayer aspect, play DOTA, BF or HALO without muting other users and tell me how it goes.

I'm pretty sure most middle easterners including Israelis don't need Google's magical earbuds to understand the meaning of "ya ibn sharmuta"...

I used to play lots of Gears of War without muting the microphone and made several friends. All of them spoke English. I've learned a few other languages at school, but none of them well enough to make friends while playing a game online.

I guess you're talking about MEMRI- a pure propaganda/ PR operation for Israel, founded by former Mossad agents, that often purposefully mistranslates the quotes to put arabs in the worst possible light:


In practice, it's doubtful that this feature of the product will be that useful.

Google Translate still fails at very basic grammatical constructs, despite all the deep learning machinery they recently upgraded to (one of my favorite examples being the extremely simplistic "my cousin and her wife", which a 5 year old would properly parse, and yet gets erroneously translated to e.g. "mon cousin et sa femme" in French, when it should be "ma cousine et sa femme" (inferred from the her)). For corpuses in languages that are closely related (e.g. Swedish and English which they showed on stage), you generally get fairly usable results, which are acceptable for getting the general meaning of a news article but would get frustrating quickly for a real life conversation. For languages that are not closely related, the tech is just plain unusable.

People who need to work with multiple languages professionally will never use this - too many nuances would be lost, and too many mistakes would be made. The UN is not getting rid of its army of translators any time soon.

On the other end of the spectrum, people who punctually need to use a foreign language (e.g. because they are visiting a country for a few weeks) don't have that much of a pain point. When my French parents visit foreign countries, they communicate with the locals in broken English and by pointing and gesturing. Sure, there might be a bit of friction there, but there's nothing that this approach doesn't get them that this product would. Not to mention that speaking to someone while wearing earbuds looks plain rude, and that you can only really have a 2 way conversation if they also are wearing these earbuds, which 99.9% of the world won't.

It certainly makes for a very cool demo, but as a product it doesn't really have much of a place.

> there might be a bit of friction there, but there's nothing that this approach doesn't get them that this product would

There absolutely is. I'm fluent in English, and in a visit to Japan a couple months ago I found myself in a restaurant with a waiter that didn't speak English, and an issue with my credit card that she couldn't explain to me. After several attempts at communicating in other ways, I just opened Google Translate on the phone and we were able to have a spoken conversation much like the one you saw in the presentation.

Except Google's Japanese to English translation is horrifically inaccurate and likely to mislead you by orders of magnitude [1]

[1] https://translate.google.com/#ja/en/11%E4%B8%87500%E5%86%86

I guess I was very lucky then. But I was definitely better off than without Google Translate.

Genuinely curious then: are you interested by this product? What would it let you do that your phone doesn't let you do? Do you feel like the interaction would have been better if mediated by audio only earbuds instead of a phone with a display?

While on vacation in New Orleans I became friends with two Brazilians staying at my hostel. Only one of them is confident in their English, the other speaks it haltingly, and I speak no Portuguese. They've invited me to come spend time with them next year, which is little time for me to become a fluent speaker. Something like this looks like it might allow me a much greater chance of understanding and being understood by all of their friends.

The demo shown included using both the earbuds and the phone, so the display would still be there.

I use Facebook's translation of comments a fair bit and it enables me to usually get at least the gist of what's being said - and sometimes a clear view of it. If that level of accuracy could be applied to in-person audio it could be a lot more useful than not having it.

You can understand it, but it's usually not a good translation.

If you need real translations done and use a translation company there is usually a way to provide context to their human translators.

I watched the earbud demo though, it seems very impressive.

Sure. Translating written text and translating live conversations are two very different things though.

But your argument, which my comment concerns, was based on text translation: "Google Translate still fails at very basic grammatical constructs..."

The point is that an imperfect translation engine can be adequate for making sense of written text in an asynchronous way (if one sentence is confusing your eyes can jump back to the previous one to try to guess the connection, etc.), but be near-unusable for on the fly translation of a real world conversation. Context matters a lot.

Then your argument is actually that they won't be able to do real time audio translation to a similar level of accuracy to Google Translate. The comment I replied to did not make that argument.

No, there is a misunderstanding.

The point I am making that you are referring to is that while you may be able to understand a piece of text in a foreign language translated visually by Google Translate (because your eyes can jump forward/backwards in the text, reread a confusing sentence, you can pause for a second and try to make sense of it, etc.), if the exact same text were translated in audio form, as part of a live, 1-1 conversation, you wouldn't be able to make sense of a lot of it because you can't do all the things I just listed which help deal with imperfect translations.

It's the same thing as if I read you a very complicated scientific paper you're not familiar with out loud, versus you reading it yourself. You will understand more in the latter situation.

(imagine if we had this back and forth in 2 different languages using Google earbuds, instead of over a textual medium :-)

I don't agree. No-one knowing you're using such translation is going to talk to you in a normal conversational manner. They'll talk to you in a simplified, slowed-down fashion. And having such translation ability in such a situation would still be better than the alternatives.

As someone interested in helping with disaster scenarios, the ability to handle rudimentary Spanish is quite interesting.

If you're interested in helping with disaster scenarios in Spanish speaking countries, you would most likely be better served by a few months of duolingo (or other language learning course/app). It'd be cheaper, more useful for you in general (e.g. you would also be able to make sense of written text, which this product alone does not cover), and you wouldn't be reliant on a charged phone + charged earbuds + cellular connection, which might be a hard trio of requirements to fulfill in disaster scenarios.

(These considerations don't address the fact that there are most likely many other people who do speak Spanish and are willing to volunteer, and assistance organizations would probably prefer those to someone with Google's latest gadget)

> you would also be able to make sense of written text, which this product alone does not cover

Google Translate does have a feature where you can take a photo, OCR the text and translate it - https://support.google.com/translate/answer/6142483

"this product" = earbuds

The translation overlay on camera feed feature of Google Translate is great, and it does not suffer from most of the problems I outlined in my original post. It is for a very different kind of use case though (it's more about solving the problem of "what does this menu/street sign say?", rather than trying to offer full conversation live translation)

I don't have to leave the US to find disaster situations where knowing Spanish would be useful.

Yes, clearly actually knowing the language would be better.

This is absurd. There are Spanish (and Tagolog, and Chinese, and...)-speaking people all over the world who don't speak the local language. There is no army of translators available at every location in the world all the time. There are plenty of disasters that don't destroy wireless networking connectivity.

The two way functionality works with only one pair of buds - the other user holds the phone

I would add that, to a lesser degree, being unable to communicate with someone in person dehumanizes them.

Commenting online isn't the same as hearing their voice and standing in front of them. That makes a huge different in how some responds and stops a lot of hate and trolling.

Sure, but I'd argue that it's societal unfamiliarity with the ramifications of mass communication (anonymous or otherwise) that causes this effect. The interface to this site may imply that this comment here is a response to you, and it even feels that way in my head sometimes, but in reality all of our comments are publicly broadcasted to the thousands and thousands of people on this site (hi, everyone!). That's a bizarre effect of the web that even veterans often allow themselves to forget.

In the past, the people who needed to understand the difference between private and public speech were those who spoke for a living: orators, salesmen, politicians. Yet the power granted by the internet has made this knowledge crucial for everyone, but nowhere is it really taught.

And even though this dynamic is enabled by the internet, the internet is not limited to it. One-on-one communication is quite possible. Video chat becomes more accessible every year. Even sites like ChatRoulette and Omegle were once great ways to have actual, honest conversations with strangers (still are, if you can wade through the sea of literal wankers). This conversation has given me a lot of food for thought of what it would look like to make one-on-one video chat a core feature of public forums like Reddit.

I think there was a research talking about the emotional value of instant messaging(close to zero emotional value), a phone call(some emotional value), and face to face conversation(full value).

as for video chat - they also diminish the emotional experience(trust), because of lack of eye-contact. but once you offer full eye contact and body language, for ex. via telepresence, this improves dramatically, probably very close to real-life conversation.

So how do we create such a system fit for consumers ? and since text, etc is more convinient and addictive, can we design something that consumers will love to use ?

please can we stop making addictive code?

> especially since there's so little financial incentive to do so.

I mean, Google is selling this product - they're not giving them out for free.

While this specific feature, at least for now, is Pixel exclusive, Google Translate itself still is completely free, and usable even without an account by anyone anywhere. It's hard to see the financial incentive to provide unlimited translations to anyone in the world for free.

These giant tech companies definitely a lot of shady and ambiguous stuff, but I still think we should commend them when they put resources in these other techs that aren't necessarily money makers, but provide a net plus for humanity. It's clear that these projects are being funded by money made in other parts of Google, such as their ad business.

It's similar to how BuzzFeed is mostly a trashy website, but once in a while they will come up with well researched thorough articles. It's clear that their shitty front page is a fund for them to achieve greater things.

Google Translate is a great product, but not monetizing it directly doesn't mean they do it out of the goodness of their heart. It is in their best interest to have access to as much information as possible no matter the language, and in that context a powerful translator is a priority, there is no way around it, and that is a massive financial incentive. That it also provides a great experience to keep its users loyal to their ecosystem is also a great financial incentive.

Also, don't underestimate taking away revenue sources from your competitors. If your competitor competes with you at A, but makes money at B, then taking away that B revenue stream affects their ability to compete with you at A as well.

I understand the comment, but how does that apply to Google Translate specifically?

There's Yandex which operates Russia's largest search engine (65% market share) which also runs Yandex Translate [0]. There's also Microsoft Translator [1], and Microsoft owns Bing, though MS Translator is a little different than Google Translate / Yandex Translate.

[0] https://translate.yandex.com/

[1] https://www.microsoft.com/en-us/translator/home.aspx

Google Translate being free doesn't "take revenue away" from another free service, though. Plus I bet Google's precedes those other two.

I find the free Google Translate, on iOS, to be one of my absolute favorite tools when traveling. The camera OCR translate feature is almost like magic.

I used it on my Pixel in a recent trip to Japan and it was a life saver. From translating signs and bus schedules, to communicating with a waiter that didn't speak English about an issue with my credit card.

Google translate is... iffy. Here's an example from this morning, where I asked a colleague about how the mooncakes were at last night's mid-Autumn festival:

were the mooncakes good last night? -> google translate -> 月饼好吗昨晚好吗?

月饼好吗昨晚好吗? -> google translate -> Would you like me last night?

I ended up forming my own question, which was slightly wrong - my colleague asked 你是要说好吃吗? ("did you mean to say 'tasty'?") which I threw into el goog for a laugh, and got in return: "Are you going to be delicious?"

So either google translate is a bit iffy (still amazing, but iffy) or mooncakes are some sort of aphrodisiac and I'm not understanding that :)

> It's hard to see the financial incentive to provide unlimited translations to anyone in the world for free.

More communication = more trade, more market penetration = more money floating around. It's not a direct incentive, but it's there.

I worked on this feature and just want to clarify that it isn't Pixel exclusive; it works on any Android phone running M or newer. It is exclusive to the Pixel Buds, though.

If you're talking about the real time translation, https://support.google.com/googlepixelbuds/answer/7545575?hl... says "Google Translate on Google Pixel Buds is only available on Pixel."

Sorry, my information was out of date since I switched projects a while ago. Supporting all M+ phones was the original plan, but after observing some severe bluetooth audio bugs on certain phones, the team decided to support Pixel only for now to derisk the launch.

Then why does this footnote exist on the google store?

See the last sentence:

¹The Google Assistant on Google Pixel Buds is only available on Android and requires an Assistant-enabled Android device and data connection. Data rates may apply. For available Assistant languages and minimum requirements go to g.co/pixelbuds/help. Requires a Google Account for full access to features. Google Translate on Google Pixel Buds is only available on Pixel.


Is there a reason it's exclusive to Pixel Buds?

We did explore using standard protocols in order to support third party headsets. The most natural solution would be to listen for AT+BVRA (voice recognition command), which most headsets generate after some button is held down for a couple seconds. It didn't fit with our desired UX, though. We wanted a hold-while-talking UX, rather than hold for a couple seconds, wait for a beep, then release and start talking.

We thought about listening for AVRCP key events to detect when a certain button was pressed and released -- probably play/pause, which seems to be the most prominent button on most headsets. It would have been hacky, though, and we ran into several problems. For one thing, a lot of headsets power off if the play/pause button is held down for several seconds.

We also had concerns about audio quality with third party headsets, especially those which didn't support modern versions of SCO (which introduced new codecs with 16khz support and other improvements), or with poor antennas leading to high packet loss (SCO is a lossy protocol, so we still get some speech and attempt to translate it, but accuracy suffers). We were concerned that all accuracy problems would make Google Translate look bad, even if the headset was to blame.

Interesting, thanks for the reply.

What about an app that you interact with rather than something physical on the headphones? Considering the phone is required, anyway...

If I'm understanding you correctly, it's already supported. The Google Translate app has a "conversation mode" where two people can take turns speaking into the phone - https://support.google.com/translate/answer/6142474?co=GENIE...

The problem that the Pixel Buds integration aims to solve is that it gets cumbersome for two people to hand a phone back and forth. With the new UX, one person can use the phone while the other uses Pixel Buds for the duration of the conversation.

I think what he was saying is just move the "button" on the headphones to a button on the phone, because they are already holding it. Then the headphones can just be in a normal "audio call" mode, and the phone button triggers the translation engine to stream data...


"For phone to access the Google Assistant: Android 6.0 Marshmallow or higher and other requirements"

Other requirements: https://support.google.com/assistant/answer/7172657?co=GENIE...

One imagines there will eventually be free + for-pay tiers.

I don't think this refutes the point; Google's core competency is selling ads, not selling babelfish, and 0% of their executives believe that it will ever be so. Remember that Google shut down Google Reader for having only a few million users and no path towards profitability. Back around the Google Plus initiative it was reported that Google Translate was a hair's width from the chopping block due to providing no financial value to the company, saved only by extreme defensive reaction from internal users (thanks, whoever you are!).

Culture is much stronger than language, just look at US politics to see how divided people get while sharing language and geography.

Maybe an advanced version of the translator could translate ideological positions as well as literal language. A conservative person conversing with a liberal person could have their "I don't believe it's my responsibility to subsidize others' healthcare" instantly translated to "I think healthcare should be federally funded from tax revenue".

This is terribly inaccurate - we have to remember to step outside our tech bubble and understand that in-person interactions are drastically different than comments on facebook or the web. I can despise people's views and like them individually.

Right, which is why promoting communication between individuals from different cultures via translation services won't necessarily noticeably improve relations between those cultures as a whole.

You can have friends in other countries and still happily go to war against those barbarians/heretics/whatever-your-justification-is, and keep telling yourself that you are doing it just as much for them as for yourself, even if they might currently disagree.

I have a very close friend, if you bring up the topic of abortion and don't drop it that could be the end of it very quickly. Some people are very identified with their political positions.

Nobody here is saying that the ability to communicate is a sufficient condition for interpersonal understanding; it is merely a necessary one.

There is an even greater divide between the US and cultures who don't speak the same language: Mexico, Russia, China, etc.

Perhaps you think this because the divide between US groups is more legible to you.

I hope this actually works, but I'm very skeptical.

Very recently I went to a resort in the Dominican Republic, where the main language is Spanish. Most waiting staff spoke good English, so everything was fine, until one of the people refilling the mini bar came in with a present and started ranting in Spanish. My first instinct was to grab google translate, do speech to text and then translate. Together, speech to text and translate were quite useless, probably because of the speed at which he spoke and possibly the accent.

I completely agree that this is a nice step. I doubt, however, that it will be anything more than a cool gimmick (in most cases), because of my recent experience.

I like that tech very much; but I also think we ought to learn how to understand others POV, and not assume too much about their personality, experience, reactions ..

Open source it if they are so noble

Google home doesn’t work 50% of the time in English; how are they going to get other languages working reliably?

It’s a good start but typical Google: try to release a hardware product with half-baked tech.

I point to 2 things that puts the US as leader of the pack in the world.

1) English is spoken in lots of places (don't forget songs and movies are great propaganda). 2) The USD is recognized in a lot of place

What is the intended insight?

The "translate conversations in real-time" part seems to have nothing to do with the Google Pixel Buds - it's already live in the Google Translate app.

In fact, you can try it right now. Open the Google Translate app on android, put on a pair of headphones with a mic, and tap the audio translation. As you hear spoken word in another language, it'll translate in realtime back to your non-google headphones.

*note - The trick is you need to press the button so it enables translation across both languages, as that seems to be the only way to keep translation on even after the first phrase.

The translation stuff seems to all be handled by your phone, so what is it about these specifically that makes them required for this type of functionality? I'm having trouble seeing why this shouldn't work with any set of earbuds...

I worked on this feature and commented on why we didn't support other headsets here: https://news.ycombinator.com/item?id=15404918

I think what completes the puzzle is that the Pixel 2 phone omits a headphone jack. Without a headphone jack wireless earbuds are required. Normally wireless earbuds would be superfluous, so they're introduced with a use case that highlights their advantages.

That's to say, I don't believe the Pixel Buds are necessary at all for real time translation. But I think they're presented with the real time translation to give people some positive and cool imagery to associate with the earbuds. It's advertising as far as I can tell.

Why is this Google Translate an advantage that's exclusive to Bluetooth headsets?

(Other than their choosing it to be so, of course. ;)

So any old bluetooth headset will work, or have they artificially limited it to pixel buds?

Only Pixel Buds will work. You can read this comment chain by someone who worked on the project to learn more about why they made that decision: https://news.ycombinator.com/item?id=15404395

Does anyone else not see the irony in the pixel name. Google became big by tracking people online using tracking pixels, now their tracking people in the real world using physical pixels

When I think pixel, I don't think tracking pixel. So no.

I use Bose Soundsport Wireless headphones currently and the mic is terrible at filtering background noise. Given this is about the same price range it seems like a worthy competitor even as an iOS user.

Is this a Babel fish?

in my experiences as a user of and developer for speech recognition systems, I have concluded that the best any machine translation system is going to be able to do is translate really basic, imperative communications.

e.g, "Don't eat that!" "We are friendly, don't shoot" "There is food in the kitchen" "Where is the nearest bathroom?"...That will all work relatively well.

Punishing others with Vogon poetry in their own native tongue... never gonna happen.

Google's remarkably good at translating anything that reads like news. Because there are so many news articles written about the same topic in different languages, that data is great. Google is not so good at translating things like love letters, because it's hard to get someone to write one in two languages and publish both.

certainly the relative paucity of some subject matter to train on is a factor.

The other factor is that complex use of natural languages depend on and require ambiguity and layered meaning and hosts of other factors that are really hard to handle.

I'm not sure love letters are more complex than the news. It often turns out we're not as complex as we think. Or that doing simple calculations on very large amounts of data captures all those nuances. That was the result described in "The Unreasonable Effectiveness of Data".

there's (far) more information in-band in a snippet of language than there is in the "plain" "meaning" of that snippet. All of it's relevant for translation, and none of that is easily tractable.

A fairly trivial example is a pun. Translating highly idiomatic things of this nature turns out to be extraordinarily hard, and just throwing more data at a DNN is not going to get you too far down that road.

That depends on how often that pun appears in the corpus. If you observe the translation enough times, it'll be easy for the computer.

this misses the point. There do exist untranslatable idioms, and puns are an easy example. They rely on language-specific features like rhymes (or sight-rhyme) or homophones that are not preserved across corpora.

You can absolutely make a "direct" word-for-word translation of a pun in english to, say, russian. It's just not a pun any more when you're done. Often there are no "pun equivalents" with totally different words, because usually they hinge on culturally specific references that also don't translate well.

Basically none of this matters when what you're interested in is subway directions or ordering food or whatever, but it becomes intractable really fast whenever you're interested in talking about something more meaningful.

Au contraire, it doesn't matter if the pun is translated directly. Heck, Google might be doing whole paragraph translation for all we know. It's certainly not at the level of individual words.

Translate words and it's gibberish. Pairs of words and you start to get slang. Triples and you can distinguish word that are different parts of speech in different contexts. Quads and grammar is mostly in the bag. 5-grams and most puns are handled. 6-grams and you've taken care of all simple sentences. Etc.

No need for semantics when n-gram counts does just as well.

With enough people talking, we'll eventually have taught Google all the translations for all possible sentences. (joking, but only halfway)

>it doesn't matter if the pun is translated directly.

you can't do this anyway.

>No need for semantics when n-gram counts does just as well.

They don't. neither do bag of words, word2vec, or whatever.

Simple imperative language? Absolutely, this all works pretty well. Anything else? Ha.

Love letters rely more on emotion, allegory, and subtext than the news, especially modern-era news.

Does that make translation more random? If not, then the real trouble is lack of data.

I wonder if there will be references in the packaging to this. I mean, the original Chromecast referenced HG2G.

It is such a bizarrely improbable coincidence that something so mind-bogglingly useful could have evolved purely by chance that some thinkers have chosen to see it as a final and clinching proof of the non-existence of God.

Too bad that name was already taken. Did Douglas Adams (or his estate) license it out to AltaVista in the first place? Wait, no, it looks like the name is biblical.

I think you mean Douglas Adams.

Yes, well that's embarrassing. Fixed.

At least we have Douglas Adams to thank for the fact they can't patent this!

Seems like.

a google-fish

This would be invaluable for patient encounters. If anyone at Google is reading this -- is there any plan to make a set like this available with a BAA / HIPAA compliance?

My in-laws don't speak English. I have to try this with them around the holidays.

Be sure to have them agree to Google's privacy policy first, because otherwise you'll be agreeing to it on their behalf.

They already have Google accounts so they should be covered from privacy perspective, but thank you for your concern.

Are they aware that everything they say to you now falls under that privacy policy?

I can understand using these for situations where you're only going to be around a language for a limited amount of time (business trip, vacation, etc). But presumably your in-laws are going to be around for a while, so what's the rationale for not learning to speak their language?

Not the OP but trying to learn another language so that you can speak it with people you see maybe twice a year seems a little silly. I would agree with you if OP lived with their inlaws or something, though.

I see my inlaws twice a year or less, still learning the language (or at least trying).

Because we have limited hours in the day?

I’ve been trying for 10 years and still haven’t made any meaningful progress; I’ve always struggled with language, even in immersive environments for several months. TBH I do prioritize spending the few hours I have during evenings/weekends with my children.

Pixel Buds? What does a pixel have to do with an audio device?

Google's branding department, as usual, leaves much to be desired.

Alphabet's Google Chrome Pixel Buds

Sounds like that old Microsoft iPod parody, the Microsoft iPod Pro 2005 XP Human Ear Professional Edition.

Ahh Classic, thanks for that, made me choke on my snack :)

They are a small part of something big. Connect them all together and you get the big picture. Pixels.

If you can credibly explain why a really large number would have more to do with audio we might take you opinions about branding seriously...

Of course, it should be called 'Google Babel Fish'.

What does an Apple have to do with electronics?

Would you say that Apple's marketing department leaves much to be desired?

Wait til you find out they don't even have a Googol of anything!

Its cool that you can use the earbud case to recharge your phone (or vice versa). However, as a lefty I think it'd be annoying to be forced to use my right hand whenever I want to control my headphones.

* if you have a smartphone connected and a decent connection to the internet to use google translate.

Did you expect the earbuds to have a wireless connection and processor inside of them? (and if so, wouldn't you have complained about privacy issues anyways?)

It seems we need better words. They do have a wireless connection and processor inside of them.

The problem is that the processor probably isn't powerful enough and the wireless won't connect to the internet.

Well they’re not ‘headphones that translate’ are they?

That’s like saying AirPods make new music to your taste because you can get an app on your phone that does that.

It’s sketchy marketing.

Neat idea, but sketchy sounding marketing.

Google Translate does not require an active connection to the Internet. You can predownload translation "matrices?" into the app.

I'm not sure about the Google Assistant integration's Internet requirements though, however, given the impact of reducing lag, I would suppose they would want to be able to predownload as much as possible.

Yes, but that is for dictionary only. If you need more processing power, for example for the image recognition, you need the cloud.

At least last time i tried, from greece to english.

Unfortunately for greek<->english, the offline image recognition isn't available. It does work for other languages, even ones like Chinese. There is a little eye icon in the bottom right of the app that lets you download the language for offline instant translation.

Seems like a modest requirement.

It is if you are in a developed country and not tied to roaming / have access to fast wifi.

I didn't want to trash it, it looks like an amazing concept. It would be interesting to see it in action.

Understood, but just for perspective, we humans now have devices that seamlessly translate (more or less) foreign languages right to your ears. That's magical, sci-fi stuff. To nitpick it just because it's not readily available to all 7B+ people in the world is a tad harsh, though I know that wasn't your intention.

Don't get me wrong, this is fantastic. I really, really want to try them and I really want this technology to catch on.

Relatively affordable international roaming has finally made it to the US with the $10/day ATT and $5/day Verizon plans.

I believe European rules go in effect this year to make roaming affordable as well.

To say nothing of the availability of prepaid SIMs in most countries.

Right now, it's pretty infeasible (without NN chips) to make this work without something else handling the large model. Depending on how long folks have been working ons tuff, in a few years (IE 2018-2020), it'll be feasible

Right? It's not magic

You can download and offline store language dictionaries

Pardon if this was asked but it's not clear to me, since they are Bluetooth, wouldn't they work with any device that also runs Google Translate?

I have an answer... of course they're work but the language detection isn't automatic. You have to run the app first but if you have a Pixel 2 phone, it will run the app for you when it detects a different language. Or so the ad reads...

I have Google Assistant on my OnePlus 3. Would these work with it the same as with the Pixel 2? Would these work with a Pixel 1 the same way?

Yes, sounds like you just need Google Assistant.


It seems these only work in this fashion with Pixel devices.

I understand they're trying to create an air of exclusivity but it really just comes off kind of greedy and dickish when in reality, any earbuds with a microphone should be capable of doing the same thing through Google Translate.

As a big Apple fan, I have no idea why Google should take flack for making this an exclusive (other than their previous "we're more open and open==good" hypocrisy).

Exclusivity means they have a limited hardware profile and control the quality out of the gate. Rolling it for use on all e.g. Android6 devices is sure to be a support nightmare.

It also may expand to other (Android|iOS) devices in future.

I may have poor reading comprehension but per https://support.google.com/googlepixelbuds/answer/7544332?hl...

The "killer feature" of realtime-ish translation only works with the Pixel phones.

To use as a headset, you need a Bluetooth® enabled companion device running:

Android 5.0 or higher iOS 10.0 or higher

To use with Google Assistant you need:

An Assistant enabled Android device Android 6.0 or higher A Google Account A data connection

To use with Google Translate you need:

Pixel or Pixel 2 The Google Translate app (A list of supported languages can be found here.)

It seems like we're getting to the level that people talked about in Science Fiction not that long ago. Even when Google was started in 1998, I'm not sure that people would believe you if you said that in 20 years Google would be putting out a product that automatically translates spoken conversations in realtime.

Sure the technology isn't perfect, but it actually works surprisingly well (at least the translation part). I have been learning Mandarin Chinese recently, and the translations provided by Google Translate are pretty darn good (if not exact, they are close enough to understand).

You mean the level where we can contact almost anyone in the world at anytime? We have full access to the worlds knowledge at anytime? On a device that fits into your pocket?

We have been hitting sci-fi goals for a while now, and it is awesome!

Performing translation on the fly with technology is a HUGE win for humanity. I'm glad to see others besides Hear One taking this seriously for your average consumer.

It's unclear to me though; will these work with any phone?

Very neat, but I wish it came in an over-the-ear version. I'd like to travel to Japan some day soon, I'd like to buy some rare vintage stuff, and I can't imagine handing an ear wax encrusted earbud to some elderly farmer.

As wireless ear buds these are somewhat interesting, as a translation device these are fascinating, why can't google sell more translation stuff?

As pointed by ss17n:

> The two way functionality works with only one pair of buds - the other user holds the phone

So just keep the ear wax encrusted earbud for yourself. :-)

Maybe you know this already, but I suggest you to investigate an app called Jspeak. Take in account it requires a SIM in your device (so it would work on a smartphone but not in a plain tablet).

I've seen it used (and used it myself) in Japan and it was pretty good. Also, it was used by an old guy in a Ryokan in rural Japan. He didn't speak English so we just went to and fro speaking in his tablet, checking that the audio was correctly recognized (it converts it to text before translating) and then handling it to the other person who would read/hear the sentence in his own language.

It integrates into your phone's speaker. You don't hand anyone a hearing device.

It took me a long time to find, but here are the 40 languages supported: https://support.google.com/googlepixelbuds/answer/7547959?hl...

Really cool feature, curious how it works in practice. This kind of thing could sway me from iOS.

It's really not that remarkable, given what parts already exist. In pseudo command line:

   speech_to_text --language=english | google_translate --from=english --to=swedish | text_to_speech --language=swedish
All this product does is hook up a few pieces that Google had already developed.

If you seriously believe that it took minimal engineering effort to integrate those parts into a low latency real time system I don't know what to tell you. These kinds of dismissive comments on HN never cease to amaze me.

Well, yeah, but without that google_translate part, they wouldn't get nowhere.

Bing Translate, even though it's been used by default on both Facebook and Twitter, will probably never give you good enough results to have a conversation with a stranger. Google Translate probably will.

I mean, if there's anyone that combine those couple of parts, it's Google and no one else.

It is remarkable because it happens in realtime.

Spoken language is actually very slow compared to what computers can handle. Google Translate would have no trouble keeping up.

You may be able to hear, but how does it help you speak back to them in a different language?

It's, IME, very common for immigrants (especially those that immigrated in middle age) to be intelligible to native speakers when speaking the local language, but have more trouble understanding native speakers. So one-way translation may be useful.

(OTOH, it may also mess with code-switching.)

That's the reverse of the common case -- it's much easier to learn receptive language than expressive language.

Really? I have found the exact opposite. You can pick up a lot of clues from words, expression, tone, without knowing the entire sentence. But that doesn't help in talking to them.

you use your phone's speaker. this is shown in the demo.

I would answer in my own language, and let my interlocutor's device translate for me instead.

Buy two pairs :)

I remember a Kickstarter project which demoed a small device that you can keep in front of you and it will act as your very own real-time translator.

But this is super dope. Great job by the team at Google.

This is just an early iteration for the future. Imagine the possibilities while travelling, attending conferences, eating at foreign restaurants.

Can't wait until this becomes commonplace.

I feel like this already should be commonplace. I used Google Translate back in 2014 on a trip to Japan. Really came in useful when a hotel didn't have my reservation. Since then they've added the conversational mode that effectively makes Translate work identically to what was demonstrated with these earbuds.

That's my beef with the feature. Since the earbuds already require a phone with Assistant, why wouldn't I use the conversation mode in the Translate app instead, which is superior in every way?

Linux tech tip had a review of such device that they found on kickstarter a month ago. Is google adding something to the concept?

Does anyone have a link to the demo video?

Sorry for the short comment but won't this be the next "Google glasses"?

Are those conversations also stored on Google's servers? I know I shouldn't even have to ask this question, but who knows these days with Google.

Does it matter? They pass through Google's servers; results of analyzing them are stored on Google's servers; the full audio is probably stored on the NSA's servers; if Google says they don't store the full text, does that meaningfully alter the level of privacy you can expect?

>Does it matter?

In states that require two-party consent in their wiretapping statutes, it probably does.

With Google Translate, you can use the service offline by downloading language packs. Now, I'd assume that if you use it offline, Google will cache your conversations and upload them the next time your device has connectivity. But who knows.

If they do, they'll likely surface it here and let you delete it: https://myactivity.google.com

Not only will your conversation be recorded; you won't even be able to have conversations about forbidden subjects. Not necessarily illegal subjects; just forbidden subjects.

Translating conversations in real time is cool, but I need that feature how often? I switch my earbuds between my phone and laptop multiple times a day and it's a pain for every pair of bluetooth earbuds ever.

At least Apple tried to solve the problem. Google appears to be acting like Samsung here, adding gimmicks while releasing a lesser product.

Another way of looking at this is that Apple just made another set of wireless headphones whereas Google put out the first iteration of something quite useful that has larger long-term potential.

(I only say it this way for the sake of argument, not as some type of Google or Apple "fan" statement.)

I personally don't deal with frequent pairing and unpairing of bluetooth headphones. I don't even own a pair (just some cheap earbuds and some pretty decent Sennheisers). As such I didn't even realize there was a problem being solved by Apple putting out another model of BT earbuds.

But I am in a relationship with someone for whom English is a second language. She's a great English speaker and I am frequently impressed by her command of the language, given how different our two native languages are. I've been learning some of the basics but when I try to talk to her family, it's essentially smiles and nods.

The Translate phone app is pretty amazing for taking turns and having it translate back and forth, but I would love to have something like these new phones the next time we visit with her family. It may be borderline gimmicky and the appeal may be narrow (compared to just an easier to use pair of wireless earbuds) but it's a lot more interesting to me in terms of existing tech combined in a new way that I'd hoped to see for a while.

I think this is more a problem of you not being the target audience.

I'd use it lots... and lots. I'm an immigrant, and though I do alright with the local language (Norwegian, and I'm a native English speaker), conversations are sometimes difficult. Neil Gaiman in Norwegian is a difficult read for me, but I*m past kids books. Some folks would talk to me more, since lots of people understand some english but aren't so comfortable speaking it. Google translate isn't perfect, but it continues to get better.

I can see retail employees having access to a pair at work. Restaurants. Travel. And so on.

I understand the appeal. I've lived places where I can't even order off a menu. My in-laws don't speak any English. If/when this works flawlessly it has the potential to change the world. It just amazes me that there's a company that can ship star trek level tech yet nobody can make wireless earbuds easy to use!

My Bose QC35 connect to two devices at once, so it's possible. I don't see anything about these that indicate that they're able to do so, though.

And, oh man, the "Tech Specs" section of their site is a textbook example of how NOT to lay out text for easy parsing/readability.


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact