Hacker News new | past | comments | ask | show | jobs | submit login

With this capability, how close are y'all to it being able to listen to my pronunciation of a new language (e.g. Italian) and given specific feedback about how to pronounce it like a local?

Seems like these would be similar.




It completely botched teaching someone to say “hello” in Chinese - it used the wrong tones and then incorrectly told them their pronunciation was good.


As for the Mandarin tones, the model might have mixed it up with the tones from a dialect like Cantonese. It’s interesting to discover how much difference a more specific prompt could make.


I don't know if my iOS app is using GPT-4o, but asking it to translate to Cantonese gives you gibberish. It gave me the correct characters, but the Jyutping was completely unrelated. Funny thing is that the model pronounced the incorrect Jyutping plus said the numbers (for the tones) out loud.


Not that different at all.


I think there is too much focus on tones in beginning Chinese. Yes, you should get them right, but no, you'll get better as long as you speak more, even if your tones are wrong at first. So rather than remember how to say fewer words with the right tones, you'll get farther if you can say more words with whatever tones you feel like applying. That "feeling" will just get better over time. Until then, you'll talk as good as a farmer coming in from the country side whose first language isn't mandarin.


I couldn’t disagree more. Everyone can understand some common tourist phrases without tones - and you will probably get a lot of positive feedback from Chinese people. It’s common to view a foreigner making an attempt at Mandarin (even a bad one) as a sign of respect.

But for conversation, you can’t speak Mandarin without using proper tones because you simply won’t be understood.


That really isn't true, or at least it isn't true with some practice. You don't have to consciously think about or learn tones, but you will eventually pick them anyways (tones are learned unconsciously via lots of practice trying to speak and be understood).

You can be perfectly understood if you don't speak broadcast Chinese. There are plenty of heavy accents to deal with anyways. Like Beijing 儿化 or the inability of southerners to pronounce sh very differently from s.


It was good of them to put in example failures.


[flagged]


In my experience, when someone says a project was programmed by "white men from the west coast", it was actually made by Chinese or Indian immigrants.

(Siri's original speech recognition was a combination of Swiss-Germans and people from Boston.)

And it certainly wouldn't be tested by them either way. Companies know how to hire QA contractors.


People always say tech workers are all white guys -- it's such a bizarre delusion, because if you've ever actually seen software engineers at most companies, a majority of them are not white. Not to mention that product/project managers, designers, and QA are all intimately involved in these projects, and in my experience those departments tend to have a much higher ratio of women.

Even beside that though -- it's patently ridiculous to suggest that these devices would perform worse with an Asian man who speaks fluent English and was born in California. Or a white woman from the Bay Area. Or a white man from Massachusetts.

You kind of have a point about tech being the product of the culture in which it was produced, but the needless exaggerated references to gender and race undermine it.


An interesting point, I tend to have better outcomes by using my heavily accented ESL English, than my native pronunciation of my mother tongue I'm guessing it's part of the tech work force being a bit more multicultural than initially thought, or it just being easier to test with

It's a shame, because that means I can use stuff that I can't recommend to people around me

Multilingual UX is an interesting painpoint, I had to change the language of my account to English so I could use some early Bard version, even though It was perfectly able to understand and answer in Spanish


You also get the synchronicity / four minute mile effect egging on other people to excel with specialized models, like Falcon or Qwen did in the wake of the original ChatGPT/Llama excitement.


What? Did it seriously work worse for women? Spurce?

(accents sure)


I don't think that'd work without a dedicated startup behind it.

The first (and imo the main) hurdle is not reproduction, but just learning to hear the correct sounds. If you don't speak Hindi and are a native English speaker, this [1] is a good example. You can only work on nailing those consonants when they become as distinct to your ear as cUp and cAp are in English.

We can get by by falling back to context (it's unlikely someone would ask for a "shit of paper"!), but it's impossible to confidently reproduce the sounds unless they are already completely distinct in our heads/ears.

That's because we think we hear things as they are, but it's an illusion. Cup/cap distinction is as subtle to an Eastern European as Hindi consonants or Mandarin tones are to English speakers, because the set of meaningful sounds distinctions differs between languages. Relearning the phonetic system requires dedicated work (minimal pairs is one option) and learning enough phonetics to have the vocabulary to discuss sounds as they are. It's not enough to just give feedback.

[1]: https://www.youtube.com/watch?v=-I7iUUp-cX8


> but it's impossible to confidently reproduce the sounds unless they are already completely distinct in our heads/ears

interestingly, i think this isn't always true -- i was able to coach my native-spanish-speaking wife to correctly pronounce "v" vs "b" (both are just "b" in spanish, or at least her dialect) before she could hear the difference; later on she was developed the ability to hear it.


I had a similar experience learning Mandarin as a native English speaker in my late 30s. I learned to pronounce the ü sound (which doesn't exist in English) by getting feedback and instruction from a teacher about what mouth shape to use. And then I just memorized which words used it. It was maybe a year later before I started to be able to actually hear it as a distinct sound rather than perceiving it as some other vowel.


After watching the demo, my question isn't about how close it is to helping me learn a language, but about how close it is to being me in another language.

Even styles of thought might be different in other languages, so I don't say that lightly... (stay strong, Sapir-Wharf, stay strong ;)


I was conversing with it in Hinglish (A combination of Hindi and English) which folks in Urban India use and it was pretty on point apart from some use of esoteric hindi words but i think with right prompting we can fix that.


In the "Point and learn Spanish" video, when shown an Apple and a Banana, the AI said they were a Manzana (Apple) and a Pantalón (Pants).


No, I just watched it closely and it definitely said un platano


I re watched it a few times to ensure it said plátano before posting, and it honestly doesn't sound like it to me.


I'm a Spaniard and to my ears it clearly sounds like "Es una manzana y un plátano".

What's strange to me is that, as far as I know, "plátano" is only commonly used in Spain, but the accent of the AI voice didn't sound like it's from Spain. It sounds more like an American who speaks Spanish as a second language, and those folks typically speak some Mexican dialect of Spanish.


> "plátano" is only commonly used in Spain

The wiktionary page for "plátano" has a map illustrating how various Spanish-speaking countries refer to the banana.

https://en.wiktionary.org/wiki/pl%C3%A1tano#/media/File:Porp...

My principal association with plátano is plaintain, personally, but I am not a Spanish speaker.


I was about to comment the same thing about the accent. Even to my gringo ears, it sounds like an American speaking Spanish.

Plátano is commonly used for banana in Mexico, just bought some at a Soriana this weekend.


Interesting, I was reading some comments from Japanese users and they said the Japanese voice sounds like a (very good N1 level) foreigner speaking Japanese.


I thought "plátano" is only used for plantains in Latin America, and Cavendish is typically called "banana" instead. I'm likely wrong, though.


At least IME, and there may be regional or other variations I’m missing, people in México tend to use “plátano” for bananas and “plátano macho” for plantains.


In Spain, it's like that. In Latin America, it was always "plátano," but in the last ten years, I've seen a new "global Latin American Spanish" emerging that uses "banana" for Cavendish, some Mexican slang, etc. I suspect it's because of YouTube and Twitch.


In Spain, plátano is used for Cavendish and plantains are rarely consumed. I am a Spaniard.


I'm from Colombia and mostly say "plátano".


Good to know. I thought Colombians said "banano". That's what a Colombian friend of mine says.


plátano is used in several Spanish-speaking countries, such as Mexico and Chile.


The italian output in the demo was really bad.


I'm a native Italian speaker, it wasn't too bad.


The content was correct but the pronunciation was awful. Now, good enough? For sure, but I would not be able to stand something talking like that all the time


Do you not have to work with non-native speakers of whatever language you use at work?


Most people don't, since you either speak with native speakers or you speak in English mostly, since in international teams you speak in English and not one of the native languages even if nobody speaks English natively. So it is rare to hear broken non-English.

And note that understanding broken language is a skill you have to train. If you aren't used to it then it is impossible to understand what they say. You might not have been in that situation if you are an English speaker since you are so used to broken English, but it happens a lot for others.


Why would you say "really bad"?


It doesn't have hands.


"I Have No Hands But I Must Scream" -Italian Ellison


This was the best joke I’ve heard this year.


So good!


Joke of the day right there :-)


Which video title is this?


Found it in a reel, I’m guessing it’s in the keynote: https://www.instagram.com/reel/C662vlHsGyx/

The Italian sounded good to me.


It sounds like a generic Eastern European who has learned some Italian. The girl in the clip did not sound native Italian either (or she has an accent that I have never heard in my life).


also wondering.


Shared in a reply to my comment.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: