Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> But the pronunciation (/ˈsʌbtaɪtəls/ vs. /sub'titulos/) is very different (Levenshtein distance = 5).

Levenshtein distance between conventionalized IPA orthographic representation is not a useful metric for the perceptual difference between two spoken sound sequences.

> Written language is reasonably standard. While there can be some regional variation (color/colour, lift/elevator and so on); you get rid of all the diversity of accents which can be a huge deal for a non-native speaker.

This problem actually comes up in writing too; I knew some Chinese students who were really offended by English typos. They had a point - lacking the pre-existing knowledge of English that lets me know what a typo was meant to say, they just had no way of finding out what a misspelled word was supposed to mean. You can't look up words that don't exist.



"You can't look up words that don't exist."

Erm, just google them and take the recommended result?

In general I think searching with missspelled words is a quite solved problem.


> Levenshtein distance between conventionalized IPA orthographic representation is not a useful metric for the perceptual difference between two spoken sound sequences.

I know. Just wanted to provide some metric for the quantitatively-minded, and because the example is hard to follow if you don't speak Spanish or read IPA, and I don't know any least bad quantitative metric. But qualitatively speaking, I can tell you that inferring the meaning of that English word from the written form is obvious even to a Spanish person with zero English knowledge, while inferring it from the pronunciation it's very difficult. And it's just an example, but it happens with many words.

> This problem actually comes up in writing too; I knew some Chinese students who were really offended by English typos. They had a point - lacking the pre-existing knowledge of English that lets me know what a typo was meant to say, they just had no way of finding out what a misspelled word was supposed to mean. You can't look up words that don't exist.

It can come up, but still, the normal frequency of typos is much, much smaller than the frequency of accent-specific pronunciation variants, which typically change several words per sentence. And with Google you can look up many light typos and it will come up with the correct form (although not things like using "of" instead of "have", of course).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: