
Cutlet: A Japanese to Romaji Converter in Python - polm23
https://www.dampfkraft.com/nlp/cutlet-python-romaji-converter.html
======
Hamuko
> _One of my main motivations for making this library was dealing with the
> frequent case where using Japanese text isn 't an option for technical
> reasons, or it is an option but comes with downsides._

The annoying thing is that the Japanese are equally quilty of this. I live in
Finland so Ä/Ö are used; I have never ever found a single Japanese online
store that accepts Ä/Ö in input (not even Amazon.co.jp). Add a name and
address and you're suddenly give a "Please enter only English characters"
error.

You'd think the Japanese of all people would know how annoying it is when
systems accept a very limited amount of character sets.

~~~
claudeganon
“English characters” are a part of the Japanese written language at this
point, so it makes sense from that perspective. They’re consistent in
supporting their own written language, far less so any others.

~~~
Hamuko
Not really. I'm using Amazon.co.jp in English and I can't enter my address in
katakana. Full-width ASCII is also out. Only A-Z here.

It's kinda strange since Amazon.co.jp is really easy and handy for
international shopping since you can even use it in English. They even
precollect the Finnish VAT and DHL ships stuff from there super fast (package
leaves Japan on a Friday, is at my door on Monday).

~~~
claudeganon
I agree that it’s silly Amazon Japan doesn’t support this, but was speaking to
the broader point that “Japanese, of all people, should know...”

Failing to provide support or consideration to out-groups is a quite common
feature of Japanese society. Difficulties arising from those situations tends
to be blamed on those on the receiving end of them for being “other.”

------
wodenokoto
I just want to chime in and say that the author knows what he is talking about
when it comes to Japanese and NLP.

Like many other commenters, I am also not sure why you’d want “katsu”
transliterated to “cutlet”, but the author didn’t choose to have a tool to do
this out of ignorance of the Japanese language.

~~~
ihadfun
Because katsu means cutlet, a thin slice of meat. Tonkatsu for example is pork
cutlet.

~~~
wodenokoto
Transliteration is not translation.

~~~
sheepdestroyer
From Wikipedia, that's technically and etymologicaly correct :

"The cutlet was introduced to Japan during the Meiji period, in a Western
cuisine restaurant in the fashionable Ginza district of Tokyo. The Japanese
pronunciation of cutlet is katsuretsu.

In Japanese cuisine, katsuretsu or shorter katsu is actually the name for a
Japanese version of the Wiener schnitzel, a breaded cutlet. Dishes with katsu
include tonkatsu and katsudon."

~~~
wodenokoto
TIL, but in my defense, an abbreviation of a transliteration, which has come
to mean something slightly different should probably be considered a new word
and not be transliterated back into its etymological ancestor.

------
aikinai
This is amazing! Years ago I wrote a script to convert my music library to
romaji for my car that would show non-English characters as ?????. So many
titles were really just English titles in katakana that come out as gibberish.
I had my own hacky exception override dictionary and the script ending up
being like 70% exceptions.

If Cutlet had been around, this would have been exactly what I needed.

------
asutekku
While I’m not a huge fan of romaji itself, I disagree with converting foreign
words from their japanese spelling to their original spelling.

It might be easier for foreigner to understand the word, but at the same time
no japanese will ever understand if you talk about cutlet curry instead of
katsukaree. And that’s something i’d guess this would be used in; reading
something out loud you don’t know how to read.

~~~
hakka-nyu-su
I also wonder if the author had non-English origins of words in mind.

Ex., to how many people is it useful to convert アルバイト from "arubaito" to
"Arbeit", especially when the Japanese word has a different connotation to the
German (part-time work vs occupation)

~~~
knolax
Or if it's a word where the source language doesn't use the Latin Alphabet. I
have low confidence in the accuracy of a Tamil ---> Katakana ---> English
conversion.

------
KayL
Recently, working on a tool and find it difficult to implement `fugashi` (the
engine behind this tool) to work with `deplacy`.

~~~
polm23
Hello, thanks for trying fugashi. I'm the developer. Note that the most recent
version of spaCy, 2.3, uses sudachipy instead of fugashi, so maybe that's the
source of your problem. If you are having trouble with fugashi, always feel
free to open an issue (and note it's fine to write in Japanese).

------
woodandsteel
I'm an American and a few years ago I started getting into Japanese musicians
(Senri Kawaguchi, Kanade Sato, Yoyoka Soma, Rie Suzaku, Juna Serita and
others). This got me into reading a little about how the Japanese write and
it's crazy.

They have four different systems. There is Kangi which is from Chinese. There
are two different systems that are phonetic but based on syllables rather than
single sounds. And finally they use the Roman script like we do. It must be
really hard on the elementary school kids who are trying to learn this all.

------
junar
Obviously, language learners should never use this. They should take a week or
two to learn kana, and use a furigana tool instead.

I'm also pretty unconvinced of the article's stated purpose of "readable" URL
text. I spot checked a few Japanese news sites (Yomiuri, Asahi, Mainichi), and
none of them try to do this. The URLs just have random alphanumeric article
IDs. I don't think romaji is valuable for most readers, who probably find it
more effort to decipher it than to simply read the Japanese text in the title.

~~~
Zarel
> _I 'm also pretty unconvinced of the article's stated purpose of "readable"
> URL text._

Here's an example:

[https://magazine.jp.square-enix.com/biggangan/](https://magazine.jp.square-
enix.com/biggangan/)

(Notice: "big gangan" rather than "biggu gangan".)

While it's true that Japan tends to use numbers for dynamic content URLs, this
is more about (and using CMSes that require them) than users actually
preferring them.

Japanese URLs do frequently tend to be in English, though.

~~~
junar
I'm not very familiar with CMSs, but what kind of use case would actually
benefit from the tool?

Your example actually requires human decision-making, since Big Gangan is a
proper noun with a canonical English name. I expect most Japanese devs to be
as familiar with romanization as English devs are with spelling and grammar,
so a tool shouldn't be needed unless you're dealing with a large amount of
text and can allow for errors.

------
reedwolf
Related:

What Python would look like in Japanese:

[https://www.reddit.com/r/ProgrammingLanguages/comments/g9iu8...](https://www.reddit.com/r/ProgrammingLanguages/comments/g9iu8x/concept_art_what_might_python_look_like_in/)

------
rootsudo
Not the best because many kanjis can be transliterated the same into to romaji
and lose meaning.

For web dev, I think it's probably fine.

------
ekianjo
That's not a converter if you mix both translation and 'kanji to romaji' at
the same time in the same word. That's like mixing two sets of tools in one
when only one was needed.

