Hacker News new | past | comments | ask | show | jobs | submit login
Non-English-based programming languages (wikipedia.org)
77 points by steveklabnik on Sept 2, 2013 | hide | past | web | favorite | 75 comments

Create a non-English based programming language is very different from translating a english programming language. Different languages have different ways to express time, state, etc. If one translates a english programming language, it will be a translated English based programming language (not a Non-English base...)

I`m a native portuguese speaker, for example. In portuguese, we "break" the "to be" verb in two forms: The "ser" verb to denominate an immutable state like "The sun is hot" and a "estar" verb to denominate a transitory state "It is hot today".

My guess is that, if programming languages were made since the beginning with a language like that, the coding world would be a more pleasant place today. Things like differenciate mutable/immutable state would be much more natural.

So, it`s possible even to create a Non-English based programming language IN English. Russian, japanese, etc. languages have their peculiarities, that could shape coding very different than what it is today.

Ruby it`s a nice example of it. Just google about the subject, and found this material (http://blog.new-bamboo.co.uk/2010/12/17/learning-japanese-th...)

Any rubyist-japanese-speaker who would like to give some thoughts on the subject?

Interesting article, maybe a bit far-fetched towards the end. Correct link:


As another Portuguese speaker, I don't think that "ser" is all that immutable; it describes plenty of non-intrinsic and/or immutable properties. In fact, the verb is used extremely often in the past tense, which shows that in Portuguese the objects often change what they "are".

I ended up on the (Chinese-language) page for 丙正正¹, a Chinese C++ variant. There was a code example, which I ran through Google Translate to see what would happen. The result is surprisingly readable (and obviously a C-family language)²:

  Empty chess file :: set comments (character * s, integer n)
         If (n> = maximum number of comments)
         For (; maximum number of annotations <= n; maximum number of comments + +)
                 Comment [Maximum number of annotations] = NONE;
         If (s == NULL or the string length (s) == 0)
         If (annotation [n]! = NONE)
                  Delete Comment [n];
         Comment [n] = new character [string length (s) +1];
         String Copy (annotation [n], s);

¹ http://zh.wikipedia.org/wiki/%E4%B8%99%E6%AD%A3%E6%AD%A3

² empty where one might expect void, character, for, string, &c.

In Korean:

  체스::리플달기 (캐릭터 * ㅅ, 숫자 ㄴ)
     면 (ㄴ >= 최고리플번수)
     용 (; 최고리플번수 <= ㄴ;)
     면 (ㅅ == 무 아니면 문자 길이 (ㅅ) == 0)
     면 (리플 [ㄴ]! = 없음)
        리플 지우기 [ㄴ];
     문자 복사 (리플 [ㄴ], ㅅ);
What's interesting about Korean alphabet is that you could read and write the code above within a few days of memorizing the Korean alphabet system. Even a non-native Korean speaker can read the above example (you could read about 70% of it as most words are phonetic spellings of english words like Chess = 체스. and write it without speaking a word of Korean if you knew the consonants and alphabets. There's very little Korean word in that example above.

Good luck with writing it in Chinese (memorize all 4000 characters) or Japanese (Kanji, Katakana, Hirakana a clusterfck). If you're gonna build an Asian programming language, Korean alphabet's flexibility makes it easier if not more efficient to express developer intention.

Majority of the english words have been phonetically typed in Korean, there's very little semantic Korean meaning.

Now that's just fucking horrifying. D:

I have a translation of "Design of the Unix Operating System" in Chinese, and take great comfort in the fact that I can still get the gist of all the source code listings--even despite the comments in Chinese.

I admire the simplicity of the grammar in Chinese (from the year I took of it in college), but honestly I find logographic languages are kind of gross.


Fine, fine, I admit it: a language with millenia of cruft is totally reasonable to use as the way of persisting the cruftiest programming language in the world.

Indeed, the same years of hard study that are required to write and read Chinese literately should be added onto the same years of study required to write and read C++ reliably.

This is such a comically bad and obtuse idea I think we should propose it as the next draft standard.


Look, explain your downvotes (in the language of your choice!). My opinion is simply that alphabetic languages (here exemplified by English) are superior to logographic languages--mostly because they require knowing fewer characters.

I may be grossly misunderstanding Chinese here; as claimed, my schooling in it is limited.

Chinese characters (and similar writing systems) do have certain advantages versus alphabetic systems though. The smallest unit of writing has more information embedded in it. You've got a decent chance of guessing the meaning of a compound word if you know some of the characters in it, though not necessarily the pronunciation. With alphabets, it's reversed, if you know all of the pieces of a word, pronunciation is usually simple, but meaning not necessarily so (especially in English). As a result, the written language is more dense, and can actually be read more quickly.

Finally, with an appropriate method, it doesn't have to take years to learn enough characters to write and read literately, I can read Japanese at a high school level, and I've been at it for a year and a half.

So, Japanese is a uniquely bad example here, right?

As I understand it, there are three alphabets: kanji (lots of Chinese characters), katakana, and hiragana. The latter two are used to spell out the syllables of words, in some sense acting like an alphabetic language. Kanji seems to have a thousand or two characters in use, whereas katakana and hiragana have around fifty.

Beyond looking up stroke numbers and radicals, I found dictionary usage for Chinese characters somewhat hard--English lets you basically do a very easy binary search on a word (start at most significant character, find section, move to next most significant character, etc.).

Bad in what way exactly? Hiragana and Katakana are easier to pick up, because there is more burden placed in learning each individual word. With the baseline investment to learn the Kanji in place, each new word is just a composition of characters and their associated ideas that you already know.

You can do the same thing with dictionaries in Japanese or Chinese, though it works best if you use a dictionary that lets you handwrite in the characters (it helps a lot if you learn your radicals and stroke orders well, so that you can easily write characters you don't know).

Oh, so, my point was that using Japanese was a bad example, precisely because two of the three alphabets are used nonlogographically. It is also my understanding that new words and loanwords are spelled out phonetically in those alphabets, instead of grafting some new character into the kanji.

"if you use a dictionary that lets you handwrite in the character"

I'm unfamiliar with any paper dictionary with that capability.

I mostly use the dictionary on my phone. I can get by with a paper one, but it's a bit slower.

Technology is very handy nowadays. :)

"Chinese characters (and similar writing systems) do have certain advantages versus alphabetic systems though."

You end up with much lower population literacy rate than countries with alphabetic systems. ex) China at 92% vs Korea at 99%.

Hangul was created 600 years ago to counter the difficulties faced by ordinary citizens attempting to memorize the Chinese alphabets. Hangul also by far one of the more exotic alphabet system. This video sums it up nicely:


Hangul is great, but Japan has a 99% literacy rate as well. With a good method, e.g. Heisig, learning the characters is not very difficult at all.

geraffes are so dumb

> Perl – While Perl's keywords and function names are generally in English, it allows modification of its parser to modify the input language, such as in Damian Conway's Lingua::Romana::Perligata module, which allows programs to be written in Latin or his Lingua::tlhInganHol::yIghun Perl language in Klingon. They do not just change the keywords but also the grammar to match the language. That's impressive

As an example just how far they go, the statement:

clavis hashus nominamentum da.

Is equivalent to [1]:

@keys = keys %hash;

And in Klingon Perl [2]:

De'pu'wI' bIH yInob!

Is equivalent to:

%data = @_;

[1] https://metacpan.org/module/DCONWAY/Lingua-Romana-Perligata-...

[2] https://metacpan.org/module/Lingua::tlhInganHol::yIghun

I think it's an awful idea. Let's take Cyrillic languages, Ukrainian/Russian as an example.

* Alphabet has 33+ letters, so there is no space left for programmer's favorite symbols on layout: @#$^&{}[]|~`<>. Yep.

* In Math variables' names traditionally are Latin/Greek letters, Cyrillic is used only when teaching kids. Again: switching layout all the time? No, thanks.

* Words are simply much longer. And I mean much. In English even short words tend to become shorter (variable -> var). In Russian there is simply no such pattern exists. `var-set` -> `установить-переменную`. `var-get` - `получить-переменную`.

* Words are stable in English morphologically. e.g. `last-msg-delivered` - `last-msgS-delivered` -> `последнЕЕ-сообщениЕ-доставленО` - `последнИЕ-сообщениЯ-доставленЫ` - if you change gender or from singular to plural etc you should change few words in a row.

Seriously, it seems to me like trying to make things more complicated.

I have 104 keys on my keyboard and 4 of them are modifiers. The English alphabet isn't the Latin/Greek alphabet either. And I am absolutely astounded that Russians apparently have never invented the concept of an abbreviation or contraction.

One does wonder how soviets ever managed to program their computers.

As an aside, if you (the reader, not Avshalom) haven't read much Russian literature, Russian is the language where you take the first syllable of every word in a long phrase and you make a new word:

Министерство здравоохранения => Минздрав (Ministry of Health => Minheal)

Министерство юстиции => Минюст (Ministry of Justice => Minjust)

etc, etc. As a foreigner, I found that quite peculiar though it doesn't seem to be as common now as it was before but it's still noticeable in everyday life.

Ah, so that's where that came from in 1984. It all makes sense.

Yes, it is indeed inspired by that.

...and nowadays in the US: HomeSec

Don't worry it seems peculiar even for native like me :-) It's a bit less common now as it feels like Soviet era attribute (think Orwell's новояз (newspeak))

> I have 104 keys on my keyboard and 4 of them are modifiers.

Yep, that's one way. But it make things harder. If you set up Alt to use as English layout modifier to type {} you need to press two modifiers now: Alt + Shift. However I agree it's not the main issue.

> The English alphabet isn't the Latin/Greek alphabet either.

At least it's Latin.

> And I am absolutely astounded that Russians apparently have never invented the concept of an abbreviation or contraction.

Of course such things do exist.. they just don't work very well. English's short words is a relatively distinctive feature which it has developed being a mix of Latin/French/Germanic. For Russian words usually are complex. Var - переменная - prefix пере- + root мен. Yes, you can contract it to `перем.` But it also sounds like contraction of `перемещенная` which is `moved` or `перемещать` which is `move` or `переменный’ which is `variable(meaning: alternating)` and so forth.

Sorry, but the thing about the layout is BS. I code with a german layout. You have the use the Alt-Gr modifier a lot, but in the end it's not harder to use than a shift key.

You may call my opinion BS but it is harder due to the fact that additional letters take space off the symbols. Examples (assuming you use AltGr)

    Symbol; English; Russian
    , ; , ; Shift + .
    [ ; [ ; AltGr + х
    { ; Shift + [ ; AltGr + Shift + х
See pattern? Though I agree with you - it's not a dealbreaker.

Ok, two modifiers are proably more annoying. Is the Alt-Gr plane not big enough? I have serveral Symbols there that i never ever need, like ðæł¶ŧ←↓→øħħ̣ĸ . Are there essential Cyrillic characters?

No, altgr basically is not used while I still use it to have Russian over Ukrainian which has just few different letters іы їъ єэ ґ:-D

I could set up a lot of symbols though altgt but they won't mostly much English layout. But I don't need to because languages I program in are English based.

If I knew German, I would use exclusively German layout as apparently you do. But I need actually 4: En, Eo(that not much but still), Uk, Ru - those two every day. So I contracted it to 2: Latin/Cyrillic. :)

There's a world of difference between German layout (with only a few extra letters) and Cyrillic layout (which completely replaces Latin letters).

And yet there's a https://en.wikipedia.org/wiki/1C:Enterprise being a leading software suite in Russia. It's source code terrifies me every time.

Indeed. The one thing worse then programming with Excel formulas.

My rather simplistic view: as a native Portuguese speaker, I strongly refute the idea of programming languages (and even coding) in a language other than English. Besides the obvious reasons (globalized world, outsourcing, multinational corporations, etc), English is much less expressive than the Romance languages [1], which makes it a better formal language.

1 - http://en.wikipedia.org/wiki/Romance_languages

I think you'd have a very difficult time convincing a serious linguist that English is less expressive than the Romance languages.

Especially because when English is less expressive than a Romance language, we steal those words from the Romance language until we're at least as expressive.

But it`s much easier to create new words in english. Like transform a verb in a noum, or adjective, etc

I`m a native portuguese speaker, and think that romantic languages can be more "emotional" than english. But its amazing how a new idea can be expressed so easilly in English.

I'd contend the opposite, being fluent in German, English and having strong knowledge of Latin.

German and Latin can create new words or ideas much easier than English can. English creates them 'easier' by wholesale importing them. Take the concept of 'karma', there is no word for this in English. In German this word is schicksal. This concept doesn't exist in English at all other than the Hindi import.

English does like to import words wholesale, and I agree that German conjunctions and the fact you can often create new words by combining two different words (my favourite: "scheinheilig"="apparently+holy"=hypocrite) makes it easy to create new words in German (and other Germanic languages).

That said, your example is not a very good one. "Schicksal" can be translated as "fate", "fortune", "destiny" and "lot". Not just "karma".

I bet that'd be hard, ha!

As someone who is a native English speaker, but close to native level in Portuguese (lived in Brazil for 8 years), I'm curious what you mean by English being less expressive -- I've never heard that before.

Both languages have their expressive poets, authors, etc. It's easier to rhyme in Portuguese (it's almost like cheating, since the verbs all end in the same syllables), it's easier to modify the grammar classes of words in English (turning nouns into verbs and vice-versa), they both have rich vocabulary (although English seems to give more multiple meanings to individual words, and have more synonyms too). But in the end, I'd never call either of them more expressive than the other...

As a matter of fact, I know that English, as a natural language, is much more expressive in the sense that you can communicate many ideas using the same expressions.

Portuguese, for example, is much more expressive in the sense that you can communicate the same idea in many different ways, which may have different meanings.

There is also the contractions issue and the omission of the subject of an expression ("eu estava" == "eu tava" == "tava" == "I was"), which complicates it even more. And all of this is actually valid, depending on the linguist you "follow". ;)

Well, I did not make myself very clear, but this is what I wanted mean when I talked about expressiveness.

PS: I wanted to write something about linguistic relativity too, but I can't remember, lol.

How many cases has english? While there are some artifacts like "whom", "I" vs "me", about two.

It seems to me like using symbols instead of words makes the most sense. It wouldn't be particularly harder to use for English speakers, and it might actually make code easier to read, since there would be less cognitive load in distinguishing keywords from identifiers.

Yeah, the problems there are that the symbols don't have easy to remember names, and that they can't be typed in a normal environment. Multi-character ascii symbols would be a lot more accessible.

J seemed to be a good step in that direction.

Agreed. I'm also a native portuguese speaker. Another point is that you get used to English, and begins to "think" in it. Guess that, besides the whole globalized world thing, i would find very confuse a portuguese programming language.

When I write code, it's code. I don't read it like a human readable text. Because of that, I don't really care what it reads when I try to read it as a human readable text.

For example I don't even think about what it would mean in English if I write

     while(true){ i++; if(i>100) break; }
I'm curious if native English speakers look at code as real English text sometimes? It should be funny, because when I translate my code to my native language(word-by-word) it's just meaningless and funny.

For things like "while", "do", "return", "public", "private" I definitely think of it as "real English text".

For things like "for", "wend" (while-end), "class", "switch", "main", they're divorced enough from any real English meaning that I just think of them as arbitrary coding words.

So, it's kind of both, at least for me.

esac anyone? :P


Maybe it depends on the language? I could see something like this in python:

  if not callable(something): make_callable(something)
Which can (sortof) be read off in English.

I don't really understand what you mean with "real English text". I can't separate keywords from their true meaning, and it's much easier for me to understand the purpose of a keyword if its purpose matches its name.

Every statement has a meaning, and every block is a story. If it wasn't readable we could just as well use BANCstar [1].

(Btw, I absolutely hate that Microsoft translates Excel formulas (and so on) in localized versions. I simply cannot program in my native language: I basically translate back to English mentally.)

[1] https://news.ycombinator.com/item?id=6311717

An ambition of mine is a programming language where the program structure is decoupled from its visual/textual appearance (model-view separation). One nice result of this scheme is, for example, you could work on a program in English and another person could collaborate on the exact same source in French.

Although I suspect that if this did get written and translated around a lot, many non-English speakers would still pick the English surface syntax due to network effects...

(Edited for clarity)

AppleScript works like this. When you save a script it's compiled down to a system representation, and when you reopen it it's transformed back into the english script language. This means:

* Formatting isn't preserved and the file is retabbed each time you save it.

* Earlier versions had French and Japanese compilers/decompilers

* Each application can add AppleScript syntax (like VBA/OLE or whatever). If a script uses commands for an application you don't have installed, you see the raw FourCC codes in the code instead.

It's actually a quite interesting language. http://en.wikipedia.org/wiki/AppleScript

TI-Basic 68k (the one used on the TI-89, TI-92, and Voyage 200 series calculators) also does this, except formatting is preserved.

It also uses the same file type for tokenized and untokenized programs: untokenized programs are tokenized when run, and tokenized programs are untokenized when opened with the built-in text editor. Short of opening the file with some external program like a hex editor, there's no way to tell whether the program is tokenized or not. Which, of course, can cause problems if you send an untokenized program to a calculator set to a different language.

I believe that you described lisp :)

I get the joke :) Although, taken literally, this is not the case. The existence of a variable in Lisp is not decoupled from its human-string-name, as Lisp source is textual. If you were to rename a variable, anything that referred to it (binding sites) would be broken. The human names in Lisp source code are crucial to program execution, unlike, for example, compiled native binaries without dynamic linking.

Symbol macros.

How would that work for human-defined names? The second programmer would see a jumble of words in different languages (e.g., keywords in native language, variables and function names in the original). Even if they could be translated, that would still require the programmer to perfectly understand them in the first place, and then, why bother wasting the time to translate?

Frankly, as a non-native speaker, I absolutely dread localized source.

But I think there is only one programming environment that supports multiple natural languages. Lucio's (a friend of mine ) turtleacademy.com.

The turtle academy is a programming environment with lessons for teaching kids logo.

For professional programming, a non-english programming language is just a curiosity, non of them has ever gained global traction nor will probably gain.

For kids, for learning, supporting multiple languages is a must.

English: http://turtleacademy.com/lang/en

Russian: http://turtleacademy.com/lang/ru

Hebrew : http://turtleacademy.com/lang/he

Spanish: http://turtleacademy.com/lang/es

Chinese: http://turtleacademy.com/lang/zh

And we are looking for volunteers to translate to further languages.

I have a dual language database publishing system and I use English and German as a language source. While the obvious advantage of using english as lingua franca, I tend to use my mother tongue (German) system. It feels a bit more natural to me and it's really nice to use in presentations to non-technical people. It seems as if the barrier to use the language is much lower than it is when the programming language is English.

http://speedata.github.io/publisher/manual/index.html - Switch to the other language in the footer

I would add my friend's hpy - which ket you write in Hebrew (compiles into python) http://nirs.freeshell.org/hpy/

This is great. I am working on a simple Arabic (Actually Moroccan arabic) programming language to help popularize programming in some remote schools I work with. Mind helping me reach out to your friend about how this has been used and what challenges faced in general?

It's in the list. More Hebrew: T_PAAMAYIM_NEKUDOTAYIM

I think that English words tend to be very short; so in English one has to type less - having concise commands is an important feature for a programming language;

I don't know if it is an urban legend: I read that in English the length of a words is shorter than in Japanese; so in WW2 the English language was better for shouting out orders - less time for communication means more time for action;

So disappointed in no Varaq[1].

[1] http://freecode.com/projects/varaq

It's on Wikipedia; you can add it yourself.

Not anymore, since they require accounts now.

With most languages you can already use localised identifiers, even nowadays with UTF-8 for languages with non latin alphabet. And for the keywords, you can most often use cpp or another preprocessor to substitute them.

But the summum is reached with perligata.

I wonder how programming language based on fusional natural languages with cases could work.

Some ideas (warning - long and useless read):

- everything is an expression

- expressions have Cases. Default case (without postfixes) is Source Case

- cases are like roles for parts of expression in given complex expression context (allow us to change order of arguments in expressions without named parameters for example, and to overload functions/macros depending on Cases of the supplied arguments)

- you can define pre-, -post, and -in- fixes that work like macros/functions

- you can use pre/post fixes to tag expression with cases in the context of an expression

- you can use 1-arg functions/macros as pre and postfixes, or as regular functions

- you can use 2-arg functions/macros as infixes or as regular functions

- more than 2arg functions can only be used as regular functions

- you can overload functions/macros depending on case of args

- you can define your own cases


    % %1 %2 etc are unamed arguments in
            macro/function definitions,
            like in lambdas in clojure
    "-" is used in pre/in/post fixed macros
        definitions to mark where the base
        identifier goes

    %-! is 1arg macro that defines variable of name %. 



    %-a is postfix macro that tags variable %
        with Target case
    %-i tags with Helper case
    %-p tags with PositiveConditional case
    %-n tags with NegativeConditional case

    %1a=%2 is infix 2arg macro that assigns to
           variable %1 (must be in Target case)
           value %2 (in Source case).
Can be used as regular macro:

    = %1a %2

 or equivalently

    = %2 %1a
We can also overload = to be equality operator if both args are in Source case


 works with regular macro too:

    = %1 %2


    = %2 %1
We can overload binary operators with 3rd and 4th args in PositiveConditional and NegativeConditional cases and we don't need IFs :)

This also allows us to write if with only else clause in the form:

    > x 0 (OUTPUTa="X SMALLER THAN 0!!!")n

    %1-;-%2 is infix macro that evaluates in sequence.
            It's overloaded for all cases, especially
            for %1a-;-%2a

    # is python-style comment

 Example code to calculate real roots of quadratic equation:

   a! b! c! delta!           # definitions of variables
   = INPUT aa ba ca          # overloaded = macro with
                             # many targets (a b c in Target case)
                             # and one source
                             # to easily fetch many items from input
   deltaa=((b^2)-4*a*c)      # precedence will be a problem
                             # to define generally

    = delta 0                # overloaded = equality test with
                             # 3rd arg being positive conditional case
                             # and 4th being negative conditional case
        = OUTPUTa "x0=" (-b/(2*a)) # overloaded = macro with one target
                                   # and many sources - will join the sources
      (> delta 0
           = OUTPUTa "x0=" (-b-sqrt(delta))/(2*a);
           = OUTPUTa "x1=" (-b+sqrt(delta))/(2*a)

Love it.

I know a bit of Finnish, and I can imagine that a programming language built on the principles of Finnish grammar would be beautifully expressive as well as extremely terse.

For example, maybe '-ss' denotes 'class'. Functions and methods with '-ef' with an optional return type '-int', '-ing' (string), '-oat' (float). Start a loop iterator by adding '-oop' to an iterable? Modify conditionals change their behaviour, e.g. 'ifret' (if/return) etc

         ifret i == someValue
Looks strange, but I if you're used to the concept I wonder if it would be more efficient.

I don't understand the urge to create a new language to make a statement instead of trying to improve on what already exists.

Creating a language to make a statement is probably a bad idea, but there are plenty of other valid reasons to create something new:

• You want to learn how languages work.

• You have a problem which is difficult to solve in other languages but easy in yours.

• You want to combine features that have never been combined before, to see if their interaction creates something of value.

I've been thinking of making a scripting language in Korean alphabet Hangul. The properties of Hangul alphabet allows you to compress multiple consanants and vowels. for example: system.out.println("Hello"); would be in Korean 시.웃.플("안녕하세요"); the Korean words themselves have no coherent meaning but this is how you could distinguish from the programming language and semantic korean language.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact