
Non-English-based programming languages - steveklabnik
http://en.wikipedia.org/wiki/Non-English-based_programming_languages
======
lucasrp
Create a non-English based programming language is very different from
translating a english programming language. Different languages have different
ways to express time, state, etc. If one translates a english programming
language, it will be a translated English based programming language (not a
Non-English base...)

I`m a native portuguese speaker, for example. In portuguese, we "break" the
"to be" verb in two forms: The "ser" verb to denominate an immutable state
like "The sun is hot" and a "estar" verb to denominate a transitory state "It
is hot today".

My guess is that, if programming languages were made since the beginning with
a language like that, the coding world would be a more pleasant place today.
Things like differenciate mutable/immutable state would be much more natural.

So, it`s possible even to create a Non-English based programming language IN
English. Russian, japanese, etc. languages have their peculiarities, that
could shape coding very different than what it is today.

Ruby it`s a nice example of it. Just google about the subject, and found this
material ([http://blog.new-bamboo.co.uk/2010/12/17/learning-japanese-
th...](http://blog.new-bamboo.co.uk/2010/12/17/learning-japanese-th...))

Any rubyist-japanese-speaker who would like to give some thoughts on the
subject?

~~~
rett12
Interesting article, maybe a bit far-fetched towards the end. Correct link:

[http://blog.new-bamboo.co.uk/2010/12/17/learning-japanese-
th...](http://blog.new-bamboo.co.uk/2010/12/17/learning-japanese-the-rubyist-
way)

------
quarterto
I ended up on the (Chinese-language) page for 丙正正¹, a Chinese C++ variant.
There was a code example, which I ran through Google Translate to see what
would happen. The result is surprisingly readable (and obviously a C-family
language)²:

    
    
      Empty chess file :: set comments (character * s, integer n)
        {
             If (n> = maximum number of comments)
             For (; maximum number of annotations <= n; maximum number of comments + +)
                     Comment [Maximum number of annotations] = NONE;
             If (s == NULL or the string length (s) == 0)
                      Returns;
             If (annotation [n]! = NONE)
                      Delete Comment [n];
             Comment [n] = new character [string length (s) +1];
             String Copy (annotation [n], s);
        }
    
    

¹
[http://zh.wikipedia.org/wiki/%E4%B8%99%E6%AD%A3%E6%AD%A3](http://zh.wikipedia.org/wiki/%E4%B8%99%E6%AD%A3%E6%AD%A3)

² empty where one might expect void, character, for, string, &c.

~~~
angersock
Now that's just fucking horrifying. D:

I have a translation of "Design of the Unix Operating System" in Chinese, and
take great comfort in the fact that I can still get the gist of all the source
code listings--even despite the comments in Chinese.

I admire the simplicity of the grammar in Chinese (from the year I took of it
in college), but honestly I find logographic languages are kind of gross.

EDIT:

Fine, fine, I admit it: a language with millenia of cruft is totally
reasonable to use as the way of persisting the cruftiest programming language
in the world.

Indeed, the same years of hard study that are required to write and read
Chinese literately should be added onto the same years of study required to
write and read C++ reliably.

This is such a comically bad and obtuse idea I think we should propose it as
the next draft standard.

EDIT2:

Look, explain your downvotes (in the language of your choice!). My opinion is
simply that alphabetic languages (here exemplified by English) are superior to
logographic languages--mostly because they require knowing fewer characters.

I may be grossly misunderstanding Chinese here; as claimed, my schooling in it
is limited.

~~~
ehal256
Chinese characters (and similar writing systems) do have certain advantages
versus alphabetic systems though. The smallest unit of writing has more
information embedded in it. You've got a decent chance of guessing the meaning
of a compound word if you know some of the characters in it, though not
necessarily the pronunciation. With alphabets, it's reversed, if you know all
of the pieces of a word, pronunciation is usually simple, but meaning not
necessarily so (especially in English). As a result, the written language is
more dense, and can actually be read more quickly.

Finally, with an appropriate method, it doesn't have to take years to learn
enough characters to write and read literately, I can read Japanese at a high
school level, and I've been at it for a year and a half.

~~~
angersock
So, Japanese is a uniquely bad example here, right?

As I understand it, there are three alphabets: kanji (lots of Chinese
characters), katakana, and hiragana. The latter two are used to spell out the
syllables of words, in some sense acting like an alphabetic language. Kanji
seems to have a thousand or two characters in use, whereas katakana and
hiragana have around fifty.

Beyond looking up stroke numbers and radicals, I found dictionary usage for
Chinese characters somewhat hard--English lets you basically do a very easy
binary search on a word (start at most significant character, find section,
move to next most significant character, etc.).

~~~
ehal256
Bad in what way exactly? Hiragana and Katakana are easier to pick up, because
there is more burden placed in learning each individual word. With the
baseline investment to learn the Kanji in place, each new word is just a
composition of characters and their associated ideas that you already know.

You can do the same thing with dictionaries in Japanese or Chinese, though it
works best if you use a dictionary that lets you handwrite in the characters
(it helps a lot if you learn your radicals and stroke orders well, so that you
can easily write characters you don't know).

~~~
angersock
Oh, so, my point was that using Japanese was a bad example, precisely because
two of the three alphabets are used nonlogographically. It is also my
understanding that new words and loanwords are spelled out phonetically in
those alphabets, instead of grafting some new character into the kanji.

 _" if you use a dictionary that lets you handwrite in the character"_

I'm unfamiliar with any paper dictionary with that capability.

~~~
ehal256
I mostly use the dictionary on my phone. I can get by with a paper one, but
it's a bit slower.

~~~
angersock
Technology is very handy nowadays. :)

------
rett12
> Perl – While Perl's keywords and function names are generally in English, it
> allows modification of its parser to modify the input language, such as in
> Damian Conway's Lingua::Romana::Perligata module, which allows programs to
> be written in Latin or his Lingua::tlhInganHol::yIghun Perl language in
> Klingon. They do not just change the keywords but also the grammar to match
> the language. That's impressive

~~~
eCa
As an example just how far they go, the statement:

clavis hashus nominamentum da.

Is equivalent to [1]:

@keys = keys %hash;

And in Klingon Perl [2]:

De'pu'wI' bIH yInob!

Is equivalent to:

%data = @_;

[1] [https://metacpan.org/module/DCONWAY/Lingua-Romana-
Perligata-...](https://metacpan.org/module/DCONWAY/Lingua-Romana-
Perligata-0.50/lib/Lingua/Romana/Perligata.pm)

[2]
[https://metacpan.org/module/Lingua::tlhInganHol::yIghun](https://metacpan.org/module/Lingua::tlhInganHol::yIghun)

------
lcedp
I think it's an awful idea. Let's take Cyrillic languages, Ukrainian/Russian
as an example.

* Alphabet has 33+ letters, so there is no space left for programmer's favorite symbols on layout: @#$^&{}[]|~`<>. Yep.

* In Math variables' names traditionally are Latin/Greek letters, Cyrillic is used only when teaching kids. Again: switching layout all the time? No, thanks.

* Words are simply much longer. And I mean much. In English even short words tend to become shorter (variable -> var). In Russian there is simply no such pattern exists. `var-set` -> `установить-переменную`. `var-get` - `получить-переменную`.

* Words are stable in English morphologically. e.g. `last-msg-delivered` - `last-msgS-delivered` -> `последнЕЕ-сообщениЕ-доставленО` - `последнИЕ-сообщениЯ-доставленЫ` - if you change gender or from singular to plural etc you should change few words in a row.

Seriously, it seems to me like trying to make things more complicated.

~~~
Avshalom
I have 104 keys on my keyboard and 4 of them are modifiers. The English
alphabet isn't the Latin/Greek alphabet either. And I am absolutely astounded
that Russians apparently have never invented the concept of an abbreviation or
contraction.

One does wonder how soviets ever managed to program their computers.

~~~
archivator
As an aside, if you (the reader, not Avshalom) haven't read much Russian
literature, Russian is the language where you take the first syllable of every
word in a long phrase and you make a new word:

Министерство здравоохранения => Минздрав (Ministry of Health => Minheal)

Министерство юстиции => Минюст (Ministry of Justice => Minjust)

etc, etc. As a foreigner, I found that quite peculiar though it doesn't seem
to be as common now as it was before but it's still noticeable in everyday
life.

~~~
mistercow
Ah, so that's where that came from in 1984. It all makes sense.

~~~
lcedp
Yes, it is indeed inspired by that.

------
guhemama2
My rather simplistic view: as a native Portuguese speaker, I strongly refute
the idea of programming languages (and even coding) in a language other than
English. Besides the obvious reasons (globalized world, outsourcing,
multinational corporations, etc), English is much less expressive than the
Romance languages [1], which makes it a better formal language.

1 -
[http://en.wikipedia.org/wiki/Romance_languages](http://en.wikipedia.org/wiki/Romance_languages)

~~~
maw
I think you'd have a very difficult time convincing a serious linguist that
English is less expressive than the Romance languages.

~~~
reginaldjcooper
Especially because when English is less expressive than a Romance language, we
steal those words from the Romance language until we're at least as
expressive.

~~~
lucasrp
But it`s much easier to create new words in english. Like transform a verb in
a noum, or adjective, etc

I`m a native portuguese speaker, and think that romantic languages can be more
"emotional" than english. But its amazing how a new idea can be expressed so
easilly in English.

~~~
BuckRogers
I'd contend the opposite, being fluent in German, English and having strong
knowledge of Latin.

German and Latin can create new words or ideas much easier than English can.
English creates them 'easier' by wholesale importing them. Take the concept of
'karma', there is no word for this in English. In German this word is
schicksal. This concept doesn't exist in English at all other than the Hindi
import.

~~~
jcbrand
English does like to import words wholesale, and I agree that German
conjunctions and the fact you can often create new words by combining two
different words (my favourite: "scheinheilig"="apparently+holy"=hypocrite)
makes it easy to create new words in German (and other Germanic languages).

That said, your example is not a very good one. "Schicksal" can be translated
as "fate", "fortune", "destiny" and "lot". Not just "karma".

------
msoad
When I write code, it's code. I don't read it like a human readable text.
Because of that, I don't really care what it reads when I try to read it as a
human readable text.

For example I don't even think about what it would mean in English if I write

    
    
         while(true){ i++; if(i>100) break; }
    

I'm curious if native English speakers look at code as real English text
sometimes? It should be funny, because when I translate my code to my native
language(word-by-word) it's just meaningless and funny.

~~~
crazygringo
For things like "while", "do", "return", "public", "private" I definitely
think of it as "real English text".

For things like "for", "wend" (while-end), "class", "switch", "main", they're
divorced enough from any real English meaning that I just think of them as
arbitrary coding words.

So, it's kind of both, at least for me.

~~~
angersock
esac anyone? :P

~~~
icebraining
fi

------
pshc
An ambition of mine is a programming language where the program structure is
decoupled from its visual/textual appearance (model-view separation). One nice
result of this scheme is, for example, you could work on a program in English
and another person could collaborate on the exact same source in French.

Although I suspect that if this did get written and translated around a lot,
many non-English speakers would still pick the English surface syntax due to
network effects...

(Edited for clarity)

~~~
kalleboo
AppleScript works like this. When you save a script it's compiled down to a
system representation, and when you reopen it it's transformed back into the
english script language. This means:

* Formatting isn't preserved and the file is retabbed each time you save it.

* Earlier versions had French and Japanese compilers/decompilers

* Each application can add AppleScript syntax (like VBA/OLE or whatever). If a script uses commands for an application you don't have installed, you see the raw FourCC codes in the code instead.

It's actually a quite interesting language.
[http://en.wikipedia.org/wiki/AppleScript](http://en.wikipedia.org/wiki/AppleScript)

~~~
Zarel
TI-Basic 68k (the one used on the TI-89, TI-92, and Voyage 200 series
calculators) also does this, except formatting is preserved.

It also uses the same file type for tokenized and untokenized programs:
untokenized programs are tokenized when run, and tokenized programs are
untokenized when opened with the built-in text editor. Short of opening the
file with some external program like a hex editor, there's no way to tell
whether the program is tokenized or not. Which, of course, can cause problems
if you send an untokenized program to a calculator set to a different
language.

------
highwise
But I think there is only one programming environment that supports multiple
natural languages. Lucio's (a friend of mine ) turtleacademy.com.

The turtle academy is a programming environment with lessons for teaching kids
logo.

For professional programming, a non-english programming language is just a
curiosity, non of them has ever gained global traction nor will probably gain.

For kids, for learning, supporting multiple languages is a must.

English: [http://turtleacademy.com/lang/en](http://turtleacademy.com/lang/en)

Russian: [http://turtleacademy.com/lang/ru](http://turtleacademy.com/lang/ru)

Hebrew : [http://turtleacademy.com/lang/he](http://turtleacademy.com/lang/he)

Spanish: [http://turtleacademy.com/lang/es](http://turtleacademy.com/lang/es)

Chinese: [http://turtleacademy.com/lang/zh](http://turtleacademy.com/lang/zh)

And we are looking for volunteers to translate to further languages.

------
patrickg
I have a dual language database publishing system and I use English and German
as a language source. While the obvious advantage of using english as lingua
franca, I tend to use my mother tongue (German) system. It feels a bit more
natural to me and it's really nice to use in presentations to non-technical
people. It seems as if the barrier to use the language is much lower than it
is when the programming language is English.

[http://speedata.github.io/publisher/manual/index.html](http://speedata.github.io/publisher/manual/index.html)
\- Switch to the other language in the footer

------
tzury
I would add my friend's hpy - which ket you write in Hebrew (compiles into
python) [http://nirs.freeshell.org/hpy/](http://nirs.freeshell.org/hpy/)

~~~
why-el
This is great. I am working on a simple Arabic (Actually Moroccan arabic)
programming language to help popularize programming in some remote schools I
work with. Mind helping me reach out to your friend about how this has been
used and what challenges faced in general?

------
MichaelMoser123
I think that English words tend to be very short; so in English one has to
type less - having concise commands is an important feature for a programming
language;

I don't know if it is an urban legend: I read that in English the length of a
words is shorter than in Japanese; so in WW2 the English language was better
for shouting out orders - less time for communication means more time for
action;

------
merlincorey
So disappointed in no Varaq[1].

[1] [http://freecode.com/projects/varaq](http://freecode.com/projects/varaq)

~~~
_kst_
It's on Wikipedia; you can add it yourself.

~~~
merlincorey
Not anymore, since they require accounts now.

------
informatimago
With most languages you can already use localised identifiers, even nowadays
with UTF-8 for languages with non latin alphabet. And for the keywords, you
can most often use cpp or another preprocessor to substitute them.

But the summum is reached with perligata.

------
ajuc
I wonder how programming language based on fusional natural languages with
cases could work.

Some ideas (warning - long and useless read):

\- everything is an expression

\- expressions have Cases. Default case (without postfixes) is Source Case

\- cases are like roles for parts of expression in given complex expression
context (allow us to change order of arguments in expressions without named
parameters for example, and to overload functions/macros depending on Cases of
the supplied arguments)

\- you can define pre-, -post, and -in- fixes that work like macros/functions

\- you can use pre/post fixes to tag expression with cases in the context of
an expression

\- you can use 1-arg functions/macros as pre and postfixes, or as regular
functions

\- you can use 2-arg functions/macros as infixes or as regular functions

\- more than 2arg functions can only be used as regular functions

\- you can overload functions/macros depending on case of args

\- you can define your own cases

Example:

    
    
        % %1 %2 etc are unamed arguments in
                macro/function definitions,
                like in lambdas in clojure
        
        "-" is used in pre/in/post fixed macros
            definitions to mark where the base
            identifier goes
    
        %-! is 1arg macro that defines variable of name %. 
    
    

Example:

    
    
        a!
        foobar!
    
        %-a is postfix macro that tags variable %
            with Target case
        %-i tags with Helper case
        %-p tags with PositiveConditional case
        %-n tags with NegativeConditional case
    
        %1a=%2 is infix 2arg macro that assigns to
               variable %1 (must be in Target case)
               value %2 (in Source case).
    

Can be used as regular macro:

    
    
        = %1a %2
    
     or equivalently
    
        = %2 %1a
    

We can also overload = to be equality operator if both args are in Source case

    
    
        %1=%2
    
     works with regular macro too:
    
        = %1 %2
    
     or
    
        = %2 %1
    

We can overload binary operators with 3rd and 4th args in PositiveConditional
and NegativeConditional cases and we don't need IFs :)

This also allows us to write if with only else clause in the form:

    
    
        > x 0 (OUTPUTa="X SMALLER THAN 0!!!")n
    
        %1-;-%2 is infix macro that evaluates in sequence.
                It's overloaded for all cases, especially
                for %1a-;-%2a
    
        # is python-style comment
    
     Example code to calculate real roots of quadratic equation:
    
       a! b! c! delta!           # definitions of variables
       = INPUT aa ba ca          # overloaded = macro with
                                 # many targets (a b c in Target case)
                                 # and one source
                                 # to easily fetch many items from input
       deltaa=((b^2)-4*a*c)      # precedence will be a problem
                                 # to define generally
    
        = delta 0                # overloaded = equality test with
                                 # 3rd arg being positive conditional case
                                 # and 4th being negative conditional case
          (
            = OUTPUTa "x0=" (-b/(2*a)) # overloaded = macro with one target
                                       # and many sources - will join the sources
          )p                           
          (> delta 0
             (
               = OUTPUTa "x0=" (-b-sqrt(delta))/(2*a);
               = OUTPUTa "x1=" (-b+sqrt(delta))/(2*a)
             )p
          )n

~~~
polemic
Love it.

I know a bit of Finnish, and I can imagine that a programming language built
on the principles of Finnish grammar would be beautifully expressive as well
as extremely terse.

For example, maybe '-ss' denotes 'class'. Functions and methods with '-ef'
with an optional return type '-int', '-ing' (string), '-oat' (float). Start a
loop iterator by adding '-oop' to an iterable? Modify conditionals change
their behaviour, e.g. 'ifret' (if/return) etc

    
    
       Fooss:
         barefint:
           thingoop(i)
             ifret i == someValue
    

Looks strange, but I if you're used to the concept I wonder if it would be
more efficient.

------
Sagat
I don't understand the urge to create a new language to make a statement
instead of trying to improve on what already exists.

~~~
evincarofautumn
Creating a language to make a statement is probably a bad idea, but there are
plenty of other valid reasons to create something new:

• You want to learn how languages work.

• You have a problem which is difficult to solve in other languages but easy
in yours.

• You want to combine features that have never been combined before, to see if
their interaction creates something of value.

------
volokoumphetico
I've been thinking of making a scripting language in Korean alphabet Hangul.
The properties of Hangul alphabet allows you to compress multiple consanants
and vowels. for example: system.out.println("Hello"); would be in Korean
시.웃.플("안녕하세요"); the Korean words themselves have no coherent meaning but this
is how you could distinguish from the programming language and semantic korean
language.

