Hacker News new | past | comments | ask | show | jobs | submit login
Non-English-based programming languages (wikipedia.org)
262 points by ducaale on June 30, 2019 | hide | past | favorite | 222 comments

Seeing a programming language in your native language after working with English ones, you discover that it looks and sounds pretty ridiculous. My belief is that the native language is ingrained on a low level in the brain, so it's decoded ‘in the hardware’ and directly invokes meanings and associations, whereas a second language goes through at least some level of conscious processing. The very same way, I'm physically unable to shut off the brain's frantic search for meaning when I hear some inane song lyrics or advertisement in my native language—which causes suffering whenever I get near television. Meanwhile, I'm completely comfortable reading or hearing total drivel in English until I consciously try to interpret it. Similarly, I can't skim in English as fast as I do in my language.

This all brings me to the thought that native-English speakers probably have a rather different experience with programming languages, since the keywords must constantly invoke deeply ingrained associations but that's in turn modified by regularly seeing and thinking of them in the programming context.


French here, that's my experience as well. I want to emphasize that the huge majority of the non-english native programmers I know are pretty opposed to translating programming languages.

However, my experience teaching Japaneses has been that what is crucial if you want to make a programming language accessible to different nationalities is to translate error messages correctly. Learning IF, FOR, RETURN is not hard. There are 20 words max to learn. You could learn that many kanjis in one day if you needed. However, when you are not good at English, understanding "Access error: Array index is out of bounds" requires another level of understanding and vocabulary.

Cannot agree on error messages, had opposite experience with them. I always require all team members to install English versions of runtimes because it is easy to search non-trivial errors on the internet. Many translations are automatic or done by not very technical people and they sound weird in local language, e.g. many terms in Russian, even such simple as "heap allocation", are really weird. An English error that you see for the first time is usually solved via quick search with the first result usually linking to StackOverflow with the correct fix, but with the local language it's often difficult to understand where to start. (But basic English knowledge is assumed)

Maybe error messages should just point to pre-prepared StackOverflow questions*

(* which would probably get closed by s.o. mods)

> But basic English knowledge is assumed

Let's be clear: it is my opinion that today, basic english is a pre-requisite to any programmer.

What I am saying is that people making "native" programming language seem to want to change that and that they are misguided if they think the language is the crucial part. The crucial part are the error messages, which indeed are often very poorly translated.

Yes, heap/stack would look weird in french as well, but as weird as they look in english the first time you encounter them. However if you translate the error message correctly, a novice programmer will understand that it is a memory problem and may try to hunt the error more efficiently.

I learnt my basic English while programming. Definitely not a prerequisite !

I guess this is where error codes come in handy.

Not actually unless it's a native OS exception with a well-defined hex code. I was talking in .NET context, and when you get e.g. some SocketException related to setting up UDP multicast parameters only textual description helps for highly relevant search with double quotes. Often there are just no results at all in local language for highly specific complex conditions.

I speak English natively and Japanese fluently, and in my experience error messages (and other messages) that were translated into Japanese are often of rather questionable quality.

But you're right, people who don't speak English very well (and aren't too experienced) often don't even notice non-fatal errors.

I still think that showing translated error messages isn't very good. 1) Messages are often translated poorly (I won't go into detail here) and sometimes don't even make sense 2) Googling for a translated message won't turn up (m)any results 3) Poor translations might eventually be fixed, but IMO error messages should be considered part of the API

I think the best recourse it to display the original message _and_ a translated message, or _maybe_ an (international) error code and a translated message.

The great thing about the Japanese language is that with kanji you can make messages extremely concise, e.g.:

"Access error: Array index is out of bounds" --> "エラー: 境界外読取/書込"

which would make it acceptable IMO to display messages side-by-side. (It's unfortunate that "error" became "エラー" in Japanese)

I don't quite get the reasoning where you talk about the quality of Japanese error messages, then talk about making it even harder to decipher. Sure you can compress meaning into short strings of kanji, but readability definitely takes a hit when you do that, even to a native speaker. It's not like we're displaying this on paper; there could be better ways to show errors to non-English speakers, eg. more descriptive messages like "エラー: 配列の範囲外へアクセスしています" can be shown by default and perhaps the original error message can show up when you mouseover on it or whatever.

The "searchability on Stack Overflow" argument may vary depending on language, by the way - Japanese is one of the languages where technical information written in the language is fairly abundant (considering JP natives tend to avoid writing in English themselves and rather stick to writing Japanese), but this may not be true for other languages. Another factor is how likely the speaker of the language is to be English-bilingual.

> Japanese is one of the languages where technical information written in the language is fairly abundant

At least with respect to error messages, that has not been my experience. I've almost never had success googling Japanese error messages. That (and the insanity with remembering how to get java to display English error messages, since it doesn't respect LANG/LC_ALL) is 100% the reason why I gave up on using Japanese on my development system.

No way. I hate localized error messages. At least in modern times as software vendors stopped putting error numbers into the message.

It's very hard to find solutions online when you only get a localized error message.

That's a great point. And that combined with the cryptic nature of many error messages could easily make debugging a nightmare for people who don't speak English...

> Learning IF, FOR, RETURN is not hard. There are 20 words max to learn.

That was true in the old days with C or Pascal, but it's not much the case any more. Modern C++ has about 100 keywords [1]. Even Swift (Apple: "easy to learn") has about 100 reserved words.

[1]: https://en.cppreference.com/w/cpp/keyword [2]: https://www.quora.com/What-are-the-reserved-Swift-keywords

Programming uses the same words as regular English, but it is a completely different linguistic register, so there is never any conflict. It’s like working in any field with its own jargon — you use the meaning that is relevant to the context. In linguistics this is called the Relevance theory of meaning and interpretion, which explains why we do this contextual interpretation easily and mostly without thinking about it.

I am a native English speaker and former ESL teacher with a masters degree in applied linguistics, as well as veteran developer.

> Programming uses the same words as regular English, but it is a completely different linguistic register, so there is never any conflict.

Moreover, when you cross over those jargon terms into their regular meaning, hilarity ensues.

For example, you can easily prove that oaks are infinitely tall in winter. Start by observing that in winter, oaks don't have leaves. Then consider the graph-theory definition of "tree" and "leaf".

There's the old joke about the programmer who was told to go to the store for bread, and while they were there, to get eggs. Still at the store, getting eggs.

Alternatively, oaks do not have any nodes at all in winter

An oak without nodes? Pretty sure that's just a seed.

It depends, do they have any branches?

> Programming uses the same words as regular English, but it is a completely different linguistic register, so there is never any conflict.

That's debatable. It's impossible for me to say "if" in any context without thinking about programming now. I can't help it.

That's a good point: "never any conflict" describes the final state rather than the growth process.

When I was a younger programmer (as in, during the first few years), I discovered that assuming boolean / formal logic meanings for {if, or, else, and} in communication with my spouse (art + sales, and very smart, though not a programmer or trained in symbolic logic) resulted in, shall we say ... miscommunication. So I had some unintended help in building the separate registers of meaning / communication. :-)

Building separate meaning registers for communication does take some time.

By the way... has or in English a different meaning than in code?

I cringe every time I hear the and/or thing translated to Spanish, usually several times in a row.

Not sure if it's the usual mistranslation or it's also idiotic in the original text.

Yes, "or" in English often is more similar to xor in code. For example if someone says "I want pizza or a burger", that person probably does not want both pizza and a burger for the same meal, whereas in programming an equivalent expression would be satisfied if both are true simultaneously.

But "or" in English doesn't always mean xor in code. Sometimes it is the same as or in code.

English "or" is ambiguous between inclusive and exclusive meanings. Most of the time, the intended meaning is inferred based on context, but one can make it explicit by using phrases such as "but not both", "exactly one of", etc.

There exist languages which lack this ambiguity. Latin has two different words for "or" – "vel" (inclusive) and "aut" (exclusive).

There's another way that or_English is different, as illustrated by this frustrating "joke":

1: do you think it's A or do you think it's not A? 2: yes (assigning a truth value to the or expression rather than indicating which or operand is true.)

It depends on the intonation: / = rising, \ = falling

"Do you want /pizza or a \burger\?" = XOR

"Do you want /pizza or a /burger/? = OR (as in programming)

It can make a good (though well-worn) joke to answer the XOR version with "Yes."

> has or in English a different meaning than in code?

“Or” in English can mean either exclusive-or or inclusive-or (usually it means the latter), and/or is a means to specify inclusive-or without ambiguity (the parallel construct for exclusive-or is “X or Y, but not both”) used mostly in contexts where misconstruction can be exotics to the drafter of the text, such as contracts and formal policy documents.

In code, the meaning is already unambiguous; in all languages I am aware, “or” or the symbol referred to in speech with that name (e.g., “||”) is strictly inclusive, and do there is an exclusive-or operator, it is different.

In most programming languages the boolean operation 'or' is actually 'xor'.

But in type system the disjunction concept represents 'either', the concepts are more like how people refer to 'or'.

> In most programming languages the boolean operation 'or' is actually 'xor'.

I can't think of any where that's true; logical “and” and “or” are usually exactly what they claim to be, and many languages implement both as short-circuiting operators, whereas XOR can't short circuit because you can never know the result without considering both arguments.

The old classic:

Programmer's spouse: Honey, go to the shops and buy 1 bottle of milk, if they have eggs buy 6.

Programmer brings home 6 bottles of milk.

> Programming uses the same words as regular English

Let us be clear that this is US English and not British, but I agree with your sentiment: that the keywords are more a linguistic jargon than anything.

Although the main topic here is about other languages, there are also nuances within the two main streams of English that it's also a little wierd sometimes.

For me, I spell color with a U: ie. colour, but when coding, it's always been the US variant. I've typed this word so much that when I need to type it in my variant, that colour just looks wrong, even though it's the correct spelling in (my area).

Occasionally, when skimming British code, I've come across colour variables and it looks wrong. Similarly, I've looked at French code and there are similar strange things that leak out. The same for Russian and Japanese code I've encountered, eg:

> function <insert_diacritic_or_kanji_here>(char p)...*

The purist in me really wants to state: "everyone should type US English", but the more accepting part of me concedes that maybe keywords should be internationalised (/internationalized) too. Maybe a preprocessor system would suffice (eg somecode.c.jp -> somecode.c), but as you state: it doesn't seem right because of the 'jargon' nature of it.

The good news is that machine code and punch cards have no language, so maybe we could revert to using that and then we can neatly skip over this issue entirely ;-)

I think what this question exposes is that the primary consumers of code are humans. Code is only incidentally written for computers.

So really, this is exactly the same problem human language exists to solve. There's no getting away from it.

> The good news is that machine code and punch cards have no language

Aren't most instruction names are English abbreviations?

Maybe we can internationalize those too. Just to be consistent of course.

> Aren't most instruction names are English abbreviations?

I think you're thinking of boring old assembly...? eg (a move instruction):

> MOV EAX,15

Yes you're right about the abbreviated english, but that's why I didn't suggest it as an i18n solution.

yeah, indeed i was

Yep. Virtually no word has a concrete meaning without a context. For example the word “guys” which may mean a group of males or just a generic collection of people irrespective of gender depending on the context in which it is used.

It can also mean ropes that are anchoring an object. "When camping, I enjoy pegging the guys" for example.

"When camping, I enjoy pegging the guys" can also have a wildly different (sexual) meaning.

Our friend @inopinatus was certainly making that joke.

Not sure what you mean. In this example, I’m just pitching a tent.

During nice weather, I usually like to open the fly.

While some things are arguably vaguely close in meaning, e.g. `loop`, there are plenty of things which are essentially meaningless (or only coincidentally are words in English) like `car`, `cdr`, `cons`.

I remember the english associations of programming words being pretty confusing, to be honest. Generic, for instance, means a standard or non-descript object. Abstract, for another, means something without concrete existence. Static means something that doesn't move, or electrostatic charge. Void is a pretty archaic/unusual word for a space. All these things are a little bit like how they are used in programming languages, but I imagined it would have been easier for me if I'd started out knowing I don't know, rather than trying to figure out how stillness, motion, concrete and abstract, relate to functions.

As english, I think programming language mostly comes somewhere between willful butchery and deep elegance. Some words are like a weird riff off newspeak (grep, troff, etc), some words are obvious cultural artefacts of people trying to show how pragmatic and unpretentious they are (Factory, Object, Bash, etc), while others are trying to show some kind of inculcation, like maths terms, or stuff from electronics, but all of it has this great feeling of cultural-technical history. Some are these weird mysteries, like, why did early teletype use allcaps? Isn't lower case more legible? Did the entire early history of programming get written in shouts because the earliest users were military guys that liked shouting?

These are all somewhat sloppy versions of the definitions of these words in common use.

“Generic”, “general”, “generalize” and “specific”, “special”, “specify”, etc. all come from the relation between “species” and “genus”, where genus is a larger category containing a species. The use of these in a programming context is not too far removed from the same usage in non-technical conversation or in (non-biology) technical contexts like philosophy or mathematics.

Similarly for “abstract” (meaning idealized or separated from particular cases) and “concrete”, which have been used in logic/philosophy for a long time.

Static means unchanging.

Void means “empty”, or in a legal context invalid.

> why did early teletype use allcaps

Because it replaced humans listening to Morse code, which has no lower case. Also, the first primitive keyboards from the mid 19th century used something like piano keys, which take up a ton of space, and even still up through the 1960s data transmission was expensive so people wanted to save every possible bit.

You can read about https://en.wikipedia.org/wiki/Teleprinter, https://en.wikipedia.org/wiki/Telex, https://en.wikipedia.org/wiki/Teletype_Model_33, etc.

> Morse code, which has no lower case.

Sure, but Morse code doesn't have upper case either. It doesn't have case! It just has one set of letters and numbers and punctuation marks, it doesn't specify what case the letters should be represented in when they are not written in Morse code.

In my own case (pun intended) I used to copy Morse code by writing it down in my own weird mix of lowercase letters and semi-cursive writing.

I’m not sure if the early TTY codes, punch card codes, etc. should be considered to have a case either.

It was just the convention to transcribe them using all upper case.

>which has no lower case

This is exactly what bothers me. Why no lower case? Why not no upper case? You'd need the same number of keys, and it would be more legible. It would use the same amount of space.

The explanation I've always heard was that writing "god" with a lowercase G would have been unacceptable in the 1850s. To a lesser extent, writing personal names without a capital is still seen as a bit odd and potentially an insult.

All caps means no ascenders or descenders, so actually less vertical space per line.

This is a good point, actually. I'm still not sure it works, though - at the lower bounds of visual resolution, are capitals easier to distinguish than cursive letters? Because otherwise, any size capital can be compared to an equivalent box cursive, and I think the capital would be less legible. Perhaps the typically smaller eyes of cursive letters would swing the pendulum the other way, with smeary ink? I think with simply low resolution, cursive would always win out.

Take it up with Émile Baudot? https://en.wikipedia.org/wiki/Baudot_code

I feel when something becomes so ubiquitous, even though it's obviously inferior, it takes a bit more push than one patent - especially if the patent is mainly about encoding, and barely incidentally about typography. And I don't think it's the case that nobody knew capitals were hard on the eyes. The whole reason why lower case letters exist is because roman allcapswithnospaces are horrible.

The people doing professional book design or handwriting long letters and the people designing systems for transmitting occasional short messages long distances had different priorities.

A telegram is a short enough message that being hard to read isn’t really a bottleneck. If it costs a day’s wages to send a couple sentences, the recipient is going to be able to spend a couple minutes on figuring out what it says. People weren’t sending novels around by telegraph.

> And I don't think it's the case that nobody knew capitals were hard on the eyes. The whole reason why lower case letters exist is because roman allcapswithnospaces are horrible.

As I responded to you elsewhere, the idea that capitals are hard on the eyes is just a myth. All caps with no spaces is bad, but that's because of the missing spaces.

Capital letters didn't mean shouting until after the internet - or at least texting, BBSs and such - became popular.

I think the same for programming languages. For example, all old LISP programs were capitalized. Nowadays we call it lisp and everything is primarily lowercase (Not Even CamelCase). It's pretty rare to see capitalized lisp code these days.

Here's a bit of parody code I wrote on the subject. https://github.com/ksaj/Capitalize.Lisp

(protip: It doesn't do anything the comments say it does, even though the results appear to.)

Older versions of FORTRAN where upper case though you could use Mixed case for Hollerith statements (used for Mixed case output)

It was considered a bit flash to use this but for interactive programs input prompts looked a lot nicer.

I used to program in assembler. You can't imagine the glee that erupted within me the day I discovered you didn't have to write all caps. It changed my coding aesthetics immediately.

Nowadays I only use ALLCAPS to visually mark code that I want to refactor, or where later attention is needed, such as for code that might open security vulnerabilities if handled incorrectly.

This article has examples of SHOUTING going back to the 1800s


> Isn't lower case more legible?

No, it isn't.

Lower-case words have a wider variety of shape, as so many letters have markings extending above and below the horizontal lines that bound the letter o. UPPER-CASE WORDS ARE BASICALLY ALL RECTANGLES OF VARYING WIDTH AND FIXED HEIGHT. BUT THEY USE MUCH LESS WHITESPACE, WHICH IS WHAT LOWER-CASE WORDS USE TO CARVE OUT SHAPE.

I believe the result is that lower-case words are easier to read at sufficiently close distance but legibility drops off sooner as distance increases. I also believe that while upper-case words may be less legible, individual letters are more legible as upper-case, and therefore upper-case words may be easier for very inexperienced readers who haven’t learned words by shape and still piece them together by letter.

Word shape is not a well-supported theory of reading. Very skilled and practiced readers are still going letter by letter.

"The result" is that if you pull people off the street and ask them to read capitalized or lower-case text, they're slower at reading the capitalized text. People like pasabagi want to leap to the conclusion that that means reading capitalized text is harder. That conclusion is unjustified; the rest of the result is that the difference in reading speed disappears after a small amount of practice. Lowercase text is more common. But obviously that can't justify the choice of a writing system; any writing system will be common if it's in common use.

I second the neighbor's request for references, since your claim goes against the accepted ‘wisdom.’

Note also that your claims don't actually refute pasabagi's original conclusion. If lowercase text is easier to read because of its ubiquity, this still means we should use lowercase text—the same way that we use the right-hand rule for screwing things in and out. Books, and lowercase, were around longer than telegraph.

> I second the neighbor's request for references


> since your claim goes against the accepted ‘wisdom.’

This is overly generous as a description of a collection of myths that have been known false for decades.

> Note also that your claims don't actually refute pasabagi's original conclusion. If lowercase text is easier to read because of its ubiquity, this still means we should use lowercase text

No, it doesn't, because if we ignore your advice and use capitalized text, capitalized text will be common enough that the advantage of lowercase text disappears.

> if we ignore your advice and use capitalized text, capitalized text will be common enough that the advantage of lowercase text disappears

I see you're talking about some alternative universe where you convince significant portion of publishers (if not a majority) to switch to uppercase. The practical question is, did teletype or early programming languages or road signs flip us over to that universe? Doesn't seem so. In the world where I am, lowercase text is more legible because it's ubiquitous.

If we ignore your advice and use capitalized text only for programming in, while books are still published in mostly lowercase, capitals will be so common for programmers that lowercase text will have no advantage over capitalized text.

The styles don't compete with each other for space in your mind. It's just a question of whether you're used to them.

Your ideas seem a bit counter-intuitive, so I'll assume they're coming from some literature, and I guess I can't really comment on literature I haven't read. All the stuff I've found (partially what motivated my interest in the first place) were studies showing the opposite, written in the days when ALLCAPS was the normal form of written communication.

Note: I'm not a native speaker.

English doesn't have genders and cases [1] so grammatical constructs such as if/then/else, case of, unless etc. suffer much less than when you have gender and case. Oh, god, and plural forms of those.

For a Russian speaker using a PL in Russian is constant pain as your brain tries to add all the missing parts to the words:

   if(count(letters<must have a plural ending, accusative case>) > limit<must have singular ending, genitive case>){
       words.append(word<must have singular ending, accusative case>)

[1] Well, it does have those, but not in the same capacity as other languages.

https://en.wikipedia.org/wiki/Analytic_language https://en.wikipedia.org/wiki/Isolating_language

I buy your thought, and your prediction means that Mandarin speakers might have an easier time with a Mandarin programming language than a latin or slavic language speaker would have with a latin- or slavic-based programming language.

Interestingly, Perligata (a module that allows writing Perl programs in Latin) attempts to use case endings in variable names to denote their roles in expressions:


Many years ago, in elementary/primary school I went to a computer camp when I was living in Belgium. We used Logo in French on Macs. I don't remember much, but the keyword for random was 'hazard'.

My experience was actually not bad; I didn't have any trouble associating the words with the abstract concepts involved in sending the turtle around the screen. I was doing fairly simplistic stuff though, I'll concede.

I wonder if instead of decoding words when programming "in the hardware", we form new associations for the (relatively) limited set of keywords languages provide? I mean, in most code, the words provided by the language make up maybe 5%-10% of the code. The names we give things seem much more important.

To your point of not being able to tune out inane heard content in my native language I agree. I'm not proficient enough in any others to have tried it, sadly.

Probably it was the false friend "hasard" which, in French, means: chance, randomness.

It might well have been "hasard". It's been, uh, 20-something years? Thanks for the French lesson!

Interesting, I hadn't thought about"random" much, but hazard kinda makes sense even in English. Like "allow me to hazard a guess."

hasard is in fact the French word for random chance.

Hazard in English comes from that, via the sense of gambling: https://www.etymonline.com/word/hazard

Couple of times I saw programs written in Russian versions of Basic and some other, forgot which one. Never mind programming even trying to understand it blew my brains out. Russian being my native language I could never imagine writing software in it. I also have the same feelings about the songs as you just described. So all in all completely agree with you.

Seeing anything in computers using my native language feels wrong. I think it’s because I learned how to use computers before I learned English well.

For example I learned that to print a line in Pascal you use writeln() at 12 or so. It didn’t occur to me until 10+ years later that writeln is shorthand for “write line”. To me it’s just a symbol with no meaning outside that particular context.

I'm a native English speaker, and so the reason for names like "if", "while", etc. were clear to me when learning. On the other hand, your experience learning programming was similar to mine learning to use a Unix-like OS. Although I do know their etymology as a bit of useless trivia, for the most part "cat", "grep", "vi", "diff", "sed", "dd", "df", "tar", and so on are just arbitrary symbols to me.

Same here. I still remember feeling pretty dumb the day I realized that "if" and "else" were also English words.

> the thought that native-English speakers probably have a rather different experience with programming languages

I know what you mean, but I think you get over it quickly. There's a switch in mental context as you start to code and the fact that certain keywords mean something in your native language just doesn't matter.

As a native-English speaker, when I compare e.g. python's "x and not y" to a more formulaic "x && !y", it really doesn't seem to make much of a difference. I'm curious if you were using a syntax highlighter, that may affect your experience.

On the other hand, python gives us 'x is not y' which is equivalent to 'not(x is y)' and not 'x is (not y)'. Which can surprise even native English speakers

I tend to think of "is not" as an binary operator and my assumptions based on this hold true. I prefer to think of it as entirely separate from "not". I don't know if the compiler rewrites it to not(x is y) behind the scenes.

But it’s easy to forget how concepts you understand so well they feel natural and are burned in your muscle memory are foreign and difficult to others, beginners in particular. A way to measure that is to teach. When you try to explain simple programming concepts, you often realise what millage you have behind you.

I can't imagine programming with ESL and having to deal with code where some english speaker thought they were super clever and named various classes and variables after things that are super referential or punny. I see a lot of classes at my current job named after 80s TV shows or inside jokes. Maybe having ESL would make that type of thing easier to deal with actually since it drives me up the wall.

Classes named for inside jokes are an example of horrible programming practices which should never get past code review

Well that would assume such practices were around when that code was created.

Programming as an English speaker in code mainly written by foreigners is strange too. Misspelling words doesn't matter to them. Having several classes using the same name template ("GhostFactory" / "GhoulFactory" / "ZombieFactory") is fine... and it's also fine if one of them happens to be spelled differently ("PumpkinFatcory"). They're just arbitrary letter sequences that you type via autocomplete.

I remember having to look up what foobar meant back when I started learning programming. "What do you mean it doesn't mean anything? It has to mean something, otherwise what does this code do!?"

Oh boy. Phabricator was _terrible_ with this. Native English speakers thought ridiculous button UI names like "Clowncopterize" were so funny. Fricking no.

I think you're completely right, though I see from the comments that others have different experiences. But to me, the "language" of programming can drive me crazy.

Currently in JavaScript the various forms of "var" and "for" makes me cringe involuntarily when I'm mentally parsing code. "for let foo of bars"?? Gah!

One thing that's felt weird to me is Mockito & friends, and some badly-implemented builder APIs. They're trying to read like natural language, but when the order matters and it's unclear what the order is really supposed to be, the natural language metaphor breaks down and you're better off having it made clear what 'parts of speech' each call is and what the required order is.

I would imagine stuff like that makes a language harder to translate because instead of words, you now have an ordering that might not work. I was surprised to see so few languages on that list where it was essentially a direct keyword translation. There's very little English in C-like languages until you actually start naming everything in the libraries.

Native English speaker, very interested because I find my brain caught when I hear background noise in other (human) languages.

These days, I am hearing more Russian and Dutch. Along with Spanish, German and Swedish, impossible for me to speak but hovering at the edge of understanding, my brain cannot ignore when I hear them.

Programming languages, after 40 years of it I'm not aware of any particular effect of the English words. With "Wolfram Language", for instance, I think it would help if the words were based on Esperanto or Swahili, as I can never anticipate the magic invocations needed.

Language is weird.


"That Japanese Man Yuta" on youtube mentioned on one of his videos that Japanese-y/Chinese-y roman alphabet fonts completely break his brain when he sees them, and he can't see them as English, just horrible nonsense.

Well can confirm. I prefer pure English than romanization pinyin ANY time.

Pinyin for Chinese isn't self contained, you have to guess what really means. If I saw pinyin code, unless it is absolutely necessary, like not ambiguous and no proper English translation, it won't pass CR if I am reviewing it.

"Seeing a programming language in your native language after working with English ones"

That's probably the reason. I'd bet after working for the same time with 1C, you (or I, for that matter) would completely ignore keywords and names being in the Russian language. You'd die from the boredom much earlier though, but that's a different matter)

Btw, END IF is as weird as КОНЕЦ ЕСЛИ.

Not saying you're wrong about the hardware analogy. But I think it's no different than learning to drive or play a sport. Eventually it just becomes automatic.

Been reading and writing English almost exclusively, every day, for the past ~six years, out of ~twenty years of learning it—still waiting for it to become automatic.

Childhood does things to your brain. And language recognition doesn't work the same way as motor skills.

This is irrelevant to your point, but mad props. Your written English is superb-I would absolutely assume you’re a native speaker.

I used to think about this when I was a student. I could totally make a programming language in Thai.

Surprisingly, one of the main issues that I thought of was a cultural one. Thais culturally don't use a single word to encapsulate complex meaning. We use a combination of words to capture that kind of meaning.

Just to give an example: a taxi driver is 'human-drives-taxi' in Thai. A barber is 'tradeperson-cuts-hair'. So, the programming language would be really verbose.

Also, since programming originates from the western world, we never really have Thai words for many concepts in programming. I've been programming for years, and I have no idea how to say 'software', 'hardware', 'class', 'inheritance', 'encapsulation', 'abstraction', 'refactoring' in Thai.

>I've been programming for years, and I have no idea how to say 'software', 'hardware', 'class', 'inheritance', 'encapsulation', 'abstraction', 'refactoring' in Thai.

I'd argue that if you said most of these words to non-programmer native english speaker, they'd have the wrong connotations anyway.

Programming has become (is becoming?) an international language of its own (albeit with outsized influence from english). I think eventually it will be much like the latin used in science and medicine today.

You're right. Native English speakers must have some sort of hard time with English words being repurposed.

I remember at work a colleague asked "how do I kill chubby slave?", which didn't sound strange at the time, since Chubby was an authentication system (IIRC).

But, when thinking about it, that's probably a very offensive sentence in English.

There has been a move recently away from the "master/slave" analogy in a lot of projects for a number of reasons including that one, which personally I think is a sensible choice.

Of course, you get plenty of people saying that we should all not care about the origin or other meanings of words in programming, and yet they tend to care deeply about maintaining the status-quo.

>and yet they tend to care deeply about maintaining the status-quo.

Because changing is more work than not. It shouldn’t be a surprise when people get upset because you want to change fundamental terminology in some software’s architecture because a group is triggered by the originally chosen words.

E.g. Somehow master and slave are more offensive than killing parents and leaving orphans?

The use of "triggered" to dismiss issues people have is frankly, very silly. It's origin is in sufferers of PTSD who, when triggered, face very real and serious mental harm. Yes, we absolutely should try to be considerate to people who are triggered by certain things if we possibly can.

Now, assuming that you are just using the term as a trite implication that people bothered by these terms are just easily upset, I would question that. The transatlantic slave trade was a tragedy and its effects still haunt a huge number of people to this day.

Would you be OK with using holocaust analogies in your code? I would hope you would say no, and I imagine most people would agree. So we have established that there are references that are not suitable to make, and it is a question of degrees.

Changing these things is some work, yes, but it makes working with those systems nicer for a large number of people, for whom those analogies are a negative thing.

I'm definitely not advocating that every other term in computer science is defensible either. If you have an issue with certain ones, I would suggest bringing them up.

No one is suggesting that it should be illegal or anything, just some projects make those changes to improve the quality of those projects. And yes, how nice it is for people to work with a thing is absolutely a part of its quality.

I guess Russian is in a better position because it doesn't use compounds so much—but it made the borrowed programming terms into Russian words by freely inflecting them, which is a very Russian trait. Whenever an imported word doesn't sound quite right for Russian, it gets adjusted fast, similarly to how this occurs in Japanese.

We now have a set of programming terms that are a mix of: borrowings which already existed in other contexts (possibly of Latin or similar roots), Russian words instead of English ones, and straight up transliterated calques. This all coexists completely naturally because the words now serve as roots for further derivation and forming a layer of slang on top of the formal terms. Notably, young people are quick to appropriate foreign words—and they tend to be the ones who get into new tech.

There were (joking?) proposals for using Russian-root words for computing terms, and by now they sound like bringing an Orthodox priest speaking Old Church Slavonic into a datacenter.

Interestingly Slovak and Czech have no problems whatsoever to use native Slavic words for technical terms, e.g. súbor for file (literally a collection of data, think сбор/собор in Russian), snímač for sensor (lit. taking off something, similar to the Russian съёмник), obrazovka for screen (place for images; compare to the Russian образ (image) and экран (screen, from the French écran)). There are more but I can't think of any right now :)

It is very typical for Russians that we often feel awkward using old words in a new meaning to denote new concepts. Somehow it feels silly and imprecise, and we prefer to borrow the word from the language where a concept originated.

At the same time, in colloquial language we happen to replace English words with unrelated but similarly sounding Russian words. For example, saying мыло (soap) instead of 'mail' or 'поймать лося' (catch a moose) for activated stop loss order.

Not sure how ‘súbor’ is properly pronounced, but it seems Russian also has a computer-related connotation for it: https://i.imgur.com/JdgltUj.jpg

Including my favorite variant, the “Russian-English Computer Learning Set”: http://sannata.org/articles/subor.shtml

Though the manufacturer appears to be wholly Chinese.

Arguably none of these words existed in the modern programming sense 50 years ago. Even 'computer' was origially a person employed for calculating <https://en.wikipedia.org/wiki/Human_computer>.

I struggle to imagine that thai can't do similar extensions - inheritance maps directly onto the personal concept of ownership down (human) lineages, surely you have that?

Serious but weird suggestion, talk to a poet.

>> 'software', 'hardware', 'class', 'inheritance', 'encapsulation', 'abstraction', 'refactoring' in Thai. > > Arguably none of these words existed in the modern programming sense 50 years ago.

They did; that's why they were repurposed in computing.

"Ware" means goods, especially something for sale. Hardware means the wares used in construction: screws, hinges, brackets and so on, or any equipment. Extending that to computer hardware is straightforward.

"Software" is new for computing, by analogy to hardware.

"Inheritance" has the meaning taken from biology or reproduction. "You've inherited your mother's good looks!"

"Encapsulation" means to wrap something in a capsule; online dictionaries say it was first used in 1872, and gives an undated example of a pilot encapsulated in a cockpit. I don't have the subscription to see the 1872 usage.

"Class" means a group of things sharing some characteristic. Remember that the code you wrote is a class definition; "class" has its normal, English meaning.

"Refactor" may be novel in computing, I'm not sure. You could also apply the word to a document.

Ok, let me put it another way. Instead of 50 years ago (1970), when computers were starting to be publicly available along with this developing terminology, let's say it was 70 years ago, so 1950.

Consider that a correction on what I originally wrote.

Now take a time-machine back to then and meet with The Man On The Clapham Omnibus, a term used in english legal law to denoye the everyman, the utter mr. average. Use those terms on him:




"ah yes, as purchased at the ironmonger's"


"oh yes, like in prep school?"

err, no.

"well then like the labourers and shipmen, and the aristocracy? That kind of class I take it"

etc. etc. The terms are easy to transmute but it hadn't happened then. Agreed about encapsulation, abstraction, refactoring etc though.

Disagree with me at your peril, I got downvote rights yesterday - so does one feel luck, punk?

Well, does one?


I don't think the man on the Clapham omnibus would have trouble understanding "hardware" to mean tools, fittings or fixtures used for a particular trade or purpose. That's what the word has meant for over 500 years.

"Class" — yes, like in school, or social class. The thing in common in 1950 might be "all boys, all age 13-14, all learning chemistry". A Java EarlyTeenChemistryBoy. (Isn't the first example when learning OO programming something like the class Student, a subclass of Person

In any case, my lawyer will argue we don't want the man on the Clapham omnibus, but the better-educated man in the first-class carriage of the 8:15 express to Birmingham.

(You will find you can't downvote this reply, and I think it's poor etiquette to pick some other random comment of mine and downvote that, although that does happen.)

Let's agree to agree. Now, 1) why can't I downvote your answer? and 2) I thought the smiley made it clear I had no intention of doing so - I replied to your point because it was a good one. The signoff was a bit of fun. I made that as clear as I could.

The OED has citations for "encapsulation" (including variant spellings) going back to 1860:

    1860   F. W. Farrar Ess. Origin Lang. viii. 172   Every subordinate clause being inserted in the main one by a species of incapsulation.

I've always assumed refactor was related to factoring a mathematical equation.

Back in the 80's, when C++ first became a thing, we tried to find Japanese words for the concepts. Destructors became "Death Tractors", which I always thought was awesome.

From your 'death tractors' example, now I understand why people are debating changing 'master-slaves' to 'leader-followers'.

Thais wouldn't want to call their machines (or anything or anyone) 'slave' with serious tone. It sounds offensive.

Thinking about it, native English speakers might have internal struggle about many programming words.

As a non-native english speaker, rude/strange/crude/offensive words don't really cause me emotional impact.

Explicitly translating master/slave to my native Dutch just makes me chuckle. It's like reading a rather upfront ad in the classifieds for folk who are into bondage and discipline.

I remember many years ago reading about a user being offended by a command named "abort".

Isn't that just a straight up transliteration? Destructor -> デストラクター desutorakutaa -> "death tractor" デス トラクター desu torakutaa.

This can happen in English too, eg Experts Exchange <-> Expert Sex Change.

As far as I know it is now transliterated [1]. I do think that a Korean translation ("소멸자"---lit. extinguisher, but the destruction in programming is commonly associated to that translation) is more commonly used than a Japanese translation ("消去子").

[1] https://ja.wikipedia.org/wiki/%E3%83%87%E3%82%B9%E3%83%88%E3...

>Just to give an example: a taxi driver is 'human-drives-taxi' in Thai. A barber is 'tradeperson-cuts-hair'. So, the programming language would be really verbose.

Just like German. A fridge is a cold-cupboard, a car is a driving-thing, etc.

But While, if, else, for are concise words.

Most non-german speakers will find these pretty funny:

- lighter is fire stuff

- plane is fly stuff

- tools are work stuff

The term "Zeug" usually translates to "stuff" colloquially as "thing" would be a better fit for the literal "Ding".

You say "Western" while in fact these all are English artifacts. AFAIK, in French there is no direct translation of "software", which is called "logiciel" - a piece of things that deals with logic. The examples are countless. Don't make the same mistake as we do, labeling together all languages from your region as Asian, without distinguishing.

Maybe look at a different programming paradigm then? :)

I remember trying to learn APL and then watching https://www.youtube.com/watch?v=v7Mt0GYHU9A and was amused that there is this world where programming is just symbols :D

So maybe look at i.e. something apl-like, or maybe ml or haskell or erlang, where so many things you can do with functions and composing them, or even Prolog, where everything could be thought of as a set of rules/constraints to be solved (I think? haven't programmed in Prolog since uni :) )

APL is great.

If you haven't seen them, the late John Scholes did a number of absolutely beautiful demonstrations programming in APL[1], [2], [3].

[1]: https://www.youtube.com/watch?v=DsZdfnlh_d0

[2]: https://www.youtube.com/watch?v=DmT80OseAGs

[3]: https://www.youtube.com/watch?v=a9xAKttWgP4

For anyone looking to get into APL, Adám at Dyalog posts a lot of answers on the CodeGolf and Programming Puzzles StackExchange site often with expanded/explained code, and he hangs in the "APL Orchard" chatroom on SO. His profile says "always excited to [talk about] APL" and that seems to be genuine. - https://codegolf.meta.stackexchange.com/users/43319/ad%C3%A1...

The most public APL dialect using the classic symbols (not J/K/Q/etc) seems to be Dyalog APL - they offer one of their older versions 14 free for non-commercial use (Windows), and they offer their current version 17 free for non-commercial use if you give them your full details (maybe Linux as well?). There is a book "Mastering Dyalog APL" by Bernard Legrand from ~2008 which is available as a PDF with companion files from: https://www.dyalog.com/mastering-dyalog-apl.htm although they have added to the language since that book was released.

There are other APL interpreters - GNU APL, NARS2000, ngn/apl, dzaima/apl (Java/Android) in various states of development and license conditions, and the developers of the last two are also sometimes in the APL Orchard chatroom on StackExchange.

Online there's https://tryapl.org/ run by Dyalog, and https://tio.run/# which has three APL interpreters.

The classic 1970s book "APL\360 an Interactive Approach" by Gilman and Rose is online as a PDF here: http://www.softwarepreservation.org/projects/apl/Books/ with other books as well. That's before the {} syntax for anonymous functions was added to the language, and other developments.


Jay Foad while he was at Dyalog, talking through some Advent of Code puzzles with APL: https://dyalog.tv/Webinar/?v=Q_vgSN6rza0 (might want to skip the introduction of explaining the Advent of Code puzzle format).

Aaron Hsu with Dhaval Dalal and Morten Kromberg, talking through some problems using APL https://www.youtube.com/watch?v=Gsj_7tFtODk

And anything that you get from YouTube searching for Aaron Hsu, he has several talks / presentations from high level ideas to working with trees represented using arrays.

You should just make Thai Objective-C. Their method style is to write things verbosely like “didFinishLoading” or “willAppearInBackground”.

“class” and “inheritance” are just abstractions of the very traditional meanings of these two words. I’m sure Thai has to have equivalents for those?

That's a bit specific since OO is just used for reasons of history and inertia. I'm sure Thai has true/false and other mathematical terms, that's probably all that's needed.

The Wikipedia list does not mention Tampio, an OO language where programs are written in proper Finnish: https://github.com/fergusq/tampio

As an example, here is kertoma.itp (an example routine to calculate a factorial):

  Pienen luvun kertoma on
      riippuen siitä, onko se pienempi tai yhtä suuri kuin yksi,
      joko yksi
      tai pieni luku kerrottuna pienen luvun edeltäjän kertomalla.
  Luvun edeltäjä on se vähennettynä yhdellä.
  Olkoon pieni muuttuja uusi muuttuja, jonka arvo on nolla.
  Kun nykyinen sivu avautuu,
      pieneen muuttujaan luetaan luku
      ja nykyinen sivu näyttää pienen muuttujan arvon kertoman.
It sounds basically like a small part of a mathematics lecture.

The article also does not mention that not only Excel Formulas are localized, but VBA was localized in Office 95 as well [1].

This is German VBA:

    Funktion VorherigerGeschaeftstag(dt Als Datum) Als Datum
        Dim wd Als Integer
        wd = Wochentag(dt) ' Wochentag liefert 1 für Sonntag, 2 für Montag usw.

        Prüfe Fall wd
            Fall 1
                ' Auf Sonntag wird Datum vom letzten Freitag zurückgegeben
                VorherigerGeschaeftstag = dt - 2
            Fall 2
                ' Auf Montag wird Datum vom letzten Freitag zurückgegeben
                VorherigerGeschaeftstag = dt - 3
            Fall Sonst
                ' Andere Tage: vorheriges Datum wird zurückgegeben
                VorherigerGeschaeftstag = dt - 1
        Ende Prüfe
    Ende Funktion
[1] https://de.wikipedia.org/wiki/Visual_Basic_for_Applications#...

The localisation of Excel goes even further. If I send a normal CSV file (that is, one containing values separated with actual commas and newlines) to an American, their Excel can open it.

If I send the same file to a fellow Dutchman, their Excel can't.

Excel, for some obscure reason lost in time, decreed that the Dutch do not, in fact, separate their comma-separated values with commas. We use semicolons. No one seems to know why Microsoft thinks we apparently do this. That means that a normal bog-standard CSV file won't work by just double-clicking it or opening it in Excel.

That's right: a Dutch comma-separated values file must have semi-colons according to Excel.

LibreOffice meanwhile works with anything you throw at it, in any language, of course. It'll just ask you about the separators, defaulting to commas.

It does now, so I guess someone has added it. :)

Which reminds me of 10+ years ago when I worked for a consultancy and often on site with one of its main clients. Both were originally Finnish but by then very large multinational enterprises.

Occasionally whilst doing some archaeology investigation of some internal libraries or old systems you encounter an ancient Java (or worse) library using Finnish class and methods names.

Usually confused the heck out of all of us as there never were a Finnish speaker on any of the projects I was on, and I was there for 6 years. But as much as a decompiled java program you can guess what it does, though not what it intended to do...

Using non-English in programming may have made a little sense when they were a smaller Finnish only company, but 10+ years later, many mergers and acquisitions etc and it really did not make sense any more .:)

Considering the offices I worked for them in Oslo, Stockholm and Copenhagen the development teams was all a mix of nationalities (Scandinavians, English, Belgian, Italian, Polish, Indian) using anything but English would have been silly by then.

Among natural-looking programming languages, my favorite currently is ArnoldC: https://github.com/lhartikk/ArnoldC


This is absolutely amusing. Seems that the creator of the Tampio language has created several esoteric languages, one of them being domain specific language Retki for writing interactive fiction. It is also based on Finnish language. https://www.kaivos.org/projektit.html

This is great. Google translate even turns it into meaningful English. And just as you observed, it comes across as someone explaining the process.

One of my favorite online April Fools' gags from recent years was when it was announced Scala would become Skala and all of the keywords would switch to German [0]. `val` `var` and `def` would all keep their uniform length by becoming `unveränderliche`, `opportunistisch` and `verfahrensweise` :-)

[0] https://www.scala-lang.org/blog/2017/04/01/announcing-skala....

I worked with a German version of VBA for a little while. MS soon gave up on this but it was a really weird experience.

having learned Excel functions in English and having to use them at work in German makes me always nervous

I'm never able to get a function to work as I don't even know their equivalents in German.

Now with Sharepoint and Excel Online, we've discovered that function names follow the base language of the Workspace that the sheet is stored in. It's essentially rendered Excel Online useless.

I thought VBA was always in english (syntax & APIs). Only excel formulas are displayed translated in the regular Excel UI.

So I just so happen to have a PowerBook 540c here with a Swedish install of Office 4.2.1, and had to check my memory, since I remember it being localized.

VBA on this machine has English keywords (Sub/Dim/If/While), but Swedish application APIs (search in Word is Sök for instance). Screenshot of one of the sample macros: https://i.imgur.com/vDWQcWk.png

that was years ago. I don't remember when. It also didn't last long.

uwä (unveränderliche), swa (schwank), and vfw (verfahrensweise) ?

Wert, Variable, Definition would probably be a more normal choice.

wer, var, def

One really annoying thing about using an English-based UTF8-capable programming language with some types of non-English content (e.g. if it's in Serbian) is that you have to switch between keyboard layouts constantly. Sooner or later you end up forgetting to switch back and you start writing your code in say, Cyrillic. And let me tell you, Cyrillic letter A looks exactly the same as Latin alphabet letter A, but will happily break your javascript in the craziest possible ways, like 2 vars not being the same even though they look exactly the same.

Consider configuring your editor to highlight characters above 127.

It is a fun exercise to take a non-english language, look at its idiosyncrasies, and design a programming language with that.

My native language is German. Some ideas:

* We can take nouns and combine them to longer nouns. A "list of objects" is "Objektliste". Parsing will be a challenge but it makes the language more terse and auto-completion is faster.

* We also have gender specific articles. It is "das Objekt" and "die Liste". We could use this to have three possible namespaces, so "der Foo", "die Foo", and "das Foo" would refer to three different things.

* All nouns start with an uppercase letter, so your loop index must be "I" or "J". We could have user-defined qualifiers (like const) which are distinct from variables because they start lowercase. So where you have to use @ in Python, we just use lowercase.

Oddly, all three examples have analogues in real programming languages already:

- in many C-derived languages, a “list of objects” is spelt “object[]” as a type, with the “list” (array) specifier postfixed. (Functions declared to return arrays take this syntax to quite an interesting extreme.)

- in Perl and some derivative languages, sigils serve as a prefix to denote context of use or type; $var is different than %var or @var, for example.

- In Ruby, a leading uppercase character defines a constant, and some aspects of the language are sensitive to the case of the identifiers used.

Of course, it’d be super interesting to put many of these ideas into a single language, as it could well be quite different (and refreshing!) than many of the languages we use today.

> All nouns start with an uppercase letter, so your loop index must be "I" or "J". We could have user-defined qualifiers (like const) which are distinct from variables because they start lowercase. So where you have to use @ in Python, we just use lowercase.

Yeah, but i and j come from math. I've never seen a math paper where people summed a variable using a capital subscript.

I think it might be more interesting to think about how you write a German version of a language that is meant to be English-like. Something like SQL or Cobol.

Also, a loop index is a placeholder for a numeral, like "eins", "zwei", "drei", hence

  Für i = 1 bis 2 ...
However, using articles,

  Für das I = 1 bis 2 ...
And we may want to use a "Doppelpunkt" and an exclamation mark, since this is still a command:

  Für i: 1 bis 2!

Extending on the importance of commands in the German language (and culture) – these are a serious matter and not to be fuzzed with –, a computer featuring an explicit fast-mode, like the Sinclair ZX81, may lend itself particlarly well to a German language implementation:

  Für i: 1 bis 10, aber schnell!

> We could use this to have three possible namespaces, so "der Foo", "die Foo", and "das Foo" would refer to three different things.

You're reminding me of perl, where $foo, %foo, and @foo can coexist.

A somewhat related thing I've had on my mind:

I remember at my Dutch university we would get intro to programming courses in Java. We would often discuss things in Dutch, but found it awkward to talk about `null` (the "pointer") and `0` (the integer, which in Dutch is "nul").

In fact, I remember a distinct bug that happened from informal communication of an API. The conversation went (in Dutch):

    Student 1 - "What does Function A return if the input is invalid?"
    Student 2 - "Nul"
    Student 1 - "Null?"
    Student 2 - "Yes".
And the program went on to crash on a divide-by-zero exception, because it ended up only checking the return value for Null, rather than 0.

We ended up saying "null" as "naL", with heavy emphasis on the L and kept 0 as "nul", to somewhat mitigate these problems. The awkwardness never really went away.

I wonder if Guido van Rossum when designing Python intentionally bypassed this issue by naming the null pointer "None". This is the only language I know of that uses None, some use "nil" AFAIK, but the language issue (in Dutch) is not avoided in these cases.

I also wonder if other non-native English speakers who wrote English-based programming languages have applied similar considerations to avoid possible confusion between the native language of the speaker and English.

Ackchually, the proper pronunciation for English ‘null’ is supposedly /nʌl/: https://en.wiktionary.org/wiki/null

Interesting, that we gravitated towards a similar pronunciation due to a conflict in the beforementioned terms. It's unfortunate that proper English pronunciation does not always cross the North Sea :P. There are actually plenty of words in English that I have only written and read, and have never heard or spoken out loud... and I consider myself a decent speaker of English.

> There are actually plenty of words in English that I have only written and read, and have never heard or spoken out loud

… and I'm a well-educated native speaker.

English proudly possesses a prodigious panoply¹ of… um… phrase particles.

¹ I have never heard this word spoken.

Hah, even simple words can be confusing. My English friends seem to end the word `idea` with a, to my ears, clearly audible `r`, but look at me funny when I mention it to them. Whereas my American friends seem to pronounce it in a more straightforward way..

I also love listening to people speak the "pronunciation poem" out loud. It's very interesting to hear how the pronunciation of each second word is changed, just to make it rhyme with the first (mispronounced) word.


Oh boy. If you want a bag full of surprises, take up listening to audiobooks in English.

> This is the only language I know of that uses None

In the ML family the empty alternative of the `Option` type is called `None`; in Haskell corresponding `Maybe` type has `Nothing`.

Cool! Good to know. I don't think None is totally problem free either, as it is easy to confuse with `NaN`. `Nothing` definitely bypasses all the mentioned issues and seems like a better choice.

So the posting is right on time

Perl is so cool from a linguistic standpoint. I wish there were more written about that aspect of it. Also Conway is amazing.

I mean:

> While in graduate school at the University of California, Berkeley, Larry Wall and his wife were studying linguistics with the intention of finding an unwritten language, perhaps in Africa, and creating a writing system for it. They would then use this new writing system to translate various texts into the language, among them the Bible.

And Perl was explicitly designed with a slant towards natural-language traits. Wall has said that he didn't know as much about creating programming languages, and that this likely affected Perl. So he might've written about the linguistic aspects instead.

> Wall's training as a linguist is apparent in his books, interviews, and lectures. He often compares Perl to a natural language and explains his decisions in Perl's design with linguistic rationale. He also often uses linguistic terms for Perl language constructs, so instead of traditional terms such as "variable", "function", and "accessor" he sometimes says "noun", "verb", and "topicalizer".

For a linguist, he made a horrifically unreadable language.

And Klingon too. Also, the perl parser can be changed to support not just keyword differences, but grammar too.

Many programming keywords derive from Latin anyway, like function, object, static, constant, variable.

There was, of course, PHP's notorious Hebrew error message T_PAAMAYIM_NEKUDOTAYIM:


Programming languages and human languages are really not the same thing at all, unless maybe you consider some extreme outliers like AppleScript[1]. It seems like the choice of character set would make a bigger difference. Most of the today's popular languages would seem to be strongly impacted by making them compatible with code that is entirely in 7-bit ASCII. I don't know enough about other character encodings to try to imagine how they might make for a very different programming language design. What would a Chinese-influenced programming language look like if its designer(s) had never had any exposure to alphabetic or syllabic writing systems?

[1] https://en.wikipedia.org/wiki/AppleScript

> What would a Chinese-influenced programming language look like if its designer(s) had never had any exposure to alphabetic or syllabic writing systems?

The first thought that came to mind as I was writing this was: some kind of logographic APL. I don't actually know what such a thing would like, but I'm sure I wouldn't be able to read it.

I was a skeptic, but Asad Memon's "UrduScript", although it could just as easily be called "HindiScript" due to its usage of the Roman alphabet, made me realize that this would have been a game changer in my freshman year or anyone from the Indian subcontinent just starting to program. For Hindi/Urdu speakers, the "why (kyun?)" section here - https://asadmemon.com/urduscript/ - is a fun read. Although I have always studied in an English medium school, I distinctly remember being confused by the simple term "invoke". It was just not a part of my regular vocabulary, and I felt it was something more complicated than what it seemed. Turns out if I had something like UrduScript I would have been more confident in my early days. Having said that, I think such languages are most useful as a teaching aid rather than a serious production language.

Many languages these days support Unicode, so you could technically write them in whatever you want. I think it's great that the world has settled on a popular, relatively easy to learn language. It helps immensely with sharing open source libraries worldwide. However it would be nice if doc formats ala Javadoc had provisions for translation.

Having something like a global standard, inter-intelligibility between languages etc. is an unmitigated Good Thing, and may it never change. Apparently, my English-as-a-second-language compatriots Bjarne Stroustrup, Anders Hejlsberg, and Rasmus Lerdorf agree with me.

And it's not even about English, but about a subset of ASCII on a QWERTY layout - symbols that are inputtable and explicitly marked on pretty much any keyboard. Whether most keywords in a programming language are in English or not is not even relevant. I mean if all keyboards had Greek symbols on them we would have an article about non-Greek-based programming languages.

It's a good thing, I agree. There is no reason to worry either, as uncommon hard to input unicode symbols in the core language is such a huge usability failure, that those languages simply have no chance of gaining any significant mind share.

Totally agree. Using non-Latin characters would be a non- starter for most of the world.

Possibly relevant here: a story about Donald Davies' inventing the work "packet" (in the networking sense) from Katie Hafner's book Where Wizards Stay Up Late:

Davies' choice of the word "packet" was very deliberate. "I thought it was important to have a new word for one of the short pieces of data which traveled separately," he explained. "This would make it easier to talk about them." There were plenty of other possibilities — block, unit, section, segment, frame. "I hit on the word packet," he said, "in the sense of small package." Before settling on the word, he asked two linguists from a research team in his lab to confirm that there were cognates in other languages. When they reported back that it was a good choice, he fixed on it. Packet-switching. It was precise, economic, and very British. And it was far easier on the ear than Baran's "distributed adaptive message block switching."

I wish that much thought went into other technical naming. (Looking at you, grep.)

I have a friend with a German shorthair pointer. Whenever I hear him refer to the dog as a pointer, I briefly wonder if he’s referring to his dog, or if his dog actually represents the location of another dog.

Since we're discussing this, I'd like to share my work with this community: a programming by demonstration system, which doesn't use text [1]. One of its advantages is that people of different non-English backgrounds can be part of (say) the same introductory programming workshop, as long as one person can learn and orally convey how to use it [2]. [1] www.blockstudio.app [2] https://dl.acm.org/citation.cfm?id=3174196

English's largely non-inflected, positional syntax makes it a good fit for programming languages. Chinese would be better if not for the writing system.

I'd rather see a language with more math-like operator symbols (e.g. for-all, element-of, union, is-proper-subset, etc) and an accompanying keyboard.

Dyalog will sell you a keyboard to go with their APL: https://www.dyalog.com/uploads/images/Business/products/dk_r...

∪ - dyadic downshoe is union ( https://tryapl.org/?a=%27ab%27%20%27cde%27%20%27fg%27%20%u22... )

proper-subset isn't builtin; this might do it, but there are probably neater ways

    'ab' 'cde' 'fg' {((≢⊆⍺)>≢⊆⍵)∧(≢⊆⍵)=+/⍺∊⊆⍵} 'cde' 'ab'
"If the count of items of the left vector is greater than the count of the right vector (right side is smaller, it is proper), and the count of the right vector is equal to the number of elements in the right vector which are in the left vector (i.e. all of them are found, it is a subset)". https://tryapl.org/?a=%27ab%27%20%27cde%27%20%27fg%27%20%7B%...

Not sure it needs "for-all" because functions work on all elements by default. It can have for: loops, but they aren't math-like.

That would be APL.

i think Agda has a whole bunch of Unicode operators, and you can define your own. you can do that in Haskell as well, but it's pretty uncommon. Perl 6 has some Unicode stuff too

I don't get all the explanation going on in "Prevalence of English-based programming languages". English is the lingua franca of programming so origin is almost irrelevant. I'm Swedish but if I would develop a programming language I would do it in English, even if it was only for myself so I don't believe the "used English to appeal to an international audience" argument.

>...I would do it in English...

Men du hade inte säga varför du skulle skriver detta på engelska. Till exampel, är engelska mycket bättre än svenska för som programmerarspråk?

I've seen version of LOGO with all instructions translated into Polish used for education in 80s/90s. It was pretty OK but it wasn't really based on Polish, it wasn't using any features of Polish language, just translated the English commands to make it slightly easier for the kids.

To make a programming language based on Polish that captures the core of the language would be pretty strange - variable names would need to change depending on the role of the variable in the expression, and the order of the subexpressions in an expression shouldn't matter.

Function names should change too depending on which subject they are called, and most of the time subject name should be skipped :)

Similarly though, I don't think any programming language is really "based on English" or "captures the core of the [English] language", in any meaningful way. So isn't the situation basically the same between English and Polish?

Just like Polish noun cases, I don't see how any of the interesting distinctive features of English are captured by programming languages.

Well English is analytic language, so the syntax of if, while, and positional arguments of functions reflect that.

Good point.

we used that at school! i can't remember much though – i've used english Logo afterwards and it overwrote most of the polish commands in my memory. the only one i can remember is `np`, the polish equivalent of `fw` ("naprzód" and "forwards" respectively)

Until we program in natural language, I don't see why we must even bother with this. Its fun but useless. For programming as it is today, its just a bunch of English words that we have to learn. We can always comment and document in out native languages if need be. I say this as an Indian, so obviously not English native speaker. (To preempt the "colonial benefit of English" retort, I would say I would say the same thing had USSR won the cold war and everything had to programmed in Russian in Cyrillic. We just had to learn a few words even then, assuming Soviet tech hadn't made NL based programming a reality.)


    use Lingua::tlhInganHol::yIghun;
    <<'u' nuqneH!\n>> tIghItlh!
        wa' yIQong!
        Dotlh 'oH yIHoH yInob 
                qoj <mIw Sambe'> 'oH yIHegh jay'!
        <Qapla'!\n> yIghItlh!
    } jaghmey tIqel!
Perl makes me smile more than any other language I've poked at. There's an absurd amount of flexibility available.

D has support for Unicode characters in identifiers, pretty much the same as C does. But now I think that was probably a mistake. Some people use it, but nobody who wants their code to be editable by the larger community.

JavaScript has virtually always supported Unicode variable names (well, the basic multilingual plane).


Sure, but do people in general use them?

If you don't know English (a lot of people) and you are writing a script for a web page, you would use your own language.

Although compression tools replace the majority of symbols with short Ascii symbols, so it may not show in the final packed code.

It looks like the closure compiler emits Ascii only, using escapes for Unicode symbols that are not transformed.

In computer programming I will use American language, even if the documentation is Canadian (I am Canadian, so I write Canadian documentation even though the commands in the program itself will be American).

However, you can also have programming languages with abbreviated keywords or no keywords, which is also sometimes suitable. If there are keywords, I will always do it in American.

(And when writing music, I will write all of the notation in Italian, even though I do not speak Italian. If I don't know the Italian word for something, I will ask someone who does know, and write that.)

I didn't know about Enkelt[0,1] but I guess this changes things:


Enkelt >> var första = "Nej! Jag ville inte att skriver detta!"

Enkelt >> skriv($första)

Nej! Jag ville inte att skriver detta!


Pro-tip: You can use LLVM for non-ASCII naming, if you don't want to stick to Enlish-only letters.

[0] - https://enkelt.ml/index.html

[1] - https://trinket.io/embed/python/10bb0ea708?outputOnly=true&r...

There are only a few reserved (English) words, most of which are shared among different programming languages. Why is this any more of an issue than all the Italian words used in musical notation?

It's not, it's just fun to think about. Likewise french dominance of culinary terms, etc..

What I find more interesting is actually how removed from spoken English most programming language reserved words are.

Taking C and derivatives: "if" is pretty close, and "while" captures the meaning of the word but not its typical usage. The meaning of "return" is correct, but jargon (yes, you're "going back" to where you were called from, but that's not what people mean when they use the word normally). But from there it gets weird fast: "else" reflects a somewhat odd sense of the word that sounds archaic and stilted in human communication; "for" and "break" have little to no connection to spoken language at all.

As far as type names: "int" makes sense if you took high school math, and "char" abbreviates a real word that no one uses ("letter" is the one we get taught in school). But no amount of literacy is going to tell you what "short", "long", "double" or (weirdest of all) "float" mean, those are all terms of art you need to learn from scratch regardless of what language you speak.

"int" doesn't really make sense from a mathematical perspective, since ints in e.g. C are a finite set that is not closed under addition or multiplication.

So I think it's actually a good example of your broader point, that the meaning of English words as programming keywords is often very different from their meaning in other contexts.

> finite set that is not closed under

The discussion was about programming languages and English. What on earth is this about?

You're actually compounding the issue here, by invoking jargon from a different field. That's true of the definition of integers you'll find in college level math textbooks, but the word "integer" as understood by normal people (even computer programmers) means "whole number", which is why the type is named that way.

My point was that 10^1000000000 is a valid integer according to the high school (or college) definition, but not in most implementations of c++.

(non-native speaker, learned programming before English) I think this is a bad idea for a few reasons:

- More languages = more barriers. Gosh we have enough problems with silos within the software dev community as well as languages in real life being one of the main hindrance to mobility. - Learning how to program is a great opportunity to refresh your mind and adopt a new way of thinking, why carrying along the burden of your flawed human language?

Today is a good day to code.


Shameless plug here, but I am creating a programming language that tries to let everyone code in their native language and also 'translates': https://citrine-lang.org/ Might be interesting for people reading this topic. Nobody uses it of course but I like to work on it. ;-)

I vaguely remember reading about a fuckup where the translation department of a company (I believe it was Microsoft) accidentally translated the entire programming language (I believe it was PostScript) into German. Of course you were no longer able to print using the german version of PostScript.

Does anyone else know what I'm referring to? It was probably in the late 90s.

Wasn't it possible to programm in different languages in VB/A? At least some co-workers told me so.

Something about VB in German.

Yep, VBA had localized keywords. The standard functions still were named in English though, so the weirdness didn't even buy significant localization.

Almost 20 years ago I wrote some Excel macros on an even then ancient Windows 3.1 computer with German Excel. I tried for Hungarian notation (as I said, almost 20 years ago) and used d for date. So the end date of some range was dEnde. D is also the first letter of Datei, which is German for file. So when I brought the Document to the boss's newer computer the program got translated but all the variables still had German names except for dEnde, which turned into EOF.

Lamdu supports switching human language on-the-fly now, good luck doing that with a text-based programming language!


In my elementary school, we used a French version of Logo Writer along with a French language special purpose library called Caméléon to learn robotics programming.

It says non-English but more specifically the spellings are American for the most part.

At least the one’s I’ve used, apart from BBC BASIC, and ZX BASIC

What features in a human language would make it the optimal basis for a computer language?

The degree to which words are inflected by context. In English, it's almost nil. In Chinese, it's nil. Lingua Romana Perligata is an amusing example of how to use a heavily inflected language for programming, but it shows why you don't really want to do that.

I've heard that Perligata has actually been used in production. (It was similar to their native language.)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact