Hacker News new | past | comments | ask | show | jobs | submit login
The language of programming (temochka.com)
164 points by turingbook on July 7, 2017 | hide | past | favorite | 58 comments

Back in the 80s, Infocom [1] created a database program named Cornerstone. Because it was an integrated environment, identifiers were abstracted. A line of code displayed on the screen as:

    xn = (A * y) * x * (1 - x)
was actually stored as:

    id17 = (id1 * id11) * id10 * (1 - id10)
and there was a mapping of internal IDs to visible names:

    id1  A
    id2  B
    id10 x
    id11 y
    id17 xn
The upshot---you could change the name of a variable anywhere in the editor and the code would still work because the "name" picked by the programmer was not the actual "name" used by the system. I often wonder if that can't even extend to keywords in a language as well, translating this [2]:

    medan not_done
      för x:= 1 till 5 gör
        om person^.age = 120 så
        om person^.age > 130 så
          gåtill person_should_be_dead;

    while not_done
      for x := 1 to 5 do
        if person^.age = 120 then
        if person^.age > 130 then
          return person_should_be_dead;
Or heck, while we're at it:

      for (x = 1 ; x <= 5 ; x++)
        if (person->age = 120)
        if (person->age > 130)
          return person_should_be_dead;
But I'm not holding my breath on this.

[1] Yes, the company that made all the text adventure games.

[2] http://boston.conman.org/2008/01/04.1

I realize that Perl is no longer all that popular of a programming language, but it had a lot of features along these lines -- you could even program it in Latin!

#! /usr/local/bin/perl -w

        use Lingua::Romana::Perligata;

        adnota Illud Cribrum Eratothenis

        maximum tum val inquementum tum biguttam tum stadium egresso scribe.
        vestibulo perlegementum da meo maximo .
        maximum tum novumversum egresso scribe.
        da II tum maximum conscribementa meis listis.
        dum damentum nexto listis decapitamentum fac sic
               lista sic hoc tum nextum recidementum cis vannementa da listis.
               next tum biguttam tum stadium tum nextum tum novumversum
                        scribe egresso.

Perl might not be that popular precisely because you could even program it in Latin :D

But this will solve the problem only for the languages which are similar to English, because they share the same grammatical structure differing mainly only in the keywords. This is trivial to fix with approach you mentioned, But for the rest of the family of languages this require more change, (Sometimes change in the whole programming model)

For example in Kannada language (or any other Dravidian language) sentence formation itself is different. In English you can read 'if' statement as a proper sentence ("If a equals to Zero") but in Kannada conjunction appear at the end, 'if' sentence is more like "a equals to zero 'then'", so these kind of changes requires non-trivial changes to programming languages. As an example I was trying to 'alias' basic functions (if, for, define... etc) to Kannada languages in Lisp (because its easy in lisp!), but these kind changes breaks "S-Expression" format due to change of the position of the function in the list.

And typing in these languages in typical US-en keyboard is whole new problem in itself!

But I believe that some problems can be expressed better in languages with different structure or at least better to its native speaker.

Regarding Lisp's ordering, this has come up in the article "What if Lisp was invented by the Japanese?"[1]. Note also that in the comments to that article[2], I mentioned that "Japanese Lisp" is something I find easier to read and would prefer to write, even as a native English speaker.

1. http://lispnyc.org/blog/euske/what-if-lisp-was-invented-by-t...

2. https://news.ycombinator.com/item?id=2213012

If Lisp had been invented in Japan, maybe the main connective of an expression would have ended up as the last item visually: (1 1 +). Maybe.

However, it would still be the first element of the list in the abstract data structure: i.e. the equivalent of (car '(1 1 +)) -> +. (Or, rather, of course, ((1 1 +)' car) -> +. Let's use the fantasy read syntax.)

The main point is not that + is leftmost or rightmost, but that it's at a fixed position in the data structure, and that it is the first/most accessible position.

Well, grammar doesn't have to map strictly one to one.

Microsofts LINQ is a good example of this. Instead of using SQLs natural language "select foo from bar", they reversed it into "from bar select foo". Just for autocompletion to work better (you don't know which columns are available until you've selected table). This doesn't make it any harder to understand than SQL, except maybe the very first time you read a query but you'll get over it in 5 minutes.

Most programming languages also require you to write seconds(10) instead of "10 seconds". It's no major problem.

Sure, but why?

I mean, even without the "internal IDs" you're basically talking about creating a Pascal code generator and hooking it up to an early stage in the existing Håstad compiler. A first year CS student could do that as a course project. It's not that complicated.

But there are a few reasons we don't do that.

Like the Sean Conner article argues, the specific keyword symbols we use barely matter. Sure, you can come up with pathological cases with obviously bad choices of keywords, but all reasonable choices can be learned in a matter of minutes and then you're good. As long as a for loop behaves like a for loop it really doesn't matter if we write "for", "för", "meanwhile", or "සදහා". We'll learn to recognise the symbol, whatever its etymology.

Because of the above, when we "transpile" (which is apparently what the kids call compilation these days) we tend to want to do it to a language that is sufficiencly different (i.e.not a language whose only difference is that the keywords are made up of different symbols). And when you do that, you'll get code that uses idioms from one language but syntax from the other. And in your attempt to go from 100 people being able to read your code to 100,000 people, you accidentally went to 0 people.

ALGOL 68 was published in multiple languages, but if you meant that the interpreter could switch between languages, Perl and some others have you covered:


> For better or worse, we must agree that English has won the world’s common tongue competition.

I too labor under the yoke of a foreign tongue not sung to me at my cradle.

But something to keep in mind about English is that she gives everyone a hard time. No-one gets a free pass. Absolutely no-one.

In a few months, POTUS Trump will visit the Queen of English. Fo' shizzle they too will have problems with the language. She'll have bigly problems, and he'll have all the rest.

So the thing I do is to have something worthy to say, find people who'll appreciate it, and improvise a shared tongue for that special moment.

People have always gladly met me halfway.

I know this comment is tongue in cheek, but honestly, it's something I've come to really enjoy about English. It's so pervasive that it changes constantly, and often times you can guess where someone is from by how they construct it. As an American, I only speak English fluently but have a passing knowledge of other languages. I wonder if any other language has similar phenomena, in a single country(excluding regional dialects, which also exist here of course).

I'm sorry to disappoint you, but this happens in every language :)

A cultured speaker (he doesn't even have to be native) will recognize where other speakers are from based on either their accent or their phrasing, in most languages.

My come from a small country (Portugal), yet there are multiple recognizable differences between not only the pronunciation but also the vocabulary used for the common words (like a cup of coffee) in different regions of the country.

The fact that we were quite poor until not that long ago might have helped to maintain those differences, since most people didn't have TVs nor could afford to move to a city to attend college. Then again, the poverty also led to major emigration from rural areas, which killed a lot of smaller communities.

Russian demonstrates this to some extent, and German, prominently.

I am currently in Moscow and I can tell anybody from Moscow just by their first sentence based on pronounciation - but almost only that. That is because Russian has a governing body that dictates what "real" russian is. English has no such body.

In English, words mean what people understand by them, not what they are suppose to mean.

French has a governing body (l'Académie française) but, at least in some parts of the world, spoken french is very different from what is prescribed by the Académie. Plus the accents are so diverse, here in Québec we are 7 million people (and not all of them french speakers) and we have as much difference in the way we speak depending on the region as you have in the U.S. for example. Funny thing, when we visit France, the french sometimes answer us in english, because our accent is so thick that they think french is not our native language.

I thought that english is the worlds first language because it's very easy to learn. I would guess almost anyone can get up to basic conversation skills and learn the terms for programming in some months, and that's it. No need for shakespearian level of vocabulary to get things done. Imagine Mandarin in the same place as comparision, with a (beautiful) but very impracticable alphabet to begin with.

I suspect English is the world's first language right now because of the British Empire, and the Pax Americana that followed (and continues still)

>functions with names like reify or transduce are commonplace.

Neither 'reify' nor 'transduce' are particular to english. They're both Latin.

I'm reminded of Feynman's experiences lecturing in Brazil. He got complimented on his rapid grasp of Portuguese, when in fact he was exploiting the fact that all the big fancy Portuguese technical words were either Latin or Greek. He was perfectly able to talk about Physics, but he was unable to order himself a sandwich.

Yes, but they are familiar - or at least, not entirely unfamiliar - to English speakers because English has inherited a lot of vocabulary from Latin. Other European languages have, too, so maybe names like "reify" or "transduce" might be somewhat approachable to a Portuguese speaker (although I wouldn't bet on it); but are they approachable to a Mandarin speaker?

I disagree that most native English speakers would be able to give you a reasonable definition of 'transduce', let alone 'educe'. I love Clojure, but I find the naming cult a bit absurd from time to time. I suspect that most Clojure programmers just learn them as abstract words, much as a non-native speaker would and as the OP describes.

I agree with you on the definition, but ...

It'd be easier for a native speaker to attach a somewhat correct meaning to transduce because of familiarity with words like translate, transfer, transform, induce, reduce and deduce.

So while most people couldn't define them, they are familiar with both parts of the word and that would make it easier to remember and to understand in when seen in its context.

at least part of what i was getting at is that this statement is true not only for English, but also for French, German, Italian, Spanish, Portuguese, etc.

while it's less true for Russian, it's true to some extent because Russian and greek share many roots, especially for technical language, and loan-words are prominent features of the language.

I agree with you on the definition, but ...

It'd be easier for a native speaker to attach a somewhat correct meaning to transduce because of familiarity with words like translate, transfer, transform, induce, reduce and deduce.

Did you just claim that "reify" and "transduce" are "at least not entirely unfamiliar" to typical English speakers? Really?

Every source code file is a story. A story with characters, relationships between them, and how the characters behave towards one another. Humans are hardwired to understand the world (and by extension computers) in terms of stories.

The way that programming languages allow us to write these stories is by giving us the opportunity to name things. Naming transforms code from a dumb sequence of instructions, to something that can be understood.

I don't believe non-textual names have any place in programming. I have used Haskell and it is an unmitigated disaster.

> I don't believe non-textual names have any place in programming. I have used Haskell and it is an unmitigated disaster.

If you are referring to the use of non-alphanumeric symbols, then I wonder whether you believe that calling the monadic bind bind instead of >>= would really make things clearer?

Sure, one of them looks like a word that already exists in English, so it might be marginally easier to remember; but it has nothing to do with the binding you do with a rope, so its familiar appearance is actually misleading.

Personally, I believe you should use names that are already part of your vocabulary (even symbolic ones like +), or combine existing vocabulary items into a description (even mixing English and symbols, as in number->string), or make up your own arbitrary words or symbols, so long as the end result is internally consistent.

I think it’s more likely they may have been referring to how it’s considered idiomatic to use extremely terse variable and function names in Haskell (and indeed in most functional languages).

This is a habit borrowed from math. It often makes a lot of sense, however. Giving a variable a long name often unnecessarily ascribes intent to the variable. When a variable is generic enough to represent almost anything, giving it a specific name is potentially misleading.

>I don't believe non-textual names have any place in programming. I have used Haskell and it is an unmitigated disaster.

And yet millions of programmers use && and not and, || and not or, { } and not begin, end, down to things like => to mean bound-closure in JS, with no issue at all.

This is a bad example, these are symbols. Like +,-,*,/,e,π,µ, ... in maths. These symbols can represent a relation between the named things he referred to. They are widely used and have more or less the same meaning across different languages.

Not sure of the distinction.

If symbols "like in math" are OK, then math also use symbols x,y,z etc for the items in an equation/function etc, not just for the operators.

of course every time i type '&&' i still say 'and also' in my head

I'm not a native speaker either. But firstly, if you don't know english in computer science you can't follow any research, any documentation and worse you can't write code that is readable like a book. Basically you are stuck like Robinson Crusoe putting together yourself a roof and a bed from banana leaf, while the whole world is on an other civilisation level. I personally hate when people start using their language to code, it's just not how it evolved and how it should be done on a professional level. If you want to play music you need to learn reading the notes first. Call me a traditionalist.

Context matters. If its your in office code, use whatever language you want. But if it's going to be open source, english please.

I've found Princeton's WordNet extremely useful to name things with a fitting English word in programs. It let's you traverse the hierarchy while proposing hyponyms, holonyms and synonyms. Entry level speaker will find the example sentences useful. By avoiding invented fantasy names code becomes more readable and thus maintainable.

Just resist naming anything an 'entity'. ;)


> Just resist naming anything an 'entity'. ;)

Unless, of course, it has semantic meaning in your domain, such as when creating an Entity Component System.

Still, I'd advice using it because 'entity' is the top holonym in WordNet. I'd also argue if everything in a system is called an entity all meaning is lost and you should pick something from the lower levels of WordNet like 'thing' or 'stuff'. Never said WordNet helps to sell your product, that's an entirely different task.

What about a format for code localization for code repos? A bit like a `.pot` file for string translation, but instead it applies to names.

For reading code in another language, you apply the .pot file to the code base in a separate branch and it does search and replace in the source files so you have the same code base localized (assuming utf-8 everything).

When running code with localized identifiers, any single word, underscore_connected, and CamelCase identifier is "looked up" and "symlinked" to the correct identifier in the default locale (English). Think "search and replace" at the AST level of the language.

I'm not completely sure how well that would work in practice, as many times programmers don't use actual works for names, but abbreviations (len, dict, etc.). That being said, I think the example the author gave about Excel worked well because there was an already-enumerated set of functions that needed to be localized. I could totally see it being feasible, if tricky with regards to parsing, for (programming) langues to provide a compiler/interpreter flag to instead use a different translation/localization for keywords and standard library identifers, which although presumably more numerous than the number of functions in Excel, are still an already-enumerated set. Given how most languages rely on community contributions for translating documentation, they could presumably ask the same contributors to help with localization of a language's built-in and standard library identifers.

The topic of learning to program as a non-english speaker has come up a few times with friends and coworkers, despite all of us being native english speakers.

I've proposed that languages like Golang and Swift that support unicode should go a step further, and come with a translation file for their keywords, which is then fed into the lexer/parser.

It doesn't solve the problem of an international codebase, or let you switch languages as shown in the Excel example, but it would at least make it significantly easier to learn and write code in one's native tongue.

And it could be available today, without significant changes in tooling on either the compiler vendor's side (except the translation tooling of course) or the developer's side.

Translating the tens of keywords in a given typical language won't get you very far though - you need translations for all the terms in all the commonly used libraries, and documentation of how to use them.

That is true, bare minimum you'd need to translate the stdlib, and at that point you're translating identifiers, which means you might as well go for translating everything as shown in the Excel example.

The author's command of English as a second language is very impressive.

Absolutely, I know plenty of native speakers who wouldn't be able to write anything as correct as that without serious effort, if at all.

Growing up (Generation Commodore 64, later having an Amiga) taught me enough computing lingo that I one day started to read English books. Because at some point the really interesting books didn't have German translations anymore.

So there I was struggling with my English, always being happy when there were enough technical terms which I understood. (And worst thing of all - English speaking authors dare to use puns and humour in technical books. Which I didn't understand, because of the vocabulary and because we Germans would never do anything like this, we're much too serious for that.)

"I won’t lie, this looks outrageous even to me. But not to my Dad, who is a civil engineer and doesn’t speak a word of English. He is dangerously fluent in Excel’s formulas, which he uses extensively in those hundred-sheet documents bristling with filters, conditionals, and pivot tables. Then the roads and bridges are getting built based on those calculations."

A cheer or two for the humble spreadsheet - the longest lasting and most widely used end-user 'programming' metaphor I've heard of, despite the estimated 5% error rate in formulas.

A git for spreadsheets would be ace!

I'd like to know more about those 5%, could you give more details?


References from 10 years ago but enough to start chasing more recent stuff.

The idea of learning programming commands by rote with no concern as to their English meaning reminds me of how a lot of people working in science learn scientific terms as simple arbitrary identifiers rather than looking at the Latin and Greek roots of the term and understanding the term from those.

Names should be automatically generated. Alpha-equivalence should be trivial. English shouldn't be required, but you're not going to get away from requiring something of equivalent expressiveness at some level.

Ruby allowing utf-8 identifiers is successfully used with Japanese. Using symbols instead of a few english keywords would completely solve the problem, maybe using configurable automatic string replacement to get the symbols, comparable to IDEs like coq.

Given that the set of language keywords is typically small, learning them as opaque strings is hardly a raggedy. Nobody complains that they have to learn Italian terms like "Da Capo A la Fine" to play music.

With standard libraries becoming more important, non-keyword terms like 'vector' and 'set' are also important, but they tend to be common across languages. Good luck renaming them on the fly without a serious refactoring tool.

Maybe there should be a google translate like thingy for programming languages. Atleast the main keywords get translated. Would make lives easier. (Python would be a great language to have such a feature as it has more verbose identifiers than most other languages like C or C++)

If I have to choose a language for programming, I would suggest Esperanto. An artificial language that is easy to learn, and more consistent than English.

Amazing article. I wonder just how much ties to English have held computer science / software back?

I can't stand the chromatic aberration theme of the site. Please stop giving me a headache! On my phone I had to copy and paste the contents into a note in order to read it.

Applications are open for YC Summer 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact