Hacker News new | past | comments | ask | show | jobs | submit login
Why Does “=” Mean Assignment? (hillelwayne.com)
674 points by panic on Apr 10, 2018 | hide | past | web | favorite | 354 comments

As the article points out, the big first programming languages, FORTRAN, COBOL, LISP, and Algol each had their own way of doing variable assignment. FORTRAN used "=" and Algol used ":=" the other two languages used commands or functions to set or bind the values of variables.

In the mid 60's, when I started programming, most programs had to be keypunched. (There was paper tape entry and Dartmouth's new timesharing system using BASIC, but these weren't in very widespread use.) Until 1964, the keypunch machine in use (IBM 026) had a very limited character set. This is the reason that FORTRAN, COLBOL, and LISP programs were written in upper case. Even the fastest "super" computer of the time, the CDC 6600, was limited to 6-bit characters and didn't have lower case letters.

Naturally, symbols like "←" or "⇐" weren't available, but even the characters ":" and "<" were not on the 026 keypunch keyboard and so ":=" or "<-" were much harder to punch on a card.

These early hardware limitations influenced the design of the early languages, and the early programmers all became accustomed to using "=" (or rarely ":=" or SET) as assignment even though "⇐" might have been more logical. The designers of subsequent programming languages were themselves programmers of earlier languages so most simply continued the tradition.

> Naturally, symbols like "←" or "⇐" weren't available, but even the characters ":" and "<" were not on the 026 keypunch keyboard and so ":=" or "<-" were much harder to punch on a card.

This explains why early ALGOL implementations (e.g. Burroughs, the only dwarf who went all in on the language) and dialects (ALGO, JOVIAL, MAD, NELIAC) actually used ‘=’ for assignment. This was perfectly legal according to the language definition, which distinguished between the ‘reference language’, which used ‘:=’, and its ‘hardware representation’, which could be anything. (Naturally there was no portability.)

I feel like this is the best answer. People have to work with what they have. Much in the same way that SNES games look the way that do, even though the designers back then were just as smart and talented as the ones today. That’s simply what they could do with the medium at the time.

The limitations define the style. And then dogma perpetuates it, as the new generation continues to live with the results of once-great but now outdated thinking.

*The limitations define the style. And then dogma perpetuates it, as the new generation continues to live with the once-great but now outdated results of someone else's thinking.

Another interesting way to think about it is as "match". That is try to match the stuff on the right with the stuff on the left.

Take Erlang for example:

    1> X = 1.
    2> X = 2.
    ** exception error: no match of right hand side value 2
Notice variables are immutable (not just values themselves). Once X becomes 1, it can only match with 1 after that. You might think this is silly or annoying, why not just allow reassignment and instead have to sprinkle X1, X2 everywhere. But it turns out is can be nice because it makes state updates very explicit. In complicated applications that helps understand what is happening. And it behaves like you'd expect in math, in the sense that X = X + 1 doesn't make sense here either:

    1> X = 1.
    2> X = X + 1.
    ** exception error: no match of right hand side value 2
It does pattern matching very well too, that is, it matches based on the shape of data:

    1> {X,Y} = {1,2}.
    2> X.
    3> Y.
In other languages we might say we have assignment and destructuring but here it is rather simple it's just pattern matching.

I'm sure I've mentioned it here before, but in my favorite Erlang talk I design the language with the equal sign as my core construct.

Since the equal sign is an assertion of truth, and since assertions cause a crash on failure, you can pretty much derive the rest of the language/BEAM characteristics from that.

Thanks for sharing. Enjoyed watching (from the link below). Well done!

I like the '=' as an assertion idea. That is, it's like running code with assertions turned on. Never thought about it that way but it makes good sense.

Also stealing your idea of "rebootable code" :-)

I'm sure I stole the idea of rebootable code from someone else, but I'll be happy to take credit for it.

I almost submitted the talk to That Conference this year. I should start shopping it around more.

I don't think I've heard of this before; I'd love to watch it if you have a link!

I'd also want to see the link, if available!

This is from a few years ago, my first time giving it, and I've never had the courage to actually watch it, so ymmv.


Dude, this was great. Been looking to get into Erlang, but the "unfamiliarity" always shunned me away. Really nice talk.

Thanks, glad you liked it. I love teaching Erlang.

Also valid in Erlang:

    2 = X.

    {ok, Y} = X = foo().
Oh, and here's a fun one:

    1> {X, X} = {1, 1}.
    {1, 1}.
    2> {Y, Y} = {1, 2}.
    ** exception error: no match of right hand side value {1,2}
If you're wondering what the use of that is, here's an Erlang "drop all occurrences of an element X from a list" function. (Prerequisite knowledge for this: Erlang functions have several clause-heads; which one is executed on each call depends on which one is able to successfully bind to—i.e. = with—the arguments.)

    drop(X, List) -> drop(X, List, []).

    drop(X, [], Acc) -> lists:reverse(Acc);
    drop(X, [X|Rest], Acc) -> drop(X, Rest, Acc);
    drop(X, [Y|Rest], Acc) -> drop(X, Rest, [Y|Acc]).
In other words—first, set up an accumulator. Then, go through the list, and if X can bind in both positions, skip the element; otherwise (i.e. if Y != X) then shift Y into the accumulator. At the end, reverse the accumulator (because you were shifting.)

To illustrate how close Erlang syntax is to Prolog, here is drop/3 in Prolog:

    drop(X, List, Result) :- drop(X, List, [], Result).

    drop(_, [], Acc, Result) :- reverse(Acc, Result).
    drop(X, [X|Rest], Acc, Result) :- drop(X, Rest, Acc, Result).
    drop(X, [Y|Rest], Acc, Result) :- dif(X, Y), drop(X, Rest, [Y|Acc], Result).
I only had to make a few syntactic changes to the original program to obtain a Prolog predicate from it. The most significant change is that I use an additional argument to hold the original function's return value. An important consequence of the resulting relational nature is that this can answer quite general queries.

For example, we can ask: What are possible solutions Ls that arise when we drop X from the single-element list [A] ?

    ?- drop(X, [A], Ls).
    X = A,     Ls = [] ;
    dif(X, A), Ls = [A].
We see that there are two possible answers: One where X is A, and hence the result is the empty list []. And the other where X is different from A, as indicated by dif(X, A).

Even more generally, we can systematically enumerate all conceivable solutions for all list lengths:

    ?- length(Ls0, _), drop(X, Ls0, Ls).
    Ls0 = Ls, Ls = [] ;
    Ls0 = [X],
    Ls = [] ;
    Ls0 = Ls, Ls = [_586],
    dif(X, _586) ;
    Ls0 = [X, X],
    Ls = [] ;
    Ls0 = [X, _598],
    Ls = [_598],
    dif(X, _598) ;
This uses iterative deepening to generate all possible answers.

Also not stuck using only an unbound result. Could we not ask "given L and Ls, what elements were dropped?"

Yes, that's also possible with this definition! For example: Given Ls0 = [a,b,c] and Ls = [a,c], which element (if any) was dropped:

    ?- drop(X, [a,b,c], [a,c]).
    X = b ;
As another example, if both Ls0 and Ls are [a,b,c]:

    ?- drop(X, [a,b,c], [a,b,c]).
    dif(X, c),
    dif(X, b),
    dif(X, a).
This means that X must be different from each of these elements.

More generally, which elements can be dropped at all from the list [a,b,c]:

    ?- drop(X, [a,b,c], _).
    X = a ;
    X = b ;
    X = c ;
    dif(X, c),
    dif(X, b),
    dif(X, a).
Interestingly, the predicate definition does not even use "=" in its clauses. In Prolog, (=)/2 is a built-in predicate that means unification. In the definition above, we use implicit unification instead of explicit unification.

I have mixed feelings on this. On one hand I did want this notation while learning haskell. Partly because there are a bunch of places in haskell's type system where repeating a variable implicitly gives equality constraints.

On the other hand something like

    drop _ [] = []
    drop x (y:xs)
        | x == y = drop x xs
        | otherwise = y : drop x xs
is often more readable at a glance to me since I don't have to parse variable names to see control flow structures.

Also, in statically typed languages what happens if equality doesn't work for a value.

Prolog's unification:

  X = Y,
  X = 5,
  %% Y = 5 now as well

  ?- append([1,2], Y, [1,2,3]).
  Y = [3].

Yeah, Prolog's unification operator really blows your mind the first few times you experience it. You left the best part out of that last example:

    ?- append(X, Y, [1,2,3]).
    X = [],
    Y = [1, 2, 3] ;

    X = [1],
    Y = [2, 3] ;

    X = [1, 2],
    Y = [3] ;

    X = [1, 2, 3],
    Y = [] ;

(a few newlines added for clarity)

Some(all?) compilers transform your code into something like this internally. Its called static single assignment form. https://en.wikipedia.org/wiki/Static_single_assignment_form

Though it's true that compilers usually turn assignments into SSA, that's not really the same as what parent was referring to. Static single assignment has the same semantics as normal C-style assignments. The parent was referring to assignments that have (Erlang's) pattern match semantics.

Also, there's a simple counterexample. You can have static single assignment that has dynamic multiple assignment (e.g. within a loop construct). Under the pattern-matching semantics, evaluating the assignment (or rather the pattern match) a second time could fail, whereas an assignment always succeeds.

>"Though it's true that compilers usually turn assignments into SSA"

What is SSA here?

Static single assignment (see Wikipedia link in grandparent).

Elixir has borrowed the pattern matching approach from Erlang, but allows rebinds which makes code refactoring easier.

iex(1)> x = 1 1 iex(2)> x = 2 2 iex(3)> ^x = 3 (MatchError) no match of right hand side value: 3

Notice the pin operator (^) to achieve the same effect.

I've used Erlang but not Elixir. Having an additional optional "pin" operator sounds like a bad idea, just piling on complexity. You can't mess up with the Erlang syntax.

The creator of elixir had a blog on this: http://blog.plataformatec.com.br/2016/01/comparing-elixir-an...

only the case argument there is any good. when it get to numbered vars, you know they are grasping for anything

Ya +1 I've been working with elixir lately and the `=` operator in erlang/elixir is the first thing that came to mind when I saw this.

>"In other languages we might say we have assignment and destructuring ..."

I'm not familiar with this term "destructuring." Can you elaborate on the concept? Might you have an example of a destructure operation and a language where its's used?


Basically it means that you can assign several variables at once, by asssigning a complex value to an aproprietly structured literal with variables as placeholders.

In pseudojavascript, let's say you got this from some API:

    var someList = [1, 2, 3, 4];
    var someMap = { name: "John Smith",
                    age: 20,
                    cars: [{ id: "AAA-1111", model: "Ford T"},
                           { id: "AAA-1112", model: "Mercedes Benz"}] }
And you want to extract useful data for your code quickly. In languages with destructuring you can do sth like this:

    var [x, y & rest] = someList; //x==1, y==2, rest == [3, 4]
    var { "name": name,
          "age": age,
          "cars": [ {"id": firstCarId, "model": firstCarModel}
                  & restOfTheCars ]
        } = someMap;
And the language will put the correct values in your variables name, age, firstCarId, firstCarModl, restOfTheCars.

Javascript itself supports destructuing fine, here's some javascript straight from the firefox console:

    > let a = [1,2,3]
    > [x,y,z] = a
    Array(3) [ 1, 2, 3 ]
    > x
    > y
    > z

The other responses captured the essence of what it is, but I thought it might be interesting to briefly discuss how Erlang uses it. As someone who hasn't been able to use Erlang in his day job for a year now, I sorely miss the language.

In nearly every other language, when you call a function, the top of the function typically involves a fair bit of analysis of the arguments to decide what needs to be done.

(That's being generous: typically the entire body of the function revolves around making decisions about what to do, such that you don't know what's going to come out of the function until you've traversed multiple branch options.)

In well-written Erlang, the function itself branches at the top, taking advantage of pattern matching, destructuring, and function heads.

Let's say you're passing around arbitrary data in the form of a tuple that you need to display. Or not, depending on the value of some debug flag.

In Python you might see code like this, assuming that an integer is stored in your program like ('int', 3) and a string might have some arbitrary label, so it's ('str', 'label', 'real string'):

  if (debug):
  def display(x):
    if (x[0] == 'int'):
    elif (x[0] == 'str'):
      display_string(x[1], x[2])
In Erlang, you can branch and destructure at the function head. You don't check the debug flag in the calling code, just pass it along for the function to decide what to do with it; if the flag is false, the function just returns false. (Variables in Erlang are capitalized; the lower-case "strings" in this code are atoms, also called symbols in some languages.)

  display(_, false) ->
  display({int, N}, _) ->
  display({str, Label, String}, _) ->
    display_string(Label, String).
Since you can choose the key data elements in your function arguments and branch accordingly in the function head, it becomes immediately obvious looking at the code how many different ways the code can evaluate, and the destructuring in the function heads means you don't have to waste code chunking the data apart before working with it.

(Updated: forgot to include the debug flag in the 2nd and 3rd function heads. In all 3 heads, the _ pseudo-variable basically says "I don't care what value we have here". Another option would be to use _Debug in the 2nd and 3rd heads to indicate to the reader what the flag is, or use true since we're expecting that value.)

(Incidentally, Python used to allow tuples to be destructured in the function head, but they dropped it in Python 3 because no one knew about it or used it. A sad day.)

I could imagine by destructuring he means e.g. unapply in Scala: https://docs.scala-lang.org/tour/extractor-objects.html

Ah OK its a functional construct, this make sense. I have seen this in my brief exposure to Clojure. Thanks for the examples and links.

Recent JS versions have it too:

    > let {a,b} = {a: 1, b: 2}
    > a
    > b

I think they are forgetting that a lot of languages take que's from old math textbooks where you would see function definitions written as "f(x) = nx + b" or "y = nx + b" and constant assignment with a "c = <number>" notation.

if you want to calculate the output of a function into a data table(which is what early computers were often doing) iterating a variable x over a f(x) = xn + b(with n and b being fixed constants) is exactly how you would do it on paper so it's probably a case of computers emulating applied math rather then simply being pure theory machines.

That is, explicitly, the origin of ‘=’ for assignment in Fortran. The section describing assignment is titled ‘Arithmetic Formulas’:

A. An arithmetic formula is a variable (subscripted, or not), followed by an equals sign, followed by an expression.

B. It should be noted that the equals sign in an arithmetic formula has the significance of “replace”. In effect, therefore, the meaning of an arithmetic formula is as follows: Evaluate the expression on the right and substitute this value as the value of the variable on the left.

This works for initialization, but not for reassignment. Most notably, it is absurd for statements like x = x + 1. Meanwhile, such self-referential updates are very common in imperative programming, so anyone designing the language would come across this.

You have something similar in mathematics. Recurrence relations. Since we are dealing with a sort of time difference inside a computer, something like "x = x + 1" can be interpreted as "the value of x at the next time unit is the value of x at the previous time unit plus one". That is "x[n+1] = x[n] + 1" and this is a recurrence relation. My guess is that early programmers were deeply aware of this time difference, so "x = x + 1" made perfect sense.

The way these recurrence relations are taught at university is to distinguish the new value from the old value by using a ' pronounced prime.

Here, you would write x' = x + 1 to give the recurrence relation x[n+1] = x[n] + 1. Or, more generally x' = f(x) for x[n+1] = f(x[n]). The reason for this is because in mathematics x = x + 1 is absurd (ignoring modulo arithmetic).

In highschool, we used numeric subscripts for that. Ticks(apostrophes) were kept for derivatives.

If only there were enough notation that every notation could be unique.

Though it's neither a tick nor apostroph, but as the parent said a prime: https://en.wikipedia.org/wiki/Prime_(symbol)

What you have there is a sequence. Recursively defined or not, taking the indices out of a sequence takes away the only thing that makes it a sequence (the mapping from the natural numbers) and would be a horrible abuse of notation.

I think language designers were certainly consciously aware that they were designing something well defined, and chose to use this kind of syntax because its simpler, rather than an implicit mapping to the natural numbers via something like CPU cycles.

There is a really simple mapping between recurrent sequences and functions. Simply, given a function $f: A -> A$ that is, a function that outputs from the set it gets input from.

We then have the sequence $x_n+1 = f(x_n)$. Often, when dealing with incremental algorithms, the notation x' = f(x) is used. Here x' (pronounced x prime) stands for "the next value of x". It's a nice balance between the correctness of using indices and the conciseness of leaving them out.

Going to a higher level, the sequence x' = f(x) is essentially trying to find a fixed point of the function f. To look at this in an actual for or while loop, you need to consider the stopping condition of the loop as part of the function.

> What you have there is a sequence.

See Lucid[1]

>takes away the only thing that makes it a sequence

Unless everything is a sequence[1]

[1] http://www.cse.unsw.edu.au/~plaice/archive/WWW/1985/B-AP85-L...

Yes, I remember noticing how similar basic was to what you'd see in a math textbook. You'd expect to see a variable defined like "let x equal (whatever)", and basic just mimicked that: "LET X = 9".

_old_ math textbooks? Have they changed this in new ones?

Saying something is old implies that it has been the case for a long time, not that it has necessarily changed recently.

A coworker of mine once referred to := as the 'Zoidberg operator', and the name has stuck with me since.

At first I thought you meant "why not colon-equals?" but now I see it (literally).

Cannot unsee.

How about :≡ ?

thats the logically equivalent zoidberg operator

hehe, you are right, I can only see it now tho

Here's what Niklaus Wirth (Pascal, Modula-2, Oberon) said about using the equal sign for assignment:

> A notorious example for a bad idea was the choice of the equal sign to denote assignment. It goes back to Fortran in 1957 and has blindly been copied by armies of language designers. Why is it a bad idea? Because it overthrows a century old tradition to let “=” denote a comparison for equality, a predicate which is either true or false. But Fortran made it to mean assignment, the enforcing of equality. In this case, the operands are on unequal footing: The left operand (a variable) is to be made equal to the right operand (an expression). x = y does not mean the same thing as y = x. Algol corrected this mistake by the simple solution: Let assignment be denoted by “:=”.

> Perhaps this may appear as nitpicking to programmers who got used to the equal sign meaning assignment. But mixing up assignment and comparison is a truly bad idea, because it requires that another symbol be used for what traditionally was expressed by the equal sign. Comparison for equality became denoted by the two characters “==” (first in C). This is a consequence of the ugly kind, and it gave rise to similar bad ideas using “++”, “--“, “&&” etc.

From Good ideas, through the Looking Glass by N. Wirth:


At one point in my career, I was afflicted with a Wirth-designed language. His opinion on what constitutes good language design does not carry much weight with me. In particular, I can tell you that the extra typing of Pascal over C really does matter over a couple of years. And that := for assignment is a major pain when your left pinky finger is out of action for weeks, and you still need to hit shift for every assignment.

But then we get to this line:

> Because it overthrows a century old tradition to let “=” denote a comparison for equality

In what sense is "=" for comparison a century old? Hasn't it been used for "things that are equal" for multiple centuries? If so, and if that's distinct from "for comparison", then isn't for comparison also a new, non-standard use?

Does anyone understand what he's on about in this quote?

In math, "=" is an equivalence relation.

> it overthrows a century old tradition to let “=” denote a comparison for equality

It also, in science and engineering, denotes a method for obtaining lhs when you have values for the terms on rhs. That is the point of converting y = ax² + bx + c to x = (-b ± √(b² − 4ac)) / 2a. Early languages like Fortran explicitly followed that usage.

Not a surprise that the gripes came from the guy who birthed Pascal, seem odd.

> Because it overthrows a century old tradition

And if the goal of mathematicians and programmers were the same, this might be a compelling argument. In no sense do I see my programs as giant algorithms, they're machines, and as such require different syntax to construct.

> But mixing up assignment and comparison is a truly bad idea

Then := is also a bad decision. You should use the fully historically supported form of 'let <variable> = <value>'. Both := and == suffer from the fact that if you omit a single character, you change assignment into comparison or vice versa.

It strikes me as a nitpicky argument for it's own sake.

You're missing the point - languages like Pascal have a := assignment operator because the = operator is for comparison only. If you accidentally drop the :, then you have a simple Boolean expression. If this Boolean expression is used as a statement, by itself, then the result is a compiler error.

The '=' sign is used because after that statement the variable will indeed have equality with the r-value.

Coincidentally in async languages like Verilog there is a another assignment operator '<=' which means that the variable won't take the new value until the next clock cycle (or more explicitly the next evaluation of the process block). '=' exists also and has the same meaning as with traditional languages.

By that logic,

    a <= 1
Should set a to the minimum of a and 1. I.e. after that statement the variable will indeed be less than or equal to the RHS.

Languages borrowing the syntax don't need to follow its logical consistency. I just thought it was an interesting anecdote.

What about `a = a + 1`?

The r-value is ephemeral it doesn't exist once the statement ends.

> Since assignment is about twice as frequent as equality testing in typical programs, it’s appropriate that the operator be half as long.

Ken Thompson on why '=' is assignment and '==' the equality check.

It's this kind of mindset that puts me off Go. But I can totally see that many who want a better C getting into Go exactly for reasons like this.

But := is not "assignment" per se in Go, it is 'declare and assign'. Something like auto x = 2; in modern C++ (or C#);

If you declare a variable, then x = 2; works in Go.

I'm not sure what you find disquieting about the idea. "More frequently used operators should be less verbose" seems reasonable to me. What I find a bit strange about Go is that if they have that mindset then it's still a fairly verbose language, syntactically -- at least compared to some other modern languages. It seems to hang on to the familiarity of C syntax with a few optimisations. Which, of course, is not necessarily a bad thing at all. It's just not doing everything it can to reduce verbosity.

> "More frequently used operators should be less verbose" seems reasonable to me.

Syntax that looks pretty in isolation is a poor design guideline for programming language design. It ignores the problems of writing and debugging this code. I don't know how many bugs went unnoticed because of the "if (a = b)" mistake but it certainly weren't few. Sure, nowadays the compiler warns you about that but that took surprisingly long to be implemented.

The argument for saving a few keystrokes for potential longer debugging sessions is not a good argument. Code is much more often read than it is written.

But I didn't criticize Go's verbosity (after all, they "fixed" the assignment thin) but the mindset of doing or not doing things for specific reasons that look quite backwards for today (or let me be frank, just plain stupid) but the community then fights with vigorously for this. There are examples for that in the design of the language but the most egregious example of this is the package manager. This went from "we don't needs this" over "do it in this problematic way" over "let's do it like anybody else" to now "no, we are special, we need to do it completely differently".

IMO, Go would have been a fantastic language to have in the 90s. But looking at it from today's perspective, it looks outdated in many places. But compared to C which is a language of the 70s it is still great and therefore I understand its appeal for programmers who haven't found another language to replace C with (going from my own experience, most programmers have replaced C with multiple languages instead of just one).

The length of symbols should roughly be negatively correlated with its frequency.

Since in procedural language, assignment is a much more common operation than equality check, it is reasonable to favor "=" over ":=" or even "<-".

What does prevent : to become the assignment operator of a language? It's short, and quite "explicit".

x: 2

Self and Newspeak sort of do it this way, adding implicit self to Smalltalk's keyword syntax and accessor convention.

These 'accessors' also work on local variables, there is no special syntax for assignment. So when you write x it means send the message x, which starts its lookup in the local environment and works itself outwards. When you write x:2 it sends the message x: with argument 2, also starting in the local environment.

The accessors are automatically generated in pairs for slots. It sorta works, but it seems a bit too convoluted just to say "hey, we can do everything with just messaging".

The Rebol family of languages works like that.

Except that it doesn't. ":" in "x:" is not an assignment operator and "x" is not a variable.

Yes yes, it's a set-word! type in the "do" dialect. That's not how it's learned when first starting, though, and not usually the thing you're thinking about when using it moment-to-moment.

Shift key?

also, there's no hand balance on either of the composite symbols. := is my right pinky, <- is different fingers, but bottom and top row. neither exactly flows from the fingertips.

I don't think that is a big issue. Just bind it to something else. In RStudio by default <- is bound to alt and -.

But that is still 2 keystrokes. That is why in R they came to the compromise to use both <- and =.

Well, it also adds a space before and after so you're saving one keystroke vs space = space

a byte of RAM or disk is cheaper than a human neuron. Brains are more important than fingers.

Well, I suppose one could add a `⟵` key to the keyboard... :-P

So you have a key with the assignment operator on.

I actually rationalized the '=' symbol in assignment into the following statement, "Let 'left hand side' be equal to 'right hand side'". Using this wording resolves some dissonance around its overloaded usage.


Fun fact: various dialects of BASIC let you optionally put LET before your assignment statements. So the following are equivalent:

    X = 10
    LET X = 10

Wait...LET was optional? Auigh! Whole months of my childhood, wasted!

Depends on the dialect. It was mandatory on the Sinclair BASICs, for example (ZX81/ZX Spectrum), but optional on the Amstrad (Locomotive/Mallard) and BBC BASICs. A quick search suggests it was optional as early as "MicroSoft"'s Altair BASIC: http://swtpc.com/mholley/Altair/BasicLanguage.pdf

As someone who learned first on a ZX81, I'd always assumed that "LET X=5" was the canonical form, and that "X=5" for assignment was just a shortening for convenience.

Likewise - but of course in Sinclair BASIC it was mandatory because every command had to begin with a keyword - a fact enforced by the fact that the cursor started out as a flashing [K], and every key on the keyboard was mapped to directly enter a keyword in that mode. LET was on the 'L' key. You literally couldn't type "a = 1" into a Spectrum/ZX81 - you couldn't type an 'a' unless you were at an [L] cursor prompt, which would only appear after you had chosen a keyword. Typing an 'a' at a [K] cursor would get you the keyword NEW.

I wonder if you, like me, were also completely thrown when you first encountered a programming language where you didn't also have to enter line numbers...

> I wonder if you, like me, were also completely thrown when you first encountered a programming language where you didn't also have to enter line numbers...

Haha, yes, totally! For me this was AMOS on the Amiga (a Basic variant). I was so thrown I started by using them anyway as they were supported by the interpreter - it just treated them as labels.

I soon stopped though when I discovered it didn't reorder based on number, so you ended up with

  10 print "world"
  5 print "hello"
  20 goto 5

Not too thrown - I went from BASIC to Z80 assembler, and that just had labels...!

Only tangentially related: an (unsuccessful) search for a genealogy of the Basic programming language led me to https://en.wikipedia.org/wiki/Basic_Using_Reverse_Polish, which led me to http://vintagecomputers.site90.net/comp80/, which has scans of articles giving examples of that language, for example (from http://vintagecomputers.site90.net/comp80/Part_4_Jul_79.pdf):

   018 LET P=P K * B -

   135 LET Q=1 F G / 1 - REC 0.00001 * - Q *
but also the sort-of traditional

   016 FOR X=1 STEP 1 UNTIL T
I guess they had a Basic that only allowed expressions with a single operator or a single function call (some early systems with very limited memory did that) to the right of =, and didn’t have the memory for an infix parser.

Depended on the dialect. Whatever we had on my first PC as a kid, it was mandatory. Later, in HS, when we used BASIC for a first CS course, it was optional.

BASIC on PCs. How new-fangled. I forgot about LET. You probably needed to use it on the HP 2000 minicomputer I first learned BASIC on. :-)

Visual Basic, notably, made 'Let' optional, but made 'Set' mandatory when assigning object references.

I just intuitively understood that symbols can have different meanings in different contexts, so never felt any dissonance or had any trouble understanding it. Like "read" has different pronunciations, or "plane" has different meanings, "=" is just multiple things under different contexts.

I agree in that this doesn't really take conscious thought anymore. However, the article had the following:

> How can a = a + 1? That’s like saying 1 = 2.

It looks like a lot of us don't have (never had?) trouble accepting a statement like `a = a + 1` and, upon stepping back, thought maybe it had to do with our internal dialog (which is now probably intuition as we've been processing statements like this for so long).

I've been programming professionally for 25 years using mostly C inspired languages, and I will still regularly write "if(foo = bar)" on a daily basis and not notice until there's an error. It's easily the most common syntax error I write, followed closely by using commas in my for-loop as apparently it looks like a function to my fingers: "for(x=0, x<10, x++)"

There is a Ruby idiom[0] that I find quite useful (esp. combined with a linter such as Rubocop), leveraging optional parens as a marker of intentional assignment:

  # bad
  if v = array.grep(/foo/)
    # some code

  # good
  if (v = array.grep(/foo/))
    # some code
Caught a number of nasty bugs that way.

[0]: https://github.com/bbatsov/ruby-style-guide#safe-assignment-...

You can catch this in c using -Wall.

This is why I'm a big fan of yoda notation where applicable

if (2 == x)

Or simply enable compiler warnings for that...

Don't compilers warn about assignment inside of conditionals now?

In C++17 not only can you do assignment in the conditional, you can declare new variables which have the scope of the then and else clauses.

But yes, most compilers will catch this case.

The compiler warns unless you add an extra () around the assignment to signal that you mean it.

It's used often enough in loops like this:

    while (v = next()) { ... }
or, perhaps more usefully:

    while ((v = next()) != ERROR) { ... }
EDIT: just did a quick check, and LLVM did in fact warn me about this:

    if (x = 4) { ... }
To silence the error, you have to double up on the parenthesis, which I guess is a reasonable enough solution:

    if ((x = 4)) { ... }

Same here. I have flycheck running a linter, so I catch it a bit quicker, but single = is my single most common typo.

Which IDE so you use? Even VIM has plugins to warn on assignment in an if() statement.

Using = for equality checking in SQL doesn't help with the occasional mistakes...

My first programming teacher insisted we call the single = sign the 'gets' operator. So

int x = 10 int y = x

would read in English as x gets 10, y gets x.

Shortly after that class I took a break from any coding. When I went to another school I saw Softmore and Junior level programmers still struggling with this. I retained the habit of using 'gets'and never had a problem with this.

I've heard the same but with 'becomes' rather than 'gets'. I think becomes works better for people with a mathematical background, where the concept of variables is totally natural. For people without this, 'gets' might be better because it helps anthropomorphize the variable. The coder 'gives' X a value to hold.

People with a mathematical background should be thoroughly familiar with words having overloaded and jargoned up meanings. Fiber Bundle, group, set, relation, etc... They all mean something different when you're not talking about mathematics, and '=' is no different.

If you're speed reading some code 'becomes' is a bit of mouthful.

Do you mouth the name of operators when you read code? Like most people I do make the words sound in my head when reading text but only the variable names make sounds when reading code.

I'm a bit of a mutterer but it depends on my mood I think.

Something I've noticed is that I don't tend to subvocalise when reading code, but if I do, I generally read "x = 10" as "x e. 10" rather than "x equals 10" (so basically just the first syllable). I will read "if x = 10" as "if x equals 10".

Not sure why/how I picked it up, but similar principle, I suppose.

You'll have to pry my BCPL heffalumps out of my cold, dead hands.

  The operator " = >" (pronounced "heffalump") is 
  convenient for referencing structures that are 
  accessed indirectly. The expression 
  a=>s.x is equivalent to the expression (@a)»s.x.
Source: https://archive.org/stream/bitsavers_xeroxaltobualSep79_5187...

I don't understand the LISP line at all. I would have said "LET", "SET", and "EQUAL" (since it's talking about numbers). Is there something I'm missing?

EDIT: It's since been changed to "let", "set", "equal" (but still lower-case).

I don't understand it either. I suspect the author does not know Lisp, and badly misunderstood an explanation from someone who does.

I think that Lisp doesn't even fit well into that table, since putting LET in the first column would implicitly bring in declaration as well, much of the time SET in the second column would be misleading since mutation is so often done implicitly by a looping construct like DO, DOLIST, DOTIMES or whatever (no SET in sight), and there are all sorts of equality tests one could use.

Edit: ok, I see the author has both updated it, and is specifically talking about Lisp 1.5 which does not have the looping constructs. On the other hand, it at least has a bunch of alternatives to SET that probably should be listed as well, like RPLACA, RPLACD, ATTRIB, and so on. I think that even in Lisp 1.5 variables could be declared in different ways to using LET (like rebinding a variable passed in to a function, so the assignment was in invocation).

It's not a perfect correspondence, but I think the point is simply to show that back in the 1950's, there was no consensus yet on how assignment ought to look.

It just happens that there was no consensus yet on how it should work, yet, either!

Then again, neither of the last 2 (modern) programming languages I've used have had FORTRAN/ALGOL-style assignment semantics, so I think the jury's still out.

The article's been updated.

During the public comment period for the original ANSI C standard, we had at least one request to add ":=" as an alternate assignment operator. We declined, but I personally would have supported that. The use of = and == is one of my least favorite bits of C syntax.

These days, I suppose we could add the Unicode character for :=


and there is also


if you find typing two = characters fatiguing...

Another question is why the assignment is from right to left when the natural direction would rather be left to right, like 2+2 => x to store the value 4 in x.

I was told once by a math professor that it is a habit inherited because we use Arabic numbers/maths which were really meant to be read from right to left. Don’t know if the theory has any merit.

In conventional mathematical notation, as used in science and engineering, formulas usually have the form ν = ε, which is both a statement of equality and a method for calculating ν from the terms in ε. For example, V = I R. If you have V and I and want R, you start by rewriting this as R = V / I. Computer languages explicitly followed this.

English has Subject-verb-object word order, not object-verb-subject. That's why x is the first word in x = 2+2

> English has Subject-verb-object word order, not object-verb-subject

Imperative programming corresponds to the imperative moodin English, where English normally has verb-object word order with the subject (the entity being commanded) ommitted. The subject of the command to set the value of x to the result of the addition of two and two is the computer/runtime running the code, not the variable x which is the direct object.

English has SVO order for declarative sentences, which correspond to declarative programmig, which tends to feature definition or binding rather than mutating assignment.

>A common FP critique of imperative programming goes like this: “How can a = a + 1? That’s like saying 1 = 2. Mutable assignment makes no sense.” This is a notation mismatch: “equals” should mean “equality”, when it really means “assign”.

I sort of disagree with this. Many functional languages pull heavily from lambda calculus and other forms of mathematics. In math, "a = a + 1" isn't the same as "1 = 2". The issue isn't equality, it's that you're trying to rebind a bound variable, which isn't possible.

In other words, rebinding a bound variable is not the same as "1 = 2".

> In math, "a = a + 1" isn't the same as "1 = 2". The issue isn't equality, it's that you're trying to rebind a bound variable, which isn't possible.

"=" means equality in math; a = a + 1 is the same as 1 = 2 because if you subtract a from both sides and add 1 to both sides you get 1 = 2.

Lambda calculus has the concept of binding variables, but it doesn't use "=" for that, it uses application of lambda forms. It's the same idea that's applied in some variants of Lisp, where LET is a macro such that (let ((x 1) (y 2)) ...) expands to ((lambda (x y) ...) 1 2).

The way it plays out is that rebinding is perfectly fine, because it's not really any different from binding in the first place. The same way that

    (let ((x 10))
      (f x)
      (setf x (1+ x))
      (g x))
can be re-written as

    ((lambda (x)
       (f x)
       (setf x (1+ x))
       (g x))
That can itself be re-written as:

    ((lambda (x)
       (f x)
       ((lambda (x)
          (g x))
        (1+ x)))
If you'd like to read more about this sort of thing, Sussman and Steele's "Lambda: The Ultimate Imperative" is a good starter: http://repository.readscheme.org/ftp/papers/ai-lab-pubs/AIM-...

Rebinding is essential for recursion, which is a concept in mathematics. Given a fib(x) function defined in terms of itself, the when we express fib(10), the parameter x is simultaneously bound to a number of different arguments through the recursion. We just understand those to be different x's in different instances of the f scope.

Rebinding is also needed for simple composition of operations. Given some f(x), the formula f(3) + f(4) involves x being simultaneously bound to 3 and 4 in different instances of the scope inside f.

Of course you can rebind a bound variable; by extending into a new scope.

Functions cannot "work" in math if arguments cannot simultaneously be bound to different values.

Recursion is not possible, and so defining computation in terms of recursion goes out the window.

> rebinding a bound variable is not the same as "1 = 2".

Therefore using "=" to mean "bind variable" is a notation mismatch, which is the whole point of what you quoted, no?

Is it really worth allowing assignment in places so weird that you also need ==?

Wouldn't it be easier to use = for everything and get rid of those weird edge cases?

Yeah I was pretty surprised when I found out that assignment and equality can be totally syntactically separate and nonambiguous with miminal language-design effort. Just get rid of the silly idea of allowing expressions as statements.

Although even then it's nice to use different symbols because they are different meanings. I don't like it when a word has different meanings depending on context.

Just get rid of the silly idea of allowing expressions as statements.

I don't think that's enough. Take the two following Python lines:

  a = b = c == d

  a = b == c == d
No expression is being used as a statement, yet you can't syntactically separate assignment from equality.

I would write these as

    a = b = (c == d)

    a = (b == c == d)
respectively. And I think it would make sense to have a syntax rule that strictly enforces parentheses around truthy expressions used in such contexts.

Right, but then you're using the parentheses to distinguish the assignment from the comparison, it's not just a matter of not allowing expressions as statements.

I think GP might have meant that to go the other way around, i.e. don't allow statements to also be expressions, and more specifically, make assignment a statement.

Well first I was super intriguied by the example.

But I agree with your conclusion, that statements should not also be expressions, so a = b = c shouldn't work (at least not the way we're used to, that the b = c is an assignment "statement" that also produces an expression value).

But in the end all this does is allow "a = b = c" to be a nonambiguous statement meaning, in more familiar notation, "a = b == c". Not exactly clear! So even though a language could use the same symbol for both assignment and equality-check, I don't recommend it! Although I still like to keep statements and expressions as strictly non-interchangeable constructs.

Just noting that I didn't mean to express any particular preference for syntax, just addressing whether it's technically possible. Personally I'm fine with = vs == being prevalent, but might prefer something like := and = respectively.

What I've not seen is examples of needing to distinguish between assign and compare, that outweigh the added complexity.

if x=y # obviously compare x=y # obviously assignment x=y=z # is that really needed? If so, brackets

No, Python assignments are not expressions. "a = b = c" is just an assignment statement with multiple parts, it's not equivalent to "a = (b = c)", which in fact will throw a SyntaxError.

Implementing this typically means making substantial changes to a language that doesn't already have it, or implementing a new language. In such a language, having a special multiple assignment syntax based on trains of = and variable names would obviously be impractical.

It does lead to some pretty neat and idiomatic C, if only they just picked a different symbol for assignment.

It is extremely useful for all statements to be able to be used as expressions, to chain and nest values. It's purely syntactic clumsiness on the part of those language families that assignment and comparison are so close to the notion of "equals".

Using = for everything still has the problem that "x is always and forever this value" looks exactly the same as "y is this value for now but will be a different value in the future" or "give z, which previously had a different value, this new value". These are three different things and should have different syntaxes; = is appropriate for the first (and using it for that is not incompatible with using it for equality comparison) but not for the second or third.

I think using arrows would be nicer, especially for complex destructuring / pattern matching, and you arrows are incredibly clear.

A complex destructuring case would be something like this:

{cashMoney -> foo, stuff: {powerLevel -> bar} <- {cashMoney: 5000, stuff: {powerLevel: 8999}}

I think the arrows nicely indicate where data is flowing, and it's unambiguous what is a variable name and what is a field name.

Or you know use different operators entirely

I mean it has plenty of utility.

You want PL/1

> Ken Thompson [...] personally, aesthetically, wanted to minimize the number of characters in source code.

Funnily, when Thompson was asked what he would do differently if he were doing it over again, he said, "I'd spell creat with an e."


I always liked DHH's take on these sorts of arguments (paraphrasing): who the hell cares? Once you know the purpose of the '=' how often do you make mistakes reading or writing code?

Whereas Java is all about protecting developers from themselves, Ruby (for example) let's you get away without variable type declaration because at the end of the day, how often do you not know whether a particular variable is a string or an integer? Enough to justify enforcing declaring everything at the outset or receiving errors throughout your code?

That's not to say that those enforcements never make sense. I think that structured languages like Java are better for larger teams where enforcing standards is more important than a smaller team. But there are other ways to do that and a lot of time the rules make development a horrible experience.

My issue with the larger teams vs. smaller teams argument is that the price of success is finding yourself with the wrong tools because successful projects usually grow large teams. The counterargument is that the price of choosing the right tools for a large team may be failure when those tools don't let you execute quickly enough. I don't really buy this trade-off. Maybe it is true for the specific language of Java for a small team (I'm not so sure it is) or Ruby for a large team (I have lived this and found it to be anecdotally true, so I do buy this one), but I don't believe you have to make this choice; I believe there can be languages that are productive for small teams while avoiding foot-guns for large teams.

> how often do you not know whether a particular variable is a string or an integer?

In someone else's code? 100% of the time. I was once a diehard Ruby believer, and I still use it for small programs. But having had the singular displeasure of working on a huge Ruby codebase (pre-JVM Twitter), I have learned, in the hardest possible way, that untyped languages are absolute nightmares when more than one person is involved.

As someone who does Rails development as his day job, man do I hate convention over configuration. The argument, "Once you know that X, it's very easy to understand" sounds great, but there are a lot of Xs! This is fine if Rails (or whatever) is your life and you are going to be Rails-boi or Rails-grl until the industry moves on and you pick up your next career at MacD's.

From a language, library, or framework I want to be in and out as fast as possibly. I want to have unambiguous usage that is easily discoverable. I want to avoid having to load a million things into my memory -- not least because I work with a huge pile of legacy code that all uses different technology. Rails is not the worst system I've ever used for this, but it's edging up there. Ruby, as a language, I don't mind at all, but the coding convention of people who primarily use Rails is something that I think can be improved dramatically.

Of course, IMHO ;-)

I prefer to take the middle ground: implicit typing in C#/Scala with var. Benefits of both worlds.

> Once you know the purpose of the '=' how often do you make mistakes reading or writing code?

A nontrivial proportion of people who start trying to learn to program never get past that point, so it's worth taking them into account.

> at the end of the day, how often do you not know whether a particular variable is a string or an integer?

Strawman. Types are not about the difference between a string and an integer, they're about the difference between a user id and an order id, or a non-empty list and a possibly-empty list, or...

>"Strawman. Types are not about the difference between a string and an integer, they're about the difference between a user id and an order id, or a non-empty list and a possibly-empty list, or..."

Can you explain - how is the difference between a string and an integer different from the difference between a user id and an order id? You're calling the OPs comment a straw man but I'm not understanding how your examples are not the same thing given type system that understood all four of those.

Many Ruby users don't realise that the type system can understand all these differences. How often do you not know whether a particular variable is a string or an integer? Not very often. How often do you not know whether a particular variable is a user id or an order id? Much more often. So if you think the only kind of difference a type system can tell you about is the difference between a string and an integer, you vastly underestimate the number of bugs that a type system could help you avoid.

> because at the end of the day, how often do you not know whether a particular variable is a string or an integer?

I'm refactoring some data-pasta to more clearly declare types because we just have dicts of lists of whatever, so often enough that a random reader chimed in after 20 minutes.

Simple, programming is like using Old Speech, the language of Dragons. We cannot lie in that language and therefore when we make a statement it becomes true. (In case I'm being too obtuse.. Earthsea)

But really, this is such a pedantic discussion, trying really hard to not get sucked in.

In R, it is actually distinguished that way, example:

a <- a + 1

a + 1 -> a # also works, but REALLY bad practice

But, so does

a = a + 1

Granted, there are a bunch of R haters (especially from people with formal CS educations), I think this convention makes a lot of sense. While most will disagree about the '<-' I like it from a code reading sense in that you know that it is an assignment right away. Coming from a mathematical perspective before learning to code, this makes a lot more sense in the 'assignment' fashion.

In case you are wondering, the difference between <- and = in R is in scoping. For example, in the following function call:

foo(x = 'value')

x is declared in the scope of the function, whereas:

foo(x <- 'value')

x is now declared in the user environment. Granted, that is not good practice, but that is why there is a difference.

> Granted, there are a bunch of R haters (especially from people with formal CS educations), I think this convention makes a lot of sense.

To be clear, the assignment syntax is great, it's other things that are the target of R haters' hate! Like a high-level language without hash tables.

You can use environments for hash tables.

`->` is a bad practice? I like to use it at the end of a long `%>%` pipe. Reading the whole thing feels a lot more natural.

The first time I ever saw that you could do that was in a blog post about stringing at the end of the a series of pipes. I thought it was pretty neat (and actually used it a couple times), then had problems when I couldn't figure out why my code was messing up.

for example, this assigns a ggplot to 'plot': df %>% na.omit() %>% ggplot(aes(x=x, y=y)) + geom_line() -> plot

That is really confusing in that the way most people would read it is that it is something to be plotted. However, the assignment does occur and is masked. having 'plot <- df %>%' as the first line makes it clear that a new object is being created.

We actually had to modify our style guide to prevent the '->'

It is bad practice because you are hiding the side effect.

most of this article is covered in the Wikipedia page:


but Wikipedia says: "The reason for all this being unknown. [footnote] Although Dennis Ritchie has suggested that this may have had to do with "economy of typing" as updates of variables may be more frequent than comparisons in certain types of programs"

while the OP says: "As Thompson put it:

Since assignment is about twice as frequent as equality testing in typical programs, it’s appropriate that the operator be half as long."

Thompson and Ritchie designed C together.

Economy of typing definitely shouldn't be the #1 concern when designing a language. Look at APL.

> Prolog came out the same year as C and popularized logic programming, and you can sort of fake assignment with cuts.

Can anyone explain to me what this is supposed to mean? (I know Prolog, I understand cuts very well. But I don't know what the author means here.)

It means I don't understand Prolog very well and really shouldn't have namedropped it :P

Prolog is a good example of a language where "=" doesn't mean assignment. In Prolog, "=" means unification: You can read X = Y (which is =(X, Y) in functional notation) as: "True iff X and Y are unifiable". You can think of unification as a generalization of pattern matching. For example, f(X, b) is unifiable with f(a, Y) by the substitution {X → a, Y → b}. I think this aspect would be a valuable addition to the article.

Fun fact: If (=)/2 were not available as a built-in predicate in Prolog, you could define it by a single fact:

    X = X.
Thus, you do not even need this predicate as a predefined part of the language.

BASIC also came out with “=“ character for assignment in 1964, 5 years before B. I think it also contributed to the adoption, considering how it was popular in the 80’s.

Because ASCII had already replaced the ← with the _, something which made PIP somewhat harder to use, as this:

    PIP destination←source /switches
became this:

    PIP destination=source
or this:

    PIP destination_source

I've always had the notion that if I made a language I'd make the operators := and == so it'd be harder to accidentally use the wrong one.

Wouldn't : and = make it equally (heh!) hard to use the wrong operator notation?

Using == and := would guarantee you never accidentally use = by itself (out of habit from other languages) because that's not a valid operator.

I always read it with an implied "let" in front, so it has always sounded completely natural. It's also ergonomic to type, easy to read and while you can confuse := and = on account of : being a small glyph even in monospace fonts - you can mistake it for empty space if you're just skimming, == and = are harder to confuse since one is twice as long as the other.

I like = for assignment, but I don't like it for configuration. HCL and TOML chose it, to their detriment, I think.

When it reads as "let x equal y" it makes sense. I think it makes good sense for imperative programming. It makes slightly less sense for functional programming. For declarative configuration, I think a colon makes much more sense.

The way I explain this to beginner programmers is that there are three types of equals in math and programming: The math "=" means "these are equal" The programming "=" means "make these equal" The programming "==" means "are these equal?"

"I don’t know if this adds anything to the conversation. I just like software history."

This topic has my curiosity peaked as well. What is a good place to start reading/understanding more about this?

From my understanding software development is a complex field that blew up in multiple places at the same time.

I value clarity and explicity a lot. And I’m someone who generally lines to think about proper naming of a variable twice to make it easier for future readers of code.

At the same time when reading

> How can a = a + 1? That’s like saying 1 = 2.

I think: Well, depends how you interpret it.

If the translation is „we state, that from now on, a is the previous value of a plus 1“ it’s totally ok.

Not a big difference from saying

a = b + c

So i‘m not sure if the usefulness if changing all languages to using := is that high that it’s worth to think about changing it in current languages - and even when inventing a new language I’m not sure if it’s helpful.

I also wonder if reassignment, beside counters that need to be changed with iteration/ appearance of the event to be counted, is a thing that should generally be avoided if possible. Ok maybe in general everything where data is transformed in iterations...

TIL: "BCPL later adding a floating point keyword. And when I say “later”, I mean in 2018. "

That one really gave me a chuckle.

Because programming languages are typically designed for the tiny group of existing programmers than the much larger group of future programmers. The same reason unnecessary tokens exist, and 'drop' means delete in databases. It's entirely cultural.

Technically, DROP is used because DELETE is taken by something that deletes rows and using DELETE for tables and rows is a bit scary so you get DROP. I suspect quite a lot of naming happens that way.

I'd have thought, not being a programmer, that drop was used because whilst the access and relations with the drop-ed table are updated the table isn't necessarily deleted.

Drop updates the logical structure without necessarily acting on the physical storage in a way commensurate with "deletion". You can this drop a million tuple table in a ms (less I expect) whilst deletion would take far, far longer (of the order of a million-times longer).


> 'drop' means delete in databases

DROP doesn't mean DELETE.

DROP <object-type> <object-name> is sort of like a macro for, loosely

DELETE FROM <catalog-relvar-for-object-type> WHERE name = <object-name>

Except that real RDBMSs don't usually have DDL that is really equivalent to DML against system tables, and particularly (esp historically, but still in many years DBs), DDL has a different relation to transaction processing than DML, so it's a very good thing for clarity and developer intuition to not overload DML keywords for DDL operations, despite the loose similarity between CREATE/DROP in DDL and INSERT/DELETE in DML.

It does according to your next sentence.

>Because programming languages are typically designed for the tiny group of existing programmers...

So this is a dilemma I have while working on a new language. I'd like to go with `:=`, but `=` is absurdly popular, and I'm trying to keep the language as approachable as possible.

I don't think the clarity of `:=` is so compelling that it outweighs the `ew, why are there colons in there` reaction that I think most novice coders would have.

(Remove redundant commentary about database stuffs.)

The key isn't whether you use := or =, it's whether you allow assignment in expressions.

My advice: don't allow assignment in expressions. To me, it's like the case-sensitive issue: the language designers think it's a useful feature, but it actually works against most developers.

I definitely agree that assignments should be statement level operations.

I don't think case-folding identifiers is helpful. The language has decreed fooBar is the same as foobar, and that handles the error where you spelled the same idea two different ways, but it fails silently on the error where you spelled two different things a similar way. Worse, there are some people who are very sensitive to case and will be confused, while others will happily type their entire code in all caps.

I think a linter is the best way to catch these issues, and those subjective rules are precisely the sort of thing that need to develop more rapidly than the core parser.

Yes, but again, the issue is whether most developers will be hindered or helped by case-sensitivity in a language. Based upon my experience, identifier case-sensitivity is simply making things harder than they need to be on the developer.

Conceptually, what is the difference between these two identifiers:




And the key here is the reason for the difference: if it's a typo, then a case-insensitive language design will allow it and no-harm, no-foul. If it's not a typo, then who wants to work on a codebase littered with identifiers whose only difference is case ? :-)

> Conceptually, what is the difference

In Haskell, one is a variable, the other is a type, and that's enforced by the language. It's the same, albeit by convention, in Java. There are a lot of cases where you want to describe a type and a thing, so apple = new Apple() is pretty reasonable.

When I think of case-insensitive languages, I'm thinking of Basic, LISP, SQL, and those don't have a lot of type declarations.

And consider two counter-examples:

  my_instance vs myinstance

  things vs THINGS
The first shows case-folding is only a partial answer to ambiguous identifiers. The second shows that differences in case can be very obvious to the reader.

Those are motivators to me for pushing this off to the linter: there are a lot of subjective judgements in what should and shouldn't be the same, and having the language keep its rules for identifiers as simple possible seems like a good separation of concerns.

My final concern is metaprogramming and interoperability. In SQL, for instance, there are bizarre rules to work around case-insensitive identifiers. If another system asks you for "myObjectInstance" and "MyObjectInstance", it has to know your case folding rules to know those two identifiers are the same.

> If it's not a typo, then who wants to work on a codebase littered with identifiers whose only difference is case ? :-)

Ever worked on a Python project that interacts with Javascript, so it's snake and camel case?

I generally agree, I'd just prefer a gofmt-style utility that would just automatically resolve those and tidy everything up. I completely agree that just chucking error messages is a poor answer.

Finally, here's a challenge, if identifiers are going to be folded by the compiler: what locale should be used? In particular, how do you handle I, İ, i and ı?

No, in my example they're both references to an object instance - they're simply identifiers. Languages that are case-insensitive tend to force one to use identifiers that are also descriptive as to their usage, which is very helpful when reading code as you can tell a type from a variable from a....

Re: languages: Pascal/Object Pascal is case-insensitive, and is statically-typed.

Re: SQL: all implementations that I'm aware of use case-insensitive identifiers for all of the reasons that I've outlined. Any that don't are problematic, at best.

Re: locales: the way that this is typically handled is by a) restricting the allowed characters to the English alphabet (older), or b) by using Unicode (UTF-16) encoding for source files (newer).

Why not 'is'?

Because K&R had terrible keyboards so they abbreviated everything as much as possible.

Traditionally := was used for assignment, which makes sense since it is an asymmetric symbol for an asymmetric operation.

Granted they had some bad keyboards; there's a oft-reproduced photo of Thompson & Ritchie at Teletype 33s, but those had '63 ASCII, including ←.

Probably more important is that the Unix/C developers came from Multics, written in PL/I, which (following Fortran) used ‘=’ for assignment. And Fortran was still important; Unix had a Fortran compiler at least as far back as Second Edition, when C was being born. Kernighan & Plauger's Elements of Programming Style used Fortran and PL/I for its examples, and Software Tools used Ratfor. ‘=’ is simply what they were used to.

The use of = for assignment goes back to Heinz Rutishauser's Suplerplan from 1951, and it was adopted by Fortran.

Rutishauser might have gotten it from Konrad Zuse's Plankalkül, which used the right double arrow ⇒, which looks a little like =.

Plankalkül had the order and the assignment reversed relative to C, so to increment a variable Z1, you'd write:

    | Z + 1 ⇒ Z
  V | 1       1
  S | 1·n 1·n 1·n
(The first line is the 'main line'; Z stands for Zwischenwert, i.e. "intermediate value". The second line is the 'value line', which contains the indices of the variables -- to store Z1 + 1 in a new variable Z2, you'd replace the second 1 with 2. The third line contains Struktur-Indizes.)

The discussion here show how much computer languages are human's work, with all humans' little approximations, taste, creativity, etc. Great read :-)

What are Strukturindizes?

The translation would be structure indexes.

Yeah, I think that would be clear even if I didn't speak German, which I do. I'm curious what their semantics are / what they're for.

Bauer and Woessner [1] weren't very clear on their semantics, but according to Bruines [2], they're type annotations.

[1] https://web.archive.org/web/20090220012346/http://delivery.a...

[2] http://www.cs.ru.nl/bachelorscripties/2010/Bram_Bruines___02...

There’s also <-, most commonly seen in R but draws it’s heritage from the APL keyboard

Fun fact, R also has -> (assign to the RHS instead of LHS), and "super-assignment" <<- and ->>, and also can use = sometimes too (and I think has different semantics). Yay R.

> also can use = sometimes too (and I think has different semantics)

That’s a common misconception (even the official documentation is misleading).

In reality, assignment `<-` and assignment `=` have the exact same semantics (except for precedence, and unless you redefine them, which you can). The confusion comes from the fact that the `=` symbol is syntactically overloaded: it is also used for named argument passing (`foo(x = 1)`). This leads people to falsely claim that, if you wanted to perform assignment in a function call (which occasionally makes sense in R), then you’d have to use `<-`. But this is false. You merely need to disambiguate the usage, e.g. with extra parentheses: `foo((x = 1))` works.

That's a fantastic clarification, thanks for that!

I write R daily and have always used = for convenience. I can count on 0 hands how many times that has affected me. <<- has its uses though!

FWIW, that's what Ross Ihaka does as well (or at least did in a presentation that I watched - and pointed out that you can use = for assignment). Personally, I think = for assignment is a disaster, and wish I could use <- and -> in every language.

I've used R since it was in beta form, and one of the most interesting things to me (aside from the evolution of type hierarchy) has been changes in use of the assignment operator.

When I started, = was the norm for assignment, with == for evaluating equality. Later I started noticing that people were recommending <- based on scope concerns, but it was kind of subjective preference. Now I see articles like this saying that the "preferred use" is <-, and some people don't even know about =.

I agree = versus == can lead to tricky errors in code, but I still prefer = for various reasons in general in languages (although I get the scope arguments).

The reason is that = is shorter, and assignment is an definitional equality, which to my impression is the whole point of programming in general usually. As others have suggested here, between = and ==, = is by far the more commonly used operator in meaning, so to me it makes sense to use = for succinctness.

In math, there is an "equal by definition" operator, with three lines, so I could see that, but keyboards don't have that, so it's more steps. := is also a kind of standard "by definition" usage as discussed in the article, but again, it's more steps. I'd still prefer that over <- in R.

Interesting. I started using it in 2000 and was working directly with members of R-core at the time. My experience was that the only time I ever saw = used was by DTL.

It really wasn’t until Google published their style guide and then later when Hadley published his that I started seeing = in heavy usage.

To your point about = being shorter I wonder if the difference was that a lot of the folks I interacted with were using Emacs with ESS which would interpolate _ to <- so they wouldn’t have noticed? Just a theory.

Either way I was taught from some of the R creators that <- was evil and to be avoided. It wasn’t until I stopped using R regularly that I switched, it became too mentally taxing to change assignment operators when I bounced between languages

I teach R, and most of the students have never programmed before, at least not anything other than having been exposed to a configuration file or writing a little html.

x = x+1 is really confusing because of the way kids learn math. The notation x <- x+1 at least conveys the idea of taking a value and storing it somewhere.

I find -> to be useful when poking around in the REPL but IMO it should never be used in real code.

I spent some time as a package author, that’s where I found <<- to be the most useful

In the original Smalltalk implementation, the character corresponding to _ in ASCII was a left facing arrow, which Smalltalk used for assignment at the time.

I'm not following. The corresponding ASCII code for _ is 96 no?


"In the original Parc Place image, the glyph of the underscore character (_) appeared as a left-facing arrow (like in the 1963 version of the ASCII code). Smalltalk originally accepted this left-arrow as the only assignment operator. Some modern code still contains what appear to be underscores acting as assignments, hearkening back to this original usage. Most modern Smalltalk implementations accept either the underscore or the colon-equals syntax."


Pre-1967 ASCII had slightly different printable characters and somewhat different control characters. See e.g. https://en.wikipedia.org/wiki/ASCII

And now you know where the left-arrow on Commodore keyboards came from (same ASCII value): ASCII-1963, which became PETSCII instead of ASCII-1967.

Common in Pre-Python pseudo code and algorithms.

Are you making an argument that Python has caused us all to start writing pseudo-code using `=` for assignment?

I think he's making the argument that when we write "pseudo-code" these days we're just writing python

When I write pseudo code I still use <- for assignment (or a \leftarrow in LaTeX), despite being fluent in Python now. I guess old habits die hard.

The point is that ':=' evolved in Algol 68 to solve the ambiguity inherent in '='. C is simply from a less thoughtful and primitive language family and modern languages still seem to be copying C's horrible syntax. It's also probably why technical papers are written in Algol-like pseudocode using the '<-' symbol in LaTeX for assignment.

The syntax of C is a thing of beauty. The decision to use = for assignment and == for equality was natural, since assignment is more common than equality testing; similarly, articles are usually short words in natural languages. Algol 68 and Pascal got this wrong, and where are they now?

Pascal got a new name and career as PL/SQL ;)

Technical papers are written using '<-' in pseudocode because the papers likely use '=' elsewhere to assert equality in a mathematical statement and they want to avoid confusion.

> avoid confusion

You’re kind of making the parent’s point.

Yes, although my intention was to further support their last statement which was left open to a degree.

There is a slight distinction. If you are creating a programming language, you can come up with whatever syntax you'd like for assignment and the user has to learn it. Some choices are better than others if you want your language to be used, but there is a specification for the language that you have written down, either as a human readable document or as the compiler/interpreter. With pseudocode in technical documents, the author typically lacks the space, time, and interest to generate such a specification and leans on mathematical notation to keep things precise.

"<-" is used for assignment in F#, though variable binding is "=", so:

    let mutable x = 4   // x is equal to 4
    x <- 7              // x is equal to 7

I really like this about pascal.

We read it out loud as "becomes the same as", this being what was actually happening. I've carried the habit over to C despite the bare =, it helps me reason and seems to aid in preventing the =/== error.

The shorthand I use for "becomes the same as" is "gets".

My CS professor (~ola) taught it as "gets" and to this day when I talk out code people get confused. I'll say, "x gets 3" and they look at me funny. But I find it helpful for differing between assignment and equality.

I never liked the 'gets':

"x gets 7" is fine but

"x becomes the same as y" seems better than "x gets y"

that's wrong, though. x becomes a copy of y, which is only transiently the same.

What is "transiently the same"?

Genuinely asking - my first thoughts are:

"transiently the same" is not the same as "the same", hence it is not "the same", and therefore "not the same".

No jokes on the meta-level please :o)

Depends on the language, and the types of x and y. See e.g. python.

I was taught to read it as “takes the value of”.

Traditionally? I don't think there was a "tradition". ":=" was used in Algol. "=" was used in Fortran. COBOL didn't use either "=" or ":=".

FORTRAN was far from the first high level language.

Plankalkül for example was out a decade earlier and used →. EX:

  P1 max3 (V0[:8.0],V1[:8.0],V2[:8.0]) → R0[:8.0]

So, there really was a lot of languages out there, but := was fairly common because it was easy to parse and type.

Plankalkül had the advantage of not having to run on a real machine.

As far as := is concerned having to use the shift key for an assignment is less than ideal but there really aren't any better options on a modern keyboard. I think C's use of = is a bit braindamaged but it is nice and quick to type.

On an ASR33 = is the character you have press shift for. : is unshifted.

Many early video terminals and microcomputers had similar layouts, because they were easy to implement. https://en.wikipedia.org/wiki/Bit-paired_keyboard

It didn't actually contribute much to the evolution of other languages though, right? Had anyone even heard of it when Fortran/ALGOL/etc were being designed?

Zuse seems to have been under the impression that some of the ALGOL designers were aware of it: https://books.google.com/books?id=AsvSBQAAQBAJ&pg=PA522&lpg=...

Rutishauser was, at least.

> "=" was used in Fortran.

And most BASICs, though some (most?) required the keyword LET to introduce an assignment, as well.

Let became optional quickly it seems

But most BASIC versions would accept 'let a = 1'

Personally, I don't see why the big fuss. If your assignment and comparison operators are the same the only thing you lose is doing x = (y == 2); kind of expressions (since you would disambiguate from conditional expressions).

Now, assignments are done more frequently than comparisons, hence why it makes sense for it to be the simpler/shorter one.

You wouldn't even lose that if you can accept that assignments can't be expressions, which seems like a bit less of a loss.

> And most BASICs, though some (most?) required the keyword LET to introduce an assignment, as well.

LET was virtually always optional; an expression by itself on a line (like 'X <> 10') is a syntax error anyway, so removing LET doesn't introduce any ambiguity.

From the article:

> Looking at this as a whole, = was never “the natural choice” the assignment operator. Pretty much everybody used := for assignment instead, possibly because = was so associated with equality. Nowadays most languages use = entirely because C uses it, and we can trace C using it to CPL being such a clusterfuck.

And I dispute the "pretty much everybody used" part. In fact, I think the article contradicts that opinion. Algol used it, and so did Pascal. But Pascal was only a year old [Edit: at the time the choice was made for C] - it [Pascal] hadn't set the world on fire yet (to the degree that ever did). That leaves Algol using ":=", and FORTRAN, COBOL, and Lisp not using it. That's not "pretty much everybody used".

Yeah, it's probably more accurate to say that "everybody from the Algol line used := for assignment", with C being the first language of that line to stop.

> From the article:

Articles, like desserts, aren't always right. I am sorely tempted to pull out Sammet's book tonight and start counting.

Too late to edit, but I've gone and done it: scanned through Sammet's Programming Languages: History and Fundamentals for languages with identifiable assignment statements. I skip assembly-style, English-like (COBOL etc.), and functional-style (LISP etc.) syntax, and for summary purposes ignore accompanying keywords like BASIC's LET.



ν ← ε     4 (APL, DIALOG, IT, LISP2)

ε → ν     2 (MADCAP, NELIAC)

ν := ε     1 (ALGOL)

ε * ν     1 (BACIAC)

So pretty much everybody used ‘=’. ALGOL was an outlier, though admittedly more influential than BACIAC.

> Because K&R had terrible keyboards

There's no K in Thompson & Ritchie.

Kernighan seems to get inserted all over the place regarding Thompson's work. It's a bit disrespectful to the legend himself.

Or maybe because often used constructs are expressed by shorter symbols, because that makes it easier to write them down?

That I believe is only Algol the more widely used 1st gen languages FORTRAN and COBOL used = As did PL1/G

Pascal as well, and its descendents like Modula-2

I think the simple reason is that two characters are harder to type then one character, if they both are a key combo. And to keep syntax more clean.

Yeah but this introduced an easy path for errors. You could mean to write `==` and instead write `=` and it'll work fine in C in practically all locations (even as conditions of if-statements). Using `:=` could have likely alleviated this error to a noticeable degree, if not entirely.

Chefs rather use a sharp knife then a blunt one, even if they cut themselves from time to time. If safety is a priority, there are special designed languages like Ada that use :=

But... sharp knives are safer than blunt knives, specifically because they reduce the likelihood of being cut. That's why they're preferred.

I see your point, though. I was just bringing up an alternative consideration in the debate regarding equality.

The article is much more interesting than this comment.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact