I know there are much earlier examples, but the earliest warning about this behavior I could find in 60 seconds of searching is from the comp.lang.lisp FAQ, posted more than 30 years ago, in 1992:
Mar 21, 1992, 1:00:47 AM
Last-Modified: Tue Feb 25 17:34:30 1992 by Mark Kantrowitz
;;; ****************************************************************
;;; Answers to Frequently Asked Questions about Lisp ***************
;;; ****************************************************************
;;; Written by Mark Kantrowitz and Barry Margolin
;;; lisp-faq-3.text -- 16886 bytes
[...]
----------------------------------------------------------------
[3-9] Closures don't seem to work properly when referring to the
iteration variable in DOLIST, DOTIMES and DO.
DOTIMES, DOLIST, and DO all use assignment instead of binding to
update the value of the iteration variables. So something like
(dotimes (n 10)
(push #'(lambda () (incf n))
*counters*))
will produce 10 closures over the same value of the variable N.
----------------------------------------------------------------
In the standard it is not specified if such loops mutate or rebind, and you have to assume it doesn't rebind if you capture variables. I do think however that once you learn how it works it stops being a problem (in any case I can select the form, macroexpand it and it shows how it's implemented)
In theory sure. In practice it's easy enough to make this mistake mindlessly. I had this happen to me after many years of practice just this year (in an elaborate extended LOOP form which has same semantics).
The C# language team encountered this as well, after introducing lightweight closures in C# 4.0 it quickly became apparent that this was a footgun. Users almost always used loop variables incorrectly, and C# 5.0 made the breaking change.
I had a bit of trouble finding the original C# 5 announcement; that's hopefully not been lost in the (several?) blog migrations on the Microsoft domain since 2012.
Given how much of an uproar there was over changing the string type in the Python 2 -> 3 transition, I can't imagine this change would ever end up in Python before a 4.0.
Cue someone arguing about how bad Python is because it won't fix these things, and then arguing about how bad Python is because their scripts from 2003 stopped working...
It's worth noting that it's much less of a problem in Python due to the lack of ergonomic closures/lambdas. You have to construct rather esoteric looking code for it to be a problem.
add_n = []
for n in range(10):
add_n.append(lambda x: x + n)
add_n[9](10) # 19
add_n[0](10) # 19
This isn't to say it's *not* a footgun (and it has bit me in Python before), but it's much worse in Go due to the idiomatic use of goroutines in a loop:
for i := 0; i < 10; i++ {
go func() { fmt.Printf("num: %d\n", i) }()
}
In Python you are much more likely to hit that problem not with closures constructed with an explicit 'lambda', but with generator-comprehension expressions.
(((i, j) for i in "abc") for j in range(3))
The values of the above depends on in which order you evaluate the whole thing.
(Do take what I wrote with a grain of salt. Either the above is already a problem, or perhaps you also need to mix in list-comprehension expressions, too, to surface the bug.)
gs1 = (((i, j) for i in "abc") for j in range(3))
gs2 = [((i, j) for i in "abc") for j in range(3)]
print(list(map(list, gs1)))
print(list(map(list, gs2)))
That's a nice "wat" right there. I believe the explanation is that in gs2, the range() is iterated through immediately, so j is always set to 2 before you have a chance to access any of the inner generators. Whereas in gs1 the range() is still being iterated over as you access each inner generator, so when you access the first generator j=1, then j=2, etc.
Equivalents:
def make_gs1():
for j in range(2):
yield ((i, j) for i in "abc")
def make_gs2():
gs = []
for j in range(2):
gs.append(((i, j) for i in "abc"))
return gs
Late binding applies in both cases of course, but in the first case it doesn't matter, whereas in the latter case it matters.
I think early binding would produce the same result in both cases.
Right, creating generators in a loop is not usually something you want to do, but it's meant to demonstrate the complexity that arises from late binding rather than demonstrate something you would actually want to do in a real program.
Unless you're talking philosophically how classes and closures are actually isomorphic then no, it doesn't. None of the variables in the outer scope are captured in the class instance.
Of the two comprehension syntaxes in Haskell, Python picked the wrong one. Do notation (or, equivalently, Scala-style for/yield) feels much more consistent and easy to use - in particular the clauses are in the same order as a regular for loop, rather than the middle-endian order used by list comprehensions.
> Haskell has both do-notation and list comprehension.
Right, and do-notation is the one everyone uses, because it's better. Python picked the wrong one.
> Comprehension in both Python and Haskell (for both lists and other structures) use the same order in both language, as far as I remember.
It may be the same order as Haskell but it's a terrible confusing order. In particular if you want to go from a nested list comprehension to a flat one (or vice versa) then you have to completely rearrange the order it's written in, whereas if you go from nested do-blocks to flat do-blocks then it all makes sense.
I see what you mean, but I don't find the order that confusing in neither Haskell or Python.
However, I can imagine a feature that we could add to Python to fix this: make it possible for statements to have a value. Perhaps something like this:
my_generator = \
for i in "abc":
for b in range(3):
print("foo")
yield (i, b)
or perhaps have the last statement in block be its value (just like Rust or Ruby or Haskell do with the last statement in a block), and make the value of a for-loop be a generator of the individual values:
my_list = list(
for i in "abc":
for b in range(3):
(i, b))
Though there's a bit of confusion here whether the latter examples should be a flat structure or a nested one. You could probably use a similiar mechanism as the existing 'yield from' to explicitly ask for the flat version, and otherwise get the nested one:
my_list = list(
for i in "abc":
yield from for b in range(3):
(i, b))
Making Python statements have values looks to me like the more generally useful change than tweaking comprehensions. You'd probably not need comprehension at all in that case. Especially since you can already write loop header and body on a single line like
The limited whitespace-based syntax limits the potential for fun inline statement things, but it also completely dodges the question of what any particular statement should evaluate to when used as an expression.
Yes, I guess something like that. That was just meant as an example of how existing Python allows you to write loops on one line. It's not a good example for a meaningful comprehension in our alternative made-up Python dialect.
> The limited whitespace-based syntax limits the potential for fun inline statement things, [...]
Python already mostly allows you to use parens to override the indentation. They would just need to generalise that a bit. Btw, Haskell already does that:
Officially, Haskell has a syntax with curly braces and semicolons; and they define the indentation based syntax as syntactic sugar that desugars to ; and {}. But almost everyone uses indentation based syntax. The exception are perhaps code generators and when posting on a website that messes with indentation.
(And, because it's Haskell, the {}; syntax is just another layer of syntactic sugar for 'weird-operator'-based based syntax like >>=.)
When I was starting in Python years ago I had to turn my brain inside out to learn how to write list comprehensions. Sometimes I wonder what it's like to be a normal person with a normal non-programmer brain, having forgotten it entirely these last many years.
But Python doesn't have any concept of a monad, so what would do-notation even be in Python? And who is the "everyone" using do-notation? I don't see any analogous syntax in Lua, Javascript, Ruby, or Perl.
In Python there is a nice tower of abstractions for iteration, but nothing more general than that, so it makes perfect sense IMO to use the syntax that directly evokes iteration.
The existing syntax is meant to mirror the syntax of a nested for loop. I agree that maybe it's confusing, but if you want to go from a multi-for comprehension to an actual nested for loop, then you don't have to invert the order.
> But Python doesn't have any concept of a monad, so what would do-notation even be in Python?
It could work on the same things that Python's current list comprehensions work on. I'm just suggesting a different syntax. Comprehensions in Haskell originally worked for all monads too.
> And who is the "everyone" using do-notation? I don't see any analogous syntax in Lua, Javascript, Ruby, or Perl.
I meant that within Haskell, everyone uses do notation rather than comprehensions.
> The existing syntax is meant to mirror the syntax of a nested for loop. I agree that maybe it's confusing, but if you want to go from a multi-for comprehension to an actual nested for loop, then you don't have to invert the order.
You have to invert half of it, which I find more confusing than having to completely invert it. do-notation style syntax (e.g. Scala-style for/yield) would keep the order completely aligned.
I’m not sure what they mean by list comprehensions, either, but for completeness’s sake, I must point out that this is solvable by adding `n` as a keyword argument defaulting to `n`:
add_n = [lambda x, n=n: x + n for n in range(10)]
add_n[9](10) # 19
add_n[0](10) # 10
I don't think anyone is puzzled by the Go snippet being wrong.
The bigger problem in Go is the for with range loop:
pointersToV := make([]*val, len(values))
for i, v := range values {
go func() { fmt.Printf("num: %v\n", v) } () //race condition
pointersToV[i] = &v //will contain len(values) copies of a pointer to the last item in values
}
This is the one they are changing.
Edit: it looks like they're actually changing both of these, which is more unexpected to me. I think the C# behavior makes more sense, where only the foreach loop has a new binding of the variable in each iteration, but the normal for loop has a single lifetime.
It's actually worse in Python since there's no support for variable lifetimes within a function, so the `v2` workaround is still broken. (the default-argument workaround "works" but is scary)
This makes it clear: the underlying problem is NOT about for loops - it's closures that are broken.
> Tools have been written to identify these mistakes, but it is hard to analyze whether references to a variable outlive its iteration or not. These tools must choose between false negatives and false positives. The loopclosure analyzer used by go vet and gopls opts for false negatives, only reporting when it is sure there is a problem but missing others.
So it will warn in certain situations, but not all of them
Why would it? It's perfectly correct code, it's just not doing what you'd expect.
It might complain about the race condition, to be fair, but the same issue can be reproduced without goroutines and it would be completely correct code per the semantics.
In many languages "if x = 3" is perfectly valid code, but almost certainly not what the person intended "if x == 3". It's very smart to warn someone in a scenario like this.
It's a common enough idiom from "stone age" bare bones K&R C, absolutely.
It's also one of the great foot-guns of C programming as there are so many other almost but not that idioms and it's never clear on casual inspection whether the result of an assignment was meant to be examined or the result of a comparison.
With the evolution of C and C sanity tools that rightfully flag such statements for double checking and the desire to not have spurious flagging, etc. it's more common in later C code to see (say)
if ((err = someFunction()) != NOERROR) { errorHandle(err) }
that optimises down to the same intermediate code where NOERROR is 0, sure, but it makes it very clear what is going on, an intended assignment and then an intended comparison.
As with all idoms the general practice in the larger codebase and house code standard rules apply - there are other ways of doing similar things.
> To ensure backwards compatibility with existing code, the new semantics will only apply in packages contained in modules that declare go 1.22 or later in their go.mod files.
Python could very easily have a similar mechanism. Hell even CMake manages to do this right, and they got "if" wrong.
The Python devs sometimes seem stubbornly attached to bugs. Another one: to reliably get Python 3 on Linux and Mac you have to run `python3`. But on Windows there's no `python3.exe`.
Will they add one? Hell no. It might confuse people or something.
Except... if you install Python from the Microsoft Store it does have `python3.exe`.
I’ve not run “python3” in years on my Mac, and I’m almost certain I never type it into Linux machines either; either I’m losing my mind, or there are some ludicrous takes in this thread.
python => -bash: python: command not found
python3 => Python 3.7.5 (default, Apr 19 2020, 20:18:17)
On an Ubuntu 20.10 server:
python => -bash: python: command not found
python3 => Python 3.8.10 (default, Jun 2 2021, 10:49:15)
I no longer have access to some RHEL7 and RHEL8 machines used for work recently, but if I recall correctly they do this by default:
Red Hat Enterprise Linux 7:
python => Some version of Python 2
python3 => Some version of Python 3
Red Hat Enterprise Linux 8:
python => -bash: python: command not found # (use "python2" for Python 2)
python3 => Some version of Python 3
You can change the default behaviour of unversioned "python" to version 2 or 3 on all the above systems, I think, so if you're running a Linux distro when "python" gets you Python 3, that configuration might have been done already.
MacOS 10.15 (Catalina) does something interesting:
python => WARNING: Python 2.7 is not recommended.
This version is included in macOS for compatibility with legacy software.
Future versions of macOS will not include Python 2.7.
Instead, it is recommended that you transition to using 'python3' from within Terminal.
Python 2.7.16 (default, Jun 5 2020, 22:59:21)
python3 => Python 3.8.2 (default, Jul 14 2020, 05:39:05)
To be fair, few of these would qualify as "modern". Ubuntu 19.04 and 20.10, macOS 10.15 are all out of support, and RHEL 7 is almost ten years old and nearing the end of its support.
I suspect my confusion stemmed from mostly invoking `ipython` which doesn't include the 3 suffix (ok, part of the confusion may've been pub-related too :D).
Depending on the package manager / distribution, 'python' might be symlinked to either Python 2 or Python 3. If you don't have Python 3 installed, it might very well point to Python 2. These days it will almost certainly prefer Python 3, but I am also in the habit of actually typing 'python3' instead of 'python' because of what I assume are issues I've had in the past.
Well, no, not python's fault -- clearly the distros', and they probably should be blamed. But a PEP saying python2 and python3 should invoke the correct interpreter would help motivate the distributions.
(This is isomorphic to the usual victim-blaming discussion. Fault and blame vs some ability to make a difference; it's a shame that correctly pointing out a better strategy is both used to attack victims and attacked for attacking victims in the cases when that wasn't intended.)
Right, and Go has the luxury of being a compiler that generates reasonably portable binaries, while Python requires the presence of an interpreter on the system at run time.
> Python requires the presence of an interpreter on the system at run time.
A runtime interpreter does not prevent Perl to do similar things via `use 5.13`
Python has `from future` with similar abilities, it would absolutely be possible to do the same as Perl and Go and fix what needs to be fixed without breaking old code. One could design a `import 3.22` and `from 3.22 import unbroken_for` and achieve the same thing.
The same trick would work with python just as well. There’s nothing about Python’s status as an interpreter which would stop them from adding a python semantic version somewhere in each python program - either in a comment at the top of each source file or in an adjacent config file. The comment could specify the version of python’s semantics to use, which would allow people to opt in to new syntax (or opt out of changes which may break the code in the future).
Yeah, it would just mean that the interpreter - just like the Go compiler - would need to have the equivalent of "if version > 3.4 do this, else do that". Which is fine for a while, but I can imagine it adds a lot of complexity and edge cases to the interpreter / compiler.
Which makes me think that a Go 2.0 or Python 4 will mainly be about removing branches and edge cases from the compiler more than making backwards-incompatible language changes.
This is the direction multiple languages are moving in. Go and Rust both have something like this. (In Rust they're called "editions"). I think its inevitable that compilers get larger over time. I hope most of the time they aren't too onerous to maintain - there aren't that many spec breaking changes between versions. But if need be, I could also imagine the compiler eventually deprecating old versions if it becomes too much of an issue.
Arguably C & C++ compilers do the same thing via -std=c99 and similar flags at compile time.
Anyway, nothing about this is special or different with python. I bet the transition to python 3 would have been much smoother if scripts could have opted in (or opted out) of the new syntax without breaking compatibility.
By letting you specify a language version requirement? Not exactly backwards compatible (because it is explicitly not, as per the article).
Python doesn’t make breaking changes in non-major versions, so as mentioned by the upthread comment the appropriate place for this change would be in Python 4.
Given the above, I’m really not sure what point you think you’re making in that final paragraph.
Really? I find that surprising. I don’t write as much code as I used to but I’ve been writing Python for a long time and the only standard library breakages that come to mind were during the infamous 2 -> 3 days.
What sort of problems are have you faced upgrading minor versions?
The docs are full of remarks like "removed in 3.0 and reintroduced in 3.4" or "deprecated in 3.10", etc. A big one is the removal of the loop parameter in asyncio, but a lot of asyncio internals are (still?) undergoing significant changes, as getting the shutdown behavior correct is surprisingly difficult. Personally it's never cause me any issues - I'm always on board with the changes.
Asyncio was explicitly marked as provisional for years and most of the incompatible changes happened during that time. Same goes for typing. The rest of the language is very very stable.
There's a second motivation in my opinion. Code might work today without the change, but it could be because the author originally wrote buggy code, caught it in testing, and had to waste time tracking it down and understanding nuances that don't need to be there. Once they figured that out, they implemented an ugly workaround (adding an extra function parameter to a goroutine or shadowing the loop variable with n := n).
Good language designers want to avoid both wasting developer's time and requiring ugly workarounds. Making a change that does both, especially if it doesn't break old code, is great imo.
Whichever way you implement the semantics of the loop variable, the developer has to understand the nuances, don't you think? And those nuances have to be there; all you can do is replace them with other nuances.
If a fresh variable i is bound for each iteration, then an i++ statement in the body will not have the intended effect of generating an extra increment that skips an iteration.
If you want the other semantics, whichever one that is, the workaround is ugly.
New code written today will use the new version and have the correct behavior from day 1.
Old code that is maintained will eventually be upgraded, which yes does come with work sometimes where you realize your code works on version X but not version X+10 and you do a combination of tests and reading patch notes to see what changed.
There is no "correct" behavior here; either one is a valid choice that can be documented and that programs can rely on and exploit.
Code doesn't care about when it's written, only what you run it on, and with what compatibility options.
E.g. one possibility is that ten-year-old code that wrongly assumed the opposite behavior, and has a bug, will start to work correctly on the altered implementation.
Yeah, block scoping is one of those "weird CS ideas" that I'm sure at some point early in Python's design was deemed too complicated for the intended audience, but is also quite a natural way to prevent some human errors. JavaScript made the same mistake and later fixed it (let/const).
I'm not a computer scientist so I can't rule whether function scope was a mistake, and can't see how block scoping would be considered too complicated, I personally think it fits much better with my mental model. Then again, Python doesn't have blocks in the traditional sense of the word IIRC, in C style languages the accolades are a pretty clear delineator.
Parts of my previous job were terrible because it had JS functions thousands of lines of code long where variables were constantly reused (and often had to be unset before another block of code). That said, that wasn't the fault of function scope per se, but of a bad but very productive developer.
TBF you can have block scoping in an indentation-based language, though it probably help to merge the too, as in Haskell: `let…in` will define variables in the `let` clause, and those variables are only accessible in the `in` clause (similarly case…of)
Python does actually have a single instance of sub-function scopes: When you say `try: ... except Exception as e: ...` the `e` variable is deleted at the end of the `except` clause. I think this is because the exception object, via the traceback, refers to the handling function's invocation record, which in turn contains a map of all the function's local variables. So if the variable worked like normal variables in Python it'd create a reference cycle and make the Python GC sad. So if you need that behaviour, you need to reassign the exception to a new name [0].
Is it a bug? I've always depended on late-binding closures and I think even recently in a for loop, not that I'm going to go digging. You can do neat things with multiple functions sharing the same closure. If you don't want the behavior bind the variable to a new name in a new scope. From the post I get the sense that this is more problematic for languages with pointers.
The scope is lexical, the lookup is dynamic. What you want is for each loop iteration to create a new scope, which I would categorize as "not lexical".
By that argument a recursive function shouldn't create a new scope every time it recurses, and a language that fails Knuth's 1964 benchmark of reasonable scoping (the "man or boy test") would be fine. The loop body is lexically a block and like any other block it should have its own scope every time it runs.
If the loop "variable" (and IMO thinking of it as a variable is halfway to making the mistake) is in a single scope whose lifetime is all passes through the loop body, that's literally non-lexical; there is no block in the program text that corresponds to that scope. Lexically there's the containing function and the loop body, there's no intermediate scope nestled between them.
> and IMO thinking of it as a variable is halfway to making the mistake
I used plural for a reason.
> there is no block in the program text that corresponds to that scope.
The scope starts at the for. There is a bunch of state that is tied to the loop, and if you rewrote it as a less magic kind of loop you'd need to explicitly mark a scope here.
What's non-lexical about it? You could replace "for" with "{ for" to see that a scope of "all passes through the loop body" does not require anything dynamic.
And surely whether a scope is implicit or explicit doesn't change whether a scope is lexical. In C I can write "if (1) int x=2;" and that x is scoped to an implicit block that ends at the semicolon.
Would you say an if with a declaration in it is non-lexical, because both the true block and the else block can access the variable? I would just say the if has a scope, and there are two scopes inside it, all lexical. And the same of a for loop having an outer and inner scope.
The problem isn't with closures, the closure semantics are perfectly fine.
The problem is in the implementation of for-range loops, where the clear expectation is that the loop variable is scoped to each loop iteration, not to the whole loop scope (otherwise said, that the loop variable is re-bound to a new value in each loop iteration). The mental mode approximately everyone has for a loop like this:
for _, v := range values {
//do stuff with v
}
is that it is equivalent to the following loop:
for i := range values {
v := values[i]
//do stuff with v
}
In Go 1.22 and later, that is exactly what the semantics will be.
In Go 1.21 or earlier, the semantics are closer to this (ignoring the empty list case for brevity):
for i := 0, v := values[0]; i < len(values); i++, v=values[i] {
//do stuff with v
}
And note that this mis-design has appeared in virtually every language that has loops and closures, and has been either fixed (C# 5.0, Go 1.22) or it keeps being a footgun that people complain about (Python, Common Lisp, C++).
I don't know, my feeling is that the issue really is with how closure capture was interpreted when imperative languages started implementing lambdas. What was happening in Go seems to either amount to default capture by reference rather than value, or to the loop counters in question being unmarked reference types. The former strikes me as unintuitive given that before lambdas, reference-taking in imperative languages was universally marked (ex. &a); the latter strikes me as unintuitive because with some ugly exceptions (Java), reference types should be marked in usage (ex. *a + *b instead of a+b). Compare to C++ lambdas, where reference captures must be announced in the [] preamble with the & sigil associated with reference-taking.
(In functional languages, this problem did not arise, since most variables are immutable and those that are not are syntactically marked and decidedly second-class. In particular, you would probably not implement a loop using a mutable counter or iterator.)
Even if Go allowed both capture-by-value and capture-by-reference, this issue would have arisen when using capture-by-reference.
For example, in the following C++:
auto v = std::vector<int>{1, 2, 3};
auto prints = std::vector<std::function<void()>>();
auto incrs = std::vector<std::function<void()>>();
for (auto x : v) {
prints.push_back([&x]()->void {std::cout<<x<<", "; })
incrs.push_back([&x]()->void {++x;});
}
for (auto f : incrs) {
f();
}
for (auto f : prints) {
f();
} //expected to print 2, 3, 4; actually prints 6, 6, 6
I would also note that this problem very much arises in functional languages - it exists in the same way in Common Lisp and Scheme, and I believe it very much applies to OCaml as well (though I'm not sure how their loops work).
Tried it out, OCaml does the expected thing:
open List
let funs = ref [ ] ;;
for i = 1 to 3 do
funs := (fun () -> print_int i) :: !funs
done ;;
List.iter (fun f -> f()) !funs ;; //prints 321
> this issue would have arisen when using capture-by-reference
I understand - but in those languages capture-by-reference has to be an explicit choice (by writing the &) rather than the default, which highlights the actual behaviour. The problem with the old Go solution was that it would apparently behave as capture by reference without any explicit syntactic marker that it is so, and without a more natural alternative that captures by value, in a context where from other languages you would expect that the capture would happen by value.
> Common Lisp and Scheme
I have to admit I haven't worked in either outside of a tutorial setting, but my understanding is that they are quite well-known for having design choices in variable scoping that are unusual and frowned upon in modern language design
> Ocaml
Your example shows that it captures by value as I said, right? For it to work as the old Go examples, i would have to be a ref cell whose contents are updated between iterations, which is not how the semantics of for work. If it did, you'd have to use the loop counter as !i.
In Go 1.22 as well, closures still capture-by-reference. The change is that there is now a new loop variable in each loop iteration, just like in OCaml. But two closures that refer the same loop variable (that are created in the same iteration, that is) will still see the changes each makes to that variable.
And what I was trying to show with my example was that this kind of behavior would be observable in OCaml as well, if it were to be implemented like that.
I think that's a C-centric assumption which is moot as Python's "for" does not create any new scopes. Just reading Knuth's man-or-boy test I was struck by the alien nature of the ALGOL 60 execution model, even though to Python it can be considered a distant ancestor.
I think it played a large part in helping get past the default-deny that any language change proposal should have. The other big one for me was the scan done over the open source code base and the balance of bugs fixed versus created.
Java also had this problem with anonymous classes. The solution is usually to introduce a functor. Being pass-by-value, it captures the state of the variables at its call time, which helps remove some ambiguity in your code.
If you try to do something weird with variable capture, then any collections you accumulate data into (eg, for turning an array into a map), will behave differently than any declared variables.
Go is trying to thread the needle by only having loop counters work this way. But that still means that some variables act weird (it's just a variable that tends to act weird anyway). And I wonder what happens when you define multiple loop variables, which people often do when there will be custom scanning of the inputs.
Java has never had this problem with variables (either in a for loop or free-floating ones), since Java has never had support for closures.
There is one somewhat similar problem in Java that you're maybe thinking of: anonymous classes that reference fields of the current object. I don't think that behavior is surprising, and there are very important use cases for it.
What Go is doing is perfectly sensible. The ability to capture variables is extremely powerful, and often desired. It's just the unexpected scoping of loop variables that introduces a problem. The following code is doing exactly what most people would expect, for example:
You pass your counter into the function, it returns a function that remembers the original value, not the value as it keeps iterating later on in the caller.
It's unintuitive to users of the language, but it's very intuitive from the perspective of those implementing the language. Everybody seems to make this mistake. Lua 5.0 (2003) made this mistake, but they fixed it in Lua 5.1 (2006). (Lua 5.0 was the first version with full lexical scoping.)
To the degree the the implementers are also users they carry their implementer understanding into their use. Dogfooding doesn't help when your understanding doesn't match that of your users.
The problem is that the error conditions are relatively rare. Most of the time it doesn’t break anything. So even with dogfooding you can miss it or not see it as a problem early on. But after 10 years of evidence that it was a mistake, that it’s almost never intended, and the fix won’t break much if anything, it’s time to fix it.
I'm doing no such thing, I'm pointing out that your reasoning is faulty: dogfooding does not help when the behaviour is logical and obvious to the designer-cum-user, and same with reviewing.
It's not crazy. It's just the difference between a pointer and a value, which is like comp sci 101.
I think the main things that make it such a trap is that the variable type definition is implicit so the fact that it's a pointer becomes a bit hidden, and that easy concurrency means the value is evaluated outside of the loop execution more often.
> It's just the difference between a pointer and a value, which is like comp sci 101.
That might be the case, but my comp sci 101 was 15 odd years ago now and since then I have _never_ had to think about pointers vs values, until I started a Go project a few years ago. But even that was more comprehensible than the pointer wizardry we had to do in C/C++ back when.
I don't want to have to think about managing my application's memory, I much prefer being in the code, thinking of variable scope and maintainability which in a lot of languages automatically translates to healthy memory usage.
I don't know much about Go but the design seems very intuitive to me. You're doing something like (let ((i variable)) (loop (setf i ...) ...body)), which if there is a closure in the loop body will capture the variable i and also subsequent mutations.
The fix is to make a new variable for each iteration, which is less obvious implementation wise but as per the post works better if you're enclosing over the loop variable.
While I agree that the design is very clear for the cases illustrated here, and I am a bit puzzled on why the Go designers chose to change these as well, the design is not at all clear for the other case:
for i, v := range []int{1, 2, 3} {
funcs = append(funcs, func() {fmt.Printf("%v:%v, ", i, v)})
}
for _, fun := range funcs {
fun()
} //prints 2:3, 2:3, 2:3
The reason why this happens is clear. But, it's not what people expect from the syntax, not at all. And it's also a behavior that is never useful. There is 0 reason to capture the loop variables, as evidenced by the fact that none of the languages that have started like this and taken a breaking change to switch to the expected behavior has found even a single bug caused by this breaking change.
False. There are cases where it is useful to have the loop variable available directly. For example, you can add one to the loop variable to skip an iteration, which would not work with an iteration-local loop variable.
In a for-in-range loop, the variables are read-only inside the loop body, so there is no way to skip this.
I do agree that there are reasons to modify the iteration variable in a C-style for loop, so I am surprised that those loops are being modified as well. C#, which went through a similar change, did NOT apply such a change for those for loops.
The loop semantics do not have anything to do with arrays. The point of confusion is whether a new slot for data is being created before each iteration, or whether the same slot is being used for each iteration. It turns out that the same slot is being used. The Go code itself is clear `for i := 0; i < 10; i++`. `i := 0` is where you declare i. Nothing else would imply that space is being allocated for each iteration; the first clause of the (;;) statement is only run before the loop. So you're using the same i for every iteration. This is surprising despite how explicit it is; programmers expect that a new slot is being allocated to store each value of i, so they take the address to it, and are surprised when it's at the same address every iteration. (Sugar like `for i, x := range xs` is even more confusing. The := strongly implies creating a new i and x, but it's not doing that!)
Basically, here are two pseudocode implementations. This is what currently happens:
i = malloc(sizeof(int))
*i = 0
loop:
<code>
*i = *i + 1
goto loop if *i < 10
You can see that they are not crazy for picking the first implementation; it's less instructions and less code, AND the for loop is pretty much exactly implementing what you're typing in. It's just so easy to forget what you're actually saying that most languages are choosing to do something like the second example (though no doubt, not allocating 8 bytes of memory for each iteration).
Remember, simple cases work:
for i := 0; i < 10; i++ {
fmt.Println(i) // 0 1 2 3 4 ...
}
It's the tricky cases that are tricky:
var is []*int
for i := 0; i < 10; i++ {
is = append(is, &i)
}
for _, i := range is {
fmt.Println(*i) // 9 9 9 9 9 ...
}
If you really think about it, the second example is exactly what you're asking for. You declared i into existence once. Of course its address isn't going to change every iteration.
It would be hard to trigger it in Java. All references are pass-by-value, so you would have to do something like creating an array, passing that array and then replacing an element in it in on every loop iteration. Unless I got something wrong, it would be hard to do this by mistake IMO.
If you do an asynchronous callback in an inner loop that tries to log the loop counter and a calculated value at the same time, you will find that the loop counter has incremented underneath you and you'll get for instance '20' for all of the logs. That was my introduction to this sort of problem.
The solution as I said elsewhere is to pop out the inner block to a separate function, where the value of the counter is captured when the outer function is called, not when the inner one runs.
I don't think you remember well how you triggered this error, since Java just doesn't allow you to reference a non-final variable from an inner function. It sounds like you're talking about code like this, but this just doesn't compile:
for (int i = 0; i < n; i++) {
callbacks.add(new Callback(){
public void Call() {
System.out.println(i); //compiler error: local variables referenced from an inner class must be final or effectively final
}
});
}
for (var f : callbacks) {
f.Call();
}
Note that code like this works, and does the expected thing:
for (int i : new int[]{0, 1, 2}) {
callbacks.add(new Callback(){
public void Call() {
System.out.println(i);
}
});
}
for (var f : callbacks) {
f.Call();
} //prints 0 1 2
Interestingly, the reason the old i := i trick works is not at all what I thought!
The trick, for reference:
for i := 0; i < 5; i++ {
i := i // the trick
go func() {
print(i)
}()
}
What I assumed happened:
- The escape analyzer sees that the new `i` is passed to a goroutine, so it is marked as escaping its lexical scope
- Because it's escaping its lexical scope, the allocator allocates it on the heap
- One heap allocation is done per loop iteration, and even though the new `i` is captured by reference, each goro holds a unique reference to a unique memory location
What actually happens:
- Go's compiler has heuristics to decide whether to capture by reference or by value. One component of the heuristic is that values that aren't updated after initialization are captured by value
- The new i is scoped to the for loop body, and is not updated by the for loop itself. Therefore it's identified as a value that isn't updated after initialization
- As a result, the Go compiler generates code that captures `i` by value instead of by reference. No heap allocations or anything like that are done.
I recognize that the latter behavior is better, but if anyone with intimate knowledge of Go knows why the former doesn't (also) happen (is that even how Go works?) I would love to find out!
Yup. The linked article is a little confused. It thinks that an optimization to pass by value is affecting the behavior. In reality it only passes by value when it is indistinguishable from passing by reference (and it thinks it would be cheaper).
There is no “trick”. It’s the language spec! That go func can take arguments. Just add the argument for clarity. The “trick” here is saving the declaration in the go func’s signature. ‘go func(i0 int) { .... }(i)’
> To ensure backwards compatibility with existing code, the new semantics will only apply in packages contained in modules that declare go 1.22 or later in their go.mod files. ... It is also possible to use //go:build lines to control the decision on a per-file basis.
Doesn't that mean that all code written so far can't take up newer versions of the Go compiler for any other reason like new features/bugfixes/optimizations/etc without a full audit of codepaths involving for loops?
No, the version declared in go.mod is different than the version of the toolchain used to compile the project. If you declare an older version even new toolchains will act like the previous versions.
Without /:go:build tags, you can just define 1.21 as your Go version in go.mod to opt out of new features while getting other benefits of the new compiler
No I don't think so; any old working code will be using the x := x workaround, which will keep working when going to this version with the changed loop mechanics. What may happen is a form of... some adage, I forgot the name, where code accidentally relies on the old behaviour and breaks when that old behaviour is no longer there.
(that same adage applies to e.g. browser manufacturers having to implement bugs to not break certain websites)
I don't know why you're being down voted, but it is actually breaking the Go1 compat promise. Which says:
It is intended that programs written to the Go 1 specification will continue to compile and run correctly, unchanged, over the lifetime of that specification. At some indefinite point, a Go 2 specification may arise, but until that time, Go programs that work today should continue to work even as future "point" releases of Go 1 arise (Go 1.1, Go 1.2, etc.).
I upvoted the question to offset one of the downs because I agree it's a fair question. However I would guess the downvotes are because TFA addressed this issue directly and comprehensively, so it's a clear "I didn't read the article" indicator :-) Possibly also because the downvoters can't imagine a scenario where this would be desirable behavior (i.e. it's always a bug)
Yeah its fair, I didn't closely read that section. Although, I'm not entirely convinced the approach is safe, maybe its worth it to fix such a common pitfall.
> The end of the document warns, “[It] is impossible to guarantee that no future change will break any program.” Then it lays out a number of reasons why programs might still break.
> For example, it makes sense that if your program depends on a buggy behavior and we fix the bug, your program will break. But we try very hard to break as little as possible and keep Go boring.
> In a previous blog post they basically said they will never make a Go 2
No, they didn't say that, they said it wouldn't be backwards-incompatible with Go 1. Relevant quote:
> [...] when should we expect the Go 2 specification that breaks old Go 1 programs?
> The answer is never. Go 2, in the sense of breaking with the past and no longer compiling old programs, is never going to happen. Go 2 in the sense of being the major revision of Go 1 we started toward in 2017 has already happened.
This has all been addressed in the proposal. The research was done and this change will impact so few projects that it’s worth making a technical exception to the compatibility promise to fix a real design flaw.
I assume that if compiling with 1.22 or later, you still get all the benefits from that version like other new features, bug fixes or perf improvements, just not this particular change.
No, it doesn't break the promise; "Go programs that work today should continue to work even as future "point" releases of Go 1 arise (Go 1.1, Go 1.2, etc.)."
You can install Go 1.22 and your program will compile and run as-is. That's the promise. If however you opt-in to the changed for loop behaviour by adjusting your go.mod, the onus is on you to update your program accordingly.
It's only a backwards incompatible change if the developer makes a backwards incompatible change by updating the configured target version.
(I'm aware I'm probably being pedantic here, I understand the language used seems to imply you can just set it to v1.22 and it works but it's a bit more specific)
To add to the other comments, in the run-up to go1.21 they talked about how they’d analysed a very large corpus of Go code to see what would be affected, and it was a very very small number.
I remember thinking that the number of people who have created inadvertent bugs due to this design (myself included) would be significantly greater than the number of people affected by the fix.
The original proposal for this change went into great detail about the research they did into existing uses of this syntax. In my memory, they found vanishingly few cases in the Google codebase or GitHub code where the change would violate the expected behavior. The decision to break the backwards compatibility here came only after determining how few codebases would be affected and developing a mechanism in Go itself (the version specification in go.mod) to require actively modifying the code to build with the new behavior.
Python has the same problem (to the extent that it's actually a problem, which you might or might not agree with), and this is the #1 reason they won't change it.
Yeah I don't think it's so much "we explicitly rely on this behavior, how dare you change this" as "somewhere in our mountains of maintenance-mode code that haven't seen the sun shine through an editor window in years, this behavior cancels out another bug that we never noticed". Tooling should be able to detect when code relies on this, but it's still gonna cost some non-zero amount of developer effort to touch ancient code and safely roll out a new version if it needs to be actively addressed.
If you have tests and they break with GOEXPERIMENT=loopvar, then there is a new tool that will tell you exactly which loop is causing the breakage. That's a post for a few weeks from now.
Neither can I, but there may be cases of code accidentally relying on it - there's an adage that I forgot the name of that says just that, and I think compiler manufacturers are the most aware of that adage.
They do. Go has avoided most of the pitfalls that other language eco-systems have fallen for over the years (backwards compatibility issues, soft forks masquerading as language improvements, re-booting the whole language under the same name, aggressively pushing down on other languages etc). They've done remarkably well in those respects, and should deserve huge credit for it.
Yes, the loopvar change will break some programs, and hence the compatibility promise. But the Go team argues that the change will fix much more programs than it will break [1].
This makes me wonder, though, what guarantees that a similar breaking change won't ever happen again in the future? If any change with #(programs fixed) >> #(programs broken) is accepted, we might as well remove the compatibility promise page [2].
GPT-4 says: The behavior you're observing is due to the late binding nature of closures in Python. When you use a lambda inside a list comprehension (or any loop), it captures a reference to the variable x, not its current value. By the time you call funcs[0](), x has already been set to the last value in the range, which is 2.
To get the desired behavior, you can pass x as a default argument to the lambda:
funcs = [(lambda x=x: x) for x in range(3)]
funcs[0]() # outputs 0
I've written a tiny bit of Go and am aware of the general problem this solves. I don't get their more subtle examples (the letsencrypt one or "range c.informerMap" vs "range alarms".
When you do "for k, v := range someMap", is "v" of the map's value type (and one binding for the whole loop, copied before each iteration)? This would explain the problem, but I would have expected "v" to be a reference into the map, and I couldn't find the answer in a quick skim of the "For statements with range clause" in the spec. I'm probably looking in the wrong place because I touch Go rarely...
Go doesn’t support pointers to map keys or values. It does support pointers to array slots, but for-range copies each slot rather than giving you a pointer to it.
I suppose that makes sense when I think about it for a bit. My recent expectations come from work in Rust. There the language prevents you from mutating a map while holding a reference into it. Go doesn't have a mechanism to prevent that (except the one you said, simply not supporting those references at all). If you had a reference into a map that was resized because of a subsequent mutation, your reference would have to keep the whole previous map alive and point to different memory than a reference acquired since then. Both seem undesirable.
With array slots, the same issue is present but is a bit more explicit because those resizes happen with `mySlice = append(mySlice, ...)`.
I think the slice append semantics are very error-prone, and it would have been better if a slice was a shareable reference to a single mutable thing, like a map (or a list from Python or Java or …)
That's silly. Language constructs and APIs are always made with expectations for how they're used, stated or not. You can write code that compiles without understanding and matching those expectations but it probably won't be good code.
I'm asking because I think if it were expected that folks used large/expensive-to-copy map values, this construct would return a reference instead of copying. In Rust for example, the std library's "normal" [1] iterators return references.
The peer comments along the lines of "the expecation is it does what it does" are not so helpful from a perspective of learning to write code that is in harmony with the language philosophy.
They're asking that, if the programmer wants the map to store expensivetocopyvalue semantically but also doesn't want to have iteration generate expensive copies, does the programmer have to change the map to store *expensivetocopyvalue instead?
Anyway I believe the answer is that expensivetocopyvalue is not a type that exists in golang, because golang's "copy" operation is always a simple bitwise copy ala C struct copy / Rust's Copy trait, not like C++ copy ctor / Rust's Clone trait that can be arbitrarily expensive.
In Go, `expensivetocopyvalue` can still be achieved via an enormous (e.g. multi-KB/MB) structure (which is most literally expensive to copy) or something containing a lot of pointers (which is not really expensive to copy but will start to pressure the GC).
> as a consequence of our forward compatibility work, Go 1.21 will not attempt to compile code that declares go 1.22 or later. We included a special case with the same effect in the point releases Go 1.20.8 and Go 1.19.13, so when Go 1.22 is released, code written depending on the new semantics will never be compiled with the old semantics, unless people are using very old, unsupported Go versions
How does this work? If I pull in a package that decided to pin 1.22 (as they should) and I compile with 1.18, would it compile or error that I need to use the 1.22 compiler?
They did something sneaky. In go 1.21, they changed the version number format in the `go.mod` files. So trying to build with go 1.18 will result in:
go: errors parsing go.mod:
go.mod:3: invalid go version '1.21.0': must match format 1.23
However, this will only happen if you use the `go mod init` to create your module. If you manually specify `go 1.21` in your `go.mod`, it will build without complaining.
Interestingly, though, if you use Go 1.21 and a module declares a later version of Go, the default behavior is actually to go fetch a newer toolchain and use it instead[1]. It's a pretty cool feature, but I am a bit on the fence due to the fact that it is surprising and phones home to Google-controlled servers to fetch the binaries. That and the module proxy are for sure two of the most conflicting features in Go and I'd feel a lot better about them if Go was controlled by a foundation that Google merely had a stake in. Alas.
edit: Actually, though, I just realized what I am talking about is different than what you and your quote is talking about, which is what happens when you have a dependency that declares a different version, not the current module. Oops.
"But as go is not fixing security bugs in old releases and std library, I think it is dangerous to use them anyway."
A bit of a harsh way to phrase that. In my experience, the backwards compatibility promises have been very good, and the way you stay up-to-date with security fixes and bugs in the standard library is to upgrade Go.
I know that may strike terror in the hearts of developers used to the nightmare that major version upgrades can be in other languages, where a major version upgrade gets a multi-week task added into the task tracker, but it's completely routine for me to upgrade across Go major versions just to get some particular fix or to play with a new feature. I expect it to be a roughly five minute task, routinely.
The only thing that has bitten me about it is arguably not even Go's fault, which is its continuing advances in TLS security and the increasing fussiness with which it treats things connecting with old-style certificates. I can't even necessarily disagree... I would also like to upgrade them but while it's my server, the clients connecting to it are using certs that are not mine and it's out of my control.
> A bit of a harsh way to phrase that. In my experience, the backwards compatibility promises have been very good, and the way you stay up-to-date with security fixes and bugs in the standard library is to upgrade Go.
I don’t think we disagree? There is no reason to use old version of go.
I speak about grandparent comment who wanted to still run go1.18. It is not a good idea to still run go1.18, as it doesn’t get security updates.
for _, informer := range c.informerMap {
informer := informer
go informer.Run(stopCh)
}
for _, a := range alarms {
a := a
go a.Monitor(b)
}
Not sure what the difference could be, but let me take a guess. In one case, the loop variable is a pointer, and in the other case a value. The method call uses a pointer receiver, so in the value case the compiler automatically inserts a reference to the receiver?
The difference is that in one case, informer is an interface, so the method call resolves informer.Run immediately and there's no issue. In the other case, a is a struct Alarm, and gets copied by value, and the Monitor method takes a pointer receiver. So my original intuition was right, the compiler is essentially translating
go a.Monitor(b)
into
go (&a).Monitor(b)
Which has a reference to the loop variable, and creates an issue.
Since iterating on maps in Go always results in copying the value then I'd guess the first piece of code does what is expected due to that, and the second does not because `a` will only ever have the value of the last element of `alarms` (the original problem described in the article).
I'm looking at the naming, the top one is a map, the bottom one is a slice; that's where my internal knowledge ends though. I know a slice will have a backing array on the heap so there's some pointers / references involved.
foo, err := getFoo()
if err != nil ...
bar, err := getBar()
fmt.Println(bar)
Misses an error check on getBar. Scoping rules mean that
if foo, err := getFoo(); err != nil
Gets untenable with nesting fairly quickly.
It also introduces invalid states - what does getFoo return if it returns an error too. Do we change the API to return a pointer and return nil, or do we have a partially constructed object that is in an invalid state?
Unfortunately, the distinction is sometimes (though rarely) useful and would be a far more disruptive change than this one. I think you would actually need a Go v2 to change it.
Most commonly no impact. It can require an additional heap allocation per iteration if taking the address or capturing in a closure, but even in those cases escape analysis may be able to determine that the value can remain on the stack because it will not remain referenced longer than the current loop iteration. If that happens then this change has no impact.
I'm not sure how thorough Go's escape analysis is, but nearly all programs that capture the loop variable in a closure and are not buggy right now could be shown to have that closure not escape by a sufficient thorough escape analysis. On the other hand for existing buggy programs, then perf hit is the same as assigning a variable and capturing that (the normal fix for the bug).
Google saw no statistically significant change in their benchmarks or internal applications.
Can somebody please explain to me why this doesn’t constitute a major version due to a breaking change? Maybe I didn’t read precisely enough but it sure sounds like a breaking semantic, esp. with the fact that “this will only work for versions 1.22 and later.” Sounds like a version upgrade trap to me? What am I missing?
Or is it just because it’s Golang and they’re “we’ll never release a go v2 even if we actually do release go v2 and call it v1.x”
Seems like a pragmatic decision where the breakyness of the change is mitigated by the module versioning thing. Old code gets the old behavior, code written in newly created or updated modules gets the new behavior. Everybody is happy, compared to the alternative where this ships in a mythical go v2 which nobody uses while this sort of bug keeps sneaking people's actual work.
In practice it won't break anything, unless there is code that accidentally relies on this behaviour. Most of the code affected by this will already have a workaround - the `x := x` mentioned - which can be removed after applying this change.
It's debatable if anything non-automatic counts as a "fix".
Nim has for-loop macros that should make it possible to instead say `for i in captured(0..5): ...` or maybe `for closed(i) in 0..5: ...`. That might be a bit nicer, but would still be non-automatic. So, unnecessary qualifications can exist / persist.
Speaking of persistence, sometimes loop bodies evolve and the x:=x magic used to be needed but is no longer. Version control history might help decide if the extra step was unnecessary or vestigial, though it would surely slow things down (and perhaps run into intermediate uncompilable states of the code). No idea if David Chase and rsc looked at that aspect in their analysis.
I’m new to Go for a significant project this year and like a lot of it, but there are a few things about its philosophy that just doesn’t click for me and the versioning policy is one of them.
To me, this clearly should be considered a breaking change, which I normally would expect to look out for when the major version number changes. I get that checking module definitions means unchanged code won’t break, but it might break code in ways I as a programmer would not expect when upgrading minor versions. It might be technically correct according to some definition, but lacks practicality, which has been a recurring feeling for me as I get into the language.
For migrating, I wonder if there are any tools that could, let's say, go through your codebase and add a "// TODO - check" to every place that might be affected.
I know for previous code changes they had a tool that would just do a find & replace for you.
But you're describing a linter, which just outputs a line on your terminal with a warning; I wouldn't want a tool like that to add churn and tasks to my codebase (even though I'm guilty of adding TODOs myself and leaving them for years because ultimately they're not important enough)
Is this a common way to fix problems in language syntax? It seems unintuitive to me. Now you need to know what version is declared in one file to understand behavior in another file. I understand they want to fix this but I did not know this way was allowed.
I don't think it is common. But probably the best option in this case.
The next best alternative is introducing a new construct for this version. But then you either risk people still using the old one or you need to break lots of fine code by removing the old construct. So in this case the "in place" upgrade made the most sense.
Tying it to the declared compiler version is much like Rust's edition system or Perl's versioning, except tacked into an existing identifier rather than a separate variable. (The downside being that you are forced to make this upgrade at some point if you want to raise your minimum toolchains version. )
I hope this experiment fails. It is one cherry picked do-what-I-mean feature that muddies the Go1 Promise and will almost always be covering up a subtle logic error. It also adds a bit more historical knowledge you have to remember: In early part of the second thousand twenty fourth year of our Lord, in version 1.22.0, a subtle change was made that maybe ignored on a per file basis or per module basis as has been done previously in the future as you might recall in 1.20.8 and 1.19.13.
If it doesn't fail I have a couple more ideas, if the compiler can prove my double only ever interacts with ints then... just do what I mean it's provably correct.
package main
// YUM! I WANT MOAR DWIMMY.... AND SIGILS AND BLESS AND UNLESS
func easy(one int, won float64) {
print(one + won)
}
JavaScript had a very similar problem. If the loop variable is declared with the old `var` then it will not capture the variable. “New-style” variables declared with `let` are scoped to the loop. Although, I have to point JS started talking about making this change almost 20 years ago. As a JS developer, it’s surprising to me to see Go having to make this change now.
Go continues to be my favorite language and experience to build and maintain in.
They got so much right from the start, then have managed to make consistent well reasoned, meaningful and safe improvements to the language over the years.
It’s not perfect, nothing is, but the “cost” of maintaining old code is so much lower compared to pretty much every other language I have used.
Go is such a productive language to work with, it's absolutely mind blowing how little adoption it has around where I live. Well I guess Lunar went from node to java to go, and harvested insane benefits from it, but a lot of places have issues moving into new languages. Not that I think that you should necessarily swap to a new hipster tech, I really don't, but Go is really the first language we've worked with that competes with Python as far as productivity goes. At least in my experience.
We'll likely continue using Typescript as our main language for a while since we're a small team and it lets us share resources better, but we're definitely keeping an eye on Go.
I typically develop in Python, C++, and Typescript, and recently had to implement some code in Go. So far I've found it a pretty unpleasant language to use. It feels pedantic when I don't need it to be, and yet I have to deal with `interface{}` all over the place. Simple things that would be a one-liner Python or TS (or even just an std::algorithm and a lambda in C++) feel like pulling teeth to me in Go.
I'd love to hear of any resources that can help me understand the Zen of Go, because so far I just don't get it.
I write Go every day, and can count the number of times per year I have to involve an `interface{}` literal on one hand. Unless you're doing JSON wrong or working with an API that simply doesn't care about returning consistently structured data, I can't fathom why you'd be using it "all over the place."
Me too. We have around a dozen go services and I have maybe used or seen interface{} once or twice for a hack. Especially after generics. I think the parent comment is suffering from poor quality go code. It’s like complaining about typescript because things in your codebase don’t have types
Applications should basically never need to write custom Scanner/Valuer functions that deal in interface{}, if you find yourself doing that it's a red flag
You discovered the Zen of Go. There are no magic one liners. It's boring, explicit and procedural.
Proponents argue that this forced simplicity enhances productivity on a larger organisational scale when you take things such as onboarding into account.
I'm not sure if that is true. I also think a senior Python / Java / etc resource is going to be more productive than a senior Go resource.
Yes, pretty much. It's a pain to write, but easy to read. On a larger scale the average engineer likely spends more time reading code than writing code
I don't find go that easy to read. It is so verbose that the actual business logic ends up buried in a lot of boilerplate code. Maybe I'm bad at reading code, but it ends up being a lot of text to read for very little information.
Like a one-line list comprehension to transform a collection is suddenly four lines of go: allocation, loop iteration, and append (don't even start me on the append function). I don't care about those housekeeping details. Let me read the business logic.
It's a tradeoff; I too find one-liner list comprehensions like simple transforms or filters easier to read than the for loop equivalent.
However, it's a dangerous tool that some people just can't be trusted with. Second, if you go full FP style, then you can't just hire a Go developer, they need additional training to become productive.
There was another great resource that explains why functional programming in Go is a Bad Idea; one is function syntax (there's no shorthand (yet?)), the other is performance (no tail call optimization), and another is Go's formatter will make it very convoluted; I think it was this one: https://www.jerf.org/iri/post/2955/
First time I hear that list comprehension is a dangerous tool. The way python implements it is awkward I'll give you that, but there is a lot of success in how Java and C# implement it for example. golang just chose the easy and overly verbose way out, it's a theme they have that is visible in the rest of the language.
Go offers a programming interface at a lower level of abstraction than languages like Python or Ruby. What you call boilerplate or housekeeping, I consider to be mechanical sympathy.
Modulo extremes like Java, the bottleneck for programmers understanding code is about semantics, not syntax -- effectively never the literal SLoC in source files. It's not as if
for i := range x {
x[i] = fn(x[i])
}
is any slower to read, or more difficult to parse, or whatever, than e.g.
You don't need to go the python or ruby route to get such benefits. I daily write rust that has a pretty comprehensive iterator system, while still getting the nitty-gritty in your hands. As some other commenter put it, `x.iter().map(function).collect()` is mentally translated to "apply function to the collection x" at a glance.
between
var y []int
for _, x := range x {
y = append(y, function(x))
}
and
let y = x.iter().map(function).collect();
I'll take the second form any day. You express the flow of information, and think about transformations to your collections.
So my 2¢ as someone who's just been skimming this thread: I read the second example faster. I mean it's like 2 seconds vs 5 seconds, but in the first I have to actually read your loop to see what it's doing, whereas in the latter I can just go "oh apply fn over x".
It's definitely less clear though, in that it involves an if statement with an assignment, three temporary variable declarations, etc. Also, type inference won't detect the type of the output automatically from the transform function type, and this of course assumes you wanted to collect into a slice, but it could be a set, or a list.
For some operations, the Go style of explicit, mostly in-place mutations produces more complicated code. Whether that's balanced out by the code being "simpler" is not clear to me, but I haven't worked with Go.
I see it as unambiguously more clear, because it makes explicit what the machine will be doing when executing the code. Whether map/filter copy values, or mutate in-place, or etc. etc. is non-obvious. I mean I'm not saying my way is the only way and I appreciate that other people have different perspectives but I just want to make it clear that "clear" in code isn't an objective measure, that's all.
I agree that there are no objective measure. I guess it's just different expectations.
I would not say it's obvious what the machine is doing in the Go example though. For example it wasn't clear to me that append() mostly doesn't copy the full vector, but does a copy of the slice pointer. I had to look it up from a blog post, because the source for append() is gnarly
> For example it wasn't clear to me that append() mostly doesn't copy the full vector, but does a copy of the slice pointer.
Well I guess you do have to grok the language spec and semantics in order to understand how builtins like append behave, I'm not sure that's avoidable.
It's fine that you judge it that way, but it's not like that judgment is any kind of objective truth. I find it superior to the FP version because it is less ambiguous.
Now add filtering and groupBy and watch that loop become several dozen lines. I worked on one of the largest golang codebases in existence, and it's definitely harder to see the underlying logic compared to something like Java or C#.
Mechanical sympathy in Go? When you think you've seen it all...
Go is not a high performance language which made a lot of decisions that don't lend its usage to be nice in scenarios where people want C and Rust. However, with the hype around it, the management continues to make decision, to everyone's detriment, to utilize Go in performance sensitive infrastructure code which one could write in Rust or C# and achieve much higher performance.
Go is all about mechanical sympathy, and is absolutely a high performance language. I guess it all depends on your context, though. If you're used to writing assembly or C, things may look different.
(A "for" loop expresses much more mechanical sympathy than a list comprehension, as an example.)
But at least in the context of application services -- programs that run on servers and respond to requests, typically over HTTP -- Go is the language to beat. I've yet to see an example of a program where the Rust implementation is meaningfully more performant than the Go implementation, and I've got plenty of examples where the Go implementation is much better.
This quickly becomes more difficult to read, especially if you want to chain some operations this way.
But this applies to other languages as well, in JS (which has a function shorthand) I prefer to extract the predicates and give them a meaningful name.
I don't write much Golang -- mostly using it for my own needs because it allows quick iteration but haven't made a career out of it -- but for any such cases I just extract out the function. I deeply despise such inline declarations, they are a sign of somebody trying to be too clever and that inevitably ends up inconveniencing everyone else, their future selves included.
This is partially correct. It is made to solve internal Google's politics and the hubris of yet another graudate with a CS degree and an itch to justify hours he or she invested in practicing Leetcode (which, in turns, is a skill of writing mediocre stdlib code for languages that have inadequate stdlib)
One of the big things that I’ve found helped is to “stop being an architect”. Basically defer abstraction more.
People, esp from a Java-esque class based world want class inheritance and generics and all that jazz. I’ve found at work like 50% of methods and logic that has some sort of generic/superclass/OOP style abstraction feature only ever has 1 implemented type. Just use that type and when the second one shows up… then try to make some sort of abstraction.
For context, I can’t remember the last time that I actually used “interface{}”. Actual interfaces are cheap in go, so you can define the interface at use-time and pretty cheaply add the methods (or a wrapper) if needed.
If you’re actually doing abstract algorithms and stuff every day at work… you’re in the minority so I don’t know but all the CRUD type services are pretty ergonomic when you realize YAGNI when it comes to those extra abstractions.
Edit: also f** one liners. Make it 2 or three lines. It’s ok.
If I asked you to carve wood, would you prefer a carving knife or a Victorinox multipurpose tool? I get that it’s a bit or a cheesy analogy, but it’s basically why I liked Go. To me it’s the language that Python would have been if Python hasn’t been designed so long a go and is now caught in its myriad of opinions. Because I certainly get why you wouldn’t like an opinionated language, I really do. It’s just that after more than a decade, often spent cleaning up code for businesses that needed something to work better, I’ve really come to appreciate it when things are very simple and maintainable, and Go does that.
Similarly I’m not sure you would like working with Typescript in my team. Our linter is extremely pedantic, and will sometimes force you to write multiple lines of code for what could probably have been a one liner. Not always, mind you, but for the things we know will cause problems for some new hire down the line. (Or for yourself if you’re like me and can’t remember what you ate for breakfast). The smaller the responsibility, the less abstraction and the cleaner your code the easier it’ll be to do something with in 6+ months. Now, our linter is a total fascist, but it’s a group effort. We each contribute and we alter it to make it make sense for us as a team, and that’s frankly great. It’s nice that the ability to do this, and the ability to build in-house packages, is so easy in the Node ecosystem, but it’s still a lot of work that Go basically does for you.
So the zen is in relinquishing your freedom to architect the “linguistics” of your code and simply work on what really matters.
Boring is good when you want to build things that are maintainable by 100s of devs.
Something we have experienced over and over is that devs moving from languages like C# or Java just love how easy and straight forwarding developing in Go is. They pick it up in a week or two, the tool chain is just so simple, there's no arguing around what languages features we can and can't use.
Almost everyone I've spoke to finds it incredibly productive. These people want to be delivering features and products and it makes it easy for them to do so.
Maybe a 100 devs Go is fine, but it gets to be a nightmare as you scale beyond that.
Language abstractions exist to prevent having developers build their own ad-hoc abstractions, and you find this time and time again in languages like Go. You can read the Kubernetes code and see what I mean, they go out of their way to work around some of the missing language features.
Yeah, and that nightmare gets even worse in other languages; one motivation for creating Go was the use of C/C++ by thousands of developers at Google.
Can you link to some of these workarounds? I'm curious to see whether they actually make a lot of difference. In theory (and I have no experience with any software project with more than ten developers working on it), they only made it more difficult by adding cleverness.
> They got so much right from the start, then have managed to make consistent well reasoned, meaningful and safe improvements to the language over the years
In which universe? They have to constantly patch the language up and go back on previous assumptions.
Fast compiler, simple tooling, baked in fmt, simple cross platform compilation, decent standard library, a tendency towards good enough performance if doing things the Go way, async without function coloring. They got some things right and some things wrong. When tossing out orthodoxy, you’ll tend to get some things wrong. I think a lack of sum types is my biggest gripe.
The std library is a big part of the magic. It’s so shocking to go to JS land and see that there are 10 different 3rd party libraries to make http requests, all with wildly different ergonomics all within one code base due to cross dependency heck.
In Go there’s pretty much only the http package, and any 3rd party packages extend it and have the same ergonomics.
For a while my biggest gripe was package management but it’s a dream where we are now.
That's the first language change that can in theory break programs (in practice, it won't). Everything else was just additions to the existing language with full backwards compatibility. That's the opposite of constantly patching the language up.
Robert Griesemer, Rob Pike, and Ken Thompson are objectively smart men and pioneers and have learned from lots of mistakes both they and the industry made.
Go embodied a lot of those learnings out of the gate.
If the bar is to be perfect out of the gate that’s impossible and I can’t think of any language that could pretend to be so.
Go was very good out of the gate and has slowly but surely evolved to be great.
That's the embodiment of Go though; they didn't rush to implement generics because they didn't want to repeat Java's mistakes (just have a look at http://www.angelikalanger.com/GenericsFAQ/FAQSections/TypePa...). They took their time with a module / dependency system to avoid the mistakes that NodeJS made. Every decision in Go is deliberate.
Sure, it may not be perfect, or as elegant as other languages, but it's consistent and predictable.
Generics made Java’s type system unsound due to an unanticipated edge case. The Go team got a bunch of top-class type theorists (e.g. Phil Wadler) to make sure that Go’s generics implementation doesn’t introduce a similar problem.
>just an academically interesting property - had zero real life impact.
Only because Java runs on the JVM, which preserves enough type information that an exception will be thrown if you try to treat a list of String as a list of Integer. In a language that compiles to native code with full type erasure, that could give rise to serious security problems.
You can just imagine what Go critics would be saying on HN if Go's generics implementation allowed that kind of thing to happen!
No, you need to go way way out of your way to make a real world example for that - otherwise it would have been discovered by a bug, not by academics. It’s similar to Java’s generics being Turing complete - cool, but you can never make use of that/write it accidentally.
I am not as inclined as you to think that accidentally creating an unsound type system is no big deal. Soundness is exactly the property that makes type systems useful. Note the adverb at the beginning of the paper’s abstract:
> Fortunately, parametric polymorphism was not integrated into the Java Virtual Machine (JVM), so these examples do not demonstrate any unsoundness of the JVM.
It's somewhat amusing to see Go rediscover old ideas in programming language theory, given the stance against PLT that the Go developers took in the early years of the language.
The entire story of go seems to be learning through repeating the same mistakes as other languages, one at a time.
Nil being another big one. Even more impressively they doubled down on this mistake with typed nils. Even if you explicitly do a comparison with nil you can still shoot yourself in the foot because it was a different nil than the one you compared against.
And yet, they started off with a language that fixes the mistakes of the main language it tried to replace - c / c++. Mistakes like pointer arithmetic and manual memory management, not zeroing out memory before use, compile times, the list goes on.
And generics is an example of a language feature they took their time for, to avoid making the same mistakes as e.g. Java did, where generics ended up taking up half the language spec and compiler / runtime implementation.
func foo() *bar {
// ...
if something_wrong {
return nil;
}
}
var x interface{}
x = bar()
if x != nil {
// Dereference x
}
This will crash if `foo()` returns nil, because it's checking if `x == interface{}(nil)`, which is false. What you wanted to check was whether `x == *bar{nil}` or one of the other nil types that implements the interface; which must be done with `reflect.ValueOf(x).IsNil()`.
This is a great example of code that compiles, but would never pass code review at any decent organization. Specifically, you'd never assign a concrete return value like *bar to an interface{} and expect `x != nil` to behave like this code would imply.
Yes, it’s a contrived example. But it’s not like this is some obscure thing that no go programmer has ever run into in practice. It’s something I’d wager almost everyone has encountered if they’ve used it longer than a year.
If Dave Cheney says it hits every go programmer at least once, it caused hours of consternation for his coworkers, and it even has its own entry in the language FAQ, I don’t know what else to tell you.
Yes, it's a not-uncommon gotcha or foot-gun. No argument there. But, like many other gotchas and foot-guns, they are not too difficult to spot in code review.
...if you have experienced golang developers who have scars on their feet. If you have someone who's an experienced developer but only used golang for a few months, they might not catch it, which means a hard-to-find bug that got into your code.
Furthermore, even for experienced developers, there's a limit to how much context / rules / whatever your brain can keep. This footgun takes up space and intellectual energy that could be used for something else.
All things being equal, a language that doesn't have this kind of footgun is better than one that does: less experienced reviewers will let fewer bugs slip through, and more experienced reviewers will either spend less effort reviewing (meaning the mental energy can be used somewhere else) or will have more review capacity (meaning they'll find more bugs / improve the code more).
This is the actual code that caused me to write the ticket above (be warned, I wouldn't consider it amazing code; my first foray into writing a web app as a side project, just trying to get something that works):
Basically, I have several pages I'm rendering, which have common prerequisites regarding checks, and common handling processes (passing some sanitized data to a template). The *GetDisplay() functions take a structure from the "database" layer and sanitize it / process it for handing to the templates. The two *GetDisplay() functions return pointers to two different types, appropriate for the template to which they will be passed; and return nil if there's an issue.
So I have a map, `data` of type `map[string]interface{}` that I pass into the templates; and two different paths set `data["Display"]`; then at the end I want to check if either of the `*GetDisplay()` functions returned `nil`. So naturally, the first version of the code checked `data["Display"] == nil`, which was always false, since it was implicitly checking `data["Display"] == interface{}(nil)`, but the value in case of an error would be either `*DiscussionDisplay(nil)` or `*UserDisplay(nil)`.
I mean, sure, there are other ways to structure this; I could return an error or a boolean in addition to returning nil. But 1) the only reason to do that is to work around this language limitation 2) it's a "foot gun" that it's easy to fall into.
And sure, a golang developer who'd shot themselves in the foot a few times with this would catch it during review; but I don't think a bunch of newer developers would catch it, even if they had extensive experience in other languages.
So this is the problem, basically. Go isn't a dynamically typed language, and doesn't really let you create an arbitrary map of keys to objects like e.g. Javascript or Python does. Any time you see `map[something]interface{}` that's a huge red flag that something is fucky. In your case you want to define `data` as a struct type with a Display field (and whatever else).
if ... reflect.ValueOf(display).IsNil() {
Any use of `package reflect` in application code is a similarly huge red flag. 99 times out of 100 it's a design error that ought to be fixed.
> In your case you want to define `data` as a struct type with a Display field (and whatever else).
So first of all, the reason things are defined that way is to interact with the golang templating libraries. Secondly, your suggestion wouldn't really solve the issue in this case: content of "Display" is different for each web page, and so the only way to assign both types to the same value is to make it an interface.
> Any use of `package reflect` in application code is a similarly huge red flag. 99 times out of 100 it's a design error that ought to be fixed.
I'm not using reflect for fun; there is literally no other way to check for nil with interfaces (other than manually checking nil for all possible types).
At any rate, I wrote code in a way that's intuitive, at least to a C programmer (using 'nil' value as an indicator that there was an error); the code had a bug. Sure I could have rearchitected the whole function, and if this were a commercial product I may have. But it's a simple webapp to help scheduling discussions at my project's conferences; a quick fix that robustly works around Golang's deficiencies is perfectly reasonable.
EDIT: And honestly, there are exactly three ways of addressing this:
1. Separating the two page paths, duplicating all the logic which is common to the two. This makes it less DRY, which risks checks becoming inconsistent, increasing the chance that there will be a security issue.
2. Make the *GetDisplay() functions return a second value to indicate failure. This is honestly kind of a dumb thing to do to work around a language deficiency.
3. Continue to use 'nil' to indicate failure, and fix the check to be able to properly check for nil. This can be done by listing out the various possible values of 'nil', which is ugly, annoying, and fragile (since it would silently break if we added a third type); or it can be done using reflection.
#3 is obviously the most reasonable thing to do here.
I mean, even ignoring all of the interface{} design errors, the simple fix here is just
if _, ok := data["Display"]; !ok {
<code>
To reiterate, you should almost never need to interact with `interface{}` values in application code. If you find yourself trying to use, inspect, check, or otherwise program against `interface{}` values in application code, it almost always means that you're fighting the language, and that you need to change your approach to your problem.
Programming languages are an exercise in compromise, not pure application of theory. Well, except maybe Haskell and it's ilk, but this should be your expectation of most languages and generally not a surprise.
I think OP’s point is that given the go devs casual disregard for every development in PL theory and design over the last 30 years, it’s amusing to watch them rediscover half the issues from scratch.
If they're aware of how other languages either handle these issues or suffer the consequences of not handling them, then it sure is odd that they consciously decided to introduce the same issues into their own language, only to fix them down the road.
Given Go’s success, it seems like fixing certain footguns much later actually worked out pretty well for them? That doesn’t necessarily mean it was right, but it was perhaps not as big a deal as some people assume.
Sum/Product types. Generics/type-parameters. Bizarre handling of nil in places. Error handling that’s like some deliberately crippled version of a Result<T,E>. The absolutely unhinged decision about zero-value-defaults for types. I’m sure other people can think of some more, but that’s the ones I can think of off the top of my head.
Non PL theory but related: incorrect implementation of monotonic clocks, and then refusing to fix it because “just use google smear time bro”.
But they’re like, genuinely wild decisions? Unhinged is a great description!
The error design can return the value, or an error, or both, for some inexplicable reason and you can check it - or not - and if your function returned some indication of severe error, and you happen to not check it, you can totes just continue your program, in who knows what invalid state. Oh and also, because the devs are apparently deathly allergic to abstractions of apparently kind, you’ve got to do this janky little if err != nil check at. every. single. point. Which occludes your fundamentally important logic in pointless line noise, with zero opportunity for streamlining or improving the logic or guarantees.
There's no need for compromise in this case. This issue is something I learned about in the late 90s / early 2000s when I started reading about programming language theory. It's found in introductory textbooks.
It's not making any claims about the quality of design in JS here. I'm literally not talking about them in the same sentence for this reason. I'm, instead, merely noting that it had the same issue. Looks like it was fixed in 2014: https://nullprogram.com/blog/2014/06/06/ C# also had the same issue, as noted elsewhere in the comments.
It simply seems disingenuous to talk about how this was an issue in JS and then go on to say that Go devs have a "stance against PLT". JS is not renowned for being an exemplar of PLT, so why would the Go developers use it as a reference point for their own design? JavaScript can't even take the address of a variable - which is the underlying problem here.
In any case - I seem to remember that they discussed the rationale for the original decision in the release notes for go 1.21, along with additional context.
For whatever it's worth, I don't see any evidence that Go is specifically antagonistic to programming language theory at all - the existence of first-class constructs like channels and closures suggests otherwise. There are always costs and tradeoffs involved in adopting certain theoretical paradigms, and PLT is subject to fashion as much as any other endeavour.
Go focusses on simplicity, and when talking about simplicity I really like this quote from Dijkstra:
"Simplicity requires hard work to be obtained and education for its appreciation, and complexity sells much better.” [0]
I think Go works hard to be simple, and sometimes that comes across as being simplistic. Indeed, I was sceptical of Go when I set out to learn it, but having spent enough time with it to consider myself a professional Go developer, I also find that enjoy coding more than I have for many years, `err != nil` notwithstanding.
Before let/const, scopes could only be introduced at function level, so @babel/plugin-transform-block-scoping transpiles it using an extra function:
var _loop = function (i) {
a.push(() => i);
};
for (var i = 0; i < 3; i++) {
_loop(i);
}
The key is that the scoping happens for each iteration, not around the entire loop. That detail is nonobvious, given how many other languages have gotten it wrong, but I wouldn’t say it’s wild.
(If you’re curious how Babel deals with the more complicated cases of break/continue, labelled break/continue, and return, try it out at https://babeljs.io/repl.)
Right, the wild thing for me is when you mutate `i` in the loop body. So at the same time `i` is scoped to the iteration so that you can capture it, but also mutating it affects the loop-scoped `i` that is incremented between iterations and checked for the termination condition. The iteration-scoped `i` is assigned back to the loop-scoped `i` at the end of the loop body. So if you have a closure close over `i` in the loop body and mutate it, whether that mutation affects the actual loop variable depends on whether the closure is called during the iteration it was created in or during a later iteration. Kinda spooky, but sure, less of a footgun than the original behavior.
"It must be familiar, roughly C-like. Programmers working at Google are early in their careers and are most familiar with procedural languages, particularly from the C family. The need to get programmers productive quickly in a new language means that the language cannot be too radical."
And not including sum types despite having a sum-type-shaped hole in the language (`if err != nil`).
And some of the discussion about "why no generics" seemed kind of divorced from existing PL knowledge on the topic.
These were all intentional tradeoffs though, not any ignorance of theory. Also, it's pretty rich for someone to be complaining about Go while referencing Javascript of all languages. Javascript's design flaws are legendary. And I mean no disrespect to the creators of Javascript, they had to deal with some crazy last minute change requests to the language.
Ehhh, I see absolutely no evidence that the Go developers were particularly aware of theory. It really feels more like they just were used to thinking in terms of C, and built a language which is kind of like C.
Go also has some really weird stuff in it, such as named return values.
Frankly, the lack of sum types hurts the most. The language would just be a lot better with a unifying Result type in the library. And don't give me any of that "oh, they tried to keep the language simple!" stuff.
Intuitively, sum types are laughably simple. Everyone understands "It's one of these possible values, so you need to check which one it is and then handle that situation." They are more simple than enums on a conceptual level! Sum types are just not how C-programmers think about the world.
As it happens, we considered sum types quite seriously in the early days. In the end we decided that they were too similar to interface types, and that it would not help the language to have two concepts that were very similar but not quite the same. https://groups.google.com/g/golang-nuts/c/-94Fmnz9L6k/m/4BUx...
When I did research on this topic ages ago I read both of these links, and I'm very familiar with the arguments.
I also vehemently disagree with them, and I think that the way code is factually written in practice is on my side: People who propose sum types commonly refer to the Option<T> or Result<S,E> types in Rust. These are types which are almost exclusively used as return types.
Interface types are the opposite. They're used as input types, and almost never to distinguish between a concrete, finite list of alternatives. They are used to describe a contract, to ensure that an input type fulfills a minimal set of requirements, while keeping the API open enough that other people can define their own instances.
The fact that Go in fact does not use interface types for its error handling is a pretty good argument in favor of that, I'd say.
The thing is, at this point it doesn't matter, sadly. Adding sum-types to the language now would be unsatisfying. You would really need proper integration as the primary tool for error handling in the standard library (and the ecosystem), and that's unlikely to happen, even less likely than a Go 2.0.
EDIT: Just to make it clear, I think not wanting to add sum types to the language is understandable at this point. The real shame is that they were not in the language from the beginning.
Go made a deliberate decision to use multiple results rather than Option<T> or Result<S, E>. It's of course entirely reasonable to disagree with that decision, but it's not the case that the Go language designers were unaware of programming language theory or unaware of sum types. Although you suggest that you want sum types as the primary tool for error handling, Go made a different decision, not due to accident or lack of knowledge, but intentionally.
(Separately, I'm not entirely sure what you mean when you say that Go doesn't use interface types for its error handling, since the language-defined error type is in fact an interface type.)
Fwiw, I didn't mean to imply that they don't know any language theory, just that the language doesn't seem to reflect it. I don't think this itself should be a controversial statement, by the way, Go aims to be a simple language, and the last thing it needs is monads.
Frankly, I'm just the type of person who doesn't understand why it is possible to silently drop error values in Go (even accidentally), see
while the language is simultaneously very strict about eg. unused imports.
It seems like a pretty severe flaw for a language that takes pride in its explicit error handling, and a deep dive into why this flaw was acceptable to the creators would be really interesting.
For now though, instead of sum types we ended up with features such as named returns (???). I imagine some of the complexity here was about not wanting to introduce an explicit tuple-type, since a Result<S, E>-type doesn't compose with multiple returns. (I feel like there should be some workaround here. Maybe anonymous structs + some sort of struct/tuple unpacking, but I could see it getting gnarly.)
> I'm not entirely sure what you mean when you say that Go doesn't use interface types for its error handling, since the language-defined error type is in fact an interface type.
What I meant is that this specific usecase of sum-types (ie. error-unwrapping) is not something that interfaces in Go are used for. Error-handling in Go is done via multiple return values. This goes against the common claim that "sum types and interfaces are too similar/have the same uses", and should count for something, considering that explicit error handling is a big component of Go.
I'm not going to claim that Go has the ideal approach to whether an error can be ignored. In general, in Go, some errors can be ignored, and some can't. For example, fmt.Fprintf to a strings.Builder can never result a non-nil error. It's fine to ignore the error returned by fmt.Fprintf in that case. On the other hand, fmt.Fprintf to a os.File can return a meaningful error, and for some programs it's appropriate to check for that error. (Though the issue is complicated by the fact that complex programs probably use bufio which does its own error handling.)
I'm not personally concerned about examples like os.Open, where the result must be used. Sure, you can ignore the error. But a failure will show up very quickly. I'm not saying that this is not an issue at all, but I believe it's a less important one.
Part of the reason for Go's behavior is the idea that "errors are values" (https://go.dev/blog/errors-are-values). And part of it is that the simple fmt.Println("Hello, world") doesn't require any error handling. And part of it is the desire to make error handling clear and explicit in the program, even if the result is verbose.
Having worked on large golang codebases, it shows quite quickly how badly error handling is, the example you mentioned being one of them, as well as others not detected by the compiler or linter. Using multiple return values to model errors is fundamentally broken, as with other things they chose with the language, either deliberately or not.
The claim wasn't that they are unaware, the claim was that it doesn't seem to show in the design of Go.
I think Go is a perfectly fine language, and I respect the goal to stay clear of complexity, but when looking at any particular Go feature, then it's easier to explain the decision with "They are used to thinking in terms of C-idioms." than with any particular brilliance or awareness of PL-theory.
Thompson turns 80 this year. In what years does Thompson become old enough that you start to entertain the possibility that not everything he says or does is the result of having learned and understood all the possibly-relevant work, including the recent work?
I misspent a few thousand hours of my life on the 9fans mailing list long ago when Pike was very active on it, and my non-joking assessment is that ever since he finished his PhD or shortly after, Pike has probably felt he knows all he needs to learn about programming-language design except for the things he and the people in his immediate social environment invent.
Bell Labs was never good at designing programming languages. Did you know that in the Bourne shell (and possibly in all the other shells) you can have a statement of the form $foo = bar which will assign bar to the variable whose name is the value of foo? (Emacs Lisp, an old language, has the same functionality in the form of a function named "set", but most Emacs Lisp programmers know to avoid it.) Well, I found that statement in a shell script written by one the Bell Labs guys, and the shell script was not doing anything fancy like interpreting a programming language or defining a new PL feature (not that it is sane to do either of things in a Unix shell).
None of what I say is more than a wisp of a reason not to choose Golang IMHO.
Have you looked for any evidence? There is plenty.
Sum types have been discussed since before the initial open source release of the language, at least according to some of the issue threads such as https://github.com/golang/go/issues/19412
If they're laughably simple, then please contribute a proposal for how to add them to the language. You'll find no one is really fighting against the concept of sum types.
Go development began circa 2007, I do not believe sum types were common place nor do I believe the average developer was aware of them at the time. I'm not saying you're wrong, but given even Scala didn't add support until 2010 they do not seem like something sorely obvious that should have been there from the beginning. In hindsight, sure.
With all the crazy warts that JS has, it is at least a lisp-like very dynamic language if you squint (a lot) at it. Its greatest fault is probably leaving out integers (I can’t even fathom why they decided on that, floats can’t represent properly ints).
Go is just simply badly designed, relying on hard-coded functionality a lot.
Which is better, hard-coded functionality or magic? I don't believe it's badly designed, since every design decision is deliberate; bad design would be accidental.
Is this loop variable problem really a theory issue (if so, does it have a label)? Or, primarily a practical one? Is there a database of known and historical programming language problems that are nicely tagged with "theory problem", "frequent foot-gun", etc?
Could an LLM coupled with a constraint solver create the perfect language (for particular domains, e.g., performance)? Or, just use Rust ;-)?
In contrast to the view of many, programming language theory is very closely intertwined with programming practice. It both drives practice and reacts to practice.
In this particular case I imagine the issue was uncovered some time in the 1970s, which is when lexical scoping came to the fore in Scheme (in the US) and ML (in Europe). It's a fairly natural problem to run into if you're either thinking deeply about the interaction between closures (which capture environments) and loop constructs, or if you're programming in a language that supports these features.
I have such a love-hate relationship with this language. I use it professionally every single day, and every single day there are moments when I think to myself "this could be solved much more elegantly in language X" or "I wish Go had this feature."
Then again I also can't deny that the lack of ""advanced"" features forces you to keep your code simple, which makes reading easier. So while I hate writing Go, I like reading unfamiliar Go code due to a distinct lack of magic. Go code always clearly spells out what it does.
“Love-hate relationship” were the exact words that I used when I used go professionally every day.
I could complain all day about things the language does obviously wrong, often in the name of simplicity. But after all my complaints I still admit it’s a very good choice for certain kinds of software and software companies.
I’m in the same boat. Every once in a while, I go back and look at my old Haskell, OCaml, and Go code, and I remember why I like Go. Generally, I can hop back into my old code easily. That’s not true with more advanced languages. I just can’t resist the urge to be clever when writing them. OCaml is still pretty nice, though. Not gonna lie.
Long time software engineer, just coming off 4 years of Kotlin into Go. Love-hate describes it for me. It's just not as much fun and feels sterile. I get the whole "just write the damn code" argument, but unfortunately for me I get fulfillment out of writing code, and Go isn't doing it for me. I've been around for a long time and experienced all the various language philosophies. The Go dogma is especially frustrating. Everything I say is met some automatic recycled response. No thanks
The first point cannot bother you after you've correctly realized your second point. The more empathy you have for your future-self or your peers, the clearer it becomes.
Another commenter described it quite succinctly; to paraphrase, Go isn't made for you, it's for all the other developers that will have to work with your code - including future you.
I'll be the first to admit I know almost nothing about go, but it's surprises me to find we're still inventing languages with bobby traps like this, especially bobby traps that were well known and understood in other languages at the time.
Actually it surprises me we're still inventing languages where local variables can be mutated, which seems to be at the root of the problem here
Go has a long list of booby traps like this and prides itself on them. From outside of the Go team it looks like a small cultural shift might slowly be happening, cleaning up some of the obvious mistakes everyone's been telling them about since the beginning. Rob Pike retiring and giving up some formal power with that probably helps.
Speaking as the person Rob Pike handed the formal power to (8 years ago now), I don't think that change has much to do with it.
We've known about the problem for a long time. I have notes from the run up to Go 1 (circa 2011) where we considered making this change, but it didn't seem like a huge problem, and we were concerned about breaking old code, so on balance it didn't seem worth it.
Two things moved the needle on this particular change:
1. A few years ago David Chase took the time to make the change in the compiler and inventory what it broke in a large code base (Google's, but any code base would have worked for that purpose). Seeing that real-world data made it clear that the problem was more serious than we realized and needed to be addressed. That is, it made clear that the positive side of the balance was heavier than we thought it was back in 2011.
2. The design of Go modules added a go version line, which we can key the change off. That completely avoids breaking any old code. That zeroed out the negative side of the balance.
What about let's say 5 years from now, someone digs up a Go project from 2022, decides to get it up to speed for 2028, updates the version line. Is there something that would remind them to check for breaking changes, including this one? Perhaps the go project initializer could add a comment above the version line with a URL with a list of such changes. Though, that wouldn't help for this change.
I think the key difference here is to consider toleration vs adoption. Old code is able to tolerate the changes and still work in new ecosystems. There is still work on maintainers if they want to actually adopt the features themselves. Allowing these two concepts to work together is what allows iteratively updating the world, rather than requiring big bang introduction of features.
As for validating your software, the answer is the same as its always been… tests, tests and more tests.
Huh? Where’s the list? From the top of my head I think this is the only thing that repeatedly bit me, although I’m very aware of the behavior of for loop scoping. Linters save me nowadays at least.
Are there other things like that in the language that deserve a fix? Maybe things to do with json un/marshaling?
It has been a while, but yes, there were a lot of them and I forget most. It made it kinda pointless to me that the language was "easy" when the code felt so brittle (Null pointers...really?).
One weird thing that always goofed me up was that slices are passed by value but maps by reference. Always made it confusing how to pass them for serialization/deserialization. The compiler didn't complain it just panicked. Seemed like something the type system should catch.
> we're still inventing languages where local variables can be mutated
Local mutability is probably one of the most common uses of mutability. A lot of it is using local state to build up a more complicated structure, and then getting rid of that state. Getting rid of that use-case is just giving up performance.
There is a recurring joke about Go's language design ignoring many bits of the general language design knowledge collectively acquired through decades of writing new languages. This change is an example of why this joke exists.
Completely unrelated to the point you’re making, but the phrase is “booby trap”; I believe it originates from pranks played on younger schoolboys in 1600s England (the etymology of booby being the Spanish “bobo”).
Russ Cox and the Go team learned that the loop variable capture semantics are flawed not by reflecting about how their language works, but through user feedback.
This could have been prevented by having one person on the team with actual language design experience, who could point this issue out in the design process.
In this case, after 10 or so years, and thousands of production bugs, they backpedaled. How many other badly designed features exist in the language, and are simply not being acknowledged?
If you point it out, and you're right, will you be heard if you don't have a flashy metric to point to, like a billion dollars lost?
What if the flaw is more subtle, and explaining why it's bad is harder than in this very simple case, that can be illustrated with 5 lines of code? What if the link between it and its consequences isn't that clear, but the consequences are just as grave? Will it ever get fixed?
Others have suggested that Rob Pike and Ken Thompson have some language design experience, to state it mildly. I also want to point out...
> Russ Cox and the Go team learned that the loop variable capture semantics are flawed not by reflecting about how their language works, but through user feedback.
I think "user feedback" isn't the whole story. It's not just the Go team passively listening as users point out obvious flaws. I've noticed in other changes (e.g. the monotonic time change [1]) the Go team has done a pretty disciplined study of user code in Google's monorepo and on github. That's mentioned in this case too. This is a good practice, not evidence of failure.
I'm sure they could come up with a list of language decisions they disagree with.
I'm equally sure that if you asked kubb, kaba0, and three other strongly opinionated folks for a list of good language designers, each of the <5 lists you get back would be very short, and there'd be no overlap between them.
Many languages have made this mistake, despite having engineers and teams with many decades or centuries of total experience working on programming languages. Almost all languages have the loop variable semantics Go chose: C/C++, Java, C# (until 5.0), JavaScript (when using `var`), Python. Honestly: are there any C-like, imperative languages with for loops, that _don't_ behave like this?
That decision only becomes painful when capturing variables by reference becomes cheap and common; that is, when languages introduce lightweight closures (aka lambdas, anonymous functions, ...). Then the semantics of a for loop subtly change. Language designers have frequently implemented lightweight closures before realizing the risk, and then must make a difficult choice of whether to take a painful breaking change.
The Go team can be persuaded, it's just a tall order. And give them credit where credit is due: this is genuinely a significant, breaking change. It's the right change, but it's not an easy decision to make more than a decade into a language's usage.
That said, there may be a kernel of truth to what you're alluding to: that the Go team can be hard to persuade and has taken some principled (I would argue, wrong) positions. I'm tracking several Go bugs myself where I believe the Go standard library behaves incorrectly. But I don't think this situation is the right one to make this argument.
This isn't a bug in java. Java has the idea of "effectively final" variables, and only final or effectively final values are allowed to be passed into lambdas seemingly to avoid this specific defect. Ironically, I just had a review the other day that touched on this go "interaction".
The outcome of this go code in java would be as you'd expect, each lambda generated uses a unique copy of the loop reference value.
Oh, today I learned. I think this was an issue in Scala (with `var`), but this seems like a great compromise for Java core.
I suppose Java had many years after C#'s introduction of closures to reflect on what went well and what did not. Go, created in 2007, predates both languages having lightweight closures. Not surprising that they made the decision they did.
Your comment inspired me to ask what Rust does in this situation, but of course, they've opted for both a different "for" loop construct, but even if they hadn't, the borrow checker enforces a similar requirement as Java's effectively final lambda limitation.
Newcomers to Java usually dislike the "Variable used in lambda expression should be final or effectively final" compiler error, but once you understand why that restriction is in place and what happens in other languages when there's no such restriction, you start to love the subtle genius in how Java did it this way.
Go, designed between 2007 and 2009, certainly had the opportunity to look at their introduction in C# 2.0, released 2005, or its syntactic sugar added in C# 3.0, released 2007.
I think that's an ahistorical reading of events. They did have the opportunity, but there were very few languages doing what Go was at the time it was designed. My recollection of the C# 3 to 5 and .NET 3 to 4.5 is a bit muddled, but it looks like the spec supports a different reading:
C# 3.0 in 2007 introduced arrow syntax. I believe this was primarily to support LINQ, and so users were typically creating closures as arguments to IEnumerable methods, not in a loop.
C# 4.0 in 2010 introduced Task<T> (by virtue of .NET 4), and with this it became much more likely users would create a closure in a loop. That's how users would add tasks to the task pool, after all, from a for loop.
C# 5.0 in 2012 fixes loop variable behavior.
I think the thesis I have is sound: language designers did not predict how loops and lightweight closures would interact to create error-prone code until (by and large) users encountered these issues.
This bug appears to be because Go captures loop variables by reference, but C++ captures are by copy[1] unless user explicitly asked for reference (`&variable`). It seems like the same bug would be visually more obvious in C++.
The change in Javscript doesn’t have anything to do with for…of, it’s the difference between `var` and `let`. And JS made the decision to move to `let` because the semantics made more sense before Go was even created (although code and browsers didn’t update for another several years). That’s why Go is being held to a higher standard, because it’s 10+ years newer than the other languages you mentioned.
This places it nearly 10 years after the creation of Go. And with the exception of Safari, arrow functions were available for months to years prior to let and const.
This is somewhat weak evidence for the thesis though; these features were part of the same specification (ES6/ES2015), but to understand the origin of "let" we also need to look at the proliferation of alternative languages such as Coffeescript. A fuller history of the JavaScript feature, and maybe some of the TC39 meeting minutes, might help us understand the order of operations here.
(I'd be remiss not to observe that this is almost an accident of "let" as well, there's no intrinsic reason it must behave like this in a loop, and some browsers chose to make "var" behave like "let". Let and const were originally introduced, I believe, to implement lexical scoping, not to change loop variable hoisting.)
C# made the mistake not when they introduced loops, but when they introduced closures, and it didn't become evident until other features came along that propelled adoption of closures. Go had closures from the beginning and they were always central to the language design. C# fixed that mistake before the 1.0 release of Go. But the Go team didn't learn from it.
I hate to be that guy but this would not be possible with rust, as the captured reference could not escape the loop scope. Either copy the value, or get yelled at the lifetime of the reference.
This is one of the things the language was designed to fix, by people that looked at the past 50 years or so of programming languages, and decided to fix the sorest pain points.
I would argue that var is an entirely different issue. If variables last the entire function then it's far less confusing to see closures using the final value. After exiting the loop the final value is right there, still assigned to the variable. You can print it directly with no closures needed.
I think the parent was trying to imply that Ken Thompson had no experience in designing a programming language :-)
Seriously though, "having experience" and "getting things right" are two different things, although Golang got a lot of things right, and the parent is being unnecessarily harsh.
I can’t find it now, but I remember some joke about “it’s an interesting language, but why did you ignore the last 50 years of programming language design?”
I find Go quite frustrating in how it decries how over-complicated some features are, and slowly comes around to realize that oh, maybe people designed them for a reason (who woulda thunk it?).
> This could have been prevented by having one person on the team with actual language design experience, who could point this issue out in the design process.
Instead of making a mistake, they could have simply not.
> Russ Cox and the Go team learned that the loop variable capture semantics are flawed not by reflecting about how their language works, but through user feedback.
Since "Go 1" was deemed complete and the "Go 2" project began in 2018, the direction of the language was given to the community. It is no longer the Go team's place to shove whatever they think into the language. It has to be what the community wants, and that feedback showed it is what they want.
The Go team never shoved anything into the language without good reason, and they will not allow the community to shove anything into the language; that's how we got half baked classes in Javascript and half baked functional programming in Java, or the overall trend of languages taking features from other languages because community members say "this language would be better if it had features from this other language" often enough.
The "Go 1" project was centred around the Go Team. They built what they wanted and needed with little regard for the rest of the world. If they felt loop variable capturing was important, they would have added it. Of course, they didn't find it necessary, so it wasn't added.
When "Go 1" reached its natural stopping point and closed down, the "Go 2" project emerged to continue development of Go under the wants and needs of the community. That capture is being added now because the community has shown a desire to have it. The Go Team may use their expertise to guide the community in the right direction, but we are here because the "Go 2" project is community driven.
The original commenter seems unaware that the project changed hands.
> How many other badly designed features exist in the language, and are simply not being acknowledged?
Very few.
> If you point it out, and you're right, will you be heard if you don't have a flashy metric to point to, like a billion dollars lost?
If you're right yet don't have a better idea then what do you expect to occur?
> What if the link between it and its consequences isn't that clear, but the consequences are just as grave?
The consequence is your developers must be careful with loop variables or they will introduce bugs. That's not particularly "grave" nor even especially novel.
I'll admit, it's not a good ivory tower language, but then again, that's probably why I use it so often. It gets the job done and it doesn't waste my time with useless hypothetical features.
> If you point it out, and you're right, will you be heard if you don't have a flashy metric to point to, like a billion dollars lost?
Is this a subtle nod to the billion dollar mistake?
Because they deliberately included the billion dollar mistake as part of the language.
Even if they knew better than to include the billion dollar mistake, they were probably aware that they couldn't make a popular language without including it.
So whats your point ?,
old ideas never die ?,
language design is not language purpose and goal ?,
they made a mistake creating Go ?,
refusing to find something suitable or just break compatibility?
I find Rust syntax challenging to grasp in a specific way. Rust employs numerous symbols and expressions to convey statements, which makes reading Rust code a process of constantly navigating between different keywords, left and right. I have to create a mental map of what certain statements are accomplishing before I can truly comprehend the code.
In contrast, I find Go code relatively straightforward, especially for those familiar with C-like programming languages. This clarity is due to the deliberate verbosity of the language, which I personally appreciate, as well as the use of early return statements.
But don’t get me wrong. I enjoy programming in both Rust and Go when they are suitable for the task at hand, but I usually spend more time grappling with Rust’s syntax than with Go’s, because I often invest more time in understanding the structure and logic of Rust programs compared to their Go counterparts.
At least part of the issue for me was that many keywords/syntax rules don’t match anything I’m familiar with, even considering “C-like” languages.
I have similar issues with Rust actually. There’s a lot of sugar used that you have to grok and that takes some time.
On the other hand Python, C#, Java all stick with a set of fairly familiar conventions. In terms of syntax (and only syntax), the learning curve is more intense with Go; perhaps similar to the initial alienation provided by JavaScript.
My experience has been that once you are being paid to learn a language these problems mostly disappear. Alas, no one ever paid me to learn Go.
I guess it depends on the way the brain works. I have very bad memory, but I prefer the expressiveness of Rust to the verbosity of Go. I value much more having the whole context on the screen that navigating countless words of boilerplate code. I do agree that it gets a bit of getting used to, but I find it easier to recognize by eye.
All you really need in an language is the former - closures provide encapsulation (instance variables become bound variables) and polymorphism (over closures with the same function signature) all the same
This nonsense again? Where it is controlled per module, effectively making it impossible to review code for correctness without checking those controls. This is an anti-feature. If you can't be bothered to make a local copy or reference the correct variable, the problem is the developer. Not the language.