
Go: A Surprising Edge Case Concerning Append and Slice Aliasing - tosh
http://www.jjinux.com/2015/05/go-surprising-edge-case-concerning.html
======
stygiansonic
The Go Blog has an article about slice internals. [0]

After reading that, it is a bit easier to see why this happens. Everything is
a value in Go, whether it's an int, a struct type or a reference type like a
pointer/map/slice. [1]

Since everything is a value, this is not really "aliasing"

    
    
        // b "aliases" a
        b := a
    

It only appears to be by coincidence: The entire slice data structure gets
copied, and one part of that is a pointer to the underlying array, which does
not get copied. This is why updating the underlying array (via `a`) reflects
in `b`.

Since `append()` either modifies the underlying original array, or allocates a
new one if the underlying array is not long enough (classic dynamic array
behaviour) [2], the use of this function will only impact `b` if a new array
is not allocated.

To be fair, the article mentions all of this, but sort of in passing and at
the end. ("They're slices which are a value type that have a pointer within
them.") I feel that this is sort of the key point.

0\. [http://blog.golang.org/go-slices-usage-and-
internals](http://blog.golang.org/go-slices-usage-and-internals)

1\. [https://blog.golang.org/go-maps-in-action](https://blog.golang.org/go-
maps-in-action)

2\.
[http://golang.org/pkg/builtin/#append](http://golang.org/pkg/builtin/#append)

~~~
amelius
Copying a pointer is by definition equal to aliasing.

~~~
barsonme
A slice is:

    
    
      type sliceStruct struct {
    	array unsafe.Pointer
    	len   int
    	cap   int
      }
    

I think you both _mean_ the same thing, just the previous poster is focusing
on the entire struct itself, and you're focusing on the 'array' member of the
struct.

Or I'm just babbling.

~~~
amelius
My point was that hiding a pointer in a struct does not magically make the
aliasing go away.

------
white-flame
"Immutable data structures (like lists in Lisp) don't have these issues."

Lists in Lisp are mutable (in almost all dialects, from the original 1950s
LISP to Common Lisp). You will also have bad issues if you mutate the cells of
lists in Lisp, when there are other references starting at different cells
inside the list. List-mutating forms in the standard library directly warn
about the inputs being destroyed, hence to use the newly returned head instead
of holding on to old pointers.

(It's befuddling how often Lisp gets mentioned on HN, and how often it's
wrong.)

~~~
agumonkey
Maybe it comes from academia. Lisps are used to teach recursive program and
step into purely functional idioms. I wasn't introduced to the effectful APIs
or libraries. Only car, cdr and recursion as primitive means. Also history
rarely reports Lisp being imperative. McCarthy first paper shows a functional
structural symbolic derivator, he was said to be a proponent of recursion when
nobody cared, I think Richard Gabriel said the `progn` form was introduced
a-posteriori to satisfy FORTRAN programmers. I've read only once that Lisp was
planned to have imperative traits from the start. So it's not unnatural to
think of Lisp as a mostly pure recursive language that caved in to allow side-
effects (just like ML has Ref types when it makes sense). It's a bit Edisonian
in how simplified the story has been.

------
mcherm
Clear mutable semantics or clear immutable semantics but please, don't give us
a mushy combination of the two.

In other words, the operation "append to an alias for this slice" should
either consistently modify this slice or consistently not modify this slice.
Behavior of "sometimes modifies this slice and sometimes doesn't" is, in my
opinion, a serious API/language design failure.

~~~
nicksardo
The use of `append` is absolutely clear. It always returns a new slice; the
compiler forces you to take the value returned.

What scares you is the possibility that the underlying array may be copied to
a newer, larger array giving enough space for the appended items. If there
exists enough space, why bother with a new allocation? I want `append` to
handle this for me. If you want different behavior, you can easily setup your
own design with `copy`

Regarding this blog post, if you have multiple slices to the same array and
are arbitrarily appending to any of those slices, then your design is wrong to
begin with.

This isn't an edge-case but a lack of slice understanding.

~~~
tshadwell
Your understanding is incorrect, too. append() only returns a new slice when
"the capacity of s is not large enough to fit the additional values", in which
case "append allocates a new, sufficiently large underlying array that fits
both the existing slice elements and the additional values. Otherwise, append
re-uses the underlying array."

[-]
[http://golang.org/ref/spec#Appending_and_copying_slices](http://golang.org/ref/spec#Appending_and_copying_slices)

~~~
skj
No, nicksardo's understanding was quite correct. Slices are values, and it is
impossible to return the same one, because value semantics don't work like
that.

The slice returned only points to a _new array_ when the capacity is not large
enough to fit additional values.

------
Animats
This is why Rust has a borrow checker.

A slice is a mutable borrow. Rust only allows you one mutable borrow at a time
per object. So does Go, as someone just found out the hard way. Rust checks
this, and Go doesn't.

I've remarked in the past that borrow checking isn't fundamentally restricted
to Go. Now that we have a usable theory of borrowing, it belongs in more
languages.

~~~
SamReidHughes
You don't need a borrow checker, at most from that you'd need is const
annotations. But you don't even need that, all you really need is a wrapper
type that prohibits append. Or if you had just the ability to shrink the
capacity of a slice reference.

But really you don't need any of that, if you're passing a slice to something
and some knucklehead is calling append on it, take him out back and shoot him.

~~~
Animats
It's very easy to accidentally borrow in Go, because Go has implicit reference
types. If you send a slice across a channel, you're sending a reference, not a
copy.

~~~
SamReidHughes
People have been making slices in Go and sending them across channels without
unique ownership and borrow checking for years, and it hasn't been a problem.

~~~
Animats
Are you _sure_ there's not a problem? That's a potential race condition. Are
you sure there's no problem after a few years of changes to the code?

(Claims that there's no need for programmer checking systems are usually
wrong. Go read CERT advisories or go to DefCon. Those are just the
_exploitable_ bugs. I once spent four years debugging other people's code
using mainframe OS crash dumps. Most programmers are not as good as they think
they are.)

~~~
tomjen3
Can you come up with a reasonable way this could turn into an exploitable
security issue? I can see crashes and maybe live locking but Go doesn't have
pointer arithmetic, so I can't see how this could be used to bust the stack.

~~~
SamReidHughes
One user's private information could get sent to another user, or you could
have a subslice that gets "validated" and then overwritten.

Edit: Of course, this isn't exclusive to data races, this can happen with
anything that decides to save the slice for a while, without, say, defensively
copying it, when some other code decides to reuse it.

------
iand
The main mistake in this article is the assumption that b := a makes 'b' an
"alias" of 'a'. It doesn't. It creates a new slice that starts off pointing to
the same array as 'a'. Also slices are immutable so you can't change 'a'. If
you create a new slice by manipulating 'a' then the copy 'b' won't see
anything outside of its bounds.

------
tshadwell
This isn't about aliasing, as the specification notes "If the capacity of s is
not large enough to fit the additional values, append allocates a new,
sufficiently large underlying array that fits both the existing slice elements
and the additional values. Otherwise, append re-uses the underlying array."
[1].

I encourage anyone who writes Go to read the specification. I found it very
digestible and my code has improved a lot as a result. In this case, for
example I can do things like pre-allocate a slice of given and then append to
it several times without re-allocations which I sometimes find more semantic
when dealing with binary data -- and the reference to the slice doesn't need
to change.

[1]
[http://golang.org/ref/spec#Appending_and_copying_slices](http://golang.org/ref/spec#Appending_and_copying_slices)

~~~
jeffdavis
I used to think like that: learn all of the quirks and the internals behind
them, and you always have an explanation.

With postgres, I always had an explanation why it was doing so much work on
every update, and why replication was so hard, and why checksums were
impossible to implement efficiently, and why index only scans wouldn't work,
and...

After seeing that all of these things were fixed by people who didn't think
that way, I changed my mind.

This Go behavior is surprising behavior and non-deterministic. Changing a
constant can have some bizarre action at a distance that breaks your code. It
affects ordinary code even if the author doesn't care about these details. I
really don't see anything good about it even if you do understand it.

------
brianolson
The contract for append() has _always_ been that it might return the original
pointer or it might copy-and-extend into a new pointer. That you can keep a
reference of the previous version is programmer error for not having clear
ownership of that pointer. There should be one canonical context for it, being
a member of a struct or a local variable on the stack. Then you don't have to
worry about having two copies of it. If you need multiple threads accessing
the pointer, then you should have put a mutex on it.

~~~
twic
The point is more that it's a bad contract, isn't it?

If a slice is something which can't safely be copied, then why can it be
copied?

~~~
SamReidHughes
So that you can pass your slice to functions and actually make use of them.

That somebody could go along and start modifying data structures they aren't
supposed to is a problem in Java, C#, ... many languages.

------
TheLoneWolfling
Everyone here is talking about that it's it the specification.

If that's the case, and it does seem to be, then the problem is with the
specification. Inconsistent behavior is nasty at the best of times, and this
is no exception. I can easily see bugs coming out of this.

It's like C and undefined behavior. (Null checks in the linux kernel, anyone?)
Even the best developers get bitten by it occasionally. It's one more subtle
thing to remember, and everyone can only remember so much.

~~~
SamReidHughes
But this isn't surprising behavior and it's not edge case behavior either.
It's exactly what the docs say will happen, and you don't even need to read
the docs, it's exactly what you'd expect, given what the interface is and what
that implies about how the feature could possibly be implemented.

~~~
VanillaCafe
That something is "expected" (for some definition of expected) doesn't mean
the code is easy to reason about.

Initially, modifying "a" also modifies "b". Later, modifying "a" does not
modify "b". I can't imagine a piece of code that uses "a" or b" that wouldn't
care which of those two states the system is in.

And, knowing which of those two states code will be operating under will not
be clear by simply looking at an arbitrary piece of code. The fact that it is
well documented that transition may occur does not resolve this ambiguity.

~~~
SamReidHughes
The code in this example is very easy to reason about. Because it's a single
function.

If it's hard to reason about two different handles of shared data where one of
them is used to modify the shared data... don't do that. append is for
operating on a slice that your code is using exclusively. Like in this
example, where it's all in the same function and the behavior's predictable.

------
mmf
What is everyone ranting about?

Append is not a method of the type slice, it is a function that returns a
slice. What I read is people writing

a := make([]int,2,2)

b := a

a = f()

And then "mommyyyy a is not equal to b anymoreeeee".

Just sayin...

------
aksx
This is not an edge case.

effective go[0] clearly states "Slices hold references to an underlying array,
and if you assign one slice to another, both refer to the same array."

[0]
[https://golang.org/doc/effective_go.html#slices](https://golang.org/doc/effective_go.html#slices)

------
kylequest
Go does have immutable data structures... They are called strings :-) It's not
like Lisp or Scala, of course. Simply convert your slice to a string and you
are set.

The slice behavior can be a bit surprising to new Go devs if they just jump in
trying to write code. It's pretty common because Go looks simple and very
familiar.

You can find more traps for new Go devs in this post:
[http://devs.cloudimmunity.com/gotchas-and-common-mistakes-
in...](http://devs.cloudimmunity.com/gotchas-and-common-mistakes-in-go-
golang/)

------
wyager
Wasn't the whole point of Go that everything was simple and "just works"? I've
heard a lot of people here on HN say that they love go because you don't need
to keep track of weird quirks or implementation details. This seems to fly in
the face of that philosophy.

~~~
peferron
Go's philosophy strikes me more as "when many solutions are available, pick
the most practical one".

In this case, an alternative solution (among others) would be to always
allocate a new array when calling `append`. This solution would offer higher
predictability but lower performance.

Apparently the Go team decided that the current solution was the most
practical - good performance and a relatively low risk of biting developers.

That doesn't mean it's a perfect solution. But has anyone in this thread
suggested any solution that would be unequivocally better? Rust's approach for
example is much more solid but at the cost of increased complexity (at least
as a first glance - I'm just a beginner in Rust but the language already seems
much bigger than Go).

------
aric
If you wanted to do that:

    
    
       b := &a
       fmt.Println((*b)[0])

------
tete
I really don't want to talk badly about another language or so, but append
always returning a new slice and that being a simple statement about a
function seems way more reasonable than for example in Python 2 having 2/3
resulting in 1.

Both of these decisions were made for simplicity.

It's maybe also comparable with "new Number()" in JavaScript not being
primitive, but an Object which as a side effect is also call by reference.

I disagree with the statement that "it's in the spec, but that's bad" mainly
because a huge number bugs happen out of a lack of understanding in a language
and intuitive is usually that language X doesn't work like the first language
you really learned.

The whole concept of Go is to have a small, simple spec that you can actually
know. Compare Go's specification with other languages (maybe other than
Scheme).

Complex specifications often root in way more complex problems. A common
problem is for example C++ code, which already is complex being ported to
Java, where you end up having side effects from stuff like NUL-delimited
strings suddenly not being not delimited and so on.

I relatively frequently stumble across code that show a complete
misunderstanding of a language. It's really common and often leads to strange
work around that manifest the view of programmers not completely understanding
the language they are using. Removing hundreds of lines of work around code
and pointing out a mistake made usually looks quite impressive.

However I do not want to deny that specifications should try to avoid such
behavior. I do not deny the fact that this isn't optimal, but neither are so
many other things (see float). What I think is important is to actually
acknowledge that things are not optimal and therefor there should be ways to
get around them. append is really nice, because with that one bit of knowledge
you end up with one single simple rule on how to program.

The problem that sometimes occurs is that you have rules about a language that
go like "If you stumble across this problem the way to solve it depends on
...". So you actually have more side cases.

Also that simple append is maybe something that probably should be warned
about cause it is really, really likely an error. This is another reason I
think simple, general statements that have such side effects are good, or at
least better than needing to know a lot about a language.

What I want to point out with that is that I think that one should judge a
language after how it works out for one after one year of seriously working
with it on a day by day basis. I really don't think this will end up as a
practical issue. Finding something that is different in any language and might
be confusing for people not confident in a language is really easy, but can
also give you a completely wrong impression.

This is true for Python, Java, JavaScript, C++, ... On the other hand it
mostly happens when there is a hype about a language, which totally makes
sense. It's also got to point out that this new language is also not the
golden shot, because it stops especially inexperienced programmers from
following every new hype coming up.

