Hacker News new | past | comments | ask | show | jobs | submit login
Go(lang): Robust generic functions on slices (go.dev)
148 points by signa11 11 months ago | hide | past | favorite | 80 comments



I feel like I’m taking crazy pills. How are these APIs even remotely defensible?

    slices.Sort(s)          // fine
    slices.Compact(s)       // broken
    s  = slices.Compact(s)  // fine
    s := slices.Compact(s)  // broken (!!!)
    slices.Delete(s, …)     // broken
    s = slices.Delete(s, …) // fine
How is one intended to remember which functions require overwriting (due to invalidating) their input and which don’t? Why does the language make it so easy to render function parameters unusable upon return but impossible to enforce that they aren’t used afterward?

How on earth did it take twelve years for this “simple” language to make a function to delete elements from an array, with `s = append(s[:start], s[end:]...)` having to suffice until then? How on earth does the “better” API twelve years later have such a gaping footgun? This is a core type that quite literally every program uses. How have things gone so far off the rails that “setting the obsolete pointers to nil” is such an intractable problem for end users they had to add a new keyword to the language?

For other languages I see posts where weird language corner cases bring up challenging issues that really reinforce the idea that language design is hard. Rust—for example—has unsoundness issues in corner cases of the type system. But for go, it feels like there’s a constant stream of own goals on core areas of the language where the design should have knocked it out of the park. “Simple manipulation of arrays” just should not have this many footguns.


It's pretty simple really: Go's slice API was compromised from the start by making it the unholy hybrid of a list/vector and an actual slice, it's one of the original sins of the language. And it's not really fixable.

It must be said that even splitting them properly you'd have issues as long as you mix slices and mutability (e.g. take a full slice out of a vector then delete() on the vector, the slice will see the hole). Though the issues would be less prominent.


The way it is listed, it definitely looks problematic.

I had to dig a little and in fact, once we remember that a slice is a view into an array and that "some" of these methods return slices, it's actually fine.

The only issue is perhaps s:=slices.Compact() But that's not even because of the API. It's because of the short variable assignment that may allow shadowing, unfortunately.

The issue is probably overblown here.

To be even more thorough, I've had the pleasure (lol) to implement a wrapper to have some form of immutable slices so I can say that it is mitigable. Not pleasant but mitigable. (I needed to compare values of a slice stored in a map, value before retrieval from map vs. value when it had been inserted back into the map, so having aliasing was problematic since the value in the map would be updated at the same time the value retrieved was (obviously, duh!) , had to implement a copy-on-write variant).

:)


> once we remember that a slice is a view into an array and that "some" of these methods return slices, it's actually fine.

Only in the meme sense, it does not actually solve or help solve the problem in any way.


If the problem is that slices are somewhat complex, allowing aliasing sure.

But then it's a problem of understanding what slices are so it does help in practice.

I am more concerned by the difficulty of mutating a slice (deleting and appending element) while iterating over it for instance. Done it and that's more difficult, ceremonious.


> If the problem is that slices are somewhat complex, allowing aliasing sure.

No, the problem is what I put at the top, that slices are hopelessly confused, they have two completely different and incompatible purposes and Go does not let you differentiate between the two.

Understanding what slices are does not help, neither in theory nor in practice.


The way I see it, this is just lower level for mechanical sympathy and one is still free to implement a copy-on-write wrapper. Been there, done that.

The trick in understanding what they are is to understand that these are not vectors if I try to get closer to your semantics. Once it is viewed purely as a kind of reference type, a view in a backing array, it has only one meaning.

It's easier for a traditional functional language to implement lists and vectors perhaps because they operate on the premise of immutability first. Beware of the memory footprint though.

I admit that it might be easier to think in term of vectors. That's kind of what was done designing the wrapper.

Still, as I understand, slices are just lower-level. Immutability is a higher, language level concept.


Without defending this API, the easiest way to go about avoiding bugs when working with slice mutating functions is to consider all those "fine" scenarios as not fine.

Always assume that only the return value of slice mutating functions are the valid reference and the old one always invalid. This is not always completely accurate, but it is very useful in that, it is also never "wrong".


> Without defending this API, the easiest way to go about avoiding bugs when working with slice mutating functions is to consider all those "fine" scenarios as not fine. Always assume that only the return value of slice mutating functions are the valid reference and the old one always invalid.

The first "fine" scenario is fine because `slices.Sort` works in place, it doesn't return a value at all.

And the other "fine" versions do essentially what you advocate, by overwriting the invalid reference with the new one.


The nil return assignment is a compile time error, so easy to spot. It is not supposed to be the defence of the API or a silver bullet, just a practical rule that can save you some hustle.


By returning nil, the function makes it clear that it doesn't move the input reference


Yes, but now I have to keep track of which functions invalidate-and-return and which return nil. If I forget that `slices.Delete` returns a new slice instead of mutating in-place, the language doesn’t help me.


It's golang, it has a long and proud tradition of being footguns all the way down.


It’s the C tradition, continued.


C versus the systems languages since 1958 (JOVIAL, NEWP, ALGOL, PL/I...), Go versus the languages since Fortran until 2009.


Go would have undoubtedly been called D had the name not already been taken.


It is not like Go wasn't already taken.

https://github.com/golang/go/issues/9


To reiterate, absolutely.


Possibly that's mostly out of familiarity with the language? The only thing in your example that does things in-place (and thus looks out-of-place) is Sort(), but that's the way I'd at least expect it to work? If you take that away from the list all of them behave similarly to each other and return the modified slice:

    slices.Compact(s)       // modified slice in the return value is ignored
    s  = slices.Compact(s)  // s now points to the new changed slice
    s := slices.Compact(s)  // s already exists, this is not valid Go syntax.
    slices.Delete(s, …)     // modified slice in the return value is ignored
    s = slices.Delete(s, …) // s now points to the new changed slice

EDIT: Would prefer people not to downvote actual discussion. In this case there were was indeed good argument made on the reply that these also modify the underlying slice, but it's not like I was being rude in the comment.


> The only thing in your example that does things in-place (and thus looks out-of-place)

That is not really correct, and is much of the issue: Compact and Delete operate in-place on the backing buffer while copying the slice metadata. This is the same issue as the append() builtin and why it's so fraught (before you get in all the liveness stuff).

> s := slices.Compact(s) // s already exists, this is not valid Go syntax.

    s := []int{}
    if true {
        s := slices.Compact(s)
        _ = s
    }


Ah, my bad for not reading the article before the comments :)

As that's the case it's indeed hard to defend it. Data structure-wise it still kind of makes sense since as you mentioned the slice metadata is changed, but it basically making the old slice invalid is rather annoying.

For the := example sure, it's a bit far fetched example and likely would not pass any code review, but there are cases where shadowing is indeed valid. So is the `s := slices.Compact(s)` in this example not working as expected then?

EDIT: looking at another reply to the parent the := being broken likely is trying to point that using := also modifies the slice and thus the original slice is broken when one tries to use it in the original scope. That's really bad practice, but true indeed.


I recently wanted to split and iterate on a String (`char *`) in the Git project. I was not feeling good that it had come to this point since I'm not a C programmer. ChatGPT told me that I wanted `strtok`. So I tried that but then the compiler complained that it was banned (via a macro).[1]

Looking at the commit log I found out that `string_list.h` in the project was something that I could use. There's functions and initializers for duplicating and not duplicating the string. So I chose `STRING_LIST_INIT_DUP` and used `string_list_split` on `\n`.[2] Then I could use a regular for-loop since the struct has `nr` which is the count. And then it just worked.

I was pleasantly surprised that there was such a nice library for "list of strings" in C.[2] And the price you pay compared to more modern languages is an extra allocation (or one per sub-string?). And although I have only used it for that one thing I didn't run into any footguns.

Languages designed way after C (like in this century) should keep in mind what they want to give the user the ability to do: just make simple immutable types (like `String` in Java (of course there's `StringBuilder)), give immutable/mutable pairs, and also if they want to give access to no-allocation slices (like Rust). And if they go all the way with this they really need to give a large enough API surface so that all of these different concerns don't get mixed up. And so that users don't have to think about the underlying arrays all the time in order to not do something unintentional.

[1] Apparently `strtok` is not nice to use because it can change the string you pass to it or something. I didn't read a lot about it.

[2] The documentation tells you when you should be using dup/no-dup. And it makes sense to duplicate/allocate in this case, I think. Intuitively to me it seems that I'm paying for an allocation in order to not have to mess up per-char iteration at all. And that was fine for my use of it.

[3] And that the macro stopped me from going down the wrong path with `strtok`!


I haven’t used strtok in a long time but my recollection is that it mutates the original string by placing a NULL value at the next delimiter so “hello world” would become “hello<0x00>world” if splitting on spaces. This lets you loop with the same string passed to strtok until it is done.

It’s ugly. Would not recommend. 0/10


> slices.Compact(s) // broken

I don't quite get what you mean by 'broken' here. You know that the length of a slice can't be altered by passing it to a function. So clearly s will still contain the same number of elements as it did before the call to Compact. Similarly for Delete.

You can ignore the return value of the function and that might introduce a bug in your code. But that's something that could happen with all kinds of functions.


Don't you understand how slices work?


perhaps you should invoke bufio.ScanWords?


The problems with the API they point out are almost all things that rust's ownership system was built to solve.

Things like:

    slices.Sort(s) // correct
    slices.Compact(s) // incorrect
    slices.Delete(s, ...) // incorrect
    s := slices.Delete(s, ...) // incorrect if 's' is referenced again in the outer scope
    s = slices.Delete(s, ...) // correct
All of those are solved by having functions like 'slices.Sort' take a '&mut' reference in rust speak, and having 'slices.Compact' and 'Delete' take an owned slice, and return a new owned slice.


IMHO, all these comes from the million dollar mistake that Go made from the very beginning, with slices being a bastard between both an owned and non-owned data container. Indeed, any function accepting a slice may simply use it as an immutable view on whatever data it needs, or modify it, potentially re-alloc it, and wreak havoc by invalidating all previous references to it and/or its elements. And God forbid you passed a subslice to such a function, you're in for a nasty surprise.

Even without going the whole 9 yards of the Rust memory model, something like the views in C++, or the java {Array,Linked}Lists containing references to objects and thus being resistant to re-alloc, or even plainly immutable list like in OCaml are all working and simple enough solutions to the issue.

I still can't wrap my mind around how supposedly experienced people like Pike & co. may have designed slices, then went ‶yep, that makes perfect sense for a robust, basic, widely used data structure to create our language around″, outside of a few toy programs.


my feeling with slices is that go wanted to really make programmers mindful of the cost of allocating new space on array operations.

By having this weird API on slices, they force you to be explicit on allocating new memory.


None of that makes any sense. Slices don't force you to be explicit on allocating new memory in any way. you can literally do this:

    s := []int{}
    for i := range 29754 {
        s = append(s, i)
    }
do you see explicit allocation of new memory? As far as I'm concerned that's not in any way more explicit than e.g.

    var s = new List();
    foreach(var i in Enumerable.Range(0, 29754)) {
        s.Add(i);
    }


s = append(s,i) makes you realize something happened to the memory behind s. Otherwise you would simply call s.append(i)


The something that happens is that slices are not heap collections (unlike hashmap which say even less about allocations, but I'm sure you'll find an other excuse), so you can't even increment their length without returning a new copy.

I also fail to see how this would translate to

> make programmers mindful of the cost of allocating new space on array operations.

anyway. `append` is not taking in an allocator, reporting allocation failures, or reporting reallocations here. Like proper vectors, it's silently reallocating (and over-allocating) as needed.


> they force you to be explicit on allocating new memory.

Not at all. With e.g. `append` being a “maybe I'll allocate, maybe not, take a guess” kind of method, it's basically just like Java's `List.add` with extra steps and much worse egonomics.


You don't even need a notion of ownership. A distinction between an immutable and mutable slice should be enough, because an immutable slice can never be changed which implies that its excess capacity (if any) can't be exploited for optimization.


None of these functions would apply to an immutable slice, so how is it related?


If immutable and mutable slices are differently typed [2], it is natural to define two functions (say, `slices.Compact` vs. `slices.Compacted`) to handle each type, like Python `list.sort` vs. `sorted`. It should be natural to expect `slices.Compacted` to never alter its input, and any attempt to use a mutable version will be very explicit except for slicing [1].

[1] Especially given that the capacity is preserved by default, which contributes to the current confusion. See my older comment: https://news.ycombinator.com/item?id=39112735

[2] Originally "...are different" but edited for clarity.


That would be nice, sure, but it still doesn't fix the problem for mutable slices. Immutable slices are a distraction - this discussion is explicitly about how the mutable version is supposed to work. Unless you want to argue that mutable arrays/slices shouldn't exist at all, of course.


This would not allow the previous errors to be checked by the compiler since the main thing you're relying on is the name. Nothing prevents you to call deleted(mutable) and discard the result apart from the name.


Indeed. Though, Rust does have a way to mark a function such that any caller that implicitly discards its return value gets a compiler warning. This feature largely solves the problem you’re talking about. But it’s orthogonal to Rust’s borrowing system or mutable versus immutable distinction.

That said, I’d also point out that, while you can more or less replicate the Go example with Rust slices, in Rust it would be more idiomatic to pass around a Vec (or a mutable reference to a Vec) if a callee needs to do something like change the length. And you can’t resize a Vec if there are other references to its contents.


`#[must_use]`. I really don't know why Rust made that error.


I think it is notable that both `try!` (`?` today) and `#[must_use]` (originally restricted to `Result`, then made available in general later) appeared in the same release (0.10). In the other words, `#[must_use]` was strongly tied to `Result` back then. While we can put `#[must_use]` to any type now, the set of types that absolutely have to be `#[must_use]` remains relatively small, with a major addition being iterators and futures. Once they have been covered, any additional value from adding `#[must_use]` is not large enough to make it default, I think.


> While we can put `#[must_use]` to any type now, the set of types that absolutely have to be `#[must_use]` remains relatively small

Agreed, although conversely if we could start Rust development all over again I'd probably argue that must_use on functions (as opposed to on types) should probably be the default behavior. (These days it basically is the default behavior for functions in the standard library.) Though Rust didn't gain the ability to annotate functions with must_use until 1.27. Switching the default could perhaps be a good candidate for an Edition.


It's too late to propose new features for Edition 2024, and you would first need to get some attribute agreed which has the opposite effect, then write up how your proposal works and see if it's generally agreed. I doubt that but you should certainly try.


> Nothing prevents you to call deleted(mutable) and discard the result

The Go compiler generates an error when you are (silently) ignoring the return value of any function. Or, to put it in other words, every compiler which does allow to (silently) ignore the return value of a function, should not be used at all (C++ has at least `[[nodiscard]]` since 17 and C with C23 - which is "too little and too late", as always).


> The Go compiler generates an error when you are (silently) ignoring the return value of any function.

It does not. You can actually infer that from TFA listing cases as problematic which would be caught by such a compilation error, and confirming it by just compiling them:

    a := []int{}
    // compiler says nothing
    slices.Delete(a, 0, 0)
The builtins are special cased to error if their return value is ignored (except for copy and recover).

> C++ has at least `[[nodiscard]]` since 17 and C with C23 - which is "too little and too late", as always

You can't even mark your own functions or types as nodiscard in Go. You need third-party tooling even just to ensure you're not unwittingly ignoring error results:

    f, err := os.Create("/tmp/filename")
    if err == nil {
        return
    }
    // compiler doesn't say anything even though f.Close returns error
    f.Close()


Sorry, yes, you are of course right. That's the linter which complains, not the compiler.

I have to say that I don't understand the rationale of a compiler that errors on unused variables but lets the user silently ignore function return values. As a solution to explicitly ignore return values already exists in the language.


While that's a valid concern, it is an orthogoal issue as it can be similarly replicated in Rust as well. Rust references always track mutability but we can sidestep that by using `std::borrow::Cow`:

    fn compacted<T: ...>(input: Cow<'_, [T]>) -> Cow<'_, [T]> { ... }
Then it is clear that, for example, `compacted(vec![...].into());` as a statement will exhibit the same behavior because `Cow` doesn't have `#[must_use]`. Rust avoids this issue mainly by encouraging explicitly mutable or immutable values by default, and at this point the language should have substantially altered that Go can do the same.


must_use doesn’t have to be on a type, it can be applied to a function


This sounds awful in practice having to memorize different functions for the same thing based on mutability of the thing.


In practice it's not that bad


Programming is hard and one can not destroy complexity, only move it; other news at 12.


You'd need three types: immutable slice of an immutable array, mutable slice of a mutable array (ie basically a reference to a mutable array), and an immutable slice of a mutable array.


Long before that, the problems with the API are solved by having a separate heap type for length manipulations rather than confusing the interfaces.

If Delete or Compact are only available there and it’s modified in place, the problems don’t arise in the first place.


A missing tidbit that may help contextualize this post: One of the things about Go that surprised me is that if you have a slice which does not represent the full capacity of the underlying array, you can go ahead and reslice it up to that full capacity even though it's a panic to access the things you're reslicing directly: https://go.dev/play/p/oThz2bNFwgr

Consequently, the GC has to assume that anything forward of any given slice into the underlying array may become accessible in the future as there is legal Go that can access it. It's still memory safe, but it surprised me.

I had some code that was using my incorrect belief that slices could not be resliced up in size to implement some light security boundaries. Fortunately it was still legal, because the code in question simply didn't slice things larger and it's not like I was allowing arbitrary user-supplied code to run, so it was still correct in what it was doing. But I was expecting the runtime to scream if I did somehow screw it up when in fact it may not, depending on the exact capacity and what happened when.

It's also asymmetric, as far as I know; you can slice forward into the array if there is capacity, but if you've got a slice that starts after index 0 in the backing array you can't use that slice to walk back into the underlying array. That is, with

     s := []int{11, 12, 13, 14, 15}
     s = s[2:]
as far as I know, 11 and 12 are no longer accessible to any legal Go code (not using "unsafe") after that second line executes.

Corrections (again not involving "unsafe", it's obvious those two values are still accessible through "unsafe") welcome. I was wrong once, it's easy to believe I could be wrong again.


One detail in the latest 1.22 release is a change in the "slices" library [1] (the library is based on generics, introduced in 1.21) that helps avoid such cases:

> Functions that shrink the size of a slice (Delete, DeleteFunc, Compact, CompactFunc, and Replace) now zero the elements between the new length and the old length.

[1]: https://tip.golang.org/doc/go1.22#slices

[2]: https://pkg.go.dev/slices


You can prevent later longer reslicing by add an additional cap() element to the slicing to shrink the capacity.

  s = s[:3:3]
from your first example link.


In your playground example, if you print the capacity and the length before and after “re-extension”, it becomes clear what happened. In fact, accessing item 5 after reduction gives a size panic, where as accessing item 6 after re-extension gives you a capacity panic.

Understanding rsc’s “Go Slices” blog is very helpful here. Coming from Java or something, this exposure of underlying storage could be jarring, but coming from C, Go slices are basically built in fat arrays, and this behavior doesn’t surprise me. Maybe it was a design mistake to expose so much of the underlying machinery. Ymmv.


It is reinforcing my belief that language design is hard. Go is supposed to be a simple language, one may write it for long time, but there will be some trap you discover once in a while.


Language design is hard, but... Jesus Christ, Go really just threw away 50 years of CS language research/experience in the name of “simplicity”.


it's got all the latest features from the 1980s built right in


> ... the GC has to assume that anything forward of any given slice into the underlying array may become accessible in the future as there is legal Go that can access it.

This property complicates the semantics of slices.Delete() in an interesting way:

  It is out of scope of the current proposal to write anything to the original tail
https://github.com/golang/go/issues/63393


interesting catch, thanks for sharing. my poor memory is already racing thinking about cases where i may have left traps with the same assumption


> func Index[S ~[]E, E comparable](s S, v E) int {

After seeing this signature, I think that Go is giving up on it's simpleness principle.


Generics can be a bit of an eye sore, but go already had reflection & I recently was mucking around bigquery's internals full of `reflect` having to read through that, & it doesn't even get backed by a type checker

`func Index(s interface{}, v interface{}) int` both has to deal with incompatible types, & the function body is going to be anything but simple

(sure, without generics most people wouldn't even write that `func Index`, but ultimately there's plenty of libraries deep in reflect that'd greatly benefit from generics)

I've also been dealing with temporal.io using interface{} everywhere, got to a point where I wrote typed wrappers using generics to have type checking for signals


I was quite upset that they introduced generics. It is slow marching into C++ like look and feel and all the eyesore that it entails.


In C# I feel you needed generics because there was no resizable array without casting from object (the Pareto 80-20 use case) and later async and Task<T> but I don’t think these problems apply to Go so it could have done without it (maybe!). Non userspace generics may be where it is at to ease suffering in places.

As a language design though Elm remains remarkably simple with parametric polymorphism (aka Generics) but it needs other design choices to do so. Elm is the Go of Haskell :-)


what does the ~ do here anyway?


type Something []string

ensure that the underlying type is a slice


Why do you need a tilda for `S ~[]E`, but not for `E comparable`?

Both are type-constraining annotations, so why two syntaxes?


Because when you define a "type Foo []int" you're definig a new type Foo that is not the same type as []int.

So if the type parameter said "[]E" where E is a comparable then "Foo" wouldn't be allowed because Foo is not the same type as "[]int" (for that you'd need a type alias i.e "type Foo = []int". This is by design since it allows you to define new types you don't want to get accidentally used where you don't want them to.

When defining library functions that are truly generic you may want instead to let them be used on the "underlying type". For example, Sort library authors decided that generally you want it to just work on the underlying type and that's why they added ~ in front of the generic type. Otherwise every invocation of Sort you'd need to convert the type e.g. "slices.Sort([]int(myfoo))".

The other type parameter "E comparable" doesn't need the tilde because "comparable" is not an actual type but a type constraint.

A type constraint is an interface that defines the set of permissible type arguments for the respective type parameter and controls the operations supported by values of that type parameter.

You don't need a tilde because any type that implements that constraint will be accepted. Also any type implementing a constraint that embeds that constraint also works


It is like S: where S = []E where E: comperable.

I like the way Rust has it a lot more but it is still imperfect and feels arbitrary in ways.

I find myself unsure where to put the templatization parameters and it doesn't feel perfectly symmetrical.

Unsure if impl MyType<T> where T: X

is comperable to impl MyType<T: X> Sometimes it needs impl<T> MyType<T> where T: X


Agreed, the Rust syntax is no golden standard either - but at least it's regular; this one really weirds me out.


Guessing that E is the element, so is not a slice (or more accurately we don’t care if it is a slice or not!)


They shot themselves in the foot when they forced themselves to add generics based on bogus surveys.


Anything reusable must be generic; that’s a cost of static typing.


Why were the surveys bogus?


Because they disagreed and thus everyone else is wrong, apparently.


From the article:

> Several new functions (Insert, Replace, Delete, etc.) modify the slice. To understand how they work, and how to properly use them, we need to examine the underlying structure of slices. [My emphasis]

What this is telling us, although perhaps the author doesn't appreciate, is that this isn't an abstraction. If it was an abstraction we don't need to "examine the underlying structure of slices" in order to properly use them, we can just treat them as opaque. But in Go that's not an option, intimate knowledge of the inner workings is crucial to correct programming in Go.

And that's not necessarily a problem for its main authors, who have anyway got such an understanding, but it should have given pause to somebody who thought about how you can effectively teach others to use this language. For Go in particular that's a huge problem because it's intended as a language for software engineering and for that purpose you do want to rely heavily on abstraction.


I loved C++ for years, but the more I learned about it the less I wanted to deal with it anymore. The past year or two, Go increasingly tries to repeat that experience. Should have just stuck with Common Lisp I guess.


The ideal toolkit for building accidentally quadratic code.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: