Hacker News new | past | comments | ask | show | jobs | submit login
On Swift’s array semantics (yinwang0.wordpress.com)
89 points by joeyespo on July 13, 2014 | hide | past | web | favorite | 63 comments



According to the Swift language guide Strings, Arrays, and Dictionaries are all copied on assignment to a variable. So wouldn't the author's primary assumption (that let a and var b refer to the same array) be false? And this whole issue is null?

https://developer.apple.com/library/prerelease/mac/documenta...

Assignment and Copy Behavior for Strings, Arrays, and Dictionaries

Swift’s String, Array, and Dictionary types are implemented as structures. This means that strings, arrays, and dictionaries are copied when they are assigned to a new constant or variable, or when they are passed to a function or method.

This behavior is different from NSString, NSArray, and NSDictionary in Foundation, which are implemented as classes, not structures. NSString, NSArray, and NSDictionary instances are always assigned and passed around as a reference to an existing instance, rather than as a copy.

NOTE The description above refers to the “copying” of strings, arrays, and dictionaries. The behavior you see in your code will always be as if a copy took place. However, Swift only performs an actual copy behind the scenes when it is absolutely necessary to do so. Swift manages all value copying to ensure optimal performance, and you should not avoid assignment to try to preempt this optimization.


> According to the Swift language guide Strings, Arrays, and Dictionaries are all copied on assignment to a variable.

This might've been a recent change in docs. Quoting part of the same section from my copy of the Swift book downloaded a while ago (emphasis mine):

‘The assignment and copy behavior for Swift’s Array type is more complex than for its Dictionary type…

‘If you assign an Array instance to a constant or variable, or pass an Array instance as an argument to a function or method call, the contents of the array are not copied at the point that the assignment or call takes place. Instead, both arrays share the same sequence of element values. When you modify an element value through one array, the result is observable through the other.

‘For arrays, copying only takes place when you perform an action that has the potential to modify the length of the array.’

As an aside, it would be nice if Apple offered some kind of change log or a “what's changed” page for Swift docs (though to my knowledge they had made no promises regarding spec stability yet).


Revision history for the main Swift book:

https://developer.apple.com/library/prerelease/ios/documenta...

Note that you have to delete the book in iBooks and download it again to get the new version.


Agree. Note that the quoted text is from the docs that were released with beta3 and is a big (and welcome) change from the original docs.


No. As stated in the article, the array was not copied when it was assigned to var b as changes made to it via b also showed up in a.


Which was a bug in beta 2 that was corrected in beta 3. So the behavior in beta 3 matches the design quoted above. Hence kipple is correct.


It was a design bug not a implementation bug.

Beta 2 behaviour matched the docs at that time but there was plenty of pushback about the behaviour in radars, Apple dev forums[0] and on the wider web[1].

Chris Lattner announced[0] the planned change on the forum 19th June.

[0] https://devforums.apple.com/message/989944#989944

[1] http://blog.human-friendly.com/swift-arrays-too-swift-and-fl...


Thank you for the clarification.


I wish I could upvote this a hundred times; rarely have I seen so well-laid out a post on such a subtle topic. Just as I would form a thought like, "the author needs to explicitly break this down as an issue of variables vs values", they did so.

Part of the difficulty here (and part of why others are having a hard time understanding it) is that immutable arrays and mutable arrays are really two different types, and unlike many other types (which are inherently one or the other), you really want both. A Java analogy is that there are two different string types: String, which is immutable, and StringBuilder, which is mutable[0]; but this is a property of the value, not of the variable. (Other languages have similar string classes, and Swift might well also.) For C++ folks, the distinction between mutability of variable and mutability of value is shown in the placement of "const", i.e. "const foo ∗" vs "foo const ∗" (which of course leads us to "const foo const ∗", and the whole thing is a syntactic mess, but anyway), and object values which are mutable can behave differently from values which aren't even when the underlying type is the same. In your own code in any language, of course, you can design a class so that its objects are immutable simply by not giving the user any access to set instance variables; and if you wanted both mutable and immutable versions, even without particular language support, you could make a MutableFoo and an ImmutableFoo that are copy-constructible from each other.

What may confuse the issue here is the idea that we might want brief syntax for both kinds of array. But whatever the language designers decide, the OP is exactly right to point out that they really, really need to think this through and divorce mutability of values from mutability of variables.

[0] Technically three, because StringBuffer is also there, but that's not relevant to this discussion.


Regarding c++, const foo * is the same type as foo const * .

For consistency, one should postfix the const to the thing you want to be immutable, so foo const * is a pointer to an immutable foo, foo * const is an immutable pointer to a mutable foo, and foo const * const is an immutable pointer to an immutable foo. It works pretty well, as far as I can see.


Argh, yes, you're exactly right, and now I'm too late to edit my post. I hate it when I get that backwards; sorry about that.


rarely have I seen so well-laid out a post on such a subtle topic

I'll have to disagree. I thought the author was quite muddled in their thinking on values vs variables. One of the biggest misunderstandings in the article, and the reason why the proposed "solution" is worthless, is that all the examples discussed are of the form:

   let a = …
   var b = a
while in reality a much more common "assignment" situation is that an array gets passed as a procedure argument, which has consistent behavior in both the older and the newer Swift array semantics, but would turn into a mess with the proposed "fix".


> ...but would turn into a mess with the proposed "fix".

Can you explain why?


If we have, using the proposed syntax:

   var a = [1,2,3]
   var b = [: 2,3,4 :]

   func arrayConsumer(arg: [Int]) {}
Under the proposed semantics, arrayConsumer(a) would make arg an immutable variable referencing a mutable array (So assigning to arg[x] would be permitted), while arrayConsumer(b) would make arg an immutable variable referencing an immutable array (So assigning to arg[x] would be forbidden).


The proposed solution is not just a matter of syntax but of types: a and b would not be compatible. So a would have type [Int] and b would have type [:Int:]. You simply couldn't pass b into arrayConsumer without explicitly noting the change:

    arrayConsumer(b.immutableCopy())
Similarly, if you had a function that expected a mutable array, you would not be able to pass a in without making a mutable copy of it.

    func mutableArrayConsumer(arg : [:Int:]) { ... }
    var mutableA = a.copyMutable()
    mutableArrayConsumer(mutableA)
(You want to keep a reference to the mutable copy since a function that explicitly takes in a mutable array probably does something relevant to it.)

This solves the problem neatly and statically without any implicit copying or conversions to trip you up.


Yes, you could propagate the mutability to argument types (your syntax is the opposite of the one proposed by the OP, not that this matters to your argument), but that’s a lot of syntactic epicycles, all to deal with the top level of one collection type. Why is it so important to have this for Arrays, and not for Hashes, or for objects, for that matter?

The Swift view, as I understand it, is that there are some types ( “struct” instances and scalars) passed with value semantics, and some types (“class” instances) passed with reference semantics, and Arrays were reclassified from being the latter to being the former. This seems to me a sensible way of targeting the common case (passing arrays in Cocoa is quite common, but passing mutable arrays is extremely rare) without burdening the language with a full set of C++ style const designations.


This is an excellent post.

A meta comment (not on Swift): the number of new languages with C-like syntax and nothing revolutionary semantically is baffling. Do we really need to master yet another C-like syntax to do X where a library (or an existing language) would have sufficed? Is this the new way to create technological walls?


Why does the syntax matter at all? What would be the difference if Swift had exactly the same features and semantics but a different syntax?


Are you kidding? Swift has a TON of revolutionary stuff over C, including algebraic data types, optionals, destructuring pattern matching, and a lot more. Rust too. There are a ton of new "C-like" languages that are light years beyond C99.

Edit: Not to mention automated memory management!


I meant C-like syntax. The laundry list of features you mentioned is present in lots of existing mature languages (not C).

I don't understand the need to create something rickety without significant innovation (over any relevant language (e.g. SML, Haskell, Racket) and not just C).


Just to expand on one language, Rust, it possesses semantics that most other languages don't have. The presented problem with Swift would be impossible. As soon as you make the mutable variable, ownership of the object is transferred, and it becomes a compiler error to use the original variable after that. This is backed by the type system, encompassing both mutability and lifetimes, so it isn't a runtime enforcing the lifetime rules.

The more I learn about Rust, the more I realise how similar it is to Haskell. Even the error handling and IO uses monad-like constructions.


You complained about new C-like languages (C-like in syntax) not being any better than C. I mentioned some ways that new C-like languages (C-like in syntax) are much better than C. Now you're complaining that other languages already had those features. I don't contest that. Doesn't change the fact that your first comment was misguided.


You misunderstood my first comment. I meant new C-like-in-syntax have nothing revolutionary semantically across the board. This is a double blow as each new C-like syntax is difficult to master (compared to Lispy languages) and contributes nothing semantically new.


> You misunderstood my first comment. I meant new C-like-in-syntax have nothing revolutionary semantically across the board.

Are you saying that the new features in Swift/Rust/C#/etc. aren't revolutionary? Or that they're not "across the board"? Or that they're not semantic? Which part do you disagree with?

> This is a double blow as each new C-like syntax ... contributes nothing semantically new.

This is demonstrably false. Do you mean that you don't think the new semantics are revolutionary enough? Or "across the board" enough?

Either way I have to disagree with you. Automated memory management is a semantic game changer, and applies nearly everywhere. Same goes for type inference, and optionals. ADTs are huge too. If your argument is that these are not important semantic improvements over C, then IMO you could not be more wrong.


Why do you seem so bent on misunderstanding me? I am not talking about semantic improvements over just C!

E.g,

> Automated memory management is a semantic game changer

That has been there in other languages for decades.

By "across the board," I meant across all languages not just C-like languages.


Ah! That wasn't clear to me from your wording. I think I was especially confused because that means your argument seemingly boils down to "only the first language to have a feature is notable." (And I guess also that C-like languages are hard to learn?) But all languages stand on the soldiers of giants. I don't look down on Haskell simply because other languages had its features first.


Agreed. I am not looking down on Swift. It seems like all the human capital could go into making existing languages more awesome.


Haha...*shoulders


Because programmers in general don't like non-C syntax.


I agree.

Python, for example, has a list type (mutable), and a tuple type (immutable). Swift could simply adopt the same thing and use `(1, 2, 3)` for immutable arrays and `[1, 2, 3]` for mutable arrays. It would break backwards compatibility but it is a beta.

Every other language (Ruby, Python, PHP, Javascript, for example) that uses `[]` as syntax for arrays always uses it as a mutable array type. I don't see why Swift shouldn't do the same, and provide a separate type and possibly new syntax for the immutable version.


The silly thing is they already have NSArray (immutable) and NSMutableArray (mutable) on the Objective-C side. So you'd expect anybody working on putting this stuff together to be familiar with this. I guess the four possibilities in Objective-C are:

    NSArray *const p;
    NSMutableArray *const p;
    NSArray *p;
    NSMutableArray *p;
("const NSArray *p" and so on makes no sense, as I understand it - the Objective side of Objective-C being not big on const.)


It may handle some of the subtly of assignment and references but there is a lack of understanding of Swift. If it is a general debate about the meaning of assignment it isn't just about arrays but structs and dictionaries too.

If you want to opt in to a shareable mutable array it is easy to implement it by wrapping it in an object (class based) and passing that around.


I'd probably prefer that var b = a would just generate an error. You would have to explicitly copy or clone an immutable object in order to create the variable.


I'm glad I'm not the only one thinking this. What's the point of Swift being "safer" if this isn't an error (or at least a warning). The standard library even includes -[mutableCopy], so it's a established convention.


agreed; this would be a nice exception and I'm sure in a few places the compiler could pick this up to add warnings. Alternatively get it to throw an exception if the array is modified. Much better than some [[]] notation which just makes me think of multidimensional arrays


This is certainly my favorite too. Implicitness and sugar are only worth their weight when their semantics are obviously illusory. That's not the case with this copy-on-assign business.


I don't understand what semantics are illusory? Swift semantics for value types (including structs, arrays dictionaries and strings) are copy on assignment (return and argument passing). The fact that most of the copies are optimised out doesn't actually change the semantics.


If that's always made clear then it's probably fine. I haven't played around in Swift enough to know it to be true.

I'm used to the memory-free semantics where equality equivalates a mathematical variable with a value. Assignment doesn't have a place here and it works very well with immutable values. Only mutability needs the notion of assignment so that you have boxes to subject to mutation.


If I understand you correctly Swift does that. Only objects (defined as classes) are mutable reference types as far as semantics go.

Declaring constants with let has the effect you are talking about for equality. If you want to you can stick to that.

Where you declare a variable to contain a struct, dictionary or array with var you can mutate it but it only affects your copy so it is equivalent to creating a new instance with the updated vale and assigning it back to the same variable.

With the copy on write behaviour if there are no other variables or constants pointing to the original instance it can skip the copy for efficiency without affecting the semantics.


Yeah that makes perfect sense. The notion I'm getting at is where you say "declare a variable to contain". If you have only immutable objects then you can dispense with the idea of containment and use only equivalence. As an equivalent semantics you can have containment and copy-on-write everywhere. Under immutability they are the same.

The trouble is that if you try to have both "name-equivalence" and "containment" and mutable things. At this point, copy-on-assign almost works uniformly. Except unless something like

    var a = x
    var b = a
does not refer to three separate containers, only at most two if x were immutable. At that point I feel like the explicit copy-on-assign might be better.


I'm not sure how the behaviour isn't uniform (unless you are getting on to classes).

Taking your example assuming x has been declared let x = [1,2,3] then the code you show will behave exactly as if the array had been copied but underneath it would actually be sharing a single bit of memory and have skipped/postponed the copies.

If the next line was b[0] = 5 then a copy would be made so x and a would still be [1,2,3] and b would be a separate array [5,2,3].

The behaviour will be exactly the same as copy on assign. You might be able to identify that it hasn't copied with the identity operator === but why would you need to know?


I'm assuming this is true

    var a = [1,2,3]
    var b = a
    b[0] = 10]

    > a
    [10,2,3]
and so copy-on-assign only applies when assigning an immutable reference to a mutable one.


No a has the value [1,2,3] still in that scenario. Copy on assign semantics is for constants and mutable variables.

Just tested in the playground:

  var a = [1,2,3]
  var b = a
  a == b        // true
  a === b       // true
  b[0] = 10
  a             // [1,2,3]
  b             // [10,2,3]
  b[0] = 1
  a == b        // true
  a === b       // false
  a             // [1,2,3]
  b             // [1,2,3]


Ah! Cool! My original point was always supposing that held. Being that it doesn't then I agree with Swift being consistent here.

Observing sharing using (===) is still dangerous, but presumably that can be saved by wrapping it in sufficient warning.


I wouldn't recommend using === in production (except for objects of classes) either but for this demonstration it let me show the copy on write without going to assembly.

Glad we've reached a common understanding, I wasn't quite sure that I hadn't missed something.


I don't see anything wrong with the beta3 semantics. I see it as problematic that (=) is so casually overloaded between "assign" and "equivalate".

> Now aliasing an array will cause the array to be copied. Isn’t that weird?

But assignment to a mutable reference is just that, assignment not equivalation. It is not merely aliasing. We do not desire the ability to freely replace "b" with "a" so why assume it holds?

Sure this might vary from programmer's mental models, but that is the fault of the programmer after a point. I expect all hammers to work roughly the same, yes, but I expect a laser cutter to work under very different principles. I must learn them.


I disagree with this post. The semantics are clear and simple: if you assign,maps to a function or return from a function any array you can treat it as a copy of the array. As a performance optimisation the copy usually won't happen as it won't be needed. If you are dealing with large enough arrays that it matters wrap the array in an object that can be passed by reference.

I strongly opposed to original semantics and blogged about it. I am happy with the replacement.


Much too late to edit. I should have said "...assign, pass to a function..."


Then what is the point of having let vs var at all? I think the solution they came up with is the correct one given the design of swift and its use of let and var keywords.


let v var has to do with mutable bindings, value mutability within the bindings should be a separate concern.

Rust makes that exact difference[-1]: `mut` can be used either at the binding level

    let mut a: ∫
or at the type level:

    let b: &mut int = &mut 1;
Both are references (~pointers) to an int, but `a` is a mutable reference, it can be changed to point to a new value[0]:

    a = &2;
because b is an immutable reference it can't point to a new value and

    b = &mut 2;
is a compilation error:

     error: re-assignment of immutable variable `b`
however b's value can be modified in place:

    *b += 1;
whereas trying that with a:

    *a += 1;
is an error:

    test.rs:5:5: 5:7 error: cannot assign to immutable dereference of `&`-pointer `*a`
`a` is a mutable reference to an immutable value, `b` is an immutable reference to a mutable value.

[-1] and so does C though it has the opposite defaults, you can have an

    int const *
— a mutable pointer to an immutable int (you can reassign the pointer but not change the pointee) — and an

    int * const
— an immutable pointer to a mutable int (you can't reassign the pointer but you can alter the pointee)

[0] the code does not work as-is because lifetimes, but close enough


I didn't appreciate that Rust makes that distinction, that really is quite elegant.

By way of comparison (to Rust and the original article), here is Scala:

    scala> val a = scala.collection.immutable.Set("apple", "orange")
    a: scala.collection.immutable.Set[String] = Set(apple, orange)

    scala> a = Set()
    <console>:8: error: reassignment to val
           a = Set()
             ^

    scala> val b = scala.collection.mutable.Set("cherry", "plum")
    b: scala.collection.mutable.Set[String] = Set(cherry, plum)

    scala> var c = b
    c: scala.collection.mutable.Set[String] = Set(cherry, plum)

    scala> b += "pear"
    res6: b.type = Set(pear, cherry, plum)

    scala> c
    res7: scala.collection.mutable.Set[String] = Set(pear, cherry, plum)

    scala> c = a
    <console>:10: error: type mismatch;
     found   : scala.collection.immutable.Set[String]
     required: scala.collection.mutable.Set[String]
           c = a
               ^
In other words: var versus val has to do with mutable bindings; mutable bindings allow you to mutate the value but not the type; object mutability within the bindings is governed by the object's own type (as the original author recommended for Swift).


The different types of mutability are also quite important for the concept of "borrowing" in Rust, which is simply taking a reference. The compiler ensures that ownership of the variable isn't transferred while any such reference is active, and only one mutable reference (of a mutable type) can exist at any one time. This means that bugs like that shown for Swift are impossible in Rust, because the language ensures no two things can ever modify it at the same time.


let/var solve a different problem than the question of whether the array itself is mutable. Look back at the OP, and in particular at the last example section, which lays out the behaviour of various combinations of let/var with mutable/immutable arrays. Each of these behaviours is desirable in different circumstances, and more importantly, by separating the concerns here it will make the language semantics easier to understand and apply to new examples.


I rather like the copy-on-write semantics of Beta3. And if the OP wants the Swift team to consider his/her comments, they should file a Radar like everyone else.


The syntax proposed has problems. Many languages allow you to declare nested data structures and a natural way to express an array of arrays in swift would be the following:

  let a = [[1,2,3],[6,7,8]]


Can you give an example for which the proposed syntax would be problematic? Your example would be written as:

    let a = [: [: 1, 2, 3 :], [: 6, 7, 8 :] :]
or

    let a = [ [: 1, 2, 3 :], [: 6, 7, 8 :] ]
etc., depending on what mutability you want.


This is not a good solution. This adds complexity (syntactically and conceptually) without adding any feature. The immutability is already implied by the 'let' and 'var'.

Additionally, specifying that the RValue "[1,2]" has mutability is meaningless. The expression "[1,2]" is represented as a constant until it is stored in a variable (it should be represented in the MACH-O constant data section). It is the variable which has the concept of mutability.

The proposal is similar to proposing the following declaration in C:

"int i = const 1;"

Which is meaningless, because the '1' can never be not-const


> The immutability is already implied by the 'let' and 'var'.

The immutability of the binding and that of the value are different concerns. let and var are about the binding's mutability, trying to munge them together only ends up in tears.

> The proposal is similar to proposing the following declaration in C:

No. The proposal is similar to the following C:

    int const * i;


The proposed syntax creates ambiguity between immutable arrays and mutable arrays containing only a single mutable array.


A solvable ambiguity (nested C++ template parameters, anyone?) and besides, it was just an example. I don't think the author really meant to "propose new syntax", but rather say "this is a real problem that needs solving and I'm sure the Swift team can come up with syntax for it if they want it to be succinct".


A similar but more subtle problem happened to Java’s arrays regarding aliasing and subtyping a long time ago, which could crash the JVM.

Really? I don't remember that, and i was writing Java a long time ago. I'd be interested to be reminded about this bug.


The actual article title is currently: "Design mistakes in Swift language’s array"


We didn't change it. I assume the rewording was joeyespo's attempt to avoid misleading or linkbait content in the title. As the HN guidelines point out, that's fine.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: