

Ruby: shallow copy surprise! - aarongough
http://thingsaaronmade.com/blog/ruby-shallow-copy-surprise.html

======
tptacek
This is a _lot_ of text and boilerplate for a very, very simple idea:

    
    
        >> x = [1,2,3]
        => [1, 2, 3]
        >> def go(z);   z[1] = 0;   end
        => nil
        >> go(x)
        => 0
        >> x
        => [1, 0, 3]
    

Mutable objects in Ruby are passed by reference, not value.

~~~
aarongough
It ended up being longer than I had planned.. The main reason being that I
wanted to demonstrate the fact that Object.clone doesn't work as I would have
anticipated, and that I wanted to demonstrate a possible solution.

Greg Brown was kind enough to steer me onto the right track though, this is
mainly a design problem. I should be avoiding copying wherever possible.

------
phaedrus
The Io language has a similar gotcha - when I cloned one of my prototypes that
contained a List member, I was surprised to find that all the clones shared
the same List! It actually makes sense - it was the reference to the list that
was copied, but all the references pointed to the same list.

The problem with always defaulting to deep copy in a language where all object
slots are by reference is "where do you stop?" Do you copy all the objects
known to that object? If the object holds a data file or a resource, do you
deep copy that too? What about object graphs? What about two objects mutually
holding references to each other? What if the object holds a reference to the
global application object?

So the most common way to do it is default to only a shallow copy. It's up to
the user to define a deep copy if they need it because only the user knows
what members are semantically "part of" the parent object and what are
"pointers" to unowned objects.

------
tjarratt
I ran into the exact same problem when trying to solve some (not very
complicated in retrospect) errors in a body of text. Shallow copying a string
that I was applying permutations to and storing in an array meant that every
single string in the array was the same, so I could never have more than one
permutation without pulling some hacks like this.

I would be very interested in a cleaner variant on the marshal load hack for
non-primitives, or even some interesting doc/writeup on how this works.

------
swivelmaster
Are you kidding me? The "solution" is to serialize and deserialize? That's a
incredible waste.

edit: I thought it was not uncommon for higher level languages to pass arrays
and objects by reference, so this post wasn't particularly new or interesting.
Unless you're coming from PHP, which is, IMO, a nightmare because everything
is passed by value (by default) except objects.

~~~
aarongough
I have actually come to Ruby from PHP, maybe that's where I got my incorrect
assumptions from, it definitely wouldn't be the first bad habit PHP has left
me with.

------
aarongough
I'm actually looking for feedback on this. I know there are quite a few Ruby
hackers hanging round HN.

Is there something I'm missing here? Some idiom that lets you side-step this
problem? I'm questioning it because this problem/solution seems very much at-
odds with the elegance and thoroughness of the rest of the language.

~~~
jcoglan
You're missing a few details, especially around "primitive" data types.
There's not really any such thing in Ruby - everything is an object, but some
objects - numbers, booleans, nil - are immutable.

When you copy an object, all you do is copy its set of instance variables,
which are just references to other objects. For an array, the instance
variables are its set of indexes, which again are just references. Copying an
array just means making a new list of references, but the objects they point
to remain unmodified and uncopied.

Consider:

<pre> array = ["foo"] copy = array.dup </pre>

array and copy are independently mutable - modifying the index in one does not
affect the indexes in the other - but they still both contain references to
the single string "foo". Thus:

<pre> copy.first.gsub! /foo/, "bar" </pre>

modifies the string referenced by copy, which is the same string referenced by
array. So array becomes ["bar"].

If you want a true deep copy, do something like this:

<pre> def deep_copy(object) case object when Array object.map { |item|
deep_copy(item) } when Hash object.inject({}) do |hash, (key,value)|
hash[deep_copy(key)] = deep_copy(value) hash end # handle other data
structures if need be else object.respond_to?(:dup) ? object.dup : object end
end </pre>

~~~
jcoglan
Apologies for the formatting screw-up. The deep_copy function is here:

<http://gist.github.com/407741>

~~~
aarongough
No worries, thanks for taking the time to write it out. I'm glad that I wrote
the post (and that I'm getting hammered a little for my assumptions) because
making mistakes is probably the only way I'm going to get a deeper
understanding of the language...

One thing that confused the issue a little for me is the fact that some
objects in Ruby are actually only really 'pretend objects'. ie:

    
    
      >> test = 4
      => 4
      >> test2 = 4
      => 4
      >> test.object_id
      => 9
      >> test2.object_id
      => 9
    

I don't know enough about the deeper parts of the language to know what else
there is that's like this though...

~~~
bschaefer
This is a performance optimization for common (read: integer) numbers:
<http://ruby-doc.org/core/classes/Fixnum.html>

    
    
      >> ((1 << 30) - 1).class
      => Fixnum
      >> ((1 << 30)).class
      => Bignum
    
      >> ((1 << 30) - 1).object_id
      => 2147483647
      >> ((1 << 30) - 1).object_id
      => 2147483647
    
      >> (1<<30).object_id
      => 166070
      >> (1<<30).object_id
      => 161200

------
mr_justin
See Object#clone and Object#dup

<http://ruby-doc.org/core/classes/Object.html#M000351> <http://ruby-
doc.org/core/classes/Object.html#M000352>

~~~
aarongough
The thing is that neither of those actually solve the problem. They were the
first thing that I tried...

~~~
mr_justin
Sorry, I wasn't trying to say they solved the problem. They say exactly what
they do and don't do.

------
klochner
He referenced not being alone in his assumptions, but this is basic ruby:

    
    
       >> a = b = [1,2]
       => [1, 2]
       >> c = a.dup
       => [1, 2]
       >> a==b
       => true
       >> a==c
       => true
       >> a.equal?(b)
       => true
       >> a.equal?(c)
       => false
    

Here's a better post on the topic:
[http://kentreis.wordpress.com/2007/02/08/identity-and-
equali...](http://kentreis.wordpress.com/2007/02/08/identity-and-equality-in-
ruby-and-smalltalk/)

~~~
aarongough
Well in fairness, I did mention that the realization made me feel stupid :-p

It's not necessarily obvious if you're coming from other languages that don't
behave this way. That being said I'm surprised that I had never run into this
problem before. I think that most of the time I had the right idea with not
copying objects, but in this case I had memoized a method call and the Hash
'cache' was getting corrupted which was what brought it to my attention... A
slightly more unusual situation.

------
crazydiamond
This is not a surprise. I have the same issues with Strings too. A String is
passed around your app and someone changes it - capitalizes/chomps etc. The
String is changed throughout the app ! You have to dup() it, if you want to
ensure no one changes it.

This means if you have classes returning Strings, such as first_name,
last_name, address etc, your getter should return a dup() if you want to
ensure no accidental change to it. That sucks, if you ask me.

~~~
kunley
capitalize, chomp are non-destructive contrary to their "bang" counterparts:
capitalize!, chomp!

Lots of methods are non-destructive and it's cleaner to use them instead of
artificially calling dup().

~~~
crazydiamond
I cannot remember the exact cases, but the point is that you have an API on
one hand, and the user of an API on the other.

The API returns strings to you, the user at some point needs to (say) perform
multiple operations on that String. Say, multiple gsubs. So rather than create
a new string with each, he uses a gsub!.

I've actually once had a discussion about this on ruby-forum when i faced this
issue. We talked of a copy-on-write string. But i did not want to change my
entire application.

It is inefficient for the API to keep returning dup()'ed strings. otoh, if the
user accidentally changes the string (which she can), your API can throw an
error or malfunction.

------
jheriko
Just to reinforce the other comments - this is pretty much expected behaviour
in high level languages.

Always read the docs.

------
aarongough
I actually just had a chat via email with Greg Brown (author of Ruby Best
Practices). I updated the blog post to get to the point quicker and I included
his take on the matter...

------
swannodette
As a side note this one of those fundamental language problems that Clojure
solves without sacrificing performance.

------
jcapote
Great read, although I've never been bitten by this particular "bug".

~~~
bad_user
It's not a bug ... the references are copied by value (don't know what the big
deal is ... it's the same thing happening in Java), the cloning done on the
basic collections is shallow (again, same thing happening in many other
languages) and the basic types like Fixnum are immutable.

When learning a new language, after playing around with code-snippets I then
usually read the language's reference.

