
Python: copying a list the right way - joeyespo
http://henry.precheur.org/python/copy_list
======
nikcub
Before your jump into your code, grep it and change every instance of [:] into
list or copy know that it isn't that easy. Most Python projects will use all
four common variations of copying a list or object. Here they are with
benchmark times[1]:

    
    
        b = a[:]           0.039ms
        b = list(a)        0.085ms
        b = copy(a)        0.187ms
        b = deepcopy(a)   10.592ms 
    

First method for short lists, eg. function or system args, where you know you
have a list. The Python manual suggests this method when copying a sequence as
the fastest/best method[2]

The type constructor list will convert any sequence into a list and will
preserve order. If you pass it a list, all it does is return the sequence
using the slice operator anyway[3]. It is slower because of the type checking,
but it is implemented in C. So you can think of list() as just [:] with a type
cast - no need to call it again if you know you have a list.

copy and deepcopy are implemented in python, and are generic functions that
attempt to sniff the type of the object to be copied. They will use the
__copy__ magic[4] of the object if it exists, so you can override it in your
objects with return self[:]. You need to use these if you have a generator, a
list of non-basic types (such as lists of lists, or lists of tuples, or lists
of objects). Both functions use a module-level cache and deepcopy will iterate
and apply copy

there is very little performance degradation by aliasing copy to deepcopy and
using it everywhere, although it could save you time by catching bugs. (Edit:
scratch that, I got my benchmark wrong - deepcopy will still be slow even if
you pass it a shallow list, see comment below, thanks tedunangst)

Read the source of copy and deepcopy so you can understand them and can
implement your own custom version for more advanced types. Find the file:

    
    
        >>> import copy
        >>> copy.__file__
        /usr/local/python/2.6/lib/copy.pyc
    

Each of these methods has its own use case, if you grep through a well
implemented project such as Werkzeug[5] you can find how each is used
efficiently. For eg. [:] is used when you know you have a list, such as
template variables. list() is used to force into a list, eg. before these vars
get to other objects and copy() is used on custom data types, making a copy of
environ (which can contain almost anything) and in copying the routing table
(which can not be trusted to be a list).

[1] Benchmark times taken from:
[http://stackoverflow.com/questions/2612802/how-to-clone-a-
li...](http://stackoverflow.com/questions/2612802/how-to-clone-a-list-in-
python/2612990#2612990) which I had bookmarked as a reference

[2] [http://docs.python.org/faq/programming.html#how-do-i-copy-
an...](http://docs.python.org/faq/programming.html#how-do-i-copy-an-object-in-
python)

[3] [http://docs.python.org/faq/programming.html#how-do-i-
convert...](http://docs.python.org/faq/programming.html#how-do-i-convert-
between-tuples-and-lists)

[4] <http://www.brpreiss.com/books/opus7/html/page85.html>

[5] <https://github.com/mitsuhiko/werkzeug>

~~~
tedunangst
I'm puzzled by your comment that there's very little degradation using
deepcopy everywhere. Your numbers demonstrate quite the opposite.

~~~
nikcub
Thanks for noticing - I got my numbers completely off. When I ran the
benchmark on my machine it turns out it was still using the original copy.

One way to catch deepcopy bugs might be to create an autocopy function which
can detect if it is a 'shallow' object and use copy, or if not use deepcopy.

I am going to try and write an implementation that doesn't slow it down too
much. It might be worthwhile since copy bugs are so common in Python projects.

~~~
lubutu
I wonder whether it would be possible to optimise the Python interpreter to
make deep copies copy-on-write. I suppose that would involve a lot of work for
relatively little gain.

~~~
nikcub
I remember that being mentioned in a PEP somewhere but it never got
implemented. It might be worth implementing copy in C with copy-on-write to
bring some of those benchmark numbers down.

------
mgrouchy
If you are copying many lists, or copying many large lists you still want to
use the slice method ( new_list = old_list[:] ) as it is faster then list().
It is also the fastest method of copying lists if you consider copy.copy() and
copy.deepcopy() as well.

The only caveat here is if you are copying a list of lists, you have to use
copy.deepcopy() if you want the lists inside your lists to actually be copied.

~~~
mrmekon
Tested copy(), [:], and list():

    
    
      >>> t1 = timeit.Timer('copy.copy(orig)','import copy;import random;orig = [random.randint(0,255) for r in xrange(100000)];')
      >>> t2 = timeit.Timer('orig[:]','import copy;import random;orig = [random.randint(0,255) for r in xrange(100000)];')
      >>> t3 = timeit.Timer('list(orig)','import copy;import random;orig = [random.randint(0,255) for r in xrange(100000)];')
    
      >>> print t1.timeit(10000)/10000
      0.00036607401371
      >>> print t2.timeit(10000)/10000
      0.000416543197632
      >>> print t3.timeit(10000)/10000
      0.000372415995598
    

Probably not the world's most ideal testing here. I have no idea if/how Python
caches, for instance. But slice notation continually comes out the slowest in
this crummy example.

~~~
mgrouchy
Not at my computer right now to whip up a simple benchmark(phone), but there
is a pretty exhaustive benchmark here:
[http://stackoverflow.com/questions/2612802/how-to-clone-a-
li...](http://stackoverflow.com/questions/2612802/how-to-clone-a-list-in-
python)

~~~
mrmekon
With smaller lists, the results in my test change:

    
    
      >>> t1 = timeit.Timer('copy.copy(orig)','import copy;import random;orig = [random.randint(0,255) for r in xrange(10)];')
      >>> t2 = timeit.Timer('orig[:]','import copy;import random;orig = [random.randint(0,255) for r in xrange(10)];')
      >>> t3 = timeit.Timer('list(orig)','import copy;import random;orig = [random.randint(0,255) for r in xrange(10)];')
    
      >>> print t1.timeit(10000)
      0.0183310508728
      >>> print t2.timeit(10000)
      0.00397896766663
      >>> print t3.timeit(10000)
      0.00760293006897
    

Which is similar to those results.

This implies more setup cost for copy() and list(), but after that they're
faster.

~~~
hagy
If not for noise, list() should always outperform copy() as copy() just calls
list() internally (specifically type(l)(l)), and also incurs the cost of
several interrupted function calls.

Also, the minor difference in slice vs. list() for large lists are likely
platform dependent and highly sensitive to the details of branch prediction
and cache.

------
jrockway
_a[:] feels a bit too much like Perl._

The irony being that in Perl, a list copy looks like this:

    
    
       @dest = @src

~~~
jonathansizz
Yes. And why is there this widely-held obsession that people who don't know
your language should be able to tell what your code does (i.e.,
'readability')? When will this ever be relevant in the real world?

I don't know many non-Python programmers who like to sit down of an evening,
fire up their e-reader, and peruse a few hundred lines of beautiful Python
code.

~~~
dextorious
"""And why is there this widely-held obsession that people who don't know your
language should be able to tell what your code does (i.e., 'readability')?
When will this ever be relevant in the real world?"""

Em, you got it wrong. Readability is not about people "not knowing your
language". It's about people knowing your language and having to read your
code at a later point.

The problem with a language with poor readability is that it is hard to read
even _your own_ code written in it, because the syntax is ambiguous and funky
and it involves a large mental overhead.

~~~
bartl
Well I don't think that in

    
    
        b = list(a)
    

it's clear in any way that the purpose of "list" is to make a copy.

~~~
dextorious
No, but it's clear that it returns a list out of "a".

~~~
philh
My guess upon seeing list(a) for the first time would have been that it
returns [a], which is worse than not knowing what it does.

Possibly this is because I've known lisp longer than python.

~~~
lubutu
I agree. I write Python all the time, and I would have expected something like
list(*a).

------
raymondh
I also like list(s) for its clarity and universality (the same technique works
with dict(m) and deque(d) for example).

The s[:] gets called faster (builtin syntax dispatches directly) than list(s)
which requires a global lookup. One called though, they both run the same
underlying code and are therefore equally fast when it comes to the actual
copying.

In Python 3.3, we're adding list.copy() and list.clear() because so many
people were having issues with the [:] notation for copying and clearing.

------
phugoid
Unless you actually want to turn some other type of iterator into a list, why
wouldn't you use python's generic copy?

    
    
      from copy import copy
      b = copy(a)
    

This has the advantage of being advertised as a shallow copy, so the intent of
the operation and its effects are clear and well-documented.

I've been working with Python for a year now, and I don't claim to have swum
its depths.

~~~
reacocard
Because copy won't necessarily work the same way on things that are not lists,
which removes some of the flexibility afforded by python's duck typing. Unless
you have good reason to require a list specifically, why should you?

~~~
tuxcanfly
But don't the alternatives also depend on list slicing? Heck, the title is
`coping a list`. If you have a different datatype, it is the business of that
type to specify how to copy using __copy__, isn't it?

~~~
reacocard
Imagine a case like this:

    
    
      def example(input_data):
        l = list(input_data)
        # code that uses list-specific stuff and returns a result
    

The function 'example' doesn't want to touch the original input data, so it
needs to make a copy of it. It also contains code that assumes operation on a
list, so the copied value needs to support list-like operators. If you assume
input_data is a list, you can use [:] or copy() to copy just fine, but if
input_data is NOT a list then you cannot feed example a generator or some
other list-like object or iterable and know for sure that it is going to work.
By explicitly converting to list, you can take anything that implements
__iter__, and then safely assume that the rest of your code will be working
with lists. This adds a pretty bit of extra flexibility to the function and
can make it much easier and/or cleaner to use.

Obviously as with anything the choice of list copy method is situation-
dependent. Using [:] makes sense if you can guarantee the input is a list and
you need maximal speed. Using copy() makes sense if you just want a copy of
the input object and don't specifically care that the copy is itself a list.
Using list() makes sense if you want to be able to take in all kinds of input
values and be assured that the copy is a list. Use what is best for the
situation at hand.

------
perlgeek
As somebody who doesn't know python, I find a = list(b) to be not very
intuitive. b is a list already, why call list() on it?

So I don't think you gain any readability over the slicing syntax.

~~~
ajanuary
Copy constructors are a much more common idiom in programming than taking a
slice of an entire list.

~~~
ajanuary
For instance, Java, .NET, ruby, C++ STL and Objective C. Most languages use
them.

[http://download.oracle.com/javase/1.4.2/docs/api/java/util/A...](http://download.oracle.com/javase/1.4.2/docs/api/java/util/ArrayList.html#ArrayList\(java.util.Collection\))

<http://msdn.microsoft.com/en-us/library/fkbw11z0.aspx>

<http://www.ruby-doc.org/core-1.9.3/Array.html>

<http://www.cplusplus.com/reference/stl/list/list/>

[http://developer.apple.com/library/mac/#documentation/Cocoa/...](http://developer.apple.com/library/mac/#documentation/Cocoa/Reference/Foundation/Classes/NSArray_Class/NSArray.html)

------
pmiller2
So, his argument is that copying using

    
    
        a = b[:]
    

is cryptic? I'd say it follows directly from how slice notation works. If you
omit the first index, the slice goes to the beginning of the list. If you omit
the second index, the slice goes to the end of the list. So, if you omit both
indices, everything works as you'd expect it to _if you understand slice
notation_. Anyone working with lists in Python should understand slice
notation, so I fail to see the problem here.

~~~
DrewG
His argument is that Python beginners won't be familiar with slice syntax, and
if they encounter it in someone else's code they won't understand.

~~~
pmiller2
They need to learn it, then. Honestly, anyone who read and comprehended my
first post now understands slice notation (with the exception of the more
rarely used stride argument), so it's not like it's a ton of effort. Unless
the post author is claiming that slice notation itself is cryptic, I don't
even understand where he's coming from.

~~~
ceol
I happen to agree that it's more intuitive to use list() instead of slice
notation, but back when I had no idea about slice notation and first saw it,
it caused me to wonder what else could be done with it. After some reading, I
learned a lot more about different slice tricks than I would have if I were
just presented with list().

------
zobzu
I'm always wondering why people have to write 3 pages instead of getting to
the point. You know like, just a list.

\- a = b <= doesn't copy! its just a reference

\- a = list(b) <= works!

\- <other methods if you like>

------
sibsibsib
using list() in this manner feels a bit weird to me.

If you want to be truly explicit, why not use the built in copy module?

eg:

    
    
        from copy import copy
        b = copy(a)   #or deepcopy(), depending on your needs
    
    

*edit: I should note that the docs for copy suggest using the slice operator.

------
irahul

        b = a[:]
    

This happens to be fastest, and language recommended way to copy lists.

I really don't buy the argument about languages being intuitive to people who
don't know the language. Languages should be optimized for people who know the
language. I think Matz has mentioned it somewhere regarding "principle of
least surprise" and Ruby.

~~~
jablan
Knowledge of a language is not a Boolean value. I use Ruby on daily basis, and
occasionally read (and to a lesser extent write) Python, and I knew what
"list(a)" does instantly, and I could only guess about "a[:]". Fortunately,
both Python and Ruby has very few such gotchas, that's why I love them both.

------
swanson
This technique is also useful for iterating over a list and conditionally
removing items. If you don't slice it you are changing the list as your
iterate over it and you get unexpected results (like skipping items).

Or you could just use list comprehensions :)

------
sygeek
Despite being a non-programmer, I found this article really user-friendly.

------
Luyt
I went through my code and discovered that I seldom copy lists. I _do_ often
make selections from lists in a new list, i.e:

    
    
        b = [x for x in a if somecondition(x)]

------
ricardobeat
This is analogous to javascript:

    
    
        b = a.slice()
    

vs

    
    
        b = Array.apply(null, a)
    

(the second is fastest and arguably clearer)

------
dangoldin
Something else that's interesting is reversing a list but not in place:
x[::-1]

The built-in reverse() method will do it in place.

~~~
sdevlin
There is also the top level reversed() function that takes some sequence and
returns a reversed iterator.

------
chrisledet
What's wrong with `b = a + []`

~~~
keeperofdakeys
You are depending on an undefined side-effect, specifically that adding two
lists returns a new list. This is a very bad habit, especially if you move to
a different language that doesn't have such behaviour. When you are
programming, you should be writing what you _want_ to do, not depending on
side-effects to do it for you. People reading your code (including yourself in
the future) would have to spend a lot more time working out what the code does
otherwise. The only real exception to this is C on embedded hardware, where
you really need to use lots of these tricks.

What if you wrote `b = b + []`, some languages might just append to b, and not
create a new list (python seems to create a new string). Slices can still be
seen as having the same problem. Really you should be using the Copy module or
the List constructor, which have the implicit guarantee of a new list.

~~~
jrockway
This argument is silly. Python is not some other programming language, so when
you're writing Python, it doesn't matter what other programming languages do.
If you use the same idioms in every programming language that you use, you're
writing bad code in every programming language that you use. So don't do that.

The reason why list(x) is better than x + [] is because list(x) works
regardless of what type of iterable x is. x + [] only works on lists.

~~~
keeperofdakeys
x + [] or x[:] still has a readability issue, to anyone who hasn't done much
python before, it isn't immediately obvious that you want to create a _new_
list. For personal scripts, this may not matter too much, but for code that is
seen by other people, you may be inhibiting their understanding.

A good analogy is probably assuming that pointer sizes are the same as int
sizes in C. This assumption was safe for many years, but broke when 64-bit
came along. Slices and adding lists will probably always return new lists in
python, but it is still good not to depend on such behaviour.

~~~
jrockway
It's silly for you to dumb down code for people new to the language. If you
treat newbies like they're dumb, they'll never learn how to write "real" code.
The only way they'll do that is by reading real idiomatic code.

 _Slices and adding lists will probably always return new lists in python, but
it is still good not to depend on such behaviour._

Not buying this. The behavior is documented and Python has a deprecation cycle
for changes in documented behavior.

The reason to write list(x) is because that's the generic way to turn anything
into a list. It's not for being future proof or being easy for newbies to
understand. It's because that's the right way to do it.

------
kzrdude
A much more interesting line of python is

b[:] = a # copy a into b, while b keeps its identity.

------
hwiechers
list(x) is also used for converting a sequence x to a list. x[:] is actually
more explicit because it only means 'copy x'.

------
MostAwesomeDude
So, uh, why are people doing so much list copying? It's not something that
occurs often in idiomatic code. I understand that it's something to be aware
of, but it's just not something that is required often.

~~~
xxbondsxx
Calling methods like pop(), insert(), and remove() on a list actually affects
the contents of a list rather than returning a _copy_ of the list with
everything removed. For example:

all = range(10) allbut2 = all.remove(2)

Actually removes 2 from all as well. Hence, you have to copy lists a lot if
you are doing a lot of list creation or change from a master list.

~~~
d0mine
list.remove() returns None. So it is a mistake to bind its return value to
allbut2.

It follows the convention that methods that modify their object inplace should
return None. list.pop() is an obvious exception.

You create a new list via list comprehension instead of copying and then
removing:

    
    
      even = [i for i in L if i % 2 == 0] # remove odd numbers

~~~
tedunangst
I believe he meant:

    
    
        all = range()
        allbut2 = all
        allbut2.remove(2)
    

which is something that catches people all the time.

------
guruparan18
I knew white space means a lot in Python (Disc: newbee and learning Python
still!). Just found this, and thought someone out there might be able to throw
some more light.

    
    
      Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
      Type "copyright", "credits" or "license()" for more information.
      >>> a=[1,2,3]
      >>> b=a
      >>> b
      [1, 2, 3]
      >>> a=[1]
      >>> b
      [1, 2, 3]
      >>> id(a)
      18704984
      >>> id(b)
      18721728
      >>> c = [4, 5, 6]
      >>> d = c
      >>> id(c)
      10755928
      >>> id(d)
      10755928
      >>> print c, d
      [4, 5, 6] [4, 5, 6]
      >>> c = [7]
      >>> print c, d
      [7] [4, 5, 6]
      >>> e = c
      >>> c.append(8)
      >>> print c, d, e
      [7, 8] [4, 5, 6] [7, 8]
      >>>

~~~
keeperofdakeys
c = [4,5,6] is creating a new list, and assigning it to c; not modifying the
list that c pointed to. The '=' operator simply associates an object reference
to a variable. Going d=c is copying the reference to an object from one
variable to another.

