
Python Lists vs. Tuples - joshbaptiste
http://nedbatchelder.com//blog/201608/lists_vs_tuples.html
======
colanderman
So, I've long since argued the importance of what Ned terms the "cultural
difference", or the difference between dynamic homogeneous structures
(lists/relational columns), and static heterogeneous structures
(tuples/relational rows). There are important performance ramifications
regarding the use of one or the other, and conflating the two (i.e. dynamic
heterogeneous) is a strong design smell. Some languages "get it", others
don't.

Python is _not_ a language that "gets it", despite the article's examples.
Python chooses the much-less-useful distinction between "immutable sequence"
and "mutable sequence"; the lack of any other distinction is borne out by the
standard library (as dozzie points out here [1]). Furthermore, the issue of
immutable vs. mutable is _totally orthogonal_ to that of dynamic/static and
homogeneous/heterogeneous. There's no reason that "cultural" lists need be
mutable, or "cultural" tuples need be immutable. So Python's choice doesn't
make sense from this angle either.

For example, OCaml has both immutable lists and immutable tuples; its type
system enforces exactly the "cultural difference". Similarly with C, which has
both mutable lists (arrays) and mutable tuples (structs), both of which can be
rendered immutable (const). Why Python chose lists (and only lists!) to have
both immutable and mutable flavors, with otherwise identical semantics, is
beyond me.

Erlang used to "get it", but they lost the thread when they came out with
heterogeneous syntax support for a homogeneous structure (map). TLA+ totally
conflates tuples with lists, but its type system is rich enough to enforce the
difference (and as a bonus, allows for the elusive static homogeneous
structure).

[1]
[https://news.ycombinator.com/item?id=12323849](https://news.ycombinator.com/item?id=12323849)

~~~
autokad
"here's no reason that "cultural" lists need be mutable, or "cultural" tuples
need be immutable."

er, yes there is. if a list is not mutable, doing list.append(something) in a
loop would re-initialize the list every loop iteration. the same reason you
shouldn't add to a string in a loop, as strings are imitable.

to me, an immutable list is not a list.

~~~
colanderman
Have you used a functional language? (Lisp, OCaml, Erlang, etc.) Lists are
immutable, yet appending items is O(1). That is achievable with even a basic
singly-linked list implementation.

~~~
catnaroek
Lists aren't immutable in Lisp, though. More precisely, Lisp doesn't have
lists: it has chains of mutable cons cells.

~~~
ScottBurson
They're not _technically_ immutable, but _culturally_ Lisp programmers are not
in the habit of mutating lists unless they know where and for what purpose the
list was created. Mutating some list whose provenance you don't know -- maybe
you got it by calling some library function that doesn't explicitly promise to
return a freshly consed list -- is a bad idea, and experienced Lisp
programmers know this instinctively.

So in practice you can often get away with pretending the lists you create are
immutable, if that's what you want to do.

~~~
catnaroek
Culture isn't something I can take into account when reasoning about what a
piece of code means. The actual semantics of the programming language in
question is all I really have.

~~~
ScottBurson
I doubt that. I think it would take too long to work out the meaning of every
piece of code you see from first principles. I think any competent programmer
has a large collection of unconscious heuristics they use to understand code.
Learning to write code in such a way that the reader's heuristics will not
mislead them is, I think, a big part of becoming a software engineer.

I was just trying to describe the state of practice in the Lisp world. I do
agree that Lisp benefits from more functional data structures -- that's why I
wrote FSet [0]! I would point out, though, that since Lisp lacks any kind of
access control, there's nothing besides common sense and convention that
prevents FSet client code from mucking about inside the guts of the FSet data
structures. It would be a stupid thing to do, and everybody knows that, but
the language doesn't prevent it. So the situation is not that different with
FSet collections than it is with lists of unknown provenance.

Even in languages that nominally support access control, like C++ and Java,
you can usually still get around it with casting or reflection. Most people
know that they should play games like that only in very specific
circumstances.

[0] [https://github.com/slburson/fset](https://github.com/slburson/fset)

~~~
catnaroek
> I think it would take too long to work out the meaning of every piece of
> code you see from first principles.

Just as in mathematics, once you've proven something, you can reuse the result
as often as you want. But “the community always does this” isn't a proof.

> I do agree that Lisp benefits from more functional data structures (...)
> there's nothing besides common sense and convention that prevents FSet
> client code from mucking about inside the guts of the FSet data structures.

Then Lisp doesn't have functional data structures. That's no surprise, Lisp
has no more compound values than Python or Ruby do - that is, none.

> Even in languages that nominally support access control, like C++ and Java,
> you can usually still get around it with casting or reflection.

Standard ML actually has type abstraction, and, once you've made something
abstract, there's no way clients can inspect it. No downcasts, no reflection,
no anything - abstract means abstract.

~~~
ScottBurson
> Then Lisp doesn't have functional data structures. That's no surprise, Lisp
> has no more compound values than Python or Ruby do - that is, none.

Ah, the arrogance of youth.

~~~
catnaroek
It isn't arrogance, it's just the semantics of the programming languages in
question. Programming languages being such fundamental tools, you would think
every programmer should understand them well, but that is far from being the
case. It is tempting to blame programmers, but the real culprit is the
subtlety and depth of the topic. Two great resources to get started are
[https://www.amazon.com/dp/0262201755/](https://www.amazon.com/dp/0262201755/)
and
[https://www.amazon.com/dp/0262690764/](https://www.amazon.com/dp/0262690764/)

Anyway. A value is something you can bind to a variable. Unfortunately, you
can't bind _the_ list [1,2,3] (or whatever syntax you might prefer) to a
variable in either Python or Lisp, because those languages simply don't have
lists (or any other kind of compound value).

~~~
ScottBurson
I read back through your recent comments. I notice you get downvoted a lot.
Does this bother you?

~~~
catnaroek
I don't particularly mind.

------
pixelmonkey
Great article!

One more important difference between tuple and list that's worth knowing
about: syntax. Ned makes it seem like tuples have the syntax "(x, y)", but
they actually have the syntax "x, y".

I find that a lot of Python programmers, even long-time ones, don't realize
this important difference. It's what allows one to write code constructions
like "x, y = 1, 2" or "return 1, 2".

One needs the parentheses only to "clarify" this syntax in certain cases; for
example, as dict keys, the parens are necessary -- "x = {(1, 2): 2}". Although
interestingly, for dict lookup, they are not -- "x[1, 2] == 2".

One way I remind myself of this rule is to realize that "x = 1," is the same
as "x = (1,)", but "x = (1)" is actually the same as "x = 1". This is quite
different from lists, who are syntactically defined by the "[]" brackets, and
thus you require them, or a "list()" call, whenever you make lists, making it
just that much more heavyweight as a data structure.

~~~
thatswrong0
When I used to write python "x = 1," bit me _way_ more than it should have.
And I don't see why you wouldn't want to be explicit about your tuple
construction every time..

~~~
thansharp
I usually use x = tuple((1,) ) so that non-Python programmers on the team know
what they're dealing with.

------
ferrari8608
It's much simpler to me. Tuples are used when you need a sequence but don't
need to modify the contents, for instance returning multiple objects from a
function call. Lists are used when you want to add things to it.

Namedtuples are wonderful! It's like a class with only attributes, except you
don't have to define a new class and get slapped across the back of your head
by more experienced programmers for creating a useless class. Access objects
by index, by name, or by iteration. It's very versatile.

~~~
pablobaz
If you like namedtuples, you'll love attrs.
[https://pypi.python.org/pypi/attrs/16.0.0](https://pypi.python.org/pypi/attrs/16.0.0)

~~~
rch
I like namedtuples, but attrs looks _so_ wrong to me.

------
dozzie
> The Cultural Difference is about how lists and tuples are actually used:
> lists are used where you have a homogenous sequence of unknown length;
> tuples are used where you know the number of elements in advance because the
> position of the element is semantically significant.

Oh, if only it were so simple! Python itself doesn't understand what the heck
tuples are, in that the built-in functions often treat tuples as simply
"immutable lists". See isinstance() and issubclass() functions, which expect
what's meant to be a list of classes, but you need to pass a tuple. *args is
another such example, as the article mentions.

If what Python calls tuples were the thing with structure instead of order,
they wouldn't be iterable.

------
jackweirdy
There is an additional technical difference: Tuples are hashable and can be
used as keys in a dictionary.

    
    
        grid[(1,1)] = True

~~~
vbit
The article mentions this.

Also you can write this as `grid[1, 1] = True`

------
ratsbane
Interesting article. I've always heard that about lists and tuples, but this
article made me wonder _how much_ less efficient with storage is a list.

The following trivial example shows that, at least for very small lists and
tuples, except for the fixed overhead of 16 bytes for a tuple vs. 72 bytes for
a list they're the same [edit: no it doesn't. I wasn't looking at the diff
between a 1-element and 2-element tuple]:

Perhaps there's more to it?

    
    
      >>> from sys import getsizeof
      >>> list = [94]
      >>> getsizeof(list)
      80
      >>> list = [94, 37]
      >>> getsizeof(list)
      88
      >>> list = [94, 37, 21]
      >>> getsizeof(list)
      96
      >>> list = [94, 37, 21, 19]
      >>> getsizeof(list)
      104
      >>> tuple = (94)
      >>> getsizeof(tuple)
      24
      >>> tuple = (94, 37)
      >>> getsizeof(tuple)
      72
      >>> tuple = (94, 37, 21)
      >>> getsizeof(tuple)
      80
      >>> tuple = (94, 37, 21, 19)
      >>> getsizeof(tuple)
      88

~~~
ratsbane
Another (admittedly trivial) experiment with speed and storage: Iterating
through a 10,000 element list was about 2% slower than iterating through a
10,000 element tuple, and the list was 16 bytes bigger than the tuple:

    
    
      #!/usr/bin/python
    
      from sys import getsizeof
      from random import sample
      from time import time
    
      l = sample(xrange(10000), 10000)
      t = tuple(l)
    
      print "The tuple's size is " + str(getsizeof(t)) + " and the list's size is " + str(getsizeof(l))
    
      s = time()
      z = 0
      for x in l:
        z += x
    
      print "Iterating through a 10,000 element list took " + str(time() - s) + " seconds."
    
      s = time()
      z = 0
      for x in t:
        z += x
    
      print "Iterating through a 10,000 element tuple took " + str(time() - s) + " seconds."
    
    
      doug@supermicro:~$ ./x
      The tuple's size is 80056 and the list's size is 80072 
      Iterating through a 10,000 element list took 0.00589299201965 seconds.
      Iterating through a 10,000 element tuple took 0.00538778305054 seconds.
    
      doug@supermicro:~$ ./x
      The tuple's size is 80056 and the list's size is 80072
      Iterating through a 10,000 element list took 0.00553607940674 seconds.
      Iterating through a 10,000 element tuple took 0.00536203384399 seconds.

~~~
psyklic
This code seems inconclusive -- my first three runs showed that lists are the
same, slower, AND faster! (updated for Python 3.5.2 on Windows)

    
    
            C:\Temp>python listtuple.py
    	The tuple's size is 80048 and the list's size is 80064
    	Iterating through a 10,000 element list took 0.0005006790161132812 seconds.
    	Iterating through a 10,000 element tuple took 0.0005006790161132812 seconds.
    
            C:\Temp>python listtuple.py
    	The tuple's size is 80048 and the list's size is 80064
    	Iterating through a 10,000 element list took 0.0020074844360351562 seconds.
    	Iterating through a 10,000 element tuple took 0.001031637191772461 seconds.
    
            C:\Temp>python listtuple.py
    	The tuple's size is 80048 and the list's size is 80064
    	Iterating through a 10,000 element list took 0.0 seconds.
    	Iterating through a 10,000 element tuple took 0.001001596450805664 seconds.

~~~
ratsbane
Interesting! The previous example I posted was Python 2.7.12 on Ubuntu
16.04.01. Here is OSX 10.11.5 w/Python 2.7.10 (also inconclusive with respect
to speed, but OSX was half the memory):

    
    
      mb:~ doug$ ./listtuple 
      The tuple's size is 40028 and the list's size is 40036
      Iterating through a 10,000 element list took 0.00113916397095 seconds.
      Iterating through a 10,000 element tuple took 0.00111603736877 seconds.
      mb:~ doug$ ./listtuple 
      The tuple's size is 40028 and the list's size is 40036
      Iterating through a 10,000 element list took 0.00126791000366 seconds.
      Iterating through a 10,000 element tuple took 0.00133585929871 seconds.
      mb:~ doug$ ./listtuple 
      The tuple's size is 40028 and the list's size is 40036
      Iterating through a 10,000 element list took 0.00114607810974 seconds.
      Iterating through a 10,000 element tuple took 0.00117492675781 seconds.

------
euske
When I taught Python, the way I explained was that a tuple is basically one
value, shrinkwrapped. It's one value so that you can put anywhere that expects
a value (such as a return value or a key of a dictionary). Also it's one value
so you cannot change it.

The way you look at a statement like (a,b) = (2,3) is that this is a normal
assignment, where you imagine there are one value flowing in the equal sign,
from right to left. But it happens to contain two values within its wrap.

------
ScottBurson
It's really too bad that functional data structures weren't much of a thing
yet when Guido started developing Python. The language really cries out for
them -- particularly with the "mutable initial value" problem and the
'set'/'frozenset' distinction.

~~~
jholman
Can you elaborate on how either of those, uh, "features" would be helped by
functional "persistent" data structures?

In the case of the mutable initial value problem, that's just a choice of
semantics (in my opinion, a very bad choice). Without having looked at the
implementation, I would assume that if you have no backwards-compatibility
concerns, a fix would take a few minutes.

In the case of sets/frozensets, most Python programmers _want_ their sets to
be mutable, in a syntax-sugar-y way that's incompatible with using functional
persistent data structures.

~~~
ScottBurson
Isn't it obvious? If the initial value is an empty _functional_ list, set, or
map (dict), there's no possibility of it being modified. The problem
disappears.

> most Python programmers _want_ their sets to be mutable

They're _accustomed_ to mutable sets because those are all they've known. But
ask a Clojure programmer how they feel about functional collections, and
you'll be told they're fine. Functional collections are actually _easier_ to
use because you don't have to worry about unintended aliasing.

And functional behavior is _not strange at all_. No programming language I
have ever used (and it's a long list) has used shared, mutable semantics for
numbers. Consider, in C:

    
    
      int a = 3;
      int b = a;
      b++;
    

After this code sequence, 'a' is still 3, of course! _Nobody_ with programming
experience expects 'a' to be incremented at the same time 'b' is; it's not
even something you stop to think about -- it's completely automatic. This is
true even though I wrote 'b++' rather than 'b = b + 1' \-- 'b++' could more
easily be misread as a mutating operation on a shared integer object, like
's.add(x)'. However, there's no language I'm aware of in which that's what it
means. (You can introduce aliasing intentionally in C++ with a reference, but
this requires an explicit '&' in the code.)

------
justanotherbody
The explanation is alright, but would have really benefited from a brief
overview of tuple use in the C API.

~~~
nedbat
The C API seems like a completely separate topic.

~~~
justanotherbody
C and Python APIs of CPython are really just difference faces of the same
thing. Due to the internal use of tuples some of the Python behavior is
governed by related C behavior.

For example:

> Another conflict between the Technical and the Cultural: there are places in
> Python itself where a tuple is used when a list makes more sense. When you
> define a function with _args, args is passed to you as a tuple, even though
> the position of the values isn 't significant, at least as far as Python
> knows

This is the C behavior of tuple usage being exposed to the Python side. If one
explores the C API it becomes clear that an implementation detail of C is
being exposed to Python.

> The Cultural Difference is about how lists and tuples are actually used:
> lists are used where you have a homogenous sequence of unknown length;
> tuples are used where you know the number of elements in advance because the
> position of the element is semantically significant.

Here the Culture has moved away from the Technical roots of Python and now
they're at odds. On the C side tuples are _not* just used when the length is
known ahead of time - tuples are not always immutable on the C side.

The context is important if you really want to understand.

------
pmiller2
I was asked this in an interview. This is essentially the answer I gave. :)

