
Redesigning Python's named tuples - signa11
https://lwn.net/Articles/731423/
======
Too
It seems like the python maintainers always try to complicate things for
themselves to cater for some obscure use case that nobody use. I mean really,
when was the last time you ever accessed a namedtuple by index?? (For vectors
i do see a benefit that you can throw it right into a matrix multiplication
that is designed for arrays but really, you can do that equally easy in other
ways). And constructing them on the fly doesn't make much sense either as it
defats the benefits of a shared type across the code base.

All people ever wanted was just a way to write in one line ´Point3D =
struct(x, y, z)´ that translates into the _exact equivalent_ of:

    
    
        class Point3D: 
            def __init__(x, y, z):
                self.x = x
                self.y = y
                self.z = z
    

What would be the startup performance of parsing the above 5 line class
compared to a namedtuple? Surely it should be faster to create it with a
shorthand form built into the language, if the functionality is equivalent?

~~~
coldtea
> _It seems like the python maintainers always try to complicate things for
> themselves to cater for some obscure use case that nobody use. I mean
> really, when was the last time you ever accessed a namedtuple by index??_

Notice how when someone asks something like that, some bizarro practitioners
of such methods would always appear and say how they are invaluable for
them...

~~~
adambyrtek
It's called a _tuple_ so why would accessing its elements by index be
"bizarre" in any way?

~~~
emmelaich
Because order should be unimportant for a tuple.

~~~
xapata
Are you joking or serious? I can't read your tone without emoji.

~~~
emmelaich
Serious; see coldtea's reply for a fuller explanation.

Or consider (firstname="John", lastname="Doe") vs (lastname="Doe",
firstname="John").

I think you'd agree that they're the same.

~~~
rraval
They are the "same" by equality, but that's not the only relevant comparison.
Ordering is important if you want lexicographic less than.

Without ordering, you wouldn't be able to sanely run `sorted` on a iterable of
namedtuples.

~~~
tomrod
Dumb question from a non-developer: wouldn't using a dict object make more
sense if someone wanted an unordered named tuple?

~~~
coldtea
It's not about the storage (besides Python also has order dicts).

Namedtuples are for the convenience of creating a record type (though indeed
as others say, treating it as tuple where things expect a tuple is also a
benefit).

------
asragab
This article is ostensibly about how the current implementation of namedtuples
has had serious consequences for the startup time of Python, because
namedtuples are used in the compilation of classes (roughly). However, somehow
this buries the lede...the most "interesting" discussion is kicked off by the
first comment:

    
    
      Issues around the performance of Python and programs 
      written in it have far wider consequences than startup 
      time. During all the time any Python program is running, 
      its host machine is consuming power that typically depends
      on pumping CO2 into the atmosphere. If most of that power is 
      wasted, the effects go far beyond extra money to buy 
      it, or to operate extra servers, or users who wait a 
      little longer. The carbon footprint of a Python program 
      that runs throughout a data center, or many data centers, 
      adds up.
    

There was an article earlier on HN about the energy consumption pattern of
Bitcoin/Ethereum and presumably any blockchain that implements a proof-of-work
protocol/scheme, and between that article and this comment - I've started to
notice a growing unease (I am probably waaay behind on the uptake) about the
"world-eating" capacity of software.

I wonder how quantifiable implementation decisions like the ones exhibited by
namedtuples in Python, which one might argue is an unfortunate/accidental
side-effect vis-a-vis energy consumption, versus ones like proof-of-work,
which I would argue are explicitly designed to be expensive.

And if anything should come of that quantification, namely, does optimizing
code really become a moral imperative, and if so are there some usability and
refactorability metrics that are often held in high regard that we ought to
consider abandoning in the name of "energy efficient" software.

Obviously, this isn't a simple tradeoff, software that is difficult to write
because it is highly optimized is difficult to maintain, and it might be the
case that performance derived energy savings are outweighed by the energy cost
of maintenance (literally, the energy cost of debugging and testing).

~~~
gmueckl
I do believe that we should consider a moral obligation to optimize the sh$$
out of widely used software. Inefficient algorithms running in millions of
billions of instances (cloud data centers, smartphones, home routers, smart
TVs...) create a considerable environmental footprint fast. This is especially
true as long as we are powering them using non-renewable energy sources.

The amount of effort that can be spent on optimizations on this kind of widely
used software is enormous before the balance becomes negative. If an
optimization that shaves off 1 second of CPU time used per year on each of a
million devices will result in a net reduction in energy usage within the
first year even if it takes 30 eight hour work days to develop.

This train of thought becomes particularly nasty when you realize that all the
computational overhead introduced by the widespread usage of encryption (https
etc.) must necessarily lead to environmental damage. To put it in the most
provocative way I can think of right now: which is more important: the
security of your personal data now or the safety and wellbeing of future
generations of humankind?

~~~
carapace
Systems of elements that can trust each other are more efficient than systems
of elements that must expend energy to check each other.

------
hasenj
Why don't languages like Python have the concept of a C-like struct? Seems
like it should be straight forward to have with no downsides that I can think
of.

~~~
chubot
I had this question for awhile too, but I realized after hacking on the Python
interpreter that it breaks the execution model:

\- Python compiles a single module to bytecode at a time (.py to .pyc)

\- It interleaves execution and compilation -- imports are dynamic, you can
put arbitrary computation between them, and conditionally import, etc.

If you had structs, then you would need to resolve names to offsets at compile
time -- e.g p.x is offset 0, p.y is offset 4. That's a form of type
information. Types can be defined in one module and used in another, and
Python really has no infrastructure for that.

Nor do any other dynamic languages like Ruby, Perl, JavaScript, Lua, or PHP,
as far as I can tell. They are all based on hash tables because deferring name
lookup until runtime means that I can start executing "now" rather than
looking all over the program for types. It probably helps a bit with the REPL
too.

The need for type information in a translation unit is also what gives you C's
header file mess, so it's not a trivial problem.

~~~
pjmlp
Common Lisp, Dylan and Julia do offer such support.

It is a matter of caring about it, and having JIT/AOT support as part of the
standard tooling.

C header file mess is caused by its designers ignoring the work outside AT&T
and not bothering to implement a module system.

~~~
flavio81
> C header file mess is caused by its designers ignoring the work outside AT&T
> and not bothering to implement a module system.

To be fair to them, they wanted to create a language whose compiler did fit
into the limited hardware they had.

It's not their fault that UNIX had so much success and the world largely went
there ignoring most of the progress made in programming tools and languages
during the 70s.

~~~
pjmlp
That success was mostly caused by AT&T not being able to sell UNIX (at least
during the first years of UNIX), so they just made the source code available
to everyone willing to pay for a symbolic license, which was a tiny fraction
of what vendors were charging for their own systems.

An OS and systems programming language available almost for free, including
source code, versus what a commercial mainframe, its OS and SDK would cost
without source code included, sure recipe for success.

------
vanni
Previous LWN article on the subject: "Reducing Python's startup time" (Aug 16,
2017)

[https://lwn.net/Articles/730915/](https://lwn.net/Articles/730915/)

Related HN discussion:

[https://news.ycombinator.com/item?id=15131981](https://news.ycombinator.com/item?id=15131981)

------
tveita
I'd be happy to see it optimized - namedtuple looks like a convenient way to
quick-and-dirtily define a data structure, but several times I've ended up
changing back to plain tuples because using namedtuple was much slower,
especially when pickling.

Maybe they could add a variant that isn't a tuple as well.

------
zimablue
The subtext is jealousy of JavaScript object creation/destructing, it's the
thing I like most about JavaScript I think, really makes your code feel fluid.
I couldn't tell you what they preserve order though. There's even precompiler
extensions to destructure immutable.js Maps.

~~~
alexchamberlain
Could you expand on the syntax you're referring to?

~~~
zimablue
Not well but here is a link, the object destructing section
[https://developer.mozilla.org/en/docs/Web/JavaScript/Referen...](https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Operators/Destructuring_assignment)

------
ryanx435
I have yet to encounter a time when a list of 2 items didn't suffice for
tuples. I'm sure applications exist, as tuples are immutable, but I have never
encountered the need for them.

------
alphaalpha101
> Either way (or both) would be implemented in C for speed. It would allow
> named tuples to be created without having to describe them up front, as is
> done now. But it would also remove one of the principles that guided the
> design of named tuples, as Tim Peters said:

> > How do you propose that the resulting object T know that T.x is 1. T.y is
> 0, and T.z doesn't make sense? Declaring a namedtuple up front allows the
> _class_ to know that all of its instances map attribute "x" to index 0 and
> attribute "y" to index 1. The instances know nothing about that on their
> own, and consume no more memory than a plain tuple. If your `ntuple()`
> returns an object implementing its own mapping, it loses a primary advantage
> (0 memory overhead) of namedtuples. Post-decree, Ethan Furman moved the
> discussion to python-ideas and suggested looking at his aenum module as a
> possible source for a new named tuple. But that implementation uses
> metaclasses, which could lead to problems when subclassing as Van Rossum
> pointed out.

> Jim Jewett's suggestion to make named tuples simply be a view into a
> dictionary ran aground on too many incompatibilities with the existing
> implementation. Python dictionaries are now ordered by default and are
> optimized for speed, so they might be a reasonable choice, Jewett said. As
> Greg Ewing and others noted, though, that would lose many of the attributes
> that are valued for named tuples, including low memory overhead, access by
> index, and being a subclass of tuple.

> Rodolà revived his proposal for named tuples without a declaration, but
> there are a number of problems with that approach. One of the main stumbling
> blocks is the type of these on-the-fly named tuples—effectively each one
> created would have its own type even if it had the same names in the same
> order. That is wasteful of memory, as is having each instance know about the
> mapping from indexes to names; the current implementation puts that in the
> class, which can be reused. There might be ways to cache these on-the-fly
> named tuple types to avoid some of the wasted memory, however. Those
> problems and concern that it would be abused led Van Rossum to declare the
> "bare" syntax (e.g. (x=1, y=0)) proposal as dead.

From what I've read, v8's implementation of objects in Javascript goes
basically like this: when you call a constructor function and assign
properties to your object, it makes up struct types and ties the object to
that type or something.

Like this:

    
    
        function Point2D(x, y) {
            this.x = x;
            this.y = y;
        }
        let p = new Point2D(1.0, 2.0)
    

initially you'll have an empty object, which will be an empty struct. 'this.x
= x' will change the type to the 'X' struct, and 'this.y = y' will change the
type to the 'XY' struct. If you do this again with another object, _they 'll
share these underlying structs_.

Now this is perhaps easier with a JIT, and perhaps not. But it bears thinking
about. Why not just make it so that (x: 1, y: 0) - which would be the best
syntax IMO as it fills out the {set, dict; tuple, ???} square - creates an
object that shares its class with every other namedtuple that has exactly the
x and y properties in exactly that order?

It _really_ frustrates me when I read 'Those problems and concern that it
would be abused led Van Rossum to declare the "bare" syntax (e.g. (x=1, y=0))
proposal as dead.' I mean come on, I know it's a different environment in
Python than in V8, but seriously this is a solved problem. Those problems?
Those problems are a solved problem that a solution was already proposed for
in the thread. Just do that.

>He elaborated on the ordering problem by giving an example of a named tuple
that stored the attributes of elementary particles (e.g. flavor, spin, charge)
which do not have an automatic ordering. That argument seemed to resonate with
several thread participants.

I don't want to be too harsh, but this is nonsensical rubbish. Dictionaries
preserve order in Python. This ship sailed a long time ago. Namedtuples also
already preserve order. Tuples preserve order. Lists preserve order.
Dictionaries preserve order.

What _doesn 't_ preserve order? Like, I get that it's not strictly defined
that dictionaries preserve order, but they do, and people do rely on that, and
so it's never going to actually be changed.

>This is exactly why I scream at relational databases. If you can't tell the
difference between a set and a list, and especially if you want to store a
list in a set-based paradigm, you are going to have ALL SORTS of grief ...

Unrelated but I found this comment funny. This guy has heard of an index,
right?

~~~
kevin_thibedeau
> Dictionaries preserve order in Python. This ship sailed a long time ago.

That ship only sailed last December with the release of 3.6. It is only an
implementation detail for CPython. Other implementations can and will use the
old behavior. Nobody should be writing code that depends on this behavior.

------
lngnmn
Obviously, practicality should beat purity and very clever and reasobable
syntax

    
    
       (x=1, y=2)
    

which could be used/generalized for a procedure arguments and effecient
implementation in C (we have C-based arrays (lists) and hash-tables, so why
not generalized records/tuples?) should be accepted.

The problem with a "pure democracy" is that majority cannot be smarter than
the top 5% individuals, so really good ideas almost never getting thorough the
bulk of a bell curve.

~~~
joejev
Why is (x=1, y=2) more practical than nt(x=1, y=2)? Also, we have a very
efficient implementation for tuples. collections.namedtuple is quite fast and
the article shows cnamedtuple (which I am the author of) which is a C
implementation that is even faster. None of this needs a change to the
language itself.

~~~
rplnt
> Why is (x=1, y=2) more practical than nt(x=1, y=2)?

It was outlined in the linked article. Order of named arguments should not
matter so new syntax would be better.

