Hacker News new | past | comments | ask | show | jobs | submit login

Who would use a dictionary for a point? If it's a 2D grid, you are only going to have two axes and by convention x-axis is always first, and y-axis is always second. Perfect use case for a tuple.

   point = (0, 0)

It's just an example. There are many cases where you need more than what you can safely stuff into a tuple, which means the choice becomes 'naked dictionaries vs. classes', it's a tradeoff I've had to make many times writing python code.

Of course there are many cases where using tuples is the best way to represent some data item, (x, y) point data is an obvious example. But when your data structures become more complex, need to be mutable, need to have operations defined on them, have relations to other objects, etc, tuples are a bad choice. You don't want to end up with code that uses tuples of varying lengths, composed of elements of varying types, consumed and produced by functions that are hard-coded to rely on the exact length, ordering and element type of their their tuple inputs. Named tuples are one step in the direction of dictionaries, but they are still nowhere as flexible as mutable dictionaries or proper classes.

If you can't use a tuple, it's not like you could have really used a C-struct either. So the point is valid - the code should be using tuples and not dictionaries or classes.

Again, I assume the article only served as an example of how in Python you usually end up using dictionaries or classes, which have fundamentally different performance characteristics compared to C-structs. Yes, you could use tuples instead, in some cases, and get performance characteristics comparable to C structs. Often this is not an option though, and you would still have to use classes or dictionaries. There are many scenarios you could conveniently implement in C using structs, that would end up as an ungodly mess in Python if you would implement them using nothing but tuples.

Just as a thought exercise, try to come up with a way to represent and manipulate an hierarchical, mutable C-structure with 20 fields, using nothing but tuples in Python.

Like... named tuples? That's the nice thing about python, you're not restricted to using one data type. If you have a data structure that doesn't change, and speed matters, you can use something better suited to the task at hand.

Of course, you're not restricted to using one datatype in C either, but the constant comparison this article makes between structs and dictionaries is misleading. Comparing hash map implementations with dictionaries would be much more apt.

named tuples are classes, but they use __slots__ so they are sealed to extension thus using less memory. Take a look at the `namedtuple` source, it is a fun little bit of metaprogramming.



I am wrong. namedtuple does NOT using slots. weird, wonder why?

Because it directly subclasses `tuple`

point = namedtuple("Point","x y z",verbose=True)


Or named tuples, if you want fancy . notation. Although I haven't actually looked at how the interpreter handles them, I assume they are faster than dicts since the fields are pre-defined.

> So the point is valid - the code should be using tuples and not dictionaries or classes.

Yes if this was "production code" but for the sake of comparison in this example it should be a Python class vs. C struct.

And as the text describes, a dict behaves similarly to a class in Python when it comes to runtime perf.

It's not "just an example", it's the core of their argument. They say "Pythonic code uses a hash table-like data structure" - but that's completely wrong.

His other argument, about summing points, makes no sense either. He doesn't seem to have even profiled his "slow" point summing code or tried basic Python optimisations like using a list comprehension or generator to sum the points. And if that doesn't work well enough, and point summing is a critical part of your program, then you'd use a library like NumPy to handle that, rather than using raw python.

Perfect use case for a tuple.

Or a namedtuple, if you wanted to have access to the x and y fields as named attributes.

The article actually does mention this, but only in a footnote, and the footnote gets the facts wrong:

"One would be right to argue that collections.namedtuple() is even closer to C struct than this. Let’s not overcomplicate things, however. The point I’m trying to make is valid regardless."

No, it isn't, because a namedtuple does not work the same under the hood as the straightforward class definition he gives. If p = Point(0, 0) is a namedtuple instance, and you do p.x or p.y, what actually happens is that the "attribute" is looked up by index, which is equivalent to a C struct field access.

Is that faster now? Back in Python 2.4 or so (last time I looked under the hood of CPython many moons ago) all member access required a hashtable operation. For __slots__-enabled objects (the namedtuple of their day) this meant hitting the class hashtable to get the __slots__ index before using that index to fetch the actual member from the object. So __slots__ was mostly about space efficiency, not performance.

Has that changed?

Attribute access of namedtuples aren't any quicker as you've surmised, it requires a hash lookup and an index lookup. Index lookup is as quick as normal tuple index lookups. The main reasons I reach for namedtuples is memory efficiency and much quicker creation times vs objects.

Memory efficiency? The last time I checked namedtuple dynamically generates code for a class with the specified fields, and evals the code. So it would be exactly as efficient as a normal class. For memory efficiency you want to have multiple points in a single array, to avoid the overhead of objects, but you'd need Numpy or Cython for that.

Not quite. The namedtuple will not have a per-instance attribute dictionary, so there is a significant memery saving when creating loads of them.

Right. But you can fit 2 floats in 8 bytes so if it's memory saving and efficiency you're after, you would use an array to avoid object creation overhead. Namedtuples are nice for their self-documenting properties though, but it's misleading to advertise them as an optimization.

Attribute access of namedtuples aren't any quicker as you've surmised, it requires a hash lookup and an index lookup.

Yes, you're right, I was mistaken about how namedtuples work under the hood.

You're right, a point was a bit contrived example and some more practical use case should have been the example. But the bottom line still stands.

If you have a class in Python, the member variable lookup will essentially be a hash table lookup (requires at least a little pointer traversal). By contrast, looking up a member variable is a compile time constant offset to a known pointer value. It's O(1) vs. O(1) but a huge difference in the constant factor.

Yep, using a tuple would be a little better if you were actually writing Python code but it would be comparing apples to oranges when it comes to this example case.

edit note: a dict and class are equal only in CPython, see this video linked in the OP http://vimeo.com/61044810 - it discusses various optimizations regarding classes vs. dicts in PyPy etc

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact