Hacker News new | past | comments | ask | show | jobs | submit login
Python 3.6 dict becomes compact and keywords become ordered (python.org)
297 points by Buetol on Sept 9, 2016 | hide | past | favorite | 157 comments

This doesn't feel right.. it's not actually a nice side effect at all.

1. it's not in the spec

2. you shouldn't rely on it

3. python can't figure out if you're relying on it, so no error will be raised

4. subtle bugs are sure to be introduced by people who "know" this "feature" exists and use it.

Regardless of the cool implementation details, this post shouldn't advertise that "keywords become ordered" and "A nice "side effect" of compact dict is that the dictionary now preserves the insertion order".

Ergo... we built this awesome thing that you'd love to use but you can't. Don't use it or you'll be a bad programmer who doesn't read specs!

(Just use addict or OrderedDict.)

For what it's worth, the JS ecosystem has tried for years to get users to not rely on key iteration order. It was always unspecified but no matter what, users still expected them to iterate in insertion order.

V8 fought against it for years. Look at this bug horror show:


I think Python is doing the right thing. The only reason to not provide a deterministic iteration order for unsorted maps is to give more room for optimizers. But it seems like there is a clean implementation that is faster while also iterating over keys in insertion order.

Sure, maybe someone will come up with an even better optimization in the future that would break this, but at some point, you have to say, "OK, what we have is good enough, and it gives users a more predictable system."

If you are going to declare insertion order is non-deterministic, you really should make it non-deterministic by doing something like shuffling it. Otherwise, users will just inadvertently rely on whatever deterministic-but-unspecified-order the current implementation happens to provide.

For once, C++ got some usability right and created std::map and std::unordered_map which have explicit key ordering behaviour.

Now, back to waiting for my project to compile.....

If you think offering every variation of data structures is doing usability right, then C++ has done a lot of usability right. So much that its usability has suffered.

Java as well. Need sorted? TreeMap. High-performance but unordered? HashMap. Preserve insertion order? LinkedHashMap.

I would guess it probably is the same way in c++ but I've never spent more than 5 seconds considering which collection to use in Java. They have a decent amount of them but once you are aware of their existence they are incredibly intuitive

FWIW Python has had OrderedDict for years.

> make it non-deterministic

In Python v3.1 - v3.5 the order is deterministic within an execution, but non-deterministic across executions. This was implemented to avoid the possibility that a hacker could slow down dict key lookup/insertion by creating a large number of hash table collisions.


IMHO, that doesn't seem to be a big vulnerability.

It never seemed to catch on with the bad guys, but that doesn't mean it wouldn't have happened if major languages hadn't implemented counter measures.

DDoS using DNS amplification was possible for decades but never happened. Then one day a bad guy got bored, wrote a proof-of-concept, and the rest is history.

At least with respect to ordered kwargs, the spec is changing:


I believe it's an implementation detail that in cpython 3.6, dict and odict are essentially equivalent now, but it's an implementation detail which is there to support an official feature of the language.

If your project can sustain a minimum version of (the as yet unreleased) Python 3.6, I don't see why you shouldn't begin to rely on this feature straightaway.

Ugh. This is all bad.

This just isn't the sort of thing that benefits from fucking with.

This is imposing a slight but not-insignificant open-ended technical debt on the whole of the Python ecosystem, for what seems to me to be very obscure benefits.

> obscure benefits

1. No need for metaclasses simply to make the class construction locals dict an ordered dict. ORM library devs will be happy.

2. No need to pass a list of tuples when you wanted to pass keyword arguments. That's a big win for usability.

> technical debt

The language spec will say that class construction locals and kwargs are insertion-ordered, but other dicts are ordered according to the whims of the interpreter. What is causing technical debt?

> This just isn't the sort of thing that benefits from [change].

The core devs say that status quo is the sane default. They're not just mucking about for the amusement of code churn.

> The language spec will say that class construction locals and kwargs are insertion-ordered, but other dicts are ordered according to the whims of the interpreter. What is causing technical debt?

The debt will accrue in the wider ecosystem. The spec doesn't guarantee dicts are ordered, but people will rely on it. Then, when a new dict implementation comes along, all code written based on a (then) correct assumption (but not based on the spec) suddenly stops working.

If that happens -- people begin to rely on dict order -- the core devs will consider that when deciding whether or not to change the implementation.

Or they could just flush this now and save all the trouble.

Dicts with integer keys have always had a very predictable order. Do you consider that technical debt?

Only if someone writes (and shares widely, to be used) code that depends on it.

I've [ab]used the stability of dict order when you haven't modified the dict in the meantime, in one-off scripts.

But I wouldn't write code that depended on predicting dict key order though, that seems a bit too much to me.


For me, I feel that a Python syntax text should run unmodified on as many interpreters as possible, cPython, Pypy, Jython, Brython et. al., embedded, etc. There are lots of Python and Pythonish runtimes out there, beyond just what core dev produces.

If I can write code that works on most of those systems out-of-the-box that's a huge win from my point of view.

It should be clear that changing the semantics of the interpreter, even in subtle ways, puts a burden onto anyone hoping to maintain parity in their own runtime, yes?

> It should be clear that changing the [behavior] of the interpreter, even in subtle ways, puts a burden onto anyone hoping to maintain parity in their own runtime, yes?

No, that's not clear. Any changes to the language specification put a burden on the maintainers of interpreters/compilers. Any extra changes to the CPython interpreter do not. You argued that behavior of CPython becomes its own spec, beyond that of Python itself, but this does not seem to be the case. There are a bunch of optimizations that CPython does, like caching small ints, that programmers do not tend to rely on and do not form a sort of shadow-spec for other interpreters.

You might say that dict keys staying in insertion order is more useful than the other CPython idiosyncrasies and therefore will become a shadow-spec, unlike the others. That's possible, but let's consider the risk.

Pythonistas have done just fine without this behavior for a while. There's OrderedDict, but it was added in 2008, relatively late. If there had been lots of demand/usage, I'd have expected it to be added earlier. PEP 372 [0] indicates that use in a metaclass was one of the big motivations, which was only enabled the year before.

When a new Pythonista considers this issue, they are likely to search the internet for discussion and will find commentary on OrderedDict. I expect most will use OrderedDict when appropriate.

The riskiest group would be folks porting code from PHP and Ruby, or other languages that have a core mapping type that keeps insertion order according to the language spec. They'd just expect a Python dict to behave the same way and won't bother to read about it. To evaluate the impact of this change, we should ask how much porting from PHP/Ruby/etc. do we expect to occur, how likely is it that dict implementation will drop insertion ordering in the future, and how much benefit does this implementation provide.

Based on the explanation of the memory and compute savings, this new dict implementation sounds like a good idea.

Also, Ruby made the change in 2009 to the language spec. I'm not an expert Rubyist, but AFAICT it hasn't caused any problems for their community.

[0] https://www.python.org/dev/peps/pep-0372/

> No need for metaclasses simply to make the class construction locals dict an ordered dict. ORM library devs will be happy.

understatement with that. Saw someone trying to avoid using a metaclass for that, "had" to use descriptor objects which all used the same global counter to work out their order

Hey, guess what? Just don't use metaclasses. At all, for anything. Done.

GvR called it "The Killing Joke" for a reason.

This is a clear case of "negative productivity": People are working "harder than a cat trying to bury a turd on a marble floor" to pee in my soup. It makes me grumpy.

what, not using metaclasses in that case caused more "issues" than would of been caused by using metaclasses

multiple global values to track state, since can't bind the state to the class' creation, since the conversion to the final form of the class happened after its creation, vs being able to bind the state to the class' creation, since the conversion happens as the class is created

Knowing nothing more about this codebase than the above, I would fire everyone and make the CEO's nephew write it. It could hardly turn out worse and it would be a lot cheaper.

(I'm joking, but only barely.)

The reduced usecase (story?) is the following

The customer requests the following

    class somename (?):
        an_attr_a = ADescriptor(somestuff)
        an_attr_1 = ADescriptor(somestuff)
        an_attr_a1 = ADescriptor(somestuff)
One of their expectations is that the attributes have their definition order stored somewhere. You may either use the decorator method, or the metaclass method

I don't get your concern. You can always treat an ordered collection as unordered without harm, aside from maybe not picking the most optimized solution had you considered order. Existing code will assume it's unordered, code written for 3.6+ can assume it's ordered.

> assume it's ordered

That's discouraged for general dict usage, as non-CPython interpreters may choose a different implementation. Only kwargs and class creation dicts will be specified as ordered.

Absolutely, I just meant in the specific cases where it's guaranteed by the updated spec.

I'm imagining a scenario where I want to back-port Python 3 code to work with Python 2 and now must take account of it relying on ordering (that's not guaranteed by Python 2 semantics.)

Also it's just bad design. Ordering is an additional constraint so I feel that it should require additional code ("OrderedDict" vs "dict"). It's a deep semantic change for what seems to me to be bullshit superficial justification.

I tend to agree that this is something that could be confusing to rely on, especially if running tests against multiple version of Python (such as a previous version where this isn't the behavior).

    Explicit is better than implicit.

    There should be one-- and preferably only one --obvious way to do it.

In python 2 the ordering was deterministic. Cue lots of weird bugs when upgrading to python 3. Of course, it should never have been relied on... but it had been.

I'd prefer either a completely random order or a completely deterministic one - but please no take backs (people will rely on it).

I'm not sure anybody should be relying on code that relies upon the order of an unordered dict.

These reliances usually aren't explicit, they just happen to work that way and it can be very hard to tease out the dependency unless you have a way to easily screw with iteration order.

Pretty much any unspecified but (because of implementation) reliable ordering will eventually end up with code relying on it, usually implicitly, e.g. some Go code broke in 1.5 (and the release notes noted it explicitly) when scheduling order changed[0] despite it always having been undefined. In fact, specifically to catch this kind of hidden dependencies the Go developers added limited scheduling randomisation to the race detection mode.

[0] from definition-ordering to last-definition-bias e.g. if you launched 3 goroutines 1, 2 and 3, in 1.4 the scheduler would just run them in that order, in 1.5 it would run 3 first followed by 1 and 2.

That's true, but to rely on these behaviors is to introduce bugs into your code, and we should be detecting and correcting them, not forgoing semantically clean optimizations because someone's buggy code will break.

And so we get a 3rd way to interpolate strings in 3.6.

Yeah, this is not a dict that has ordered keys. It's a dict that has that side effect.

I can't think of any reason why a set's members, or a dict's keys (which are just set members with associated values, conceptually) should be unordered. Yes, set membership is traditionally unordered, but we have ordered sets and dicts, so they must be desirable, and they're certainly coherent.

I just hope that if sets and dicts become ordered by spec, that some day someone doesn't feel compelled to implement unordered versions.

> I just hope that if sets and dicts become ordered by spec, that some day someone doesn't feel compelled to implement unordered versions.

Unspecified-order minimizes constraints on future implementations. If someone finds a useful time or space optimization (for particular workloads, not necessarily in big-O asymptotic terms) that isn't consistent with the specified ordering, that might be a compelling reason to implement a version without the ordering constraint. (But, if its not the best choice for general use, that's true even if the default implementation isn't ordered, since its possible that the optimization will still be unsuitable as the default.)

And, obviously, even if the base case has one ordering, there's all kinds of reasons you might need to implement a version with a different ordering.

"Ordered sets" are coherent in the sense that you can make a data structure like a set and then define an order over its members, and call it an ordered set, but "ordered sets" are not sets.

There are certain purposes sets are natural for. When you introduce ordering, you will naturally be asked to add methods for querying or perhaps modifying that ordering in various ways. Many methods later, it's worth asking why you are calling this data structure a set, and what its actual purpose is supposed to be.

I interpret it differently: the ordering is a side effect, and the positive aspect is that it's easier to debug conveniently, not a feature you should rely on version to version.

I am not very sure this is tat bad. If code breaks because dict is no longer ordered, people should have used an ordered one from the start. It's a happy accident it is faster and ordered and not bad because whoever relies on dicts bein unordered (when sometimes they aren't) deserves whatever breakage they get.

This is very similar issue that made migration to Python 3 so painful.

A lot of the code had broken unicode from the start. The difference was that it worked in python 2 most of the time and failed on special cases, while python 3 was stricter and mixing bytes with string always caused it to crash.

Once dict() is ordered and people will use that property (even unknowingly) if the behavior changes again in the future, a lot of their code will be broken.

It may be similar in that it allows for relying on consistent behavior when one shouldn't, but fixing it seems a lot easier than decades of not thinking about Unicode.

A language that makes it easier to write shitty code is worse than one that doesn't

I thought Lisp was the best language though, you're telling me it's the worst? :-p

It isn't right, but popular personalities like the creator of Flask have been demanding it for years so here we are. Welcome to Open Source.

Here's what Guido wrote about some of the issues that have been raised here:


I've been asked about this. Here's my opinion on the letter of the law in 3.6:

- keyword args are ordered

- the namespace passed to a metaclass is ordered by definition order

- ditto for the class __dict__

A compliant implementation may ensure the above three requirements either by making all dicts ordered, or by providing a custom dict subclass (e.g. OrderedDict) in those three cases.

I'd like to handwave on the ordering of all other dicts. Yes, in CPython 3.6 and in PyPy they are all ordered, but it's an implementation detail. I don't want to force all other implementations to follow suit. I also don't want too many people start depending on this, since their code will break in 3.5. (Code that needs to depend on the ordering of keyword args or class attributes should be relatively uncommon; but people will start to depend on the ordering of all dicts all too easily. I want to remind them that they are taking a risk, and their code won't be backwards compatible.)



Put another way, "We're trying out a smaller, faster dict implementation that has the side-effect of being ordered. We would like to be able to change our mind in the future, except for the three cases listed above where we really want to guarantee ordering."

The original proposal was only about improving the implementation https://mail.python.org/pipermail/python-dev/2012-December/1...

Also, Guido wants people to write Python3 code that doesn't rely on dict ordering so that their code run on both Python3.5 and Python3.6.

this is fine, but with PYTHONHASHSEED no longer impacting dictionary ordering, we need a new way to test our applications to ensure they aren't relying on dictionary ordering, without having to replace all dict / {} instances with "myapplication.utils.patchable_dict". What solution has been arrived at for this use case? (there IS a solution, right?)

I'm not quite sure what you're worried about. Are you worried someone might accidentally change the interpreter when pushing to production? If so, I'd prefer to fix the testing policy to ensure that all production environments are used as test environments.

If you're worried about making sure your library/application is correct for older versions? If so, test it under Python v3.5 as well.

Or you could shuffle your test data to change the insertion order and run the test again.

relying upon the ordering of a dictionary is wrong, even with py3.6. Lots of people test for that by using an explicitly random dictionary ordering per test run via PYTHONHASHSEED (way more feasible than "shuffling test data" since not all dictionary use is that simplistic. sys.modules is a dict, for example, how do you "shuffle" that?) That apparently accidental feature is being removed.

> wrong, even with py3.6

If it works, it works. I wouldn't worry until changing the interpreter. Who knows, you might be relying on some bug for correct functionality.

it means code will suddenly break when you move it to an interpreter that does not have this implementation detail. This kind of breakage is really easy in the area class instrumentation libraries where order of class attributes affects something. Every non-cPython will implement this anyway, unwitting reliance upon it will be widespread, and there really won't be much of a "problem", other than they really should make this behavior official someday.

I'm one of those weirdos who wants to continue to use Python 2 for the foreseeable future, so to me this is just introducing a way to break backwards compatibility all the way back to 2 for the benefit of a similar but ever-more-incompatible language that I'll never use.

I'm not trying to be negative, I just want to let you know that people like me are out there.

What a sound and reasonable opinion, in my opinion.

The optimization is awesome but IMHO the key ordering "benefit" in the implementation but not the spec is a so-so move: It can cause some future bugs in the code assuming it is there. Some languages like Go added some key ordering randomization to maps to be sure to avoid people counting on any specific key order.

It's not so-so, it's a terrible idea. At a minimum, the implementation should cough up keys in (say) reverse order, or some cheap to implement different order than insertion order.

Two years down the line, the amount of code that depends on insertion order will make it impossible to do further optimization.

[Many years ago I wrote a platform with an unintended ordering scheme in its dictionary. The documentation stated in bold that order was not deterministic and could change. Users started reporting "bugs" when I changed the order. That sucked.]

It seems strange that that's what you'd be worried about. Your users found bugs in their own code. That's GOOD, isn't it? Does it matter that they blame you?

Like if I'm using pointers after freeing them and just happen to get usable data back... then you go an change it so that memory is reallocated and my code breaks. Yeah, it is a bug. In my code. It's pretty clear that I should not have been doing that. If I'm to stubborn or stupid to figure that out, then I'm really just a shit programmer.

Ordered kwargs is in a PEP[0] that is in "Final" status, meaning it's been accepted and implemented and is making its way into the language proper.

[0]: https://www.python.org/dev/peps/pep-0468/

EDIT: On rereading, I see that the parent and GP aren't specifically talking about the kwarg-ordering benefit, but the general dict-ordering benefit. I would be surprised if dict did not become a synonym for OrderedDict in the spec in the near future.

The only potential disadvantage of dict==odict is performance, as anyone relying on the current arbitrarily-ordered dict behavior by definition isn't worried about the underlying implementation moving to a deterministic-by-insertion-order dictionary.

Still a terrible idea. Speaking from hard experience.

I've never run into a case where the difference in performance between Dict and OrderedDict was important.

I have dealt with legacy code (especially code that needs to interact with JavaScript libraries) where strict-ordering was critical and it was great that at the time I was using Ruby 1.9 in which dicts are explicitly ordered.

I agree that waffling about it is bad, but I think ordered is a better default than unordered. All this stuff about deliberately trying to shuffle order to catch bugs just says to me that most code is a lot simpler to get right if dicts are ordered-by-default.

I've seen some python 2 code "made use" of the deterministic ordering of dict. Migrating to python 3 sucked, it took a while to realize this was the reason for some failing tests.

As long as the language specification is updated to specify ordered keys in dicts then this is good. The big complaint about key ordering I remember hearing is that it was basically a side effect of the CPython implementation (back in the 2.X days), but not part of the spec.

If the spec won't be updated to specify that dict keys must be ordered, then I agree with you.

EDIT: ganduG clarified below that this is just an implementation change. No bueno.

Many, many people won't read the spec. And their code will break when the implementation changes.

And what's worse, this kind of issue is hard to detect. At least with unicode the code was crashing most of the time.

If the ordering of the keys is not in the spec, a programmer should not assume the ordering of the keys. If none of the training material says or implies that the keys will always be ordered, why would a programmer?

The new and improved algorithm happens to preserve ordering. Future algorithms might not. The dictionary data structure does not specify order preservation so that, in the future, a better algorithm that doesn't have this same side effect could be used.

> a programmer should not assume the ordering of the keys

The Python community has many inexperienced programmers, even non-technical people. Particularly, it's so beginner friendly by making it very, very easy to get started with the REPL, etc. A lot of these programmers aren't even going to look at the docs and even if they do, might not understand them.

> If none of the training material says or implies that the keys will always be ordered, why would a programmer?

Because people make mistakes.

Perhaps "should" is the better word. That is, a programmer should not assume the keys will be ordered.

Programmers should not try to access memory beyond the end of an array. They should not attempt to add ints and strings. They should not give a function a float when it expects a struct.

But they do. So we try to design languages in such a way as to minimise the likelihood of these mistakes going unnoticed.

Designing a language to minimize the likelihood of these mistakes is a motivation for language design, but usually far from the primary.

"Programmers should not try to access memory beyond the end of an array" ^ Right, and when they do, it's called a logical bug, something to be fixed. A programmer relying on the incidental but unspecified behavior of a function would be making a similar mistake.

I agree.

But, we learned from C that the possibility of trying to access arbitrary parts of memory means that programs will end up trying to access parts they shouldn't. So, higher level languages like Python solve the problem by having memory-safe types like lists, which preclude the possibility of making this mistake.

Similarly, if we randomise the ordering of keys in the dict then it's impossible for a programmer to make the mistake (consciously or unconsciously) of relying on ordering.

At the cost of an additional processing step to scramble the keys. The balance between safeguarding ignorant or overactive programmers against the performance of a critical builtin type is not going to tip in that direction.

The whole reason we can even have this discussion is because a new algorithm was developed to give us the increased performance in the first place.

> Programmers should not try to access memory beyond the end of an array. They should not attempt to add ints and strings.

In Python, both of these raise an error to make it very explicit and obvious to the programmer, while this implicitly ordered keys bit can't.

Given how much Python I learned just by opening the interpreter and seeing what happened, I can definitely go along with the idea that this is bad.

why would a programmer? Because the programmer would just run the code and check what different methods return as result. Then if it looks like the ordered keys, said programmer will rely on this behavior.

For instance, I'm writing some javascript recently. I try a little array in the console, if Array.keys() returns the orders keys, I will assume it is, and would check the docs (where?) only if I had a problem. Something unstable should not look stable. I agree with the Golang way, make it look like what it is by the specs.

I have trouble understanding a programmer who takes a couple samples of a method's behavior and feels confident enough to trust that as the actual behavior of the function. At least guard you're flippant assumption with a check or something!

The problem is not to add some checks (asserts? No, just kidding), the problem is that you're quickly writing a little thing for a demo out an experiment, and faster than you'd expect these ten lines of innocuous code have grown in a gigantic ball of mud upon which many businesses are built.

To me the less worse way to handle this is a softened TDD, where a consequent part of the code is covered (described, checked, structured) by a suite of reasonably atomic automated tests.

"try a little array in the console, if Array.keys() returns the orders keys, I will assume it is"

That may work for some toy script you write for your own use, but must be avoided if you intend to engineer a program.

Certainly, that kind of thinking will lead to very bad results if you ever do low-level multi-threaded programming or try to write portable code.

For multi-threaded programming, "is this call thread-safe?" is not something answered in a test, and certainly not in a quick test.

For portable programming, you simply cannot run the test on all possible systems.

In general, when you program like that, you will not only rely on implementation details that may change (for example, this may change for arrays with lots of entries) but also forget to handle error conditions.

> check the docs (where?)

Thankfully that's more obvious in Python.

It is very easy to inadvertently add dependencies on key order. For example, if you are generating HTML, you might have something like:

    print("<a " + " ".join(k+"=\""+v+"\"" for (k,v) in d.items()) + "/>")
And d would be a dict of attributes for the anchor element. The code will produce different output depending on computer which is not what you want because it can cause subtle problems for other systems expecting a certain output.

Several compiler bugs have been of this kind so it is not only newbie programmers that make these mistakes.

How does HTML attribute order change output?

The code will produce a string which will be different depending on unpredictable factors. Such as the memory address of the 'd' object.

But how will that affect output? I'm not aware of any scenario where HTML attribute order matters.

It shouldn't affect the way a browser renders the HTML. But there are many possible scenarios in which the order is relevant for some other reason.

Suppose you have another system monitoring the web page generated by the above code. Like a cdn or something. That system would think the web page is constantly updated as the code of the HTML page would be different each time it accessed it, leading to lots of redundant work if it has to update caches, check outbound links and so on.

If a caching thing doesn't understand that order of HTML attributes doesn't matter, I don't want to be using that thing.

> why would a programmer

How many programmers have you worked with? (People will assume all sorts of things that are not true if it makes their job slightly easier, or even harder)

This was my thought as well.

I haven't really followed the discussion around the decision, so perhaps I am missing something, but my initial feeling is that either ordering should be required by the spec and implemented, or the ordering should be randomised.

But, that is the outcome of about 1 minute's thought :)

It needs to be realized that this dict implementation landed literally yesterday. There are space benefits and it simplifies the language in three places where we are adding ordered mapping guarantees (kwargs, namespace passed to metaclasses, and cls.__dict__, all of which were planned to be ordered prior to this dict implementation coming into the language).

Before we bake the requirement of dictionaries being ordered into the language spec and require all current and future Python implementations to support it for Python 3.6 onwards -- remember, Python is 26 years old -- we want to live with the dict implementation for a few releases first.

Does this mean all dicts are ordered in Python 3.6 onwards?

If so I can see scope for subtle bugs as code written and tested on Python 3.6 will potentially fail on earlier versions due to dict order.

But I guess that's why you test using tox...

That's an argument against ever adding ANYTHING to a new language!

"Oh, we have this neat feature in mind, but because code that uses it wont work on older versions of the compiler/interpreter, we can't use it!"

Python code that will depend on the ordering will either have to specify Python 3.6 as the minimum version, or just use OrderedDict, which is presumably isn't going anywhere.

Also, it's rare that anything "depends" on the ordering of dictionary keys, but it's frequently "nice to have". For instance, it means that if you output JSON, all the keys are in a sensible order. This matters not at all for other computers reading the JSON (as the JSON spec explicitly says that ordering of keys is irrelevant), but it's lovely for developers debugging the program.

> That's an argument against ever adding ANYTHING to a new language!

Not exactly, because no error will be raised from Python and subtle bugs could be introduced if someone relied on this behavior.

I am trying to imagine a bug that would happen with an ordered dictionary which would not happen on an unordered dictionary. An unordered dictionary may return it ordered by chance.

Parent said for bugs happening in the other direction -- written for 3.6, run on earlier versions, so it would be the inverse happening (ordered assumption not holding).

I suspect that any procedure relying on dictionary order can experience all the classes of bugs associated with relying on mutated state. For example, if procedure P1 relies on dictionary order and calls another procedure P2 on a dictionary, might get a new dictionary back in a case where P2 is written in a functional rather than imperative style.

Essentially, relying on ordering requires tracking identities rather than values. Which suggests a second bug. What does it mean now mean for two dictionaries to be equal?

Really? It's in the freaking name. Code is written that depends on dictionary being ordered. Works in CPython 3.6, fails on earlier CPython and other implementations of Python.

Sorry I wrote that back to front. Fixed now.

chance my friend is what hides many bugs

If you delete a key, and add one, the order can be lost.

But it is correct in the sense that the insertion order has already changed.

Here is a fun little thing I didn't realize before running into it:

    OrderedDict(a=1, b=2).keys()
Is not guaranteed to return ['a', 'b'].

Of course this makes sense, but is annoying nonetheless.

In 3.6, this will be ordered. Python creates a dict out of the keywords arguments, so before 3.6, this creates the old hash-ordered dict which is covered to an OrderedDict.

But, as the other commenter mentioned, a list of 2-tuples is the safe bet.

I think the standard answer is to use

> OrderedDict([('a', 1), ('b', 2)])

I find myself doing that and similar things (objects with `key` and `value` attributes) to deal with JSON non-ordering a lot. I can definitely see value in preserving keyword argument order

[for k in OrderedDict("a"=1, "b"=2)] should give you what you want, IIRC

The problem isn't that .keys() isn't returning them in order. The problem is that the kwargs (a=1, b=2) aren't passed to OrderedDict() in order, so the constructor has no way of knowing which one should be first.

  SyntaxError: keyword can't be an expression
You want to pass a list of tuples:

  from collections import OrderedDict
  assert list(OrderedDict([('a', 1), ('b', 2)]).keys()) == ['a', 'b']

That's not even valid Python.


  [k for k in OrderedDict(a=1, b=2)]
not that different in valid Python

But as pointed out elsewhere, kwargs aren't guaranteed to be in order, so by the time the OrderedDict is built it's already wrong. It seems the change we're discussing was primarily motivated by a desire to fix that specific problem.

The point OP is making is that none of these statements will guarantee the order of insertion, since OrderedDict must take the variadic kwargs input as a dict, which is inherently unordered. You have to use a list of tuples.

Yes, agree, just wanted to give the correct Python for the (still) ill attempt at traversing OrderedDict keys in order.

Python should have a "grumpy mode" which causes dict-keyword order to be randomized, and other deliberate attempts to break code where people are doing the wrong thing :P

How will they reconcile this?

  In [10]: OrderedDict((('a', 1), ('b', 2))) == OrderedDict((('b', 2), ('a', 1)))
  Out[10]: False
  In [11]: dict((('a', 1), ('b', 2))) == dict((('b', 2), ('a', 1)))
  Out[11]: True

The equality test for dict objects will be unchanged. Note that it's not as if they said "Oh gee let's just alias dict() to OrderedDict() and call it a day".

The entire test suite still passes. What do that tell you?

With this given, why wouldn't collections.OrderedDict be deprecated as of 3.6?

Because this is an implementation detail. The language spec doesn't enforce this.

The real reason they did this was because of the performance gains from the approach - the ordering is just a nice side effect. Its an idea originally from PyPy afaik.

`OrderedDict` is now just a thin wrapper around `dict`.

i.e. if you want your code to be portable among different Python implementations then you should still use `OrderedDict`.

> Its an idea originally from PyPy afaik.

No, the idea is from Raymond Hettinger on the Python-Dev ML back in 2012: https://mail.python.org/pipermail/python-dev/2012-December/1...

PyPy were the first to bother actually implementing it.

Ah okay, good to know. I knew PyPy did this so I assumed it came from there.

Do you have a link to info about why it's faster? My initial reaction was that the extra indirection would slow it down, but of course that's based on nothing but general knowledge and two seconds' thought.

> My initial reaction was that the extra indirection would slow it down, but of course that's based on nothing but general knowledge and two seconds' thought.

There are already indirections in the current implementation.

The new dict has much better memory locality: the "actual data" (24 bytes on 64b platforms) is stored in a dense array instead of the former sparse array, and the sparse array only stores indices so it can be made much more compact (especially as it can switch the index size depending on the size of the compact array, for small dicts it'll store byte indices). So not only is the new dict smaller, the way the data is laid out is better fetchable.

The original point[0] was actually iteration speed: since the data is stored in a dense array, that can be iterated directly and efficiencly instead of having to iterate a sparse array and skip the random empty entries (leading to awful branch predictability, whether a given entry of the sparse array is empty is essentially random if the hash is any good).

[0] https://mail.python.org/pipermail/python-dev/2012-December/1...

Great, thank you. That's a cool technique and I'll have to remember it.

Thanks for this - I was also wondering about how this would effect cache usage.

> `OrderedDict` is now just a thin wrapper around `dict`.

It is not. odictobject.c has not been touched at all yet. Also, OrderedDict defines several methods that ordinary dict does not (and unlikely ever will).

I am sure there is a bit of code out there still calling the method...

Deprecating it won't affect that code since it's more of a flag to indicate that future use should be limited to ensure forward compatibility.

It is. Like most programming language deprecations, it's implementation was removed and the name remains for backwards compatibility.

edit: it seems like this is exactly what OrderedDict does, I presumed it's ordered by key, like std::map (red-black tree) vs. std::unordered_map (hash table) is in C++.

No, this new dict implementation preserves insertion order, it's not ordered by value of the key.

The motivation is to preserve the order of kwargs.

That's exactly the way OrderedDict works too:


> An OrderedDict is a dict that remembers the order that keys were first inserted. If a new entry overwrites an existing entry, the original insertion position is left unchanged. Deleting an entry and reinserting it will move it to the end.

Oh thanks, I was mistaken. Added an edit to GP post.

Any benchmarks yet? I'm curious about the cost of the indirection and the 2nd array vs the smaller sparse array (easier cachable), and thinking of doing the same for cperl hashes.

splits also need to realloc now twice, which might have some costs. still the run-time advantage should beat all new costs.

What's the point of the compact dict? Does having all the keys together outweigh the extra indirection? That would surprise me. Can someone help explain why that would be the case?

It all depends on the underlying technology. Yes, an indirection costs more. But in the same time, the first level array can be much smaller and it even can be that both arrays together are smaller than the old array before, because it had to had gaps. The new compact array does not need to have gaps.

It of course depends on many factors, like how is the filling factor.

But, preserving memory can potentially also preserve transfers between processor and main memory. In old processors, pure clock cycles and the complexity of the opcodes where the main factor when it came to speed. Today, the complexity of opcodes (for example indirection) are less and less interesting. The amount of cache misses and how much memory must be transferred between processor and memory chips are deciding.

> "Preserving the order of kwargs in a function"

"What are you trying to achieve?"

There is a motivation segment in the PEP: https://www.python.org/dev/peps/pep-0468/#motivation

I don't quite get it though.

Suppose you want to make something like an XML generator helper function:

   def start(tag_, **kwargs):
      write("<" + escape(tag_))
      if kwargs:
          for k, v in kwargs.item():
             write(" " + escape(k) + "=" + quote_escape(v))
(Apply hand-wavying to get the correct code.) This might be called as:

    start("abc", x="1.0", y="2.0")
but generate the output

    <abc y="2.0" x="1.0">
when you want it to be:

    <abc x="1.0" y="2.0">
The output order depends on the Python hash implementation, which (in modern Pythons) is randomly selected during startup. For an API which preserves order, you must currently either pass in the pairs in iterable order, like:

    start("abc", (("x", "1.0"), ("y", "2.0")))
or switch the API to pass in a dictionary-like object instead of kwargs, then switch to an OrderedDict (which must also be initialized with pairs in iterable order).

In 3.6, there's no need for that -- kwargs will preserve the keyword parameter order.

    start("abc", x="1.0", y="2.0")
    <abc y="2.0" x="1.0">
Ok, I can imagine people wanting this. Especially people coming form Javascript.

However `<abc y="2.0" x="1.0">` and `<abc x="1.0" y="2.0">` are semantically equivalent in XML, aren't they?

They are equivalent. That doesn't mean that all tools will use XML semantics to test for equivalence.

A testing tool might require that the output is byte-for-byte equivalent to a known good output. Python's pseudorandomly determined hash function won't preserve that order across multiple runs.

XML itself doesn't no, humans may, and some crappy crummy tools sadly do as well[0].

Round-tripping ordering is also convenient for automated processing tools as well (e.g. add/remove parameters without introducing extraneous unnecessary changes)

[0] some also do care about namespace aliases/prefixes, which is a bit of a shock the first time they choke on your perfectly valid and namespaced XML.

This is why it will be defacto stabilized quickly (because it's so useful).

I'm sure the python developers have considered that, since they explicitly chose to preserve order instead of other small non-order preserving optimizations that could have been done with the new representation.

It's a side-effect.

Though it's something which has been requested in the past. Also preserving the ordering of class definitions.

This news has spread and seems to cause a great deal of confusion. I wish the headline had been "Python dict faster [because of some subtle details ... don't look behind the curtain]"

    dict_keys(['c', 'd', 'e', 'b', 'a'])   # random order
> random

Maybe I'm nit picky but I find the misuse of that word annoying. My experience of CPython is that dicts are unordered but deterministic. Not random.

If your hash function is random (seeded), the order will be random over the seed.

Across processes perhaps? But hash(a) == hash(a) and hash(b) == hash(b) so a dictionary that with a and b inserted will yield the same order every time.

    d1 = dict()
    d1['a'] = 1
    d1['b'] = 1

    d2 = dict()
    d2['a'] = 1
    d2['b'] = 1

The order of the keys in a dictionary was merely guaranteed to be consistent for a given instance: that is

    d1 is d2 ⇒ list(d1) == list(d2)
But the order is not guaranteed to be consistent between two equal dictionaries, in that

    d1 == d2 ⇏ list(d1) == list(d2)
This is because the history of the dictionary can affect its order, not just its contents. For example,

    x = {-1: (), -2: ()}
    y = {-2: (), -1: ()}
    list(x) != list(y)
This remains true, but now the particular order produced from a given list of insert and delete operations is well-defined, rather than arbitrary.

It's worth noting that previously the order of two dictionaries with the same history was also the same (but may vary between program runs), but that was not guaranteed.

It's not actually clear if this is still regarded as an implementation detail, but I expect it will be effectively impossible to stop this becoming a key assumption of many libraries, so would expect any attempt to require OrderedDict (except for pre-3.6 compatibility) will fail.

Yes across processes. You don't want to change (reseed) the hash function after the dictionary is created. (At least then you need to rehash everything.)

Seeded hashing was introduced to python 3.2.3: https://docs.python.org/3.3/using/cmdline.html#cmdoption-R

    $ python3 --version
    Python 3.5.2

    $ python3 -c "print(hash('a'))"
    $ python3 -c "print(hash('a'))"

    $ python3 -c "print(set('abcdef'))"
    {'d', 'f', 'e', 'b', 'c', 'a'}
    $ python3 -c "print(set('abcdef'))"
    {'e', 'f', 'd', 'a', 'b', 'c'}
Note that there is no guarantee that this ordering is uniformly random. (Though it probably is.)

They should say "arbitrary order"

How about unspecified order?

Stable, but not in insertion order or any other order I'd call deterministic.

Deterministic means it is completely determined by the parameters, so it doesn't change from one call to another. This seems to be exactly what you are naming "stable" here. If it outputed the keys in a known order, it would be predictable.

Yet, the GP is just complaining about two different and widespread definitions of "random", so, whatever. It would be great to have a contexts dictionary we could use to resolve such things.

So for larger dicts the memory size of the dict would be 5/3 of what it is now? That seems like a big regression unless I'm missing something.

If you've disabled the StartCom CA due to concerns about lack of transparency[0] and are therefore unable to view pages like this one, you can always click the "web" link above and then view the cached page on Google.

For convenience, that link is:


0. https://news.ycombinator.com/item?id=12411870

Edit: To be clear to the downvoters, this has nothing to do with Python, other than they're using the StartCom certs. Not a criticism of Python.

The python docs were actually the sole reason I've re-added this cert to my trust store :(

(I didn't want to train myself to click-through the warnings)


If you care enough to disable one specific CA, then you likely know how caches work and how to find the alternative. Did you plan to post a comment like this on every single HN article which uses StartCom's cert?

OK, so I was also raising awareness of the issue, guilty. And no, this was the only time I've mentioned this on HN, and I'll never mention it again here, you have my word. Had no fucking idea it would draw so much ire (most downvoted comment ever here in half a decade). Was trying to be helpful. Sorry for pissing you all off so much.

Don't take it personally. I downvoted it, because I think it's irrelevant. I doubt anyone got angry or had any emotions at all about that post.

Dude, Python is trying to become PHP. Except with a complicated implementation.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact