
Python 3.6 dict becomes compact and keywords become ordered - Buetol
https://mail.python.org/pipermail/python-dev/2016-September/146327.html
======
_hyn3
This doesn't feel right.. it's _not_ actually a nice side effect at all.

1\. it's not in the spec

2\. you shouldn't rely on it

3\. python can't figure out if you're relying on it, so _no error will be
raised_

4\. subtle bugs are sure to be introduced by people who "know" this "feature"
exists and use it.

Regardless of the cool implementation details, this post shouldn't advertise
that "keywords become ordered" and "A nice "side effect" of compact dict is
that the dictionary now preserves the insertion order".

Ergo... we built this awesome thing that you'd love to use but you can't.
Don't use it or you'll be a _bad programmer_ who doesn't read specs!

(Just use addict or OrderedDict.)

~~~
munificent
For what it's worth, the JS ecosystem has tried for years to get users to not
rely on key iteration order. It was _always_ unspecified but no matter what,
users still expected them to iterate in insertion order.

V8 fought against it for _years_. Look at this bug horror show:

[https://bugs.chromium.org/p/v8/issues/detail?id=164](https://bugs.chromium.org/p/v8/issues/detail?id=164)

I think Python is doing the right thing. The only reason to not provide a
deterministic iteration order for unsorted maps is to give more room for
optimizers. But it seems like there is a clean implementation that is _faster_
while also iterating over keys in insertion order.

Sure, maybe someone will come up with an even better optimization in the
future that would break this, but at some point, you have to say, "OK, what we
have is good enough, and it gives users a more predictable system."

If you _are_ going to declare insertion order is non-deterministic, you really
_should_ make it non-deterministic by doing something like shuffling it.
Otherwise, users will just inadvertently rely on whatever deterministic-but-
unspecified-order the current implementation happens to provide.

~~~
danpat
For once, C++ got some usability right and created std::map and
std::unordered_map which have explicit key ordering behaviour.

Now, back to waiting for my project to compile.....

~~~
Sharlin
Java as well. Need sorted? TreeMap. High-performance but unordered? HashMap.
Preserve insertion order? LinkedHashMap.

~~~
tdb7893
I would guess it probably is the same way in c++ but I've never spent more
than 5 seconds considering which collection to use in Java. They have a decent
amount of them but once you are aware of their existence they are incredibly
intuitive

------
fermigier
Here's what Guido wrote about some of the issues that have been raised here:

"""

I've been asked about this. Here's my opinion on the letter of the law in 3.6:

\- keyword args are ordered

\- the namespace passed to a metaclass is ordered by definition order

\- ditto for the class __dict__

A compliant implementation may ensure the above three requirements either by
making all dicts ordered, or by providing a custom dict subclass (e.g.
OrderedDict) in those three cases.

I'd like to handwave on the ordering of all other dicts. Yes, in CPython 3.6
and in PyPy they are all ordered, but it's an implementation detail. I don't
want to _force_ all other implementations to follow suit. I also don't want
too many people start depending on this, since their code will break in 3.5.
(Code that needs to depend on the ordering of keyword args or class attributes
should be relatively uncommon; but people will start to depend on the ordering
of all dicts all too easily. I want to remind them that they are taking a
risk, and their code won't be backwards compatible.)

"""

[https://mail.python.org/pipermail/python-
dev/2016-September/...](https://mail.python.org/pipermail/python-
dev/2016-September/146348.html)

~~~
raymondh
Put another way, "We're trying out a smaller, faster dict implementation that
has the side-effect of being ordered. We would like to be able to change our
mind in the future, except for the three cases listed above where we really
want to guarantee ordering."

The original proposal was only about improving the implementation
[https://mail.python.org/pipermail/python-
dev/2012-December/1...](https://mail.python.org/pipermail/python-
dev/2012-December/123028.html)

Also, Guido wants people to write Python3 code that doesn't rely on dict
ordering so that their code run on both Python3.5 and Python3.6.

~~~
zzzeek
this is fine, but with PYTHONHASHSEED no longer impacting dictionary ordering,
we need a new way to test our applications to ensure they aren't relying on
dictionary ordering, without having to replace all dict / {} instances with
"myapplication.utils.patchable_dict". What solution has been arrived at for
this use case? (there IS a solution, right?)

~~~
xapata
I'm not quite sure what you're worried about. Are you worried someone might
accidentally change the interpreter when pushing to production? If so, I'd
prefer to fix the testing policy to ensure that all production environments
are used as test environments.

If you're worried about making sure your library/application is correct for
older versions? If so, test it under Python v3.5 as well.

Or you could shuffle your test data to change the insertion order and run the
test again.

~~~
zzzeek
relying upon the ordering of a dictionary is wrong, even with py3.6. Lots of
people test for that by using an explicitly random dictionary ordering per
test run via PYTHONHASHSEED (way more feasible than "shuffling test data"
since not all dictionary use is that simplistic. sys.modules is a dict, for
example, how do you "shuffle" that?) That apparently accidental feature is
being removed.

~~~
xapata
> wrong, even with py3.6

If it works, it works. I wouldn't worry until changing the interpreter. Who
knows, you might be relying on some bug for correct functionality.

~~~
zzzeek
it means code will suddenly break when you move it to an interpreter that does
not have this implementation detail. This kind of breakage is really easy in
the area class instrumentation libraries where order of class attributes
affects something. Every non-cPython will implement this anyway, unwitting
reliance upon it will be widespread, and there really won't be much of a
"problem", other than they really should make this behavior official someday.

------
gbin
The optimization is awesome but IMHO the key ordering "benefit" in the
implementation but not the spec is a so-so move: It can cause some future bugs
in the code assuming it is there. Some languages like Go added some key
ordering randomization to maps to be sure to avoid people counting on any
specific key order.

~~~
noobiemcfoob
If the ordering of the keys is not in the spec, a programmer should not assume
the ordering of the keys. If none of the training material says or implies
that the keys will always be ordered, why would a programmer?

The new and improved algorithm happens to preserve ordering. Future algorithms
might not. The dictionary data structure does not specify order preservation
so that, in the future, a better algorithm that doesn't have this same side
effect could be used.

~~~
gbog
why would a programmer? Because the programmer would just run the code and
check what different methods return as result. Then if it looks like the
ordered keys, said programmer will rely on this behavior.

For instance, I'm writing some javascript recently. I try a little array in
the console, if Array.keys() returns the orders keys, I will assume it is, and
would check the docs (where?) only if I had a problem. Something unstable
should not look stable. I agree with the Golang way, make it look like what it
is by the specs.

~~~
noobiemcfoob
I have trouble understanding a programmer who takes a couple samples of a
method's behavior and feels confident enough to trust that as the actual
behavior of the function. At least guard you're flippant assumption with a
check or something!

~~~
gbog
The problem is not to add some checks (asserts? No, just kidding), the problem
is that you're quickly writing a little thing for a demo out an experiment,
and faster than you'd expect these ten lines of innocuous code have grown in a
gigantic ball of mud upon which many businesses are built.

To me the less worse way to handle this is a softened TDD, where a consequent
part of the code is covered (described, checked, structured) by a suite of
reasonably atomic automated tests.

------
brettcannon
It needs to be realized that this dict implementation landed literally
yesterday. There are space benefits and it simplifies the language in three
places where we are adding ordered mapping guarantees ( __kwargs, namespace
passed to metaclasses, and cls.__dict__, all of which were planned to be
ordered prior to this dict implementation coming into the language).

Before we bake the requirement of dictionaries being ordered into the language
spec and require all current and future Python implementations to support it
for Python 3.6 onwards -- remember, Python is 26 years old -- we want to live
with the dict implementation for a few releases first.

------
andybak
Does this mean all dicts are ordered in Python 3.6 onwards?

If so I can see scope for subtle bugs as code written and tested on Python 3.6
will potentially fail on earlier versions due to dict order.

But I guess that's why you test using tox...

~~~
collyw
I am trying to imagine a bug that would happen with an ordered dictionary
which would not happen on an unordered dictionary. An unordered dictionary may
return it ordered by chance.

~~~
njharman
Really? It's in the freaking name. Code is written that depends on dictionary
being ordered. Works in CPython 3.6, fails on earlier CPython and other
implementations of Python.

~~~
collyw
Sorry I wrote that back to front. Fixed now.

------
IgorPartola
Here is a fun little thing I didn't realize before running into it:

    
    
        OrderedDict(a=1, b=2).keys()
    

Is not guaranteed to return ['a', 'b'].

Of course this makes sense, but is annoying nonetheless.

~~~
toyg
[for k in OrderedDict("a"=1, "b"=2)] should give you what you want, IIRC

~~~
the_mitsuhiko
That's not even valid Python.

~~~
coldtea
Well,

    
    
      [k for k in OrderedDict(a=1, b=2)]
    

not that different in valid Python

~~~
ak217
The point OP is making is that none of these statements will guarantee the
order of insertion, since OrderedDict must take the variadic kwargs input as a
dict, which is inherently unordered. You have to use a list of tuples.

~~~
coldtea
Yes, agree, just wanted to give the correct Python for the (still) ill attempt
at traversing OrderedDict keys in order.

------
Terr_
Python should have a "grumpy mode" which causes dict-keyword order to be
randomized, and other deliberate attempts to break code where people are doing
the wrong thing :P

------
fucking_tragedy
How will they reconcile this?

    
    
      In [10]: OrderedDict((('a', 1), ('b', 2))) == OrderedDict((('b', 2), ('a', 1)))
      Out[10]: False
      In [11]: dict((('a', 1), ('b', 2))) == dict((('b', 2), ('a', 1)))
      Out[11]: True

~~~
wyldfire
The equality test for dict objects will be unchanged. Note that it's not as if
they said "Oh gee let's just alias dict() to OrderedDict() and call it a day".

------
Dowwie
With this given, why wouldn't collections.OrderedDict be deprecated as of 3.6?

~~~
ganduG
Because this is an implementation detail. The language spec doesn't enforce
this.

The real reason they did this was because of the performance gains from the
approach - the ordering is just a nice side effect. Its an idea originally
from PyPy afaik.

`OrderedDict` is now just a thin wrapper around `dict`.

i.e. if you want your code to be portable among different Python
implementations then you should still use `OrderedDict`.

~~~
masklinn
> Its an idea originally from PyPy afaik.

No, the idea is from Raymond Hettinger on the Python-Dev ML back in 2012:
[https://mail.python.org/pipermail/python-
dev/2012-December/1...](https://mail.python.org/pipermail/python-
dev/2012-December/123028.html)

PyPy were the first to bother actually implementing it.

~~~
ganduG
Ah okay, good to know. I knew PyPy did this so I assumed it came from there.

------
rurban
Any benchmarks yet? I'm curious about the cost of the indirection and the 2nd
array vs the smaller sparse array (easier cachable), and thinking of doing the
same for cperl hashes.

splits also need to realloc now twice, which might have some costs. still the
run-time advantage should beat all new costs.

------
imh
What's the point of the compact dict? Does having all the keys together
outweigh the extra indirection? That would surprise me. Can someone help
explain why that would be the case?

~~~
PythonicAlpha
It all depends on the underlying technology. Yes, an indirection costs more.
But in the same time, the first level array can be much smaller and it even
can be that both arrays together are smaller than the old array before,
because it had to had gaps. The new compact array does not need to have gaps.

It of course depends on many factors, like how is the filling factor.

But, preserving memory can potentially also preserve transfers between
processor and main memory. In old processors, pure clock cycles and the
complexity of the opcodes where the main factor when it came to speed. Today,
the complexity of opcodes (for example indirection) are less and less
interesting. The amount of cache misses and how much memory must be
transferred between processor and memory chips are deciding.

------
smegel
> "Preserving the order of __kwargs in a function "

"What are you trying to achieve?"

~~~
thomasahle
There is a motivation segment in the PEP:
[https://www.python.org/dev/peps/pep-0468/#motivation](https://www.python.org/dev/peps/pep-0468/#motivation)

I don't quite get it though.

~~~
dalke
Suppose you want to make something like an XML generator helper function:

    
    
       def start(tag_, **kwargs):
          write("<" + escape(tag_))
          if kwargs:
              for k, v in kwargs.item():
                 write(" " + escape(k) + "=" + quote_escape(v))
          write(">")
    

(Apply hand-wavying to get the correct code.) This might be called as:

    
    
        start("abc", x="1.0", y="2.0")
    

but generate the output

    
    
        <abc y="2.0" x="1.0">
    

when you want it to be:

    
    
        <abc x="1.0" y="2.0">
    

The output order depends on the Python hash implementation, which (in modern
Pythons) is randomly selected during startup. For an API which preserves
order, you must currently either pass in the pairs in iterable order, like:

    
    
        start("abc", (("x", "1.0"), ("y", "2.0")))
    

or switch the API to pass in a dictionary-like object instead of kwargs, then
switch to an OrderedDict (which must also be initialized with pairs in
iterable order).

In 3.6, there's no need for that -- __kwargs will preserve the keyword
parameter order.

~~~
thomasahle

        start("abc", x="1.0", y="2.0")
        <abc y="2.0" x="1.0">
    

Ok, I can imagine people wanting this. Especially people coming form
Javascript.

However `<abc y="2.0" x="1.0">` and `<abc x="1.0" y="2.0">` are semantically
equivalent in XML, aren't they?

~~~
dalke
They are equivalent. That doesn't mean that all tools will use XML semantics
to test for equivalence.

A testing tool might require that the output is byte-for-byte equivalent to a
known good output. Python's pseudorandomly determined hash function won't
preserve that order across multiple runs.

------
wyldfire
This news has spread and seems to cause a great deal of confusion. I wish the
headline had been "Python dict faster [because of some subtle details ...
don't look behind the curtain]"

------
Pirate-of-SV

        dict_keys(['c', 'd', 'e', 'b', 'a'])   # random order
    

> random

Maybe I'm nit picky but I find the misuse of that word annoying. My experience
of CPython is that dicts are unordered but deterministic. Not random.

~~~
thomasahle
If your hash function is random (seeded), the order will be random over the
seed.

~~~
Pirate-of-SV
Across processes perhaps? But hash(a) == hash(a) and hash(b) == hash(b) so a
dictionary that with a and b inserted will yield the same order every time.

    
    
        d1 = dict()
        d1['a'] = 1
        d1['b'] = 1
        print(d1.keys())
    
        d2 = dict()
        d2['a'] = 1
        d2['b'] = 1
        print(d2.keys())

~~~
Veedrac
The order of the keys in a dictionary was merely guaranteed to be consistent
for a given instance: that is

    
    
        d1 is d2 ⇒ list(d1) == list(d2)
    

But the order is not guaranteed to be consistent between two equal
dictionaries, in that

    
    
        d1 == d2 ⇏ list(d1) == list(d2)
    

This is because the _history_ of the dictionary can affect its order, not just
its contents. For example,

    
    
        x = {-1: (), -2: ()}
        y = {-2: (), -1: ()}
        list(x) != list(y)
    

This remains true, but now the particular order produced from a given list of
insert and delete operations is well-defined, rather than arbitrary.

It's worth noting that previously the order of two dictionaries with the same
history was also the same (but may vary between program runs), but that was
not guaranteed.

It's not actually clear if this is still regarded as an implementation detail,
but I expect it will be effectively impossible to stop this becoming a key
assumption of many libraries, so would expect any attempt to require
OrderedDict (except for pre-3.6 compatibility) will fail.

------
lqdc13
So for larger dicts the memory size of the dict would be 5/3 of what it is
now? That seems like a big regression unless I'm missing something.

------
jnbiche
If you've disabled the StartCom CA due to concerns about lack of
transparency[0] and are therefore unable to view pages like this one, you can
always click the "web" link above and then view the cached page on Google.

For convenience, that link is:

[https://webcache.googleusercontent.com/search?q=cache:9Tj0HS...](https://webcache.googleusercontent.com/search?q=cache:9Tj0HSwUl2YJ:https://mail.python.org/pipermail/python-
dev/2016-September/146327.html+&cd=1&hl=en&ct=clnk&gl=us)

0\.
[https://news.ycombinator.com/item?id=12411870](https://news.ycombinator.com/item?id=12411870)

Edit: To be clear to the downvoters, this has nothing to do with Python, other
than they're using the StartCom certs. Not a criticism of Python.

~~~
jnbiche
...

~~~
viraptor
If you care enough to disable one specific CA, then you likely know how caches
work and how to find the alternative. Did you plan to post a comment like this
on every single HN article which uses StartCom's cert?

~~~
jnbiche
OK, so I was also raising awareness of the issue, guilty. And no, this was the
only time I've mentioned this on HN, and I'll never mention it again here, you
have my word. Had no fucking idea it would draw so much ire (most downvoted
comment ever here in half a decade). Was trying to be helpful. Sorry for
pissing you all off so much.

~~~
viraptor
Don't take it personally. I downvoted it, because I think it's irrelevant. I
doubt anyone got angry or had any emotions at all about that post.

------
kragen
Dude, Python is trying to become PHP. Except with a complicated
implementation.

