Hacker News new | past | comments | ask | show | jobs | submit login
Python dicts are now ordered (softwaremaniacs.org)
734 points by signa11 on Feb 7, 2020 | hide | past | favorite | 436 comments



I live in the Python bubble, so I haven't realized until this post that so few people knew about that.

This post is massively popular despite talking about a feature we had since Python 3.6, in 2016, that was posted on HN at the time and that is featured in most popular tutorials.

A good reminder that most of the world doesn't revolve about my favorite language. And that information is not that fast to spread.


> about a feature we had since Python 3.6, in 2016

No, you had that feature in one implementation. Now it's in the language specification. That's vastly different because only now you can rely on it without fearing it can go away with the next release.


First, cPython is 99% of Python deployments. So much that if a script works on it but not on another implementation, people often consider the later broken.

As this post proves that most people don't even know about this feature, you can be pretty sure the vast majority of people don't know about pypy, micropython, etc.

Secondly, even if you want to nit pick, Python 3.7 made it official more than one year ago. We are currently in the 3.9 alpha.

In fact, 3.7 didn't touch the implementation, just merely declared "yep, good idea, let's keep it that way".


There's also the indication that this is official, and not just an implementation quirk. I still used OrderedDict in 3.6 because there was no guarantee 3.7 (or later) wouldn't silently break ordering on the standard dict. At least now it's unlikely to change because it will be considered a breaking change.


BDFL declared Python dict to be ordered in 2017 — It can't be more official than that for Python https://mail.python.org/pipermail/python-dev/2017-December/1... i.e., Python 3.7 (whatever implementation must keep the insertion order). CPython keeps the order since Python 3.6. Pypy even before that.


A barely legible email dump with 50 concurrent answers is supposed to be the clear official statement?

Not to mention that developers just begin migrating to python 3 in 2017 and code still had to work on 2.7 that doesn't order.


The official release notes for the following release is certainly a clear official statement.

https://www.python.org/downloads/release/python-370/

“The insertion-order preservation nature of dict objects is now an official part of the Python language spec.” (June 27, 2018)


I knew about it, because everyone was talking about it. The origination of the actual information is irrelevant. It was basically interesting gossip for a few weeks.


The ruling is the very first and only line from the link:

> Make it so. "Dict keeps insertion order" is the ruling. Thanks!


There's a great talk by David Beazley[0], which I don't remember its title right now, where he bashes all previous python versions except the latest at the time[1], which was 3.6. And he says "since I'm not a core developer, I can tell you this: rely on dicts being ordered so much that eventually they'll make it official". Guess he "won" in the end ;)

[0] Well, he usually gives great talks anyway

[1] Alright, he literally started many talks doing that


Raymond Hettinger does or did a few talks on dictionaries too. He covers the history of dictionaries in python and how they evolved. It was mentioned in the article's comment section.

https://www.youtube.com/watch?v=p33CVV29OG8


Yes, David Beazley's talks are always wonderful!


Whoa, calm your horses angry boy.


That still was announced a while ago. When reading the title I though it was something new (maybe now Python decided to keep the keys sorted? ;)


> rely on it without fearing it can go away with the next release

This is Python we're talking about.


There is no language spec for Python, the CPython implementation defines the language.


The language reference is a specification for the language. The library reference is a specification for the standard library.

https://docs.python.org/3/reference/index.html


I have lived in Python ~80% of my career (~50% now) currently), although a large chunk of my career was writing 2.7 compatible with 2.6 (thanks RHEL). For the last year and a half I have written 3.6+ exclusively and I didn't realize that regular dicts were now ordered.

I don't use dicts if I require ordering, but if I did I would probably still reach for collections.OrderedDict...

I wonder what the implications are of dict ordering with the "new" (new for me...) function args and *kwargs...from the hip it sounds pretty handy.


The implications are that now you get to see the order in which keyword args were specified by the caller. And in classes, you get to see the order in which attributes were defined. Both can come in very handy.


Dict ordering started from the need to have class attributes, args and kwargs ordered if I recall.


No, that was a happy coincidence. The goal of the dict implementation was efficiency. Ordering was a side-effect.


Most of the python I've written and maintain (which, honestly, isn't too much) was 2.7, so it is news to me too. Nice!


This is awesome, because the ordered map is the best data structure out there for easy & predictable programming, possibly only barring the array.

There's so many cases where it's a benefit for map entries to retain order (and none where it's a problem). PHP really got this one right (and immediately messed it up by mixing ordered maps with arrays into a big soup, but hey, PHP). And so did, JS, sorta-kinda-by-accident.¹

When I went from PHP to Ruby back in the day, Hashes not being ordered definitely was my biggest gotcha. Everything about Ruby (except the deployment) was nicer, but the hashes... I've spent serious amounts of extra thinking just to make code work that would've worked out of the box in PHP.

Yay for ordered maps! Every language should have them, they're the best! There's just so much ergonomics packed in such a simple API.

1) JS objects are ordered maps, iff the keys are strings that cannot be parsed as an integer (really) (yeah it's nuts) (but still better than unordered maps!!)


> JS objects are ordered maps

So currently (now that there are Symbols in JS), the order is: array-indexed properties first (integers), then string-key properties, in insertion order, then Symbol-keyed properties in insertion order.

The reason for the funny behavior in treating integer keys differently is that property keys are always treated as strings, so obj["3"] and obj[3] can't be distinguished, and arrays are also ordinary objects, so setting obj[3] was made to do the same thing on any object rather than special-casing arrays and non-array plain objects...

Today, JavaScript is a small, reasonably elegant scripting language with reasonable semantics, buried in a medium-sized, reasonably expressive scripting language with reasonable but different semantics, added 20 years later. The kitchen-sink disease is far enough along that it'll probably never recover the appeal it once had as a beginners' language. It still has the advantage of backwards compatibility, and it's become more practical as a compilation target.


> The reason for the funny behavior in treating integer keys differently is that property keys are always treated as strings, so obj["3"] and obj[3] can't be distinguished, and arrays are also ordinary objects, so setting obj[3] was made to do the same thing on any object rather than special-casing arrays and non-array plain objects...

Note that special-casing arrays is precisely what implementations used to do (i.e., for non-Array objects, the order in most VMs was simply "properties in insertion order").

V8 was the first JS VM to stop special-casing Array (and this was done before Chrome went public, and was the behaviour in the first Chrome beta). It turned out that virtually no websites relied on "array index" properties appearing in enumeration order (there was some breakage, but it was relatively few and far between, and believed to be worthwhile for the gains in cache-hits when accessing properties), and this also allowed the more compact array representations to be used for them.


I forgot about that when writing my comment but yes, V8 made that change along with "shadow classes" or whatever the object specialization stuff was called. Heady days for JS performance.

In retrospect if something was going to be standardized I'd have preferred it to be the older behavior which was simpler to explain, but so it goes.


SpiderMonkey actually had inline caches before V8 shipped if I'm not mistaken; the motivating factor for the behaviour change in V8 was just to improve the cache-hit ratio.


JavaScript is a small, reasonably elegant scripting language consisting entirely of edge cases.

/s


I did not even know that integer is a good key for an object. Is this being used anywhere outside arrays?


You can use it anywhere you want. Whether it's a good idea or not, like most anything in JS, is a matter of opinion.


> PHP really got this one right (and immediately messed it up by mixing it with arrays, but hey, PHP).

No, it didnt. What PHP calls "array" is actually an Ordered Map, as you alluded to:

https://yaml.org/type/omap

it literally says that in the documentation, paragraph one, sentence one:

> An array in PHP is actually an ordered map.

https://php.net/types.array

The issue, if one exists, is that PHP doesnt have a sequence type:

https://yaml.org/type/seq

but one is provided as a package:

https://php.net/class.ds-sequence

https://github.com/php-ds/extension


Sure, PHP arrays are actually ordered maps, but it's super confusing when used as arrays. I mean, look at code like this:

    $arr = [];
    $arr[2] = "two";
    $arr[0] = "zero";
    $arr[1] = "one";
    echo join($arr, ", ");
    // two, zero, one
In any other language, the result of similar code would be "zero, one, two". You'd need to `ksort` this thing to get it to behave like a normal bog-standard boring array.

Everybody's writing PHP code pretending that it has arrays when really, it does not. The gotchas are way bigger than the gotchas in Python <3.7's unordered maps. It's a mess.


Why wouldn‘t you just push/append to the end of the array with '$arr[]'? There is no need for explicit keys in PHP. The gotcha is writing code in PHP pretending you are using a different language.


I literally just learned "bog-standard" 10 minutes ago and now I see it here. And no, don't Baader-Meinhof me, I would have noticed this weird term in random text in the past.


Would you though? Appears at least once a week: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...


>I would have noticed this weird term in random text in the past.

Well, no so weird, I'm not a native english speaker, and I've seen the term hundreds of times over the years...

It's also quite common on HN:

https://www.google.com/search?q=site%3Anews.ycombinator.com+...


I'm aware of the terms "bog" and "bog-standard", but not being from the UK, I'm not sure what is so standard about bogs.


Baader-Meinhof denial!


Baader-Meinhof Dunning-Kruger?


I have to disagree also.

The order inserted makes most sense in most cases. Hand crafting integer based array indexes out of order is rare. And as you say a simple ksort call guarantees sort order.

    ksort($arr);
    echo join($arr, ", ");
    // one, two, three


Exactly. Of you hand craft an array you probably need it in that order.


Then why not put it in order while you're hand crafting it?


I disagree. The behavior you exhibit makes more sense than the behavior you imagine.

You start with an _empty_ array, then (somehow) set the third element of that array? That makes very little sense.

Had you _initialized_ your example as an array with a length of 3, your desired behavior would, indeed, manifest.


The problem is PHP calling an ordered map an array. Array in pretty much every other language means a sequence indexed by integers.

Ideally the maintainers of PHP would rename it and deprecate the use of `array()` over a long period of time.


The PHP documentation refers to them as "associative arrays", which is technically correct, but I agree I'm not sure how they could have conflated these concepts so badly.


I believe the "associative array" terminology came from Perl. Though Perl's associative arrays don't have any guaranteed order. And Perl also has normal arrays.


>You start with an _empty_ array, then (somehow) set the third element of that array? That makes very little sense.

Well, there's no reason a language could not initialize an empty array to some initial capacity, either on declaration or when you add an indexed element.


I guess my reasoning is that:

$arr = []

doesn't appear to be initializing to any capacity. So that after:

$arr[2] = "two"

I would expect:

count($arr) === 1

And honestly, sure. A language could be designed to pad the array with empty values if you specify an index greater than the upper bound (or lesser than the lower bound). I believe a sibling mentioned matlab did something like that.

But is that really better behavior or just different behavior? I don't really see an important difference. My point was that the criticism was fairly weak. PHP, indeed, has a very useful abstraction in its concept of "array". Most of the hate should be directed at the name, not the construct.


In that situation, Matlab allocates an array of length 3, fills it with “empty” values (depending on the type), and then sets the third element to the value. That’s what I’d have expected to happen here too...


Would you expect it to do that if somebody wrote the following?

    $id = 12835151;
    $arr[$id] = Get_thing_with_id( $id );


If $arr were an array I would, though I'd cringe at the wasted space (unless it were a sparse array).

Obviously, I'd agree that a map/associative array/dict/etc is a better choice here; I'm just annoyed about the name.


In my bog standard PHP I would do $thing = Get_thing_with_id( $id ); if(!empty($thing)){process($thing);}


yes.


So how can you access, say, the third element of your ordered map in PHP ?


    array_values($arr)[2];


I mean youd probably have to iterate. Either that or use a package with proper array support.


That's messed up


In practice, it's almost never an issue. The standard lib has push/pop/shift/pad methods available.

So example, if you had numeric indices that you wanted to guarantee that index order is equal to insertion order, you'd create an array using $my_array = array_pad([], 20, null); If it's an array that you didn't create, the ksort(...) method will order the array for you as expected.

Edit: after reading through the documentation on array_pad, it looks like it will even correct key order in existing arrays. https://www.php.net/manual/en/function.array-pad.php#87735


>There's so many cases where it's a benefit for map entries to retain order (and none where it's a problem)'

Don't know if this could actually be the case in practice, but theoretically the ordering could allow for a timing attack to glean some bit of information when performing a linear scan of the map (size of the map, relative location of the data, etc).

Just a contrarian thought given the definitive statement of "none where it's a problem". I'm generally of the same opinion as you though; mostly if not entirely harmless, potentially helpful in certain applications.


>Don't know if this could actually be the case in practice, but theoretically the ordering could allow for a timing attack to glean some bit of information when performing a linear scan of the map (size of the map, relative location of the data, etc).

An attack where the attacker has access to your ...program code and can run instructions there? In that case, leaks from a "timing attack" would be the least of your worries...

If they just provide an input to your program somehow externally, then whether you put that input into a map or an ordered map or not is an implementation detail. You could make your program rid of the "timing attack" in 100s of ways... (or have one, in 100s of ways). That doesn't make an ordered map more unsafe than any of the 1000s of ways to have a timing attack.


I think it's also worth noting that requiring an ordering is preventing a performance optimization

This doesn't impact most high level python code, and an OrderedDict is a very reasonable default. But there's a reason why Google's c++ map intentionally randomizes iteration order

(Hint: it allows the hash map and hash function to be extremely high performance while allowing themselves the flexibility to change the hash function)


Python dicts are not ordered maps, they are dicts preserving insertion order.


The title of the post is misleading: it means "ordered by insertion order" (not sorted).

Previously it was unpredictable, and that's why collections have OrderedDict; which makes sense instead of trusting an implementation detail of CPython from 3.6.

Is not mentioned in the docs: https://docs.python.org/3/library/stdtypes.html#mapping-type...

EDIT: but the behaviour is documented in OrderedDict itself! https://docs.python.org/3/library/collections.html?highlight...

Interesting!


I wonder why OrderedDict isn't outright deprecated (or reimplemented as a trivial wrapper over dict).

You can get the first/last key of a 3.8 dict with dict.keys() and reversed(dict.keys()) which you can then delete to reimplement OrderedDict.popitem.

Deleting a key and reinserting it will let you reimplement move_to_end.

The only cool thing that OrderedDict's implementation might be useful for is to move to front (or anywhere else not the back) but it doesn't expose that api.


When OrderedDict is compared with another OrderedDict, it also checks whether the order of items is the same (this way also violating LSP, but that's another story), dict instances don't do that.


Thanks, I just learned something.


...thats the same thing. Maybe youre thinking about "sorted maps"?


It's not the same thing. Compare two OrderedDicts that weren't constructed in the same order and the comparison will fail. Compare two dicts that weren't constructed in the same order and the comparison will pass.

The difference is whether order is part of its identity. A dict is still just a set of pairs, not a list of them. It just happens that it also guarantees now that if you _iterate_ over the set you'll walk the keys in insertion order.


Good point, but close enough! My points about how it makes the data structure more awesome stands.

FWIW I don't think people agree on whether "element order is part of the value's identity" is part of the definition of "ordered map". There's plenty other languages & libs with ordered maps where the order isn't part of its identity but they still use the term "ordered map" in the name or the docs. Eg PHP's arrays or ImmutableJS's OrderedMap.


Could you explain some examples of what this is useful for? What sort of algorithms or operations do you have in mind where you both want insertion order, and also key-based lookup in the same data structure?

I've made heavy use of all kinds of maps, and of queues and channels and arrays, but I don't recall ever noticing a situation where I wanted the properties of both mixed into the same data structure.

I'd love to learn more about useful tools to add to my toolbox!


For "algorithms", I mean, some really do care about insertion order. Like idk, if you have a priority queue, then you generally want FIFO ordering when the priority is the same? It like a pretty obvious desire in most cases... imagine a thread scheduler or a packet scheduler or what have you.

But generally it's about determinism and avoiding loss of information, not just whether a particular algorithm needs it. For example, you'd want serialize(obj) and serialize(deserialize(serialize(obj))) to produce the same output, otherwise you e.g. might not be able to cache stuf. But for a data structure like a hashtable, it's pretty tough (not logically impossible, but rather pointlessly difficult) to make that happen without preserving insertion order.

As another example, it's incredibly handy for a user to see items in the order in which they were inserted. Like say you're parsing command-line arguments, and the command is ./foo --x=y --w. If the user sees {x: y, w: None} then that tells them w was passed after x. That can be extremely useful for debugging; e.g. maybe you expected the caller to specify w earlier, in a different context and for an entirely different reason. Seeing that it came afterward immediately tells you something is wrong. But when such information is lost it's harder to debug code.


If --w can be specified in two places for "an entirely different reason", then an unordered data structure is simply inappropriate, full stop. That's not a question of debugging, that's a question of correctness.


  console.log(object)

  {
    keys
    in
    headache
    less
    order
  }
I find this use case severely undervalued in this thread and have no idea why. It helps so much in logging and/or debugging.


I wrote a program that read in a proprietary data format, ran a few transformations, then outputted CSVs. Using an ordered map was super-useful, because it gave me deterministic output between runs. Otherwise I ended up with CSVs with differently ordered rows...


I think there is a lot of counters you might want both.

For example, supposed you have a list of ids accessing some system, you may want to count the raw number, but then also have some transformations thereof that are still ordered by count.

Example

  l = getListOfAccessIds()
  counts = Counter(l)
  total = np.sum(counts.values())
  fractions = {k:v/total for k,v in  counts.most_common()}

  #use the fact new dictionary is still ordered by most common
  plt.semilogy(fractions.values())
  plt.plot(np.cumsum(fractions.values()))

  #still works like dict
  print(fractions[id_of_interest])


Very few algorithms really care, rather, it's when you're composing them that maintaining insertion order avoids a lot of boilerplate in having to cart that information around.

For the same reason, though, it's potentially a bad idea. All these algorithms that generate dicts generally won't promise to maintain insertion order, they just happen to by chance. Then consumers come to depend on it and be surprised when it inevitably changes.


I use ordered maps to keep track of state snapshots/patches of redux/mobx-state-tree stores.

It allows me to both apply them sequentially as generated, or to "jump to" a particular point in time.


Say, a simple LRU cache:

* Inserting into the cache is just normal insertion.

* To delete excessive items, simply iterate to get the first few items' keys, then delete.

* To lookup, simply do a key-based lookup, then delete and re-insert.


Working with GraphQL.


Point of order: Ruby’s hashes do preserve insertion order. Python dicts are now the same.


Good for Ruby! Nice to hear.

I'm pretty sure they didn't in Ruby ~1.8 though. It's been awhile :-) Can't remember exactly which version, but it bit me more than once so I'm pretty sure :)


Yup, I think Ruby 1.9 or so made hashes ordered.


Awesome! Thanks for the heads-up both of you :-)


Ruby hashes are currently ordered and have been for quite a while now. Ruby 1.9 maybe?

But back in the day they indeed were not, for appropriate values of back in the day. :)

it did make a LOT of things more convenient and less buggy when they made ruby hashes ordered, I recall. I didn't expect it would matter much, by found myself loving it. I believe the ruby maintainers investigated and determined they could do it with very little performance hit. It turns out that having a repeatable and predictable order, that also matches insertion order, is what a lot of people end up assuming whether they realize it or not, just makes everything smoother when it is.


> There's so many cases where it's a benefit for map entries to retain order (and none where it's a problem)

Of course there’s a trade off. There has to be.

In this case it prioritizes usage patterns oriented around insertion and enumeration over usage patterns involving repeated removal Or modification of items.

as items are removed, the insertion-ordered index will either fragment, require removal operations to not be constant time, or be implemented using a data structure that has per-item storage overhead (such as a linked list).

also there are two possible, reasonable things for an ordered list to do when you write a new value to an existing key - leave the key in original insertion position, or move it to the end. Whichever one the implementation chooses, people who want the other will need to implement their own key ordering state alongside that which the collection is doing for them.


> and immediately messed it up by mixing ordered maps with arrays into a big soup, but hey, PHP

And here I thought Lua was the only language insane enough to do something like that.

Just realized Lua tables aren't ordered, but it still mixes arrays and maps / tables into a single data structure.


In JS's defense, it really tried as well. Fortunately though they failed harder at it, which is why arrays and objects are probably more different from one another than Eich had intended there in that mad 2 week language design frenzy. Good for us!

Back when JS got popular and the majority of web programmers knew PHP, there were dozens of tutorials out there trying to convince you not to do stuff like this:

    var a = new Array();
    a["moo"] = "five";
    a["foo"] = "six";
After all, well, it works! Like in PHP! So what's the problem? :-) Most people felt the same way about these tutorials as people feel about monad tutorials nowadays.

Good times!


Also don’t forget about JS Map which is ordered

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...



That's not about JS Map objects, that's about regular objects which don't keep their properties in insertion order.


You missed the point.



Am I the only one that thinks this is a stupid decision? This will silently break code that starts to rely on this behaviour that gets executed on Python3.5 and lower. I would consider changing how a builtin works to be a major breaking change. It would have been fine if this was a change between 2 and 3 but on a minor version? Thats insane.


I'm not clear on how this break existing code.

Code that assumed it was arbitrary, would expect to handle any arbitrary order, including a happens-to-be sorted order.

Code that assumed it was random, like actually inserted by random(), was already broken, because that simply isn't the case.

Code that assumed the order would stay constant was relying on implementation-specific behavior, and could potentially break on any version update; as with any reliance on implementation-specific behavior, you'd break if the dictionary code ever got touched -- even if it were for a bugfix.

Code that ordered the dictionary keys before iterating are now slightly innefficient due to extra work of sorting a sorted list.


It doesn’t break existing code. Code written for Python 3.7 might break on older versions of Python


What you describe is forward compatibility, and Python (and most other programming language) doesn't have it.


Usually languages don’t have it in a very explicit way though. If I try to use f-strings in python 3.5 I get a very explicit syntax error. If I rely on ordered insertions in Python3.5 I get a potentially difficult to diagnose bug.


Why does it matter if it's explicit or not. If python doesn't support forward compatibility, you should know that if you write code for 3.7, it's not gonna work in 3.5. Doesn't seem like a big deal to me.


If you’re the only one writing it and you’re the only one running it, it’s probably fine. But if I’m putting a file out there that will only work in 3.7, it’d be nice if any potential users of that file would get a good error message if they try to run it on 3.5, rather than wrong results.

I could potentially assert a version, but do I really want to do that each time I wrote something that might be used somewhere else?

And yes of course I could add a line of documentation, but there is a 100% chance I’d still get bug reports from people on 3.5.


Setuptools solves this, add a python version specifier to your setup.py or pyproject.toml file.

If you are just distributing raw python files then congratulations you’ve just realised why packaging is valuable.


If I rely on a python version and I expect other people to use it, I add a version if statement on top. I hate those packaging tools that insist on installing stuff in your system and create a frankendebian when really all I want to do is run a single py file standalone once. Often have to do chenanigans like "python3 -c 'from sometool import __app__'".

If you want to install it, go ahead and copy or symlink it in your ~/bin or whatever you fancy (that's your personal preference anyway unless I'd specifically package it for some OS like Debian). I don't want to have to use some setup.py that I have no clue where in my OS it installs things.


Yes, a failure to understand how your tools work or how to use them effectively does indeed make things harder.


Well I know how my tools work, I don't know how this custom file works that is duplicated and delivered with each project.


> I don't know how this custom file works that is duplicated and delivered with each project.

It's not duplicated and in most cases it's not even delivered as part of the installation.

> If you want to install it, go ahead and copy or symlink it in your ~/bin or whatever you fancy

That's exactly what pip will do if invoked with `--user`.

> I don't want to have to use some setup.py that I have no clue where in my OS it installs things.

It installs it to a single place. Run `python3 -m site` and look at `USER_BASE`.

To avoid a lot of this, use pipx[1] to keep things even more isolated.

> Often have to do chenanigans like "python3 -c 'from sometool import __app__'".

You're doing things wrong because you don't know the tooling. You'd also typically just do `python3 -m sometool`.

Things that are distributed as a single file are either so simplistic and have no other dependencies that you can just make do, or written by someone who doesn't know what they are doing and so you're going to have a bad time.

1. https://github.com/pipxproject/pipx


Most projects use setuptools for package management, which ensures that the environment is as expected.


If you write code that doesn't work in 3.5, you should check the version at startup and exit


Yes, that would be a good idea. But if you ever run scripts you didn’t write, there’s the potential people didn’t do this, and you have the potential for hard to discover bugs. The language should be designed such that bugs are difficult to encounter, this is an instance where it wasn’t.


#!/usr/bin/env python3.7


so now you have to install a specific python version for your script to work?


Only if you're using version specific features.


What happens when python 3.8 comes out? Everybody needs to go into your script to change the hashbang every time a new release comes?


you can just hashbang to python3 which is a symlink to whichever python3.X the user has.

If your code won’t work for older versions you can make an explicit check that the version of python is greater than whatever you need.


Major version changes are for API compatibility. If there is a change which makes all other 3.* incompatible, then it should be a major version increase.


You're describing something like semver, but Python doesn't do semver.


And this change would be fine even if Python did.


Who told you that? I'm looking at PEP 606, it says nothing like what you claim. https://www.python.org/dev/peps/pep-0606/


ECMAScript recently got stable array sorting. That can cause precisely the same kind of backwards-incompatible but difficult to diagnose bug.


TIL

I suppose Python doesn't even have backwards compatibility within the same major release as we saw with the addition of the async keyword in Python3.5. Many older Python 3 packages broke because they expected that to be a legal identifier for a variable name.


tensorflow didn't work on 3.7 for a solid 8 months because some people at google very unwisely decided that `async` and `await` were great choices for variable names, despite PEP492 landing in 2015.


that's because tensorflow is advertisement for Google and while it's technically open-source, it doesn't stand for any kind of community-project, it's all there to show off (and ingrain in its users) the way Google wants things to Go (just look at the byzantine Bazel-build-processes - tensorflow taking hours to build and pytorch about 10minutes...).


my torch builds also take hours.

facebook is just as capable of writing hot garbage, sadly.


Yeah, still it's an order of magnitudes faster...


Python has never tried to maintain backwards compatibility within a major release


You're right, I read the gp too quickly.

But in the case of downgrading, I'm fairly sure there's a number of other breaking changes that can't trivially downgrade minor versions. Like f-strings were only introduced in python3.6 as I recall. Async keyword only exists as of 3.4 as well I think?


I think the argument is that if you run code with f-strings and walruses on Python 3.5, the code will break noisily. Whereas if your code implicitly relies on ordered dicts, it could break silently. Syntax errors rarely cause subtle, hard to track bugs.


Introducing things is different than changing things.


Sure, but you can't safely take everything from a higher version to a lower version in any case; if insertion order became gauranteed due to a bugfix, and wasn't backported, you'd be in the same boat.

The only way to consistently code cross-version is to start with the lowest you plan to support (assuming the higher versions are actually backwards-compatible).

Does any language gaurantee that code is both backwards and forwards compatible?


Issue seems to be silent incorrect behavior, what happens if you attempt to run python code containing f-strings using an older python version. Does it raise an exception? That's good! What happens now if you write code for 3.7 which takes advantage of the new ordering and someone grabs it from your repo and runs it using 3.2, it would happily give incorrect results and noone is the wiser.


If you expect this situation you can assert the language version.


But the whole point is that some developer won’t expect that someone would run their code on an older Python, isn’t it?


Both of these would be syntax errors if you tried to execute them in earlier python versions. This change might break software completely silently.


If you know you're supporting old code, use OrderedDict.

you arguably ought to anyway, for explicitness.


OrderedDict is slow and expensive though: it maintains ordering through a doubly linked list.

It has useful features for manipulating ordering but while I've regularly needed had use for maintaining insertion ordering I can't remember ever needing to move items around within a map.


If memory serves me correctly, ever since dicts became ordered the OrderedDict simply became a subclass of dict, so it will have exactly the same performance characteristics.


OrderedDict was always a subclass of dict maintaining additional information (which is not free, it has to store and manipulate two pointers per entry).

It remains so today, ordereddict is not an alias or trivial facade to the normal dict because it has to maintain its doubly linked list and implement a bunch of additional operations based on that e.g. pop first or move to end.


Yup.

    >>> isinstance(OrderedDict(), dict)
    True


That was already the case in python 2.7 or 3.1.


The trouble is I publish a (new) code that advertises itself as working on 3.x and then it turns out it is being used by a person who only had the version prior to this change.

That said, Go made a similar change (from insertion-order to explicitly-randomized) and world didn’t end. So there’s that.


If your code relies on a minimum python version, you can add `python_requires=">=3.5"` to your setup.py [https://packaging.python.org/guides/distributing-packages-us...] to ensure it's not installed on older releases.

That field itself is kinda new; but if needing to block users with older versions, that shouldn't be an issue.


personally, i just drop a f-string in setup.py and that'll filter out any python for which this issue pertains.


This might not work if you’re distributing a wheel.


Python 3.4 is EOL anyway so there's no need to do this. Anybody running 3.4 is already unsupported.


and 3.5 dies in september. hurray!


I thought Go made the change from undefined behaviour with an underlying implementation that was insertion order in a map with 8 or fewer entries, to similarly undefined behaviour with an implementation that randomised lookups in those cases. Any code that has ever relied on any kind of ordering in Go maps will almost certainly be wrong, even random ordering, because the distribution of the "random" ordering is biased.

See https://medium.com/i0exception/map-iteration-in-go-275abb76f...


FWIW, when Go made that change, it was a much less-widely-used language (smaller blast radius).


Yeah I think this is probably my main issue. I don't think it's reasonable to ask users of your code to always use 3.7+ instead of 3.6 if they are usually expected to be compatible. And it's also unnecessary to break such compatibility for something like preferring dict over OrderedDict anyways. At least I would try to avoid any such issues by still using OrderedDict.

That said, I have no idea about the internals of dict. I assume no performance was sacrificed for this change.


It actually improves performance. Or at least, it comes along with a set of performance improvements that give you ordering for free. Raymond Hettinger has a great talk on it: https://www.youtube.com/watch?v=npw4s1QTmPg&t=1s


Using OrderedDict is actually nice in this case, even if the default dict has the same ordering. That way you're explicitly saying you rely on that behaviour and it makes reading the code easier.


right, so making it implicit is bad design


It's part of the zen of Python: Explicit is better than implicit.


Python is nothing like its guiding principles. That's why it's Zen -- the principles are a collection of contradictory statements, given what you will encounter in real world Python. You're meant to be confused and meditate on it.


Well... full support for 3.6 ended in December of 2018 (now it only has security fixes), the older versions are already unsupported.

Also, this change was implemented 3.6, but in 3.7 they officially documented it as a language feature (i.e. that all other Python implementations also need to preserve the order).


Furthermore, there's a lot more stuff that are not backward compatible after 3.6 or 3.7, and if you're writing a library that targets other versions, I would hope that you have tests for all said versions.


> Code written for Python 3.7 might break on older versions of Python

That's a truism. For all versions of Python. If you use feature of python ver X, you should not be surprised that it doesn't run on versions less than X that lack that feature!!!

If you write a Python library and use feature of Python ver X and don't mark library as only >= Python ver X, you are doing it wrong and a horrible person.


What versions of Python didn't have this behaviour? It was there but just not guaranteed.

I don't see how anything could break (unless there are alternative implementations with different behaviour I'm not aware of?)


> It was there but just not guaranteed.

Dicts have been effectively ordered since 3.6. Iteration order was literally randomised (at process start) before 3.6. I'm also not sure whether the behaviour under deletion was changed between 3.6 and 3.7 so it's possible that there are subtle differences there.


CPython is mot the only Python. Portability is an issue also.


This ja why it is now a language feature. Every python implementation that claims python 3.7 compatability must implement this.

The other why around is also true, code that relies on it must specify that it requires python >= 3.7


Portability isn't affected; if they claim compatibility with python3.7, then they claim their dicts have insertion-ordered keys.

If they claim compatibility with only up to python3.6, they can have whatever order they choose.

The only issue with portability is that I think the main reason it was made a gaurantee is that cpython found the new, presumably optimized, implementation came with insertion order for free, so they went ahead and gauranteed it. But that might not be an optimal strategy in other areas, but they're forced to follow along anyways.

But actually moving cpython code to say ironpython should not be impacted, unless ironpython lies about it's compatibility


That also goes both way: Pypy defaulted to ordered dicts a few years before cpython did.


The insertion order dict implementation actually comes from pypy


The insertion order dict implementation comes from Raymond Hettinger who is amongst other things a core CPython developer. pypy pulled the trigger on using it first (and probably has optimisations CPython doesn't). PHP also used it before CPython did, IIRC. And possibly Rust (as the third-party indexmap).


Do you mean to say the future should be constrained by the past?

I get the whole principal of least surprise, but not at the expense of progress.


> Code that ordered the dictionary keys before iterating are now slightly innefficient due to extra work of sorting a sorted list.

Dicts are ordered, specifically insertion-ordered, not sorted.

Sometimes they'll be sorted:

  D = {i:i for i in range(10)}
... specifically, when you insert them in sorted order. But then you can break the sortedness on your next insert:

  D[-1] = -1
What it allows is parallel iteration:

  zip(D.keys(), D.values())
now being synonymous with

  D.items()
This enables, nay, encourages people to write code that is very subtly broken in 3.5 and below.


The property you mention at the end has been true of dicts since long before 3.5, at least since 2.7 if not forever. See https://docs.python.org/2/library/stdtypes.html#dict.items :

> If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are called with no intervening modifications to the dictionary, the lists will directly correspond. This allows the creation of (value, key) pairs using zip(): pairs = zip(d.values(), d.keys()).


Sorry, yes, I was hasty. Ordered dicts supports such a thing with intervening modifications:

  oldkeys = list(D.keys())
  D[z] = foo(z)
now if you take

  zip(oldkeys, D.values())
then you're guaranteed to iterate over oldkeys, with the proper values associated with those keys -- if z was an oldkey, its value got updated; otherwise, it comes after oldkeys and gets dropped out of zip.

The subtlety of this is what I, and perhaps others, find the most jarring.


Dicts are ordered in certain cases in python 2, and it was relied upon by some users (I saw it in the wild).

Baking this as a language guarantee is the only protection against Hyrum's Law.


This won't break existing code. It will break new, good code that gets back-ported from eg. a 3.7 tutorial to a 3.5 environment, without any syntax errors.

> Changed in version 3.7: LIFO order is now guaranteed. In prior versions, `popitem()` would return an arbitrary key/value pair. [1]

If they added a new `popordereditem()` method, good 3.7 code would use that, and an attempt to run that code on 3.5 would throw a reasonable, useful error message. If they wanted to play it safe like Rust or Go, they'd add the ordered method and make popitem() deprecated or make it artificially use random/arbitrary order so you can't accidentally depend on a new implementation detail and have a test case work.

Also, your case 4 does deserve some protection because while bad code it's hard to test for bad but working code. Implementation-specific or undefined behavior that works is the worst kind of problem, because it's hardest to test against. Compile-time syntax errors are the easiest, runtime errors are testable with a solid test harness, some possible can be identified as warnings with linters, but sloppy code requires manual inspection to detect.

Actually, no, I take that back: Implementation-specific or undefined behavior that works sometimes! is the worst kind of problem. That's what this code enables; if your test case is `dict({'one': True, 'two': True})` and you have a unit test where you popitem() to assert that you get 'two' and then 'one' it will pass the test harness on 3.7, it will pass on 3.6 because of implementation-specific behavior, and it will pass on 3.5 because the hash map arbitrarily puts those particular items in order. But it will silently get it wrong when you pass in user-supplied data. Shudder.

[1]: https://docs.python.org/3/library/stdtypes.html#dict.popitem


> Code that ordered the dictionary keys before iterating are now slightly innefficient due to extra work of sorting a sorted list.

I think TimSort is extremely efficient for presorted lists, so even that isn't a major impediment.


I'm of two minds. In principle, no, I don't love the idea of a change like this happening on a minor version.

But, from a pragmatic angle, it strikes me as a genius move. As the article said, this was a natural by-product of a performance enhancement that was made in 3.6. In principle, that was fine, because there was officially no predictable ordering, so a change to how it was being ordered in practice shouldn't materially affect anyone. And nobody really paid much notice to it then.

And that's where the danger lies - there's risk there, because, as you said, someone who's used to Python 3.6 might have problems switching to 3.5. And that's probably a greater risk if they left it out of the spec, because people would have continued to not pay it much notice. By making it official in 3.7, though, they've sent out a message that the Internet hate machine has ensured that everyone will hear. That probably, in practice, actually reduces the risk.

And, while many understandably see this as violating the spirit of semver, it unambiguously does not break the letter: Minor changes are only supposed to avoid breaking the software's compatibility with code written against an old version. There was never any rule saying that old versions have to be compatible with code written against the new version.


Python (the language) doesn't follow semver anyway.


Go purposely "randomizes" order of a map when it's iterated over, which you can see running this demo:

https://play.golang.org/p/DISpyv0Zuq_j

HN discussed this a little 6 years ago: https://news.ycombinator.com/item?id=7655948

As crawshaw pointed in that discussion, it helps catch people inadvertently relying on map order in tests or other places in their code.


More important (in my opinion), adding a random seed to your hash prevents collision attacks, where an attacker tries to ruin your performance by sending many values (e.g. HTTP params) that would hash to the same bucket.


Many (most?) programming language runtimes now do this, especially the web-focused ones. But it's completely orthogonal to insertion-ordered iteration (which is implemented by a list).


That surprised me, when I first saw it. If someone takes a minute to read the docs (e.g. Python), one would never build something knowing that order is not guaranteed (e.g. order in dictionaries). But seemingly people do.


Had some code break while porting from 2 to 3 because it assumed dict order and it was part of the official Python build system, the old spark parser for the AST module.


My main issue with it is that if someone figures out an even better way to do dictionaries but it doesn't preserve insertion order, then that technique can't be used. But I certainly wouldn't call it a stupid decision, the problem is that once a behaviour is observable then people absolutely will start relying on it.


All they have to do is give it a different name, such as unordered_dict.


I might have preferred odict be added and dict left alone, so that you didn't have to import OrderedDict when really you just want the new dict. If dict and odict mapped to the same thing for a while, fine, but dict could diverge again if desired.


The problem is that when you behave a certain predictable and deterministic way users will leverage that even if it's not specified.

That is actually why the Python maintainers decided to make this behaviour official: the ordering was the consequence of changes in implementation details, but over 3.6's lifecycle they feared it would cause compatibility issues as users would start relying on the ordering properties of CPython and that would be an issue for alternate implementations (though pypy had switched to the same implementation even earlier so was also insertion-ordered even when running in Python 2.7 mode).

Providing stronger guarantees was considered useful and unlikely to be severely detrimental in the long run, so the project decided to make it part of the language spec.


"users will leverage that even if it's not specified" -- reminds me of the classic xkcd, "Workflow": https://xkcd.com/1172


An xkcd in the thousands is now considered "classic"? Man I'm getting old.


Which means no one gets it for free when it is added to the language.


You could add an ordered dict by a different name and also have dict be the more efficient ordered version so that people get the efficiency gain for free but don't have to worry and forward compatibility.


I think this is a key point. Since they advertised this, it became a commitment for all future versions. Sure they can break again in 4.0


Maintaining insertion order is strictly a product of adding an additional list to whatever more efficient map you implement. Add something to a dict, push it onto the end of the list. When iterating over the keys, use the list instead of the dict structure.

It does come at a cost, but I think Python aims for ease of use over runtime speed and memory efficiency, so it seems perfect for them.


> I would consider changing how a builtin works to be a major breaking change.

This one is unusual because it won't break old code being brought forward, only the other way around, and theoretically there are lots of things going the other direction that would break (though most of them are explicit, not silent).


In other words, it breaks backwards compatibility, but not forward compatibility.


That's not what people call backward compatibility.

Backward compatibility is the ability to run old code with new interpreters. This is not broken here.

What's broken is the ability to run new code on old interpreters. But this is already broken at every python update (new operators, methods, syntactic sugar..). We could call it reverse backward compatibility.


I suppose the meaning of the term depends on what we consider to be the subject of the compatibility.

- backwards compatible code: new code can run on an old interpreter

- backwards compatible interpreter: old code can run on a new interpreter

EDIT: After some thought, you're right. The second description is the reasonable interpretation.

"python 3.7 is backwards compatible with python 3.6"


This confusion reminds me of a scene from the 2002 film of Dave Barry's "Big Trouble".

(Driving to an airport) "Okay, we gotta pick a road. Arrivals or departures? We're arriving, but then we're departing."

https://en.wikiquote.org/wiki/Big_Trouble

https://www.youtube.com/watch?v=EjHHzZAUxbM&t=300


No, this change is backward compatible but not forward compatible. Old code (written for old interpreter) works in new interpreter; but new code (written for new interpreter) is broken in old interpreter.


That is an original line of thinking in the world of software, “Let’s not add a feature to a new version because it wouldn’t work if used in an older version”.


It would silently introduce bugs, that's the whole issue.


Right. In fact, if it explicitly broke something there'd be less of an issue. Instead, it will appear to work/interpret but you'll get unreliable results. That makes the bug more nebulous.


I think you are giving it bigger weight than it deserves. Writing Python code I would not want to depend on internal order of dict elements without a very good reason. If I am writing some low level code that could benefit from it, maybe I would use this feature, but I would then encapsulate it and insert a version check to complain.

This is like many evaluation guarantees in latest C++-NN standards. Good to know, might be useful for performance tuning of automatic code generators, etc. , but irrelevant most of the time. My 2c.


You can't walk Python code backwards anyway without already considering the minor version differences. eg. f'strings'.


At least f"strings" are an explicit failure case and are obvious when you run unit tests / coverage


Not only f strings, but any change really.. New operators, new methods on standard library objects... Python never guaranteed compatibility of new code with old interpreters.


If you rely on this feature, surely you know it’s new to 3.7 and document your project accordingly...? It’s not like code magically appears, somebody has to write it.


3.5 is EOL soon, and it's already uncommon to write code using a new interpreter that targets an older interpreter - there's more than just ordered dicts that you'll be missing if you do so.

People will just do what they've always done - they'll be aware of the differences, or they'll test.

There's nowhere I have seen that mentions dicts are ordered without mentioning in the same paragraph the version since which this has been the case, so anyone who knows they're ordered will be aware.

Tldr its fine


Code that assumes no order cannot be broken by a specific order unless it was setup to assume everything but that specific order in which case it was already broken, but only randomly showing it’s broken nature.


It's not breaking, it was implementation defined, now it's guaranteed order.

If a new guarantee or behaviour were a breaking change, every non-patch release of a semversioned tool (which python isn't) would be major.


> It's not breaking, it was implementation defined

And randomised at interpreter start, since Python 3.3.


> It's not breaking

And that's sort of the problem. If it was broken, you'd know it and you'd fix/rearchitect it. Instead, it will appear to work.


Sort of. You'd only know based on the interpreter you were using. You wouldn't know unless you tested other Python interpreters.

For example, if you wrote something in pypy it would be ordered versus cpython.


Actualy, dict are already ordered since a long time(since python 3 I think), but it wasn't certified. It was just the result of a new implementation of dict structure.

I can't imagine how it can break something. In which case can you have an advantage to have an unordered list? Biggest downside can be about perf, but I don't think it's the case here.


python 3.6 is when they reworked the dictionary and had the side effect of being ordered.


For anyone who came to Unit Testing with the XP era and happened to use Java, Java 5 introduced changes to hash table ordering that somehow resulted in Junit 3.0 running some tests in a different order.

It's important to discover coupling between your tests, this just isn't the way most people want to do it.


I don't see exactly how one would rely on dicts not being sorted, as it was already answered by another comments.

And beyond this, even if maintaining backward compatibility is important, I don't think it should prevent software from being improved, and dicts being ordered is pretty awesome. I'm pretty sure relying on unsorted dicts might involve pretty hacky code.

I recently stopped writing a z-order test application because I remembered that dicts are not sorted, so it made my goal a little pointless. Meanwhile C++'s std::map is sorted. I think there was already sorted dict in python, but I don't remember.

EDIT: I'm wrong, I'm mixing "ordered" and "sorted"


I agree, but more from the point of view on what if a new unordered implement if discovered 10 years from now that’s better? Would a new type have to be added?


I think you bring up a good point. If someone hands me a python script and told me it was 3.7 I would only hear 3 and expect it to run or fail, not to run with differing outputs!

Dot-upgrades do allow for breaking changes, but this one can break silently and I’m not sure where such breakages neither belongs in python nor where they should belong.

Personally I’m happy to get ordered ducts sooner rather than later.


I think if you are a library developer targeting different versions of Python, you generally know what features you can and can't use from new versions of Python (ie not many), and ideally you are also testing with each version in CI.


No, I agree, it's stupid. To me it feels like a kind of conceptual backwash from JS. The whole P3 fiasco looks really silly from far enough away.

It's like, you have a great marriage with everything working out and some fine children who are doing great too, but you get a divorce anyway to pursue your dreams of being a conceptual artist.

(BTW, in case anyone is keeping track, I was going to maintain P2 but got caught up in Prolog, of all things, and now Python looks just as crude as everything else and I don't have the requisite love anymore to shoulder the work. I might make a Python 2 interpreter in Prolog though! That would be fun.)


By that logic, any change would be considered "breaking" because people will expect that the change also exists in older versions.


This was my initial thought as well. Seems like people are celebrating, but this is going to be a source of difficult to figure out bugs.


PHP has had this undocumented feature since forever and so did all mainstream Javascript engines.

Big whoop


> Javascript

Javascript doesn't specify performance characteristics of objects and arrays, so even with implementations, one object could be a hashtable, another a balanced tree.


The implementation doesn't matter to EGreg's point, which is that ES standardized that iterating over object keys are generally required to return keys in insertion order.

In the past (from before this standardization) Chrome had in fact changed the object iteration order due to an optimization, and had to revert it after lots of complaining on their bug tracker.

(To be precise, the spec still requires array index properties to be returned ahead of other properties regardless of insertion order. The behavior that Chrome "reverted" to is this new one, so not exactly the same as the original behavior.)


This is an amazing contribution to the language. A mixture of speed and convenience, probably made by volunteers. As for people criticizing a change to what use to be a non-deterministic ordering of a dict iteration; I don't know what to say to them, other than, are you serious? There are people out there who are working for us, they work for free and they did some heavy lifting to give us this. They might read what you wrote and think, "Why bother? Maybe I should spend my weekends playing with my kids instead."


> As for people criticizing a change to what use to be a non-deterministic ordering of a dict iteration; I don't know what to say to them, other than, are you serious? There are people out there who are working for us, they work for free and they did some heavy lifting to give us this. They might read what you wrote and think, "Why bother? Maybe I should spend my weekends playing with my kids instead."

I don't agree with the complaints about ordering ducts, but this is a ludicrous response. Appreciation of volunteers' work on Python isn't diminished by criticism of language features or changes. In fact, it enhances it: if people have opinions about the project you work on, it's a good sign that it's important, significant work. I've contributed to OSS projects before, and the ones in use by more than just my friends and I were naturally the ones that felt the most meaningful.


As it’s widely known, more often than not “criticism” of open source software quickly devolves into hate and toxicity. Helpful criticism is great but be careful with defending the “criticism culture” around foss, it’s often angry unhappy people that want it all for free on a golden platter, and yesterday.


Sure, hate and toxicity themselves should be called out. But that's not what's happening here, and not what GP comment was referring to. It doesn't make any sense to throw out the baby of legitimate criticism with the bathwater of toxicity.


That does not matter, nothing is above criticism.


It does matter. If people get hate for doing work on open source they will stop doing work on open source. Yes you can be critical of everything, but the hate that some people spew toward certain open projects is not only nauseating but actively damaging to the community.


As far as I know, it was implemented by Raymond Hettinger. This is a very interesting talk about the new tech:

https://www.youtube.com/watch?v=p33CVV29OG8&t=1202s


Indeed, here is the mail from Raymond Hettinger explaining the new design on the python-dev list: https://mail.python.org/pipermail/python-dev/2012-December/1... Quite amazing that it is faster, more memory efficient and convenient for the end user.


Raymond came up with the idea, PyPy implemented it, and then INADA Naoki implemented it for CPython.


If I remember correctly his talk. He went to CPython first and it was rejected, but PyPy welcomed it.


It was proposed by Raymond Hettinger, but was actually first implemented by PHP and Pypy.


Other languages don't generally have special order guarantees about standard maps. This seems very idiosyncratic.


For almost all intents and purposes, object keys have been ordered in JavaScript since ES2015. Map has always been ordered.

https://www.stefanjudis.com/today-i-learned/property-order-i...


Users of JSON, probably the most common data interchange format on the planet, frequently have implicit requirements about key ordering. It is highly convenient to be able to parse a JSON string into a native Python data structure, add a field, emit it back, and preserve the ordering.


in what reasonable use case would the order of the properties on an object matter? I can't think of one


when you are diffing the serialized output?


From what I recall of the JSON standard itself, there's no guarantee about key ordering being significant. If you're diffing serialized output to compare two JSON objects you need to be serializing it in a consistent format, otherwise even whitespace is going to throw you off.


It's significant to a human that wants to know what has changed.


If a human is inspecting serialized JSON using pen and paper, the human is presumably clever enough to match up key for key regardless of ordering.

If the human is using a computer to compare two JSON payloads (as the use of a diffing algorithm suggests), the human and computer should be clever enough as a team to realize that they could just deserialize and reserialize each JSON payload such that the keys were lexicographically sorted and the data was pretty-printed in the exact same way before running it through the diffing algorithm. `jq -Sc` would do the trick.


Most diff programs don't have what you describe. And in a lot of cases you don't have the easy ability to "do stuff" before running the input through a diffing algorithm.


> Most diff programs don't have what you describe.

That’s why pipes were invented.

> And in a lot of cases you don't have the easy ability to "do stuff" before running the input through a diffing algorithm.

Well, in that case you’re not going to be able to meaningfully compare two JSON payloads because neither order nor white space have any semantic meaning in JSON. I’m really curious what you’re talking about though, since if you’re working on the shell you can easily use jq to do as I describe.


So you can sort the keys at serialization time rather than paying a performance penalty all the time?


I tracked down a bug once where someone was hashing state that happened to include Python dicts serialized to JSON, which caused different hash values for the same entity. I know, I know, insert tears of blood emoji, and any engineer worth his salt wouldn't be so sloppy. But you don't always get to choose to work only with top talent that understands why you should design things elegantly as a baseline[1]. In these cases, removing implicit gotchas like "the same code has the same behavior" (ie iteration over the same dict) is valuable.

[1] Well you _can_ choose to do so, and I recently have, but it's not without its costs.


I can't speak for all "other" languages but Ruby made the change years ago from 1.8 to 1.9. Ruby calls them hashes but they're standard maps. Obviously Ruby and Python are very similar but I think the point stands.


Golang map iteration is returned in random order specifically to make sure that people don't rely on the order.

I think this feature says a lot about the philosophy of Python vs Go.


Sounds like a fast and idiomatic way to shuffle a deck of cards is then to convert to a map and back.


Convert a deck of cards ([]Card?) to a map (map[Card]bool?) and back just to shuffle? That's unlikely to be faster or more idiomatic than a straightforward implementation of the Fisher-Yates shuffle[1]. Try writing the code to do it both ways and compare.

[1]: https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle


I think "idiomatic" would be using rand.Perm, which implements the Fisher%E2%80%93Yates shuffle. But aside from whether it's idiomatic, converting to a map isn't random enough.


With Golang 1.11, I get orderings like this. S is spades, H is hearts, D is diamonds, and C is clubs, because HN eats the Unicode characters I was actually using.

     S9  SJ  SK  HQ  D9  S4  S5 S10  HJ  S8  H6  DA  D2
     D4  DQ  C6  C8  CJ  H3  H9  DK  C3  CQ  SA  S2  S3
     S6  SQ  H4 H10  D8 D10  C4  CK  S7  H7  D5  D6  D7
     C7  H2  HK  D3  DJ  C2  C5  C9  HA  H5  H8  CA C10
On average you would expect about one card in a shuffled deck to be (cyclically) followed in the deck by its succeeding card, and on about one out of 50 shuffles, that card would be followed by its successor. Here we have S4 S5, DA D2, SA S2 S3, and D5 D6 D7.

This is very strong evidence of bias.

I'm interested to hear if my Golang can be made more idiomatic, or if there are bugs in it: http://canonical.org/~kragen/sw/dev3/mapshuffle.go

In particular it seems like there ought to be a less verbose way to express the equivalent of the Python list({card: True for card in deck}) in Golang.


No, it isn't random enough for that.


IIRC it used to just start iteration at a pseudorandom index and then iterate normally. I looked at it couple years ago, don't know if they changed it.


It seems to do that, but I think it also tweaks the hash function for each newly created map, because in the code linked from https://news.ycombinator.com/item?id=22278753 I'm not getting rotations of the same iteration order when I generate two maps. There doesn't seem to be any randomness in mapiternext() itself.

You could imagine that the hash function itself might do an adequate job of randomizing the order of the cards, though, especially if salted with a per-map salt. SipHash, for example, would probably not have any detectable biases in the distribution of the permutations thus produced. But whatever hash function Golang is using for my structs has an easily visible bias, as described in that comment.


What does it say? That Python is willing to spend more computation time for simplicity, without waiting for the programmer to ask for it? That Python will make semantic changes in minor version bumps of the language?


The new-ish ordered dictionary is actually faster. It is also completely backward compatible for any but the most contrived example.

Now wasting computation on randomizing... that is a problem.


The ordering property is a side effect of the new and more cpu/memory efficient data structure. It would be very surprising to say no to a 2x performance jump to avoid the (sometimes useful) ordering.


Tcl's dicts are ordered.


Java added LinkedHashMap a while ago. I tend to default to it except for small temporary use cases where order absolutely won't matter or have an outside impact...


LinkedHashMap is not Java's "standard maps" though. If you tend to use it by default it's really a personal idiosyncrasy, especially as LLHM tend to be larger and slower than regular hashmaps due to having to maintain a doubly linked list.

Python has had one such in the standard library for a decade or so.


LHM is not slower for iteration (it's faster actually).

LHM indeed pays 2 references per node but they are well worth as it has deterministic ordering/iteration and I have witnessed numerous cases with HashMap that show up in production only due to iteration order (esp after rehashing). The code is broken but the cases did not reproduce during testing...

Now if the two extra references are an issue, consider that HashMap is quite space inefficient with having a dedicated node per each entry - that's the main price. The (simple) node memory footprint is 36bytes on heaps less than 32GB, i.e. compact pointers. The extra references add another 8 bytes for having an insert or access order.

If the goal is getting a compact low memory footprint, HashMap is not fit for purpose. Overall it's the jack of all trades and even got reworked (java8) to support tri-based collision resolution for keys that implement Comparable.

Couple years back I wrote CompactHashMap (under CC0) that has an average cost of 10bytes per entry and it's extremely compact for small sizes with having only 2 fields on its right own, so even small/empty maps are tiny. In microbenchmarks (same used in openjdk) it handily (2x) beats java.util.HashMap on "Traverse key or value", get/put have similar performance, and "Search Absent" is worse.

The point is: LHM should be the go-to hashmap for java as being node based hashmap is justified (unlike java.util.HashMap)


Yep. Also, LinkedHashMap maintains ordering based on insertion order. So, iff you insert your data ordered how you want it, it's behaving as an ordered map. If you want true, self-reordering map, what you want is a TreeMap. Beauty with that one is that you have full control over defining the custom ordering function, because we're not always indexing a map by primitives or autoboxed type. I try to use it sparingly though as you're paying log(n) on pretty much all map operations.

While we're here, another tidbit that's often overlooked in the Java collections: If you reallly care about iteration performance, your data is without nulls, your data is already ordered how you like it or you don't care about ordering, your qty items >= 10, and you don't need random access, then ArrayDeque is gonna be your horse because of how much better it co-locates its contents in memory and how much less overhead is required to maintain it during each operation compared to all the other List implementations, including ArrayList and LinkedList.


In regards to sibling replies, does it feel to anyone else like everything (except golang) is converging towards PHP's array() ?


Interestingly enough here it was kinda but kinda not the other way around: historically PHP used a closed-addressing hash map and threaded a doubly linked list through it to maintain the insertion order.

But the dict TFA talks about doesn't use a doubly linked list, or closed addressing, its ordering is a side-effect of its implementation but not originally a core goal (memory saving and iteration speed were). It'd probably been proposed by others before but it came to wider attention after Raymond Hettinger (a core dev) proposed it a few years back[0]. PHP actually released it first[1], closely followed by pypy[2]. CPython only got around to it some time later[3]

[0] https://mail.python.org/pipermail/python-dev/2012-December/1...

[1] https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implem...

[2] https://morepypy.blogspot.com/2015/01/faster-more-memory-eff...

[3] https://docs.python.org/3/whatsnew/3.6.html?highlight=3.6#wh...


JavaScript Maps are iterated over in insertion order.


std::map in C++ stores keys in order. You have to use std::unordered_map to not get that behavior.


Yes, but that’s different kind of order. Python dicts order is insertion order, while std::map is key order.


People who bother to complain are those who actually care about your thing. People who do not care simply leave without ever telling you why. Your complainers are often your most dedicated and invested users.


Though in this specific case, my sense has been that your complainers are often your most dedicated and invested users of a version of the language that was officially retired 38 days ago.

Python is just bizarrely political. I suspect that it's now cursed for all time to have every future PEP become a battle in a proxy war over PEP 3000.


Sometimes they're just concern trolls though.


Especially with the (lack of) change to sets I'm interested to benchmark some regular things before and after the change.

    set(dict.keys())
    set(dict.values())
    thing1 = dict(...)
    thing2 = copy.deepcopy(thing1)
I feel like there could be some other testcases. I'm wholly in support of the change (regardless of benchmarks) but depending on the results I could see some arguments against.


1. sets don't use the new dict's implementation

2. IIRC the new dicts are no(t significantly) slower than the old dicts, however they use less memory, and iterate faster

The iteration order is actually a side-effect of implementation details, the original goals were a more compact representation and a faster iteration: https://mail.python.org/pipermail/python-dev/2012-December/1...


> 1. sets don't use the new dict's implementation

Yes, of course. That's why I'm wondering if turning dict keys into a set is slower now.


I was under the impression python dict ordering was deterministic, just not in an order recognisable by humans, i.e. ordered by hashes. Was this not the case?


From the docs: "CPython implementation detail: Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions."

So I would claim that equates to non-deterministic


I read that as: if two dicts are constructed in the same way and use the same python implementation then they'd be the same. If you change the way the dicts were generated or the underlying implementation then they're not the same.

To me, nondeterministic is when the input does not change but the output does. The docs essentially say to not rely on the order, not that the result is not reproducible.


CPython hashes are intentionally non-deterministic, so even the same exact Python install will produce different hash orderings on the same code, if you run it twice (although you can make it deterministic by using PYTHONHASHSEED).


I’d call undependable but maybe it’s the same thing in practice?


That's generally what people mean when they say "nondeterministic" in the context of computing. Yeah, in philosophy it generally means something like "the future is not completely determined by the past," but in computing it means something closer to "the programmer cannot reasonably determine the behavior and thus should not depend on specific behavior."


In computing it means a given set of inputs lead to a given set of outputs. It has nothing to do with how difficult it is for a programmer to reason about. Deterministic builds, deterministic tests, etc.


But what counts as "input" will vary based on who you ask and under what context.


> Was this not the case?

From the origin to Python 3.2, iteration order was arbitrary but deterministic (in CPython, it was not guaranteed to be determinstic in other implementations)

From Python 3.3 to Python 3.5, iteration order was non-determinstic (across runs, it was deterministic per-process instance)

In Python 3.6 it's unspecified but deterministic and follows insertion order (in CPython)

In Python 3.7, Python 3.6's behaviour was specified and documented


I don't have any criticism if the performance of this new dict stays the same as earlier and that's a huge if. Considering python3 is already a crawling language compared to peers like java and php, I think devs should focus more on that before handing out features.


> I don't have any criticism if the performance of this new dict stays the same as earlier and that's a huge if.

1. it's been there since 2016, the new dict was released as part of Python 3.6 (though the ordering was only an implementation detail until 3.7)

2. the new dict is much more compact (20~25%) and should have much faster iteration speed, those were actually the original goals, ordered iteration was a side-effect

3. the new dict should not benchmark significantly slower across the board, it wouldn't have been merged if it were a regression


Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: