Hacker News new | past | comments | ask | show | jobs | submit login

I'm not clear on how this break existing code.

Code that assumed it was arbitrary, would expect to handle any arbitrary order, including a happens-to-be sorted order.

Code that assumed it was random, like actually inserted by random(), was already broken, because that simply isn't the case.

Code that assumed the order would stay constant was relying on implementation-specific behavior, and could potentially break on any version update; as with any reliance on implementation-specific behavior, you'd break if the dictionary code ever got touched -- even if it were for a bugfix.

Code that ordered the dictionary keys before iterating are now slightly innefficient due to extra work of sorting a sorted list.

It doesn’t break existing code. Code written for Python 3.7 might break on older versions of Python

What you describe is forward compatibility, and Python (and most other programming language) doesn't have it.

Usually languages don’t have it in a very explicit way though. If I try to use f-strings in python 3.5 I get a very explicit syntax error. If I rely on ordered insertions in Python3.5 I get a potentially difficult to diagnose bug.

Why does it matter if it's explicit or not. If python doesn't support forward compatibility, you should know that if you write code for 3.7, it's not gonna work in 3.5. Doesn't seem like a big deal to me.

If you’re the only one writing it and you’re the only one running it, it’s probably fine. But if I’m putting a file out there that will only work in 3.7, it’d be nice if any potential users of that file would get a good error message if they try to run it on 3.5, rather than wrong results.

I could potentially assert a version, but do I really want to do that each time I wrote something that might be used somewhere else?

And yes of course I could add a line of documentation, but there is a 100% chance I’d still get bug reports from people on 3.5.

Setuptools solves this, add a python version specifier to your setup.py or pyproject.toml file.

If you are just distributing raw python files then congratulations you’ve just realised why packaging is valuable.

If I rely on a python version and I expect other people to use it, I add a version if statement on top. I hate those packaging tools that insist on installing stuff in your system and create a frankendebian when really all I want to do is run a single py file standalone once. Often have to do chenanigans like "python3 -c 'from sometool import __app__'".

If you want to install it, go ahead and copy or symlink it in your ~/bin or whatever you fancy (that's your personal preference anyway unless I'd specifically package it for some OS like Debian). I don't want to have to use some setup.py that I have no clue where in my OS it installs things.

Yes, a failure to understand how your tools work or how to use them effectively does indeed make things harder.

Well I know how my tools work, I don't know how this custom file works that is duplicated and delivered with each project.

> I don't know how this custom file works that is duplicated and delivered with each project.

It's not duplicated and in most cases it's not even delivered as part of the installation.

> If you want to install it, go ahead and copy or symlink it in your ~/bin or whatever you fancy

That's exactly what pip will do if invoked with `--user`.

> I don't want to have to use some setup.py that I have no clue where in my OS it installs things.

It installs it to a single place. Run `python3 -m site` and look at `USER_BASE`.

To avoid a lot of this, use pipx[1] to keep things even more isolated.

> Often have to do chenanigans like "python3 -c 'from sometool import __app__'".

You're doing things wrong because you don't know the tooling. You'd also typically just do `python3 -m sometool`.

Things that are distributed as a single file are either so simplistic and have no other dependencies that you can just make do, or written by someone who doesn't know what they are doing and so you're going to have a bad time.

1. https://github.com/pipxproject/pipx

Most projects use setuptools for package management, which ensures that the environment is as expected.

If you write code that doesn't work in 3.5, you should check the version at startup and exit

Yes, that would be a good idea. But if you ever run scripts you didn’t write, there’s the potential people didn’t do this, and you have the potential for hard to discover bugs. The language should be designed such that bugs are difficult to encounter, this is an instance where it wasn’t.

#!/usr/bin/env python3.7

so now you have to install a specific python version for your script to work?

Only if you're using version specific features.

What happens when python 3.8 comes out? Everybody needs to go into your script to change the hashbang every time a new release comes?

you can just hashbang to python3 which is a symlink to whichever python3.X the user has.

If your code won’t work for older versions you can make an explicit check that the version of python is greater than whatever you need.

Major version changes are for API compatibility. If there is a change which makes all other 3.* incompatible, then it should be a major version increase.

You're describing something like semver, but Python doesn't do semver.

And this change would be fine even if Python did.

Who told you that? I'm looking at PEP 606, it says nothing like what you claim. https://www.python.org/dev/peps/pep-0606/

ECMAScript recently got stable array sorting. That can cause precisely the same kind of backwards-incompatible but difficult to diagnose bug.


I suppose Python doesn't even have backwards compatibility within the same major release as we saw with the addition of the async keyword in Python3.5. Many older Python 3 packages broke because they expected that to be a legal identifier for a variable name.

tensorflow didn't work on 3.7 for a solid 8 months because some people at google very unwisely decided that `async` and `await` were great choices for variable names, despite PEP492 landing in 2015.

that's because tensorflow is advertisement for Google and while it's technically open-source, it doesn't stand for any kind of community-project, it's all there to show off (and ingrain in its users) the way Google wants things to Go (just look at the byzantine Bazel-build-processes - tensorflow taking hours to build and pytorch about 10minutes...).

my torch builds also take hours.

facebook is just as capable of writing hot garbage, sadly.

Yeah, still it's an order of magnitudes faster...

Python has never tried to maintain backwards compatibility within a major release

You're right, I read the gp too quickly.

But in the case of downgrading, I'm fairly sure there's a number of other breaking changes that can't trivially downgrade minor versions. Like f-strings were only introduced in python3.6 as I recall. Async keyword only exists as of 3.4 as well I think?

I think the argument is that if you run code with f-strings and walruses on Python 3.5, the code will break noisily. Whereas if your code implicitly relies on ordered dicts, it could break silently. Syntax errors rarely cause subtle, hard to track bugs.

Introducing things is different than changing things.

Sure, but you can't safely take everything from a higher version to a lower version in any case; if insertion order became gauranteed due to a bugfix, and wasn't backported, you'd be in the same boat.

The only way to consistently code cross-version is to start with the lowest you plan to support (assuming the higher versions are actually backwards-compatible).

Does any language gaurantee that code is both backwards and forwards compatible?

Issue seems to be silent incorrect behavior, what happens if you attempt to run python code containing f-strings using an older python version. Does it raise an exception? That's good! What happens now if you write code for 3.7 which takes advantage of the new ordering and someone grabs it from your repo and runs it using 3.2, it would happily give incorrect results and noone is the wiser.

If you expect this situation you can assert the language version.

But the whole point is that some developer won’t expect that someone would run their code on an older Python, isn’t it?

Both of these would be syntax errors if you tried to execute them in earlier python versions. This change might break software completely silently.

If you know you're supporting old code, use OrderedDict.

you arguably ought to anyway, for explicitness.

OrderedDict is slow and expensive though: it maintains ordering through a doubly linked list.

It has useful features for manipulating ordering but while I've regularly needed had use for maintaining insertion ordering I can't remember ever needing to move items around within a map.

If memory serves me correctly, ever since dicts became ordered the OrderedDict simply became a subclass of dict, so it will have exactly the same performance characteristics.

OrderedDict was always a subclass of dict maintaining additional information (which is not free, it has to store and manipulate two pointers per entry).

It remains so today, ordereddict is not an alias or trivial facade to the normal dict because it has to maintain its doubly linked list and implement a bunch of additional operations based on that e.g. pop first or move to end.


    >>> isinstance(OrderedDict(), dict)

That was already the case in python 2.7 or 3.1.

The trouble is I publish a (new) code that advertises itself as working on 3.x and then it turns out it is being used by a person who only had the version prior to this change.

That said, Go made a similar change (from insertion-order to explicitly-randomized) and world didn’t end. So there’s that.

If your code relies on a minimum python version, you can add `python_requires=">=3.5"` to your setup.py [https://packaging.python.org/guides/distributing-packages-us...] to ensure it's not installed on older releases.

That field itself is kinda new; but if needing to block users with older versions, that shouldn't be an issue.

personally, i just drop a f-string in setup.py and that'll filter out any python for which this issue pertains.

This might not work if you’re distributing a wheel.

Python 3.4 is EOL anyway so there's no need to do this. Anybody running 3.4 is already unsupported.

and 3.5 dies in september. hurray!

I thought Go made the change from undefined behaviour with an underlying implementation that was insertion order in a map with 8 or fewer entries, to similarly undefined behaviour with an implementation that randomised lookups in those cases. Any code that has ever relied on any kind of ordering in Go maps will almost certainly be wrong, even random ordering, because the distribution of the "random" ordering is biased.

See https://medium.com/i0exception/map-iteration-in-go-275abb76f...

FWIW, when Go made that change, it was a much less-widely-used language (smaller blast radius).

Yeah I think this is probably my main issue. I don't think it's reasonable to ask users of your code to always use 3.7+ instead of 3.6 if they are usually expected to be compatible. And it's also unnecessary to break such compatibility for something like preferring dict over OrderedDict anyways. At least I would try to avoid any such issues by still using OrderedDict.

That said, I have no idea about the internals of dict. I assume no performance was sacrificed for this change.

It actually improves performance. Or at least, it comes along with a set of performance improvements that give you ordering for free. Raymond Hettinger has a great talk on it: https://www.youtube.com/watch?v=npw4s1QTmPg&t=1s

Using OrderedDict is actually nice in this case, even if the default dict has the same ordering. That way you're explicitly saying you rely on that behaviour and it makes reading the code easier.

right, so making it implicit is bad design

It's part of the zen of Python: Explicit is better than implicit.

Python is nothing like its guiding principles. That's why it's Zen -- the principles are a collection of contradictory statements, given what you will encounter in real world Python. You're meant to be confused and meditate on it.

Well... full support for 3.6 ended in December of 2018 (now it only has security fixes), the older versions are already unsupported.

Also, this change was implemented 3.6, but in 3.7 they officially documented it as a language feature (i.e. that all other Python implementations also need to preserve the order).

Furthermore, there's a lot more stuff that are not backward compatible after 3.6 or 3.7, and if you're writing a library that targets other versions, I would hope that you have tests for all said versions.

> Code written for Python 3.7 might break on older versions of Python

That's a truism. For all versions of Python. If you use feature of python ver X, you should not be surprised that it doesn't run on versions less than X that lack that feature!!!

If you write a Python library and use feature of Python ver X and don't mark library as only >= Python ver X, you are doing it wrong and a horrible person.

What versions of Python didn't have this behaviour? It was there but just not guaranteed.

I don't see how anything could break (unless there are alternative implementations with different behaviour I'm not aware of?)

> It was there but just not guaranteed.

Dicts have been effectively ordered since 3.6. Iteration order was literally randomised (at process start) before 3.6. I'm also not sure whether the behaviour under deletion was changed between 3.6 and 3.7 so it's possible that there are subtle differences there.

Do you mean to say the future should be constrained by the past?

I get the whole principal of least surprise, but not at the expense of progress.

CPython is mot the only Python. Portability is an issue also.

This ja why it is now a language feature. Every python implementation that claims python 3.7 compatability must implement this.

The other why around is also true, code that relies on it must specify that it requires python >= 3.7

Portability isn't affected; if they claim compatibility with python3.7, then they claim their dicts have insertion-ordered keys.

If they claim compatibility with only up to python3.6, they can have whatever order they choose.

The only issue with portability is that I think the main reason it was made a gaurantee is that cpython found the new, presumably optimized, implementation came with insertion order for free, so they went ahead and gauranteed it. But that might not be an optimal strategy in other areas, but they're forced to follow along anyways.

But actually moving cpython code to say ironpython should not be impacted, unless ironpython lies about it's compatibility

That also goes both way: Pypy defaulted to ordered dicts a few years before cpython did.

The insertion order dict implementation actually comes from pypy

The insertion order dict implementation comes from Raymond Hettinger who is amongst other things a core CPython developer. pypy pulled the trigger on using it first (and probably has optimisations CPython doesn't). PHP also used it before CPython did, IIRC. And possibly Rust (as the third-party indexmap).

> Code that ordered the dictionary keys before iterating are now slightly innefficient due to extra work of sorting a sorted list.

Dicts are ordered, specifically insertion-ordered, not sorted.

Sometimes they'll be sorted:

  D = {i:i for i in range(10)}
... specifically, when you insert them in sorted order. But then you can break the sortedness on your next insert:

  D[-1] = -1
What it allows is parallel iteration:

  zip(D.keys(), D.values())
now being synonymous with

This enables, nay, encourages people to write code that is very subtly broken in 3.5 and below.

The property you mention at the end has been true of dicts since long before 3.5, at least since 2.7 if not forever. See https://docs.python.org/2/library/stdtypes.html#dict.items :

> If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are called with no intervening modifications to the dictionary, the lists will directly correspond. This allows the creation of (value, key) pairs using zip(): pairs = zip(d.values(), d.keys()).

Sorry, yes, I was hasty. Ordered dicts supports such a thing with intervening modifications:

  oldkeys = list(D.keys())
  D[z] = foo(z)
now if you take

  zip(oldkeys, D.values())
then you're guaranteed to iterate over oldkeys, with the proper values associated with those keys -- if z was an oldkey, its value got updated; otherwise, it comes after oldkeys and gets dropped out of zip.

The subtlety of this is what I, and perhaps others, find the most jarring.

Dicts are ordered in certain cases in python 2, and it was relied upon by some users (I saw it in the wild).

Baking this as a language guarantee is the only protection against Hyrum's Law.

This won't break existing code. It will break new, good code that gets back-ported from eg. a 3.7 tutorial to a 3.5 environment, without any syntax errors.

> Changed in version 3.7: LIFO order is now guaranteed. In prior versions, `popitem()` would return an arbitrary key/value pair. [1]

If they added a new `popordereditem()` method, good 3.7 code would use that, and an attempt to run that code on 3.5 would throw a reasonable, useful error message. If they wanted to play it safe like Rust or Go, they'd add the ordered method and make popitem() deprecated or make it artificially use random/arbitrary order so you can't accidentally depend on a new implementation detail and have a test case work.

Also, your case 4 does deserve some protection because while bad code it's hard to test for bad but working code. Implementation-specific or undefined behavior that works is the worst kind of problem, because it's hardest to test against. Compile-time syntax errors are the easiest, runtime errors are testable with a solid test harness, some possible can be identified as warnings with linters, but sloppy code requires manual inspection to detect.

Actually, no, I take that back: Implementation-specific or undefined behavior that works sometimes! is the worst kind of problem. That's what this code enables; if your test case is `dict({'one': True, 'two': True})` and you have a unit test where you popitem() to assert that you get 'two' and then 'one' it will pass the test harness on 3.7, it will pass on 3.6 because of implementation-specific behavior, and it will pass on 3.5 because the hash map arbitrarily puts those particular items in order. But it will silently get it wrong when you pass in user-supplied data. Shudder.

[1]: https://docs.python.org/3/library/stdtypes.html#dict.popitem

> Code that ordered the dictionary keys before iterating are now slightly innefficient due to extra work of sorting a sorted list.

I think TimSort is extremely efficient for presorted lists, so even that isn't a major impediment.

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact