Hacker News new | past | comments | ask | show | jobs | submit login
Dictionary union (PEP 584) is merged (github.com/python)
143 points by whalesalad on Feb 25, 2020 | hide | past | favorite | 84 comments



  {**d1, **d2}
is very natural if you also write javascript where their spread operator looks like:

  {...d1, ...d2}


This is reason enough to upgrade from Python 2.7 if you are still on it. I use this convenience almost daily.


No it isn’t.


One of the issues with Python 3 is that there isn’t really one killer feature, it’s countless little ones. Many believe 3.6 was the first release where they added up to enough of a benefit (though f-strings are a big help for many projects).

Regardless it no longer matters, the Python 2 ecosystem is now rotting as packages drop support. Every week I have to make one or two hot fixes somewhere to forcibly pin to an old version to fix something.


For me, proper handling and distinction of Unicode vs. binary data was a game changer. I don't know if that's related to my first language being non-English, but I remember it being really important to me and a strong reason why I made the switch years ago.


I used to work for a huge SEM business working with keywords in all languages of the world. No-brainer.


I would definitely mention integer division as a massive change in py3 too. In my field, the @ multiplication operator is also useful.


> I would definitely mention integer division as a massive change in py3 too.

To be fair, you can get that in Python 2 too:

  from __future__ import division


Even more natural if you write Ruby, which uses an identical syntax.


So PEP 584 is a JavaScript inspired feature request?


No, it adds a new merge (|) and update (|=) operator to avoid using

    {**d1, **d2} 
because it apparently looks ugly[1].

[1]: https://www.python.org/dev/peps/pep-0584/#d1-d2


And in doing so creates multiple ways to do the same thing which is SUPER SUPER annoying.

There has been a recent effort to add all these new operators that don't actually let you do anything you couldn't but now you can confuse everyone by doing it in other ways.


Python doesn't use formal interfaces. This makes dicts implement unions in a duck typeable way - the operators.


instead of actually useful features like dict deconstruction


The bigger reasons are discoverability (as noted in the section you linked) and the ever-vague notion of Pythonic-ness.

{d1, d2} is not intuitive to a primarily-Python developer, and looks nothing like typical Python. The dict unpacking operator it uses is almost never seen outside function arguments.


The PEP document describing the feature: https://www.python.org/dev/peps/pep-0584/


> > Dict union will violate the Only One Way koan from the Zen.

> There is no such koan. "Only One Way" is a calumny about Python originating long ago from the Perl community.


For what it is worth, from https://www.python.org/dev/peps/pep-0020/:

> There should be one-- and preferably only one --obvious way to do it.

Personally, I’ve always thought that Python missed its Zen, both here and on “explicit is better than implicit.”


That ship had long sailed with string formatting anyways.


F-strings are the one obvious way to do string formatting. There may be other ways, for legacy backwards compatibility reasons, but f-strings are the way to do string formatting.


I think you are right. Just left a string of legacy (%-formatting) and dead end (.format()) solutions on the way there.


I think F-strings are bad for i18n. It sucks to use them with a database of string localizations because the variable name is now embedded in dozens of translated F-strings and basically becomes immutable.


Yes, unless a template is needed later. That's why the others continue to exist.


I disagree. In logging for example, ‘%s’ with the value as an argument to the logger is preferred because the formatting can be ignored if the log level is not sufficient to print.


Except log formatting isn’t the same as actually executing the expression passed to the logging function.


Log formatting is a different problem than string formatting. With log formatting you pass the formatting arguments as function parameters, which is completely different from any other way you format strings.


I think you're getting downvoted because logging is basically doing:

    def log(fmt, *args):
        print(fmt % args)
Hardly a huge change.


It sailed long before that.


Go back to 1.0 and we still had two ways to write strings ("abc" and 'abc'), and two ways to write not equal ("!=" and "<>").

The last died with 3.0.

But my point is that the Zen of Python must be seen as a post hoc description overlaid onto whatever the actual Python philosophy is. Aligned, certainly, but at times only roughly aligned.

So I don't see it as having sailed (with string formatting, or other specific even) but never having been there in the first place. More like, sailing in the same waters.


    The Tao that can be told
    is not the eternal Tao


    Can the Tao be found
    where there is no Tao?


It sailed with Turing completeness! At least, the popular interpretation of the phrase that ignores the word “obvious” did.


It sailed when they added the for loop - everyone was happy using while loops, things worked and it was simple.

Now there are TWO ways of calling a block of code repeatedly based? How confusing for new users. Python really went downhill since then.


I think go dropped one of these (for vs while) to keep to just one approach on loops so python providing lots of ways to do same thing is something other languages targeting entry level folks are seeking to avoid


Ah, so that's what the migration to Python 3 was all about!


Fantastic. Especially glad they went with | over +, that’s always felt like the natural way I’ve wanted to do this. Looking forward to more set-like operators in the future!


Thank goodness sanity prevailed on the operator!

We had a whole discussion on HN last time[1] about this, where I argued that dicts are logically subclasses of sets and therefore should share operators.

When I saw this headline I accepted my fate of typing the "wrong" operator from now on and liking Python just a tiny bit less for the inconsistency. So glad they reconsidered.

[1] https://news.ycombinator.com/item?id=19314646


Guido stated his preference to | and the pep was changed.


The PEP has this section:

> The new operators will have the same relationship to the dict.update method as the list concatenate (+) and extend (+=) operators have to list.extend. Note that this is somewhat different from the relationship that |/|= have with set.update; the authors have determined that allowing the in-place operator to accept a wider range of types (as list does) is a more useful design, and that restricting the types of the binary operator's operands (again, as list does) will help avoid silent errors caused by complicated implicit type casting on both sides.

Would someone please explain what they mean with regard to being different from set.update, and what could lead to silent errors?


I wouldn't have thought about dict unpacking as a solution either but once suggested it seems satisfactory and I don't see how adding a new operator is more discoverable or natural than just putting this method in a more prominent place in the documentation.


Guido himself said he had forgotten about this trick and since it's syntactic sugar, it does not respect dict subclasses or other mappings.


imo

  defaultdict(callback,{**a,**b}) 
is more readable than a | b without knowing that a or b are defaultdicts and having to reason about which default callback will be used


It’s also really expensive. This new operator works with any Mapping type without needing silly hacks.


Note that a | b is already a disaster if you try to use subclasses of set. In python 2.7, iirc it would return a set of a's type but without calling the constructor. In python 3 it seems to return a set (not the type of a or the type of b).


Not sure about the set operator, but this operator is meant to handle subclasses better than the status quo.


Wow, one more point for python 3.

Just override ‘__and__’ in your whatever class to replace default return.

Pretty explicit in my book.


the easy to forget justification is surprising to me. especially when most modern languages have the concept of unpacking, rest, spread or etc.

making the trick work with other mapping types and making it faster is totally understandable though.


It's pretty obvious with the context of other languages, but wildly outside the norm for Python. I rarely see dict unpacking outside function signatures.


It mentions in the PEP discussion that the {a, b} trick only works for string keys. So it isn't applicable in as many cases as the new operator.


actually, only

  dict(d1, **d2)
has that problem. it works fine if you unpack into a dict literal:

  >>> d1 = {1: 'a'}
  >>> d2 = {2: 'b'}
  >>> {**d1, **d2}
  {1: 'a', 2: 'b'}
iirc the pep mostly just says that it's suboptimal because it's syntactically heavy/noisy, non-obvious and can't be overloaded in dict subclasses

---

i was curious why the two double-stars behave differently despite syntactic similarity. so i went and checked the bytecode, and it turns out they compile down to different opcodes! `{××d1, ××d2}` yields a BUILD_MAP_UNPACK, while `dict(d1, ××d2)` yields a CALL_FUNCTION_EX/CALL_FUNCTION_KW (depending on the CPython version)


This seems pretty straightforward. When doing

`dict(d1, d2)`

You are calling the dict function, and using the normal syntax for unpacking a dictionary into kwargs. In this case, the name for kwargs must be strings.


yeah, that side of the (in)equation was pretty obvious, i was mostly interested in the `{...}` one. i admit that a bytecode listing probably isn't the best exposition, i just like digging into VM stuff :)


  dict(d1, **d2)
only works with string keys

  {**d1,**d2} or dict({**d1,**d2}) 
works with all key types it seems


I wish there was a union operator for typing as well to replace `Union[str, int]` with just `[str|int]`.


PEP 604 (draft) proposes this:

  def f(list: List[int | str], param: int | None) -> float | str:
      pass

  f([1, "abc"], None)
https://www.python.org/dev/peps/pep-0604/


Naively, couldn’t your just overload `or` to make that work?


Did you mean `str | int`?


Yay! I was wishing for this feature just a few days ago. It's somewhat analogous to how sorted (since Python 2.4) frees us from having to tediously make copies of lists to sort them in place.


You can already do

   {**d1, **d2}
today for the same effect.


Here I was wondering what a dictionary union was, already having stumbled upon to it and used it.

Note that the method you show is slightly different for cases of dict subclasses.

The PEP notes the difference: https://www.python.org/dev/peps/pep-0584/#d1-d2.


I haven’t read the entire bug tracking thread, but it seems like people were mostly against it, and have been many times in the past.

What made decision makers change their mind and accept this change?


Most of the bug tracking thread was just about whether `somedictsubclass() | somedictsubclass()` should be `dict()` or `somedictsubclass()`

The latter (returns `somedictsubclass`) would cause the `|` operator to rely on the `copy()` method from `dict` which would be the only case where an operator relies on a non-double-underscores method. Based on that, two core devs were against it. The core devs prevailed, and the behaviour will be the former (returns `dict`).


It seems that just using + as the operator was reject because it's: "Too specialised to be used as the default behavior."

What does that mean? It works for lists, obviously lists don't need to worry about duplicated values, but it's kind non-intuitive that + won't work for dicts. It think many people view dicts and lists as the same general type of data structure.


>> It think many people view dicts and lists as the same general type of data structure.

Is that a joke or are you from PHP world?


Python 2 Community: We are in hell, we have to stop working on everything to upgrade to Python 3, there is no straightforward way to upgrade, many of our python 2 libraries haven't been updated, and there are tons of little bugs that are hard to fix.

Python 3 Community: Look at thing cool dictionary merging thingy!


What does this do on duplicate keys? Keep one? Take a predicate?


It keeps the one on the right. They explain it in the PEP: https://www.python.org/dev/peps/pep-0584/


AKA Last one wins.


Can't wait for Python 4k where we TOOWTDI all the old stuff.


The problem with sprinkling operator overloading all over the place in non numerical use is that you as tje reader don't get the context hints provided by method names. I think this change is bad in the overall balance.


The best way to do dictionary union is already symbolic:

    {**d1, **d2}
This provides a clearer symbolic notation for dictionaries analogous to what's already available with sets. FWIW the pep discusses what this would look like as a method vs an operator:

https://www.python.org/dev/peps/pep-0584/#use-a-method


The best way for in my view is .union(), the new syntax additions are too cryptic.


Not sure what you're referring to because there is no "union" method/function. There is currently no non-symbolic built-in way to combine dictionaries in an expression.

You may be interested to read PEP 584's list of examples of all the real-world code the existence of this operator makes clearer:

https://www.python.org/dev/peps/pep-0584/#examples


I agree. Back in the day when I used Ruby, I remember one of the arguments for Python being their belief that there should be one way to do things. Found one reference:

> There should be one-- and preferably only one --obvious way to do it.

https://www.python.org/dev/peps/pep-0020/


There should be one _obvious_ way to do things. Not _one_ way to do things.

The operator is better than the dict unpacking-repacking trick, and will become the obvious way to do it.


we already have `set1 | set2` for set unions

and dict keys are basically a set


I think that's cryptic too for most people, intuitive just for people who are conversant in bitwise operations.


these are standard set operations, not bitwise


The choice of the "|" operator for set union comes from bitwise operations: bitwise OR works as a union operator if you are using integers as bit vectors to represent sets of boolean attributes. And it was a common idiom back in the day when people used to program in C/assembly, using words as bit vectors was a common way to save memory.

Hence "|" as set union is intuitive for people who are familiar with this application of bit vectors.


I think the OR operator comes from set theory and has nothing to do with the low-level boolean flag fieldsets.


I looked up set theory a couple of places (WP[1] and Britannica[2]) and didn't find any references of the OR operator in this context.. do you have a link?

[1] https://en.wikipedia.org/wiki/Algebra_of_sets

[2] https://www.britannica.com/science/set-theory/Operations-on-...


The analogue of the OR operator in set theory is the union operator. People think of them as basically the same thing because of the correspondence between a property and the set of things with that property. If A is the property of being either B or C, then the set of the things that are A is the union of the set of things that are B and the set of things that are C.


Nice


FINALLY!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: