Hacker News new | past | comments | ask | show | jobs | submit login
PEP 584 – Add + and – operators to the built-in dict class (python.org)
164 points by Ivoah 49 days ago | hide | past | web | favorite | 144 comments



> An alternative to the + operator is the pipe | operator, which is used for set union. This suggestion did not receive much support on Python-Ideas.

That's disappointing. It's always been on my Python wish list that dicts would subclass sets, as dicts are essentially sets with values attached. Pretty much everywhere you can use a set you can use a dict and it acts like the set of its keys. For example:

    >>> s = {'a','b','c'}
    >>> d = {i: i.upper() for i in s}
    >>> list(d) == list(s)
    True
Dictionaries have been moving in this more ergonomic direction for a while. Originally, to union two dictionaries you had to say:

    >>> d2 = {'d': 'D'}
    >>>
    >>> d3 = d.copy()
    >>> d3.update(d2)
    >>> d3
    {'a': 'A', 'b': 'B', 'c': 'C', 'd': 'D'}
Nowadays, as the PEP points out, you can just say:

    >>> {**d, **d2}
    {'a': 'A', 'b': 'B', 'c': 'C', 'd': 'D'}
There's no reason you shouldn't have always been able to say d | d2, same as sets. Now I finally get my wish that dictionaries will behave more similarly to sets and they use the wrong set of operators.


The most compelling reason to not do this is that (I claim) it’s not super obvious what to do when the keys are equal. In:

  { 'a' : 1 } | { 'a' : 2 }
Should the result be:

  { 'a' : 1 }
(prioritise the left hand side), or

  { 'a' : 2 }
(prioritise the right hand side), or should it raise an error? Maybe a fourth option would be do downgrade to sets of keys and give:

  { 'a' }
A fifth option is to magically merge values:

  { 'a' : 3 } or { 'a' : (1,2) }
For the first two choices one loses commutativity which means that code then suddenly has to have previously cared about it (or it will do the wrong thing), even though it didn’t previously matter, and one is always potentially losing data. The third choice is safe but could cause unforeseen problems later if shared keys only happen rarely. The fourth choice also forgets a bunch of information held in the dict.

In a language like Haskell, one can use traits to specify how to merge values (Monoid) but without traits (and a way to choose which trait to use) I think some kind of magic merge is not great.

I claim the operations one should really want with dicts are not set operations but rather more relational ones, ie {inner,outer,left,right} joins on the keys followed by some mapping to decide how to merge values.


While I agree with you, I will note that even set union in Python is not commutative. a | b should equal b | a in the sense of __eq__, but the actual objects in the result set depend on the order of the arguments (and in the opposite way from dict + dict). This happens with objects that are distinct but compare/hash equally (x is not y and x == y). Whether that actually matters for any useful program is another story...

Dumb program to illustrate this point:

    class Dummy:
        def __init__(self, value):  self.value = value
        def __repr__(self):         return 'Dummy(%s)' % self.value
        def __hash__(self):         return 0
        def __eq__(self, other):    return True

    a = {Dummy(0)}
    b = {Dummy(1)}
    print(a | b)
    print(b | a)
    print(a | b == b | a)


Unfortunately, even in Haskell Data.Map.Map's monoid instance is left-biased. There is the monoidal-containers package which newtype-wraps Data.Map.Map to have instance Monoid m => Monoid (MonoidalMap k m), which I think is much more sensible.


I think I wasn’t even sure that Haskell had a Monoid instance for Data.Map, I knew it wasn’t the interface which I would naturally expect though. I agree that the interface for MonoidalMap is more natural.


Besides, anytime somebody compare Python to Haskell, the battle is over. They have completly different use cases and philosophy. If you want something in Haskell, you probably want the opposite in Python.


Its not clear what you're saying here.

The comparison was to say, "this decision is difficult everywhere" -- which lang seems beside the point.


Great post, sets have nice properties that dictionaries don't have. Making them act similarly seems like a trap


> it’s not super obvious what to do when the keys are equal

    d1 | d2 | d3 | ...
is equivalent to:

    {**d1, **d2, **d3, ...}


Now read the above but instead of “it’s not super obvious what

  d1 | d2
should be because losing information/desirable properties/weird errors”, read “it’s not super obvious what

  {**d1, **d2}
should be because losing information/desirable properties/weird errors”.

Except I guess one could throw in something about TOOWTDI too.


I actually think it is obvious what a dictionary merge should do (overwrite keys on the left with keys on the right), but this is besides the point because it's already been determined for

    {**d1, **d2}
In other words, there are no new semantics to discuss here. I'm just saying the two syntaxes should be equivalent.


> For the first two choices one loses commutativity which means that code then suddenly has to have previously cared about it (or it will do the wrong thing)

Since this is a new operator, that shouldn’t be an issue.

I think losing commutivity is okay. After all, d1.update(d2) != d2.update(d1) if keys conflict.


What you have written doesn’t look at all symmetrical but d1 | d2 looks very symmetrical. Operators being symmetrical around a vertical axis tends to imply being commutative (although there are many exceptions e.g. a divide symbol (but note fractions aren’t symmetrical) or a minus sign or using ^ for exponentiation (but superscripting is not symmetrical) or matrix multiplication (but maybe one could argue this is an abbreviation of function application))

Secondly I claim that the issue with using | is that it is not a new operator. It is a new, incompatible meaning for an old operator. Old code might not bother checking that its arg is a set because of it weren’t a set then | or in would fail. New programmers might see dicts as being basically sets and wrongly assume functions for sets would correctly work on dicts.


In case the values match you could supply a collision callback to define what to do, eg to add the values,

  d1 = {'a': 1}
  d2 = {'a': 2}

  d3 = {**d1, **d2, add_func)

  def add_func(a, b):
      return a+b
Or something along those lines


Why not raise a ValueError and let the programmer figure out what The Right Thing To Do is when you add two dicts that have the same key with a different value?

I assume the same key with the same value would be OK, but I'm not really sure it's a good idea for it to be OK.


You can't do value comparison without making dict item comparison a pissed in function or making dict values immutable. If you're doing something that really looks like a mathematical Union that will raise if there's any overlap then it's a really confusing abuse of notation. I don't think there's a way out.


That is one thing you could do to merge dicts. To expand on my last paragraph above, I think I would imagine the following operations (stupid syntax):

  a & b = { k: (a[k], b[k]) for k in a.keys() | b.keys() }
  a | b = { k: (a.get(k, None), b.get(k, None)) for k in a.keys() | b.keys() }
  a |& b = { k: (a.get(k,None), v) for k, v in b.items() }
  a &| b = { k: (v, b.get(k,None)) for k, v in a.items() }
  a |_| b = { k: only(a,b,k) for k in a.keys() | b.keys() }
  def only(a,b,k):
    if k in a && k in b:
      throw DuplicateKey(a,b,k)
    elseif k not in a && k not in b:
      assert(false)
    elseif k in a:
      return a[k]
    else:
      return b[k]
This doesn’t work well if values can be None so maybe instead of pairs there should be objects Left(x), Right(y), and Both(x,y)


That syntax doesn't make sense. The

  {**d1, **d2}
idiom is just a clever mashup of Python's dictionary construction literal {}, and * * unpacking. That's why it only works with string-valued keys (which is a major limitation).

Adding a third item to the dictionary literal would require special-casing the {} dictionary construction literal.


  >>> { 'a' : 1 } | { 'a' : 2 }
ISTM the most logical result would be:

  { 'a' : { 1, 2 } }
...but I could certainly understand throwing an exception.


While I see your point, I don't think this makes sense historically. Dictionaries never supported such behavior before so you'd be introducing a new core concept to a dictionary. But moreover, you'd be changing the type of the value only on duplicated keys, and what about if you were to add another value of 2 to a? Are you making this a set, and why? I think it would come with too many caveats and assumptions in the PEP.

I'm not saying you have a bad idea/logic here, just that I'm not sure it's the best thing for the dict.


Note that this forgets the order of the arguments, which may not be desirable


If the property we want to achieve is "a | b == b | a" we necessarily have to forget the order of the arguments.


"dicts would subclass sets, as dicts are essentially sets with values attached"

Such a derivation would violate the Liskov substitution principle. Consider the following with set:

  x = {"one", "two"}
  y = set()
  y.update(x)
  y
It result in y being {'two', 'one'} .

Now, do the same with dict:

  y = dict()
  y.update(x)
This gives the exception: "ValueError: dictionary update sequence element #0 has length 3; 2 is required"

This means that dict cannot be used anywhere that a set can be used, which means it violates the Liskov substitution principle (see https://en.wikipedia.org/wiki/Liskov_substitution_principle ) which means that if covariant methods are needed for good design then dict cannot be a subclass of set.


If dicts did subclass sets, then sets would be dicts whose values are all None. In other words, your last example would be defined as:

    >>> s = {'one', 'two'}
    >>> d = {}
    >>> d.update(s)
    >>> d
    {'two': None, 'one': None}


If sets are dicts with values of None, then they're dicts, not a superclass of dict.


Would s["one"] = 1 raise an exception? Or convert the set into a dict? Or change the sentinel value for all the set elements?

None seem like a good design since it means either the instance change its class on the fly (which Python does support) or that a dict does not act like its parent set object, breaking the is-a relationship most people expect from an OO design.

It seems like the circle/ellipse problem, and the current implementation is the "drop all inheritance relationships" solution to that problem. https://en.wikipedia.org/wiki/Circle-ellipse_problem#Drop_al...


> Would s["one"] = 1 raise an exception?

Sets don't support indexing, so it would still raise an exception. Dicts do, which is an example of them supporting more operations than sets, which is an example of why (if there is to be any subclass relation) dicts are subclasses of sets.

Edit: I suppose there's some confusion about my language above. "then sets would be dicts whose values are all None" could more helpfully read "sets would be equivalent to dicts whose...".


Liskov substitution and meaningful method-mutability forbid any kind of sub-typing relationship.


> as dicts are essentially sets with values attached.

Interestingly enough some languages actually do the opposite. In Rust for example a set is literally just a dictionary with unit as the value[0] and unit is essentially a way of expressing the absence of a value (it takes up no space in memory, and you just can't do anything with it).

[0]: https://doc.rust-lang.org/stable/src/std/collections/hash/se...

For posterity, the above link shows:

  pub struct HashSet<T, S = RandomState> {
      map: HashMap<T, (), S>,
  }


Same as Go, where everybody just uses map[Key]struct{} as sets.


To be fair, this is because the language does not support a type safe set type. I would use one frequently if it did


map[T]struct{} is a type-safe set type. It's just not an ergonomic one.

FWIW, I'm usually using map[T]bool and only ever inserting `true` values. It uses a bit more space, but membership checks read like

  if set[key] {
instead of

  if _, ok := set[key]; ok {


Before Python grew a set type, it was common to implement them in a similar way, i.e. as a dict with some kind of default value (0, 1, None, etc).


Python sets are internally just a dict hash table for the keys with no associated values.


I think internally C++ STL set<T> is similarly just map<T, void>


> Interestingly enough some languages actually do the opposite.

Of course you can represent sets as dictionaries with empty values (ask anyone who programmed Perl). You're supporting my point that dicts logically subclass sets, because they can represent sets where the values can be other things as well.

You're also getting at what the behavior should be if you union a dictionary and a set. Hypothetical Python:

    >>> s = {'a','b','c'}
    >>> d = {'d': 'D'}
    >>> d | s
    >>> {'d': 'D', 'a': None, 'b': None, 'c': None}


I think he's flipping your point: sets are a subclass of dicts/maps, not vice-versa. Thinking of a maps as a set where it's value maps to something else sounds backwards because values in sets don't map to values arbitrarily (or at all in some cases); maps are maps.


> It's always been on my Python wish list that dicts would subclass sets, as dicts are essentially sets with values attached.

> There's no reason you shouldn't have always been able to say d | d2, same as sets.

I don't agree with this view, mainly because merging a dict is not associative, while unionizing a set is.

The actual operation for "a + b" is "add everything from b set to a", and + more closely resembles that than |.


dict.keys() pretty much does just that:

    >>> a = {"foo": 1}
    >>> b = {"bar"}
    >>> a.keys() | b
    {'bar', 'foo'}
As an aside, I like the plus operator. Begin able to merge two dictionaries in one line and have the result be a new dict is something I've needed often enough.

    {**d, **d2}
works, but is pretty recent and still feels weird to me (not coming from a language that makes use of destructuring a lot, like Javascript).


> dict.keys() pretty much does just that... a.keys() | b

You're illustrating my point:

    >>> set(a) | b
    {'bar', 'foo'}


Yeah, I largely agree. Is that unusual these days? :D

And I'd be sort of okay if dicts implemented set operators, although I don't think using `set()` or `.keys()` is a big ask. But using the pipe operator for the operation in this PEP would be a bad idea IMO: For sets, a | b == b | a. For dicts, not necessarily. So if they used the pipe operator, that could lead to surprising or unintuitive results.

On the plus side, not using pipe still leaves it open for future usage, so you might get your wish yet.


Your exact point that for sets, union operator is commutative while for dictionaries it wouldn't be is one of the main objections I foresaw if I ever wrote this up as a PEP. Counterpoint: for numbers, + is commutative but for lists it's not, so it's normal for the same operator to have different commutativity depending on the type of the operands. IMO it's worth using the set operators because of the subclass relation of sets and dicts despite some small (but really, predictable) changes in behavior, but I can see how someone could have a different opinion.


The extension operator still resembles more a set union than an addition. The fact that it is not commutative is not imo an argument since sum is even more frequently used with a commutative semantics.


”that dicts would subclass sets, as dicts are essentially sets with values attached.”

I think a variant of https://en.wikipedia.org/wiki/Composition_over_inheritance applies here. Inheritance, in general, only is a good idea if there is a strong isa relation. A dictionary isn’t a set of keys, it has a set of keys.

If you want to see a dictionary as a set, I think the better view would be to see it as a set of (key,value) pairs where equality of pairs is defined as equality of the key parts, ignoring the ‘value’ parts.

I think it makes sense to require that such a set should behave identical to a dictionary, and that providing a ‘real’ dictionary is just an optimization, plus the addition of convenience functions, e.g. to get the set of keys.

If one sees things that way, one could even define the dictionary interface as taking an equality operation on the keys and a ‘value combiner’ function that combines values, and will be used in the cases you outline (that function could add integers, concatenate strings, keep the larger value, or whatever the programmer specifies)


If you treat dictionaries as sets of tuples union doesn’t work as expected: {(‘a’,1)} | {(‘a’, 2)} = {(‘a’,1), (‘a’, 2)} Same key maps two values.


That's not what the parent says though.

He says that sets are like dictionaries without values, e.g. a set is akin to the dictionary keys.

So, in your example the ('a', 1) and ('a', 2) are the keys (in how the parent argues about) -- it's not 'a' that's the key.

Same like you can do today:

  d = {}
  d[('a', 1)] = 6
  d[('a', 2)] = 89
We can still express that dictionary as a set of tuples, it's just:

  {(('a', 1), 6), (('a', 2), 89)}


Meant to respond to top level parent who said:

> dicts would subclass sets

Was trying to make the case as to why treating dictionaries as a child of <generic collection> with an extend/merge operation using ‘+’ rather than a child of set with a union operation ‘|’ makes more sense (to me).

Changing the behaviour of a well defined operation like union seems bad - although my case is somewhat undermined by python’s overloading of ‘+’ to mean extend.


Yes, its not so bad for the + case, but the - case seems non obvious at best.


Practically, what your wish would accomplish ? Will it make most people more productive ? Produce less bug ? Learn faster ?

Most Python coders don't even use sets more than once a year. Hell, I use collections.deque more than sets.

But dicts ? We use it all the time. In fact, failing a {} + {} is a recurring disapointment in all my classrooms.

Plus, in PHP and JS, arrays/objects are the "do-it-all" data structure. And it's horrible. You see the same data structure everywhere. You have to read in details what something is, and what it's for.

It very nice that dict and set are very distincts, and that they have a distint set of operators. This way it's super easy to scan the code and know what the data structure is, and what it's used for. That's why I always teach set([1, 2]) and not {1, 2} first. It helps people to make a clear distinction in their mind.


I use sets a lot, sample size of one. I do data science/engineering stuff, sets of columns/sets of keys. I'm very sceptical of your claim that most developers don't use sets.


Actually, a lot of Python users don't even know sets exist. Or they forgot, and use them randomly after they google "remove duplicates". Even among the sets users, a lot of them don't even know you can use ^, | and ~ with them.

Size sample: a few hundred students and colleagues over 15 years.

It make sense: for columns, it's quite commonto use dicts, or pandas dataframe. The sets of key is just the dict keys memory view.

A web dev rarely needs sets. A GUI coders, a sysadmin or a geograph neither.

It's not that sets are not useful, it's just that in the huge numbers of things you need to do in programming, accross all fields that Python attends to, they are pretty niche.


> It very nice that dict and set are very distincts, and that they have a distint set of operators. This way it's super easy to scan the code and know what the data structure is, and what it's used for.

This is exactly why I want explicit typing in Python.


You do with type annotations.

But type declaration may not be in the view port.

Or you may read a script or snippet, which is a type or code that won't bother with typing.


> But type declaration may not be in the view port.

Type declaration doesn't have to be in the view; the very point of type annotations is to support static type systems, which you can query about any type. For example, if you use mypy, you can add `reveal_type( any_kind_of_expr )` anywhere in your code and you'll get the most precise type available, be it inferred or declared manually, for that expression at given point in code.


len(dict1 + dict2) does not equal len(dict1) + len(dict2) so using the + operator is nonsense.

The operators should be |, &, and -, exactly as for sets, and the behaviour defined with just three rules:

1. The keys of dict1 [op] dict2 are the elements of dict1.keys() [op] dict2.keys().

2. The values of dict2 overwrite the values of dict1.

3. When either operand is a set, it is treated as a dict whose values are None.

This yields many useful operations and is simple to explain.

merge and update:

    {'a': 1, 'b': 2} | {'b': 3, 'c': 4} => {'a': 1, 'b': 3, 'c': 4}
pick some items:

    {'a': 1, 'b': 2} & {'b': 3, 'c': 4} => {'b': 3}
remove some items:

    {'a': 1, 'b': 2} - {'b': 3, 'c': 4} => {'a': 1}
reset values of some keys:

    {'a': 1, 'b': 2} | {'b', 'c'} => {'a': 1, 'b': None, 'c': None}
ensure all keys are present:

    {'b', 'c'} | {'a': 1, 'b': 2} => {'a': 1, 'b': 2, 'c': None}
pick some items:

    {'b', 'c'} | {'a': 1, 'b': 2} => {'b': 2}
remove some items:

    {'a': 1, 'b': 2} - {'b', 'c'} => {'a': 1}


>The operators should be |, &, and -, exactly as for sets, and the behaviour defined with just three rules:

len(a) - len(b) != len(a-b) either.

I'm not really sure why you think that length should be a linear map for dictionaries. Their length is their least interesting property.


Python uses different operators for combining lists and for combining sets. So, the question is: is this operation more like list + or more like set |?

Choosing len(a) + len(b) == len(a + b) as a criterion is interesting because it is what distinguishes an addition from a merge.

Addition: put together two collections and keep everything

Merge: put together two collections and keep some things

This is a merge.

The dictionary merge operator should be | for the same reason that the set merge operator is |.


Yes again, for two sets:

    a = {1,2,3}
    b = {1,2,3,6,7,8}
    len(a) - len(b) == 0
    len(a) != len(b)
  
Subtraction on sets does not work like you're suggesting it should. I don't understand why you're using that to defend broken notation. The only reason why we have a|b as union is because a∪b is too hard to type on most keyboards, and we still are trying to catch up to APL when it comes to math source code.


I'm not suggesting anything about subtraction on sets. I'm suggesting that + is not appropriate for dictionary merge because it doesn't behave like + on other Python containers.

The context is Python, not other languages. In a language that used + for set union, + would also be fine for dictionary merge.

But Python isn't like that. Strings, tuples, and lists have + and do not have -, and for all of them the + operator does add lengths. Sets have | and -, and those operators do not add or subtract lengths.


The length (i.e magnitude) of the sum of two algebraic vectors is also not the length of the two original vectors.

Would you not use + to represent vector sum?


Why should that constraint hold? It's not even true for simple vectors under the euclidean norm:

||<1,0>|| + ||<0,1>|| != ||<1,0> + <0,1>||


    > {'a': 1, 'b': 2} - {'b': 3, 'c': 4} => {'a': 1}
I think that removing the key from d1 would be a bad idea if the value is not the same on both dicts. If you think the dict is a vector of named dimensions, should 'c' be -4 in the result?

I'd totally support it resulting in:

    {'a': 1, 'b': -1, 'c': -4}


So much better! Overloading addition for something that behaves differently is not good.


Addition as defined in mathematics behaves differently depending on context.


Unlike some of the other commenters, I'm fine with the + specification. + hasn't been commutative in Python for a long time.

But the - bothers me, and nobody else seems to have mentioned this. {"a": 1} - {"a": 1} = {}, sure, but it is way less obvious to me that {"a": 1} - {"a": 2} = {}, and not {"a": 1}. If you consider dictionaries as an unordered list of tuples (key, value) where keys happen to be unique and as a result of that you get nice O()-factors on access, that doesn't make sense. You went to remove ("a", 2), but saw ("a", 1) and thought, "eh, close enough". But it's not the same thing.

If you think of a dict as a set that happens to have associated values, the specification makes more sense, but if you dig into that line of thought, that turns out to be a rather weird way of thinking of them. Values really shouldn't be thought of as second-class citizens of a dict. If you are going to go this route though, {"a": 1} - {"a"} = {} (where the right-hand side is a set) actually makes more sense, without the spurious value on the right-hand side.

I'd actually rather conceive of the - operation as a "dict minus an iterable that will yield keys to remove". This has the advantage of recovering the original {"a": 1} - {"a": 2} = {} semantics that probably is what people want in practice, just via a different method. But locking the right-hand side to a dict makes it weird.


> Values really shouldn't be thought of as second-class citizens of a dict.

Aren't they? If I do `d["a"] = 1; d["a"] = 2` the first assignment is completely gone after the second, I don't get a set with superimposed values on the "a" key.


Its consistent with iteration over a dict:

  for k in my_dict:
      print(my_dict[k])
In this example it is implied that unless you specify .items(), you are only considering keys in the iteration. This would apply to the + and - operations too as I understand


Using - to mean "here's a dict and a seq -- remove all the seq's keys from the dict" would be useful and consistent, but they specifically prohibit that. They require the rhs to be a dict, too, even though the values are never used. Why?


Good point, although you'd need to make sure your seq only has unique values. Other than that I don't see why you should have to write

  {k: v for k, v in d.items() if k not in seq}


> Analogously with list addition, the operator version is more restrictive, and requires that both arguments are dicts, while the augmented assignment version allows anything the update method allows, such as iterables of key/value pairs.

    >>> d + [('spam', 999)]
    Traceback (most recent call last):
      ...
    TypeError: can only merge dict (not "list") to dict
    >>> d += [('spam', 999)]
    >>> print(d)
    {'spam': 999, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}


While I get the "Because this is what lists do"-argument, I am still wondering why there is a difference in the types allowed for `+` and `+=`?


The difference is because + in Python is intended to be a reflexive, declarative operator while += is a directional, imperative operator.

If the + has heterogeneous operands, there should be a promotion process to the most specific type that generalizes the operand types so that the addition works the same regardless of order, as exists for numeric types. But for general types (and collections particularly) the concept of most specific generalized type including two other collection types is not always sensible, so requiring homogenous operands makes more sense.

With += there is no intended symmetry between operands, the left side is the receiver into which the right side is added.


I don't find this convincing at all.

> so that the addition works the same regardless of order, as exists for numeric types

+ is not commutative for lists, tuples, or dicts. So the promotion process need not be commutative either. There is no good reason why list + tuple should be forbidden, or dict + items should be forbidden.

a [op]= b is commonly and easily explained as "a = a [op] b, where a is mutated in place". Python should not break that explanation with mysterious inconsistencies.


> + is not commutative for lists, tuples, or dicts

Yeah, that's a good point. I think I was thinking on the type level rather than the value level, but I am unsure that makes a convincing argument for the behavior here even if it is otherwise true (which I’m not sure it is always, even at the type level.)


> With += there is no intended symmetry between operands, the left side is the receiver into which the right side is added.

I'm not sure I buy that, consider

a += b versus b += a

For integers, the result will be the same, it will just be placed in a different place. For lists, there are differences, but lists are more clearly directional in themselves. For dictionaries, the results can be entirely different.


Yeah, agreed that this feels really bad. I also sympathize with the "this is how lists work" argument, but that tells me the list functionality here was a mistake.

In my mind `a += b` is syntactic sugar for `a = a + b`, nothing more. It certainly shouldn't have different semantics.


For sets I can understand what + and - means: you can add or subtract the sets (not add or remove an element directly). This should be like lists, e.g.

    [10,20] + [30]
But what + and - would mean in the case of dicts is obscure. Better to just use full method names imho.


All my students disagree with you. They all try addition, and all expect a resulting dict with keys from both dicts. The fact the keys from the one side are prioritized is something they will learn once, just like with dict.update().


That's something a student may learn. But there will be plenty of people reading Python code that don't know how "+" works on dicts; and it is difficult to find out what it does because you can't easily Google/grep for the function name. Doing experiments with dicts is not something you want to be doing while you are reading other people's code.


The + operator looks great – I've personally experienced the papercut this solves multiple times, where it would be most natural to have "combine two dicts" operator:

    return {'a': 'b'} + other_dict
but instead I had to assign to a variable and mutate with .update(), which is much more verbose:

    x = {'a': 'b'}
    x.update(other_dict)
    return x
However, I was working in Python 2; Python 3 has

    {'a': 'b', **other_dict}
and even

    {**one_dict, **other_dict}
though the PEP mentions that the latter doesn't work in all circumstances. Still, it will be nice to have a more general operator; I personally don't really care whether it's called + or |.

On the other hand, the - operator seems... strange, in that it only considers the keys of its right-hand argument, and ignores the values. Seems like a footgun.


I think overloading + so that a + b != b + a is problematic.

I know this is the case for strings and lists, but those cases are very well established.


It intuitively makes sense for lists and strings, as those have an order that matters.

I agree with you (and disagree with other commenters) that this particular case is more problematic


I would think that adding two dictionaries will make people think of the established cases for other collections, rather than the less related case of numbers.


That is already the case for + on sequences though (e.g. string and list). In my experience it never causes confusion in practice in those circumstances.


And a = 1 is not equality.

Welcome to the world of programming, where we don't all try to match mathematical conventions because many of us suck at maths and are practical.


There is not rule of math stating that + must always be commutative. It is commutative for real numbers, yes, but there are plenty of use cases mentioned elsewhere in this discussion where + is not commutative.


Reminds me of Scala Maps[0].

Edit: after reading more carefully,...

> Analogously with list addition, the operator version is more restrictive, and requires that both arguments are dicts, while the augmented assignment version allows anything the update method allows, such as iterables of key/value pairs.

But why? Consistency in API behavior is important, and as a user I don't want to have to read that I can add lists of pairs only with assignments. I hope the draft gets fixed.

[0]: https://docs.scala-lang.org/overviews/collections/maps.html


You can always allow it later, but deprecating such a "feature" is a pain. And subtle errors/outright abuse can happen with some of these automatic coercions, so Python tends to be a bit more conservative than other dynamic languages. The most (in)famous example being comparison (edit: not addition) of an integer and `None`. Allowed in Python 2, non-intuitive IMO, and responsible for a few bugs in its time. Disallowed in Python 3:

> TypeError: '<' not supported between instances of 'int' and 'NoneType'


Do you mean comparison with None?

  Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> 1 + None
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
  >>> 1 < None
  False
vs

  Python 3.5.2 (default, Nov 23 2017, 16:37:01)
  [GCC 5.4.0 20160609] on linux
  Type "help", "copyright", "credits" or "license" for more information.
  >>> 1 + None
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
  >>> 1 < None
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  TypeError: unorderable types: int() < NoneType()


doh, you're absolutely right, cheers! edited the parent for posterity, too.


Being able to compare all values was useful. I miss it.

None being less than everything is particularly useful.


It was also a huge source of bugs.

I often give an exercice prompting a number with input(), then checking it to another number in the Python for beginers course. Most students don't call int() before comparing. In Python 3, they get an error, and learn to do so. In Python 2, it seems to work sometime, and if I don't catch it, they will get in trouble one day.

Just make your code explicit, it's the sane thing to do anyway.


It's consistent with [] + () not working, but [* * , * * ()] works.


Python continues to introduce more non-intuitive semantics that may be a small boon to the the expert class of programmers, but comes at the expense of ease of adoption for beginners. It started by making everything a generator, which are not very easy to master, and for which there were plenty of perfectly good substitutes (e.g., xrange, iteritems). And now you "add" sets of items (which you can't do in math) and when the update function worked well.

Python 3 is such a sad mess.


In my teaching of python to newcomers (mostly coming from matlab/R or no programming background) they often try to do dict_a + dict_b, and are confused as to why that doesn’t work when list_a + list_b works fine.

It think it’s an extreme stretch to claim it’s non-intuitive.


If dict addition was purely insertion, I would agree with you, but there is no way the following is intuitive:

  {1 : 1} + {1 : 2} ==> {1 : 2}


I couldn't disagree more. Python 2 was a mess. range vs. xrange, items vs. iteritems, keys vs. iterkeys, input vs. raw_input, strings vs. Unicode strings, integer vs. float division were a mess, and were especially confusing and inconsistent for beginners.

Teaching Python 2 to beginners was always annoying for them: "ok so there's this function called input() but NEVER use it, always use raw_input(), unless you like RCE", "although all the tutorials say `for i in range()`, you should really get in the habit of using xrange() because...". Generators don't need to be explained in detail or understood by a beginner; all that really needs to be taught is the concept of iterators, and eventually, at an intermediate stage, the idea that some iterators are lazily-evaluated.

A simple dict "copy + merge" addition operator is a perfectly reasonable idea that will help beginners, not hurt them.


> Python 2 was a mess. range vs. xrange, items vs. iteritems, keys vs. iterkeys

Generators execute asynchronously, you have to keep track of what they are up to and where they are in their iteration process. This can cause all sorts of problems. Consider the following:

  for x in pull_from_database():
    do_something_with_disk_or_network(x)
If pull_from_database returns a list, the code can be relatively easily understood. If it's a generator, this can be an incredibly confusing piece of code because do_something_with_disk_or_network can alter the generation of pull_from_database.

The same logic applies to iterating over dictionaries or other items. With python 3, I'm sure we'll start seeing many bugs of the following nature that can be pretty difficult to debug:

  d = get_dictionary()
  for k, v in d.items():
    do_something_and_possibly_mutate(d)


To me, the + operator for merging lists seems very intuitive.


The update function does not work well. It is very cumbersome to have to do an in-place update. A frequent bug I see is

  def my_func(d1, d2):
      """Returns a merged dict"""
      d1.update(d2)
      return d1
The problem here is that now the d1 you have passed in has been modified to contain all the keys of d2, overriding any keys that appear in both with d2's value. Having a first-class operation that does a merge without mutating the inputs will make the language easier, not harder.


I agree that the update function can be cumbersome, but the "+" operator implies semantics that do not apply well to dictionaries. For example, the following is illegal in python:

  {1, 2, 3} + {3, 4, 5}
Similarly, addition with dicts should not be allowed. The pipe operator would be a closer fit, but even that has problems because with sets it's commutative and with dicts it's not.


Yeah, I hate this. Now dict.items() become not thread safe just because of iterators. It could crash anytime just because you modified dict in another thread while iteration is in progress


I'm pretty sure that the situation you're describing was not thread safe in Python 2 either.

Sure, once you're in the body of the for loop, the dictionary must have been copied to the list so you're safe. But while d.items() is being evaluated at the start of the for loop, there is an internal iteration that could be preempted by the other thread. The GIL doesn't save you because Python operations aren't guaranteed to be atomic, and I doubt something that complex would be (it would be a serious problem if iterating over a large dictionary in one thread held up all other threads for an arbitrarily long time). Even if it is GIL-atomic, you're risking breakage if you move to another implementation (e.g. pypy) or if Python changes its atomiticity in future.

In general, if you want to modify an object in one thread and read it in another thread, you should add locking to prevent this happening simultaneously.

It is however true that the Python 2 items() method allows you to modify the dictionary in the body of the same for loop. But this is a surprising exception compared to iterating over a list or other container, so it makes sense overall to demand you explicitly make a copy if that's what you want.


In python 2 items() returned list, and access to dictionary was blocked by GIL, so while array is prepared, dict couldn't be modified. So it is thread safe in Python 2. In python 3 you need to lock, but it's not always obvious until it bites you. You may think that you need threads only for parallel processing and it's easy and managed, but there are much more common cases when you may use threads – UI or third party toolkits like QT, which often run callbacks in their own threads. And there is no other way to protect items() except of locking, even if you would try to prepare array out of iterator to make it faster, any parallel thread could break it by modification.

For myself I found only one good solution. Subclass dictionary and create thread safe version of it with locks around all critical operations: modifications and reads. If you want to make it more efficient you need separate read and write locks.


> while array is prepared, dict couldn't be modified

I mentioned this exact situtation in my comment. In fact that's what most of my comment is about.

To repeat:

* I don't believe it actually is atomic (but I haven't checked ... have you?)

* Even if it is it wouldn't be guaranteed to be atomic in future versions of Python (ignoring the fact that future versions of Python no longer have items() with the same symantics).

* It won't be safe in other implementations of Python e.g. pypy

* It doesn't match other collections that you can iterate over that don't need an items() e.g. list

* (This one is new) It won't be safe in user-defined dict-like classes that define their own items() method, even if that method is supposed to have the same symantics.

Modifying an object in one thread while reading it in another is a bug, even it seems to work for now. Don't blame Python for making it slightly more likely to break. Just using use a flipping mutex!


Lets assume that evaluating a Python 2 items() call or list isn't atomic, and it would break some multi-threaded code. Even with that, there is a huge difference with iterators that can be passed around left and right, and be executed far after they are generated.

Using a non-threadsafe list, race conditions and other problems will likely crop up in CPU-bound applications. However, with iterators, that may get lazy executed far after they are created, race conditions are far more likely to occur.

As an example, consider the following program:

  for value in x.items():
    do_shared_network_or_disk_call(value)
If "x" is a list, there is definitely the possibility of race conditions cropping up. But if "x" is an iterator, the possibility of that increases dramatically. In a multi-threaded/processed environment, both are bad, but why would Python 3 try to make the situation worse?


It's atomic in CPython and protected with GIL. It will be safe in user-defined dict classes if you will make it safe and care of this. And everything above is a matter of implementation. What you write is pure and correct in common sense, but it's not practical. If you have thread safe data structure that care about its state consistency itself, why not to use it without locks and make things simplier? I don't talk about syncing state of several data structures etc. I'm talking about very simple use cases when it becomes very handy.


What do you mean by "threadsafe" here? Could dict.items() actually break in Python 2? I've never seen that happen.


As I admitted in my comment, I'm not 100% sure that it's not protected by the GIL. If it's not, I wouldn't expect a hard crash if you mutate from another thread while iterating, but more like e.g. an item doesn't appear in the even though a different one had been removed by the other thread. But as I said in my comment, even if it does happen to be protected by the GIL, I think it's unsafe and fragile to rely on it.


Unexpected/undefined behavior?


There are a lot of cases when you don't need strict consistency and current state is enough for processing. For example you want to save requests stats from web servers. Would you stop all operations until you counting and writing to DB to be precise? Off course not. Some current number that you have is good enough for you. Off course you need to be aware of side effects.


Wouldn't you otherwise risk a dangerous race condition?

Many languages I know do that, for example C#.


Does anyone know a large Python application that is iterators all the way down which is not subtly broken?

I have never seen one.


dict.merge(d, ...) and dict.diff(d, ...) are more expressive and have a cleaner semantic.

overloading arithmetic ops for string, list or dict ops might only look elegant at first sight, but discrimination needs to be done at runtime, slowing down the most important arithmetic ops, and do not help much the casual code reader. It also cannot be used in normal python code as older python will fail, only in special internal code.

normal method names can be provided by external modules, so they are backwards compatible and will find more widely adoption.


Teacher here.

All my students eventually try {} + {}.

I'll bet on it to be the most intuitive.


Students are supposed to set things on fire and find various nifty features for learning's sake. But with experience - and the discovery that various such features largely overlap (eg. + and .extend()), one tends to grow out of that stance and become wiser.


It's intuitive to try, but it's not obvious what it actually does when keys overlap.


It’s obvious what you want done, and thankfully that is what’s going to be done.


the problem with overloading add is that add results in more than merge. merge is add minus the duplicate keys. students shouldn't learn wrong semantics.

{1:,2:} + {1:,3:} = {1:,2:,3:}


There is a conter proposal to use classmethod like this from another core developer.


with an object method you could return the changed dict (ref semantics, much faster) and as class methodcall you would return a copy, as with +.

d.merge(d1) vs dict.merge(d, d1)

So have the best of both worlds, and backwards compatible.


It's weird to use + for a non-commutative operation, right?


+ is already non-commutative for list and tuples for example.


And for strings.


Wow, I had no idea you could add tuples.


Big fan of .update(), .extend() and similar methods here. I do use

  { **d }
but I don't like using features just because they're there.

Let's try to write code that can be read by people without having to wonder about commutativity and such semantic details, especially when you're trying to build something that's supposed to be predictable.


They already use it for string concatenation. Multiplication would be more natural though (or, better, just concat function).


> The implementation will be in C. (The author of this PEP would like to make it known that he is not able to write the implementation.)

I hope this is for a reason other than the author being unfamiliar with C. Otherwise the author is cheating themselves, because adding functionality to an existing code base is probably my favorite motivator for learning a new language.


C isn’t the hard part, rather the Python api, and dev workflow.


I’ve contributed code to a handful of open source projects. Learning how to do that is also a worthwhile (and sometimes humbling) experience. I haven’t contributed to Python but it has an entire guide for how to do so, which already puts it above many projects:

http://devguide.python.org/

The CPython API is as straight-forward as any I’ve seen.

So my original comment still stands. :-)


I agree it is quite worthwhile. However, it will take longer than expected for a new person, and most of have obligations.


I don't like the idea, because a + b should produce new list c without modification of both. Which is not memory optimal and cost of it is not obvious. Also dict is used a lot for sub classes and that could break a lot of existing functionality with potentially no benefit for most of the developers. I don't think that merging is very common operation for dicts and even so it could be done with 1 or 2 update function calls, but that will be obvious in that case, while '+' in deeps of duck typing code is not. Also absence of '+' operation for dicts is kind of guard for type validation in case if someone passed dict instead of integer. Which is pretty common when you parse some JSON from client.


It does produce a new list/dict, etc. That's what you want to avoid side effects in many cases. Modifying is actually an exception. Handy to have, but not common. Even numpy arrays recreate everything every time.


One of the long awaited features, rejected by Guido many times, and finally accepted. Maybe we'll get list.get(), functools.partial() as a c written builtin, pathlib.Path() as a primitive or inline try/except, one day.


I don't think this has been accepted yet?


No but Guido said yes https://bugs.python.org/issue36144#msg336848 and even if he is not BDFL anymore, it usually means it will be done.


    def __add__(self, other):
        if isinstance(other, dict):
            new = type(self)()  # May be a subclass of dict.
            new.update(self)
            new.update(other)
            return new
Is there something I’m missing? To me it would be cleaner and more memory/time performant to just `self.update(other)` rather than having a third list instance at operation time. But that would really only apply if you have truly massive dicts.


Yes, Python supports in place operators to avoid this:

    def __iadd__(self, other):
        if isinstance(other, dict):
            self.update(other)
            return self
This would get called for `d += other`.


My follow up question would be why not do this for both union functions? I don’t see why they would want to have two functions that do the exact same thing (called differently, though) be written in two different ways.


Is there a real-world demand for the dictionary difference operator or is it just being proposed for completeness? I'm racking my brain to think of reasons to use it that would be more expressive than simply giving a list of keys to delete.


I like this. The current kwargs syntax is very confusing since it behaves very differently from funcall kwargs syntax.


What exactly differs?


For example multiple values for the same keyword argument are not allowed in funcall so you can't "update" default arguments with a dict of arguments that need to be changed.


Thanks, I did not know that.


This is what you get when you try to implement general mathematical concepts in a language that is horrible at expressing them. What a clusterfuck python is going to be in a few years.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: