Hacker News new | past | comments | ask | show | jobs | submit | page 2 login
PEP 572: Python assignment expressions has been accepted (groups.google.com)
317 points by est on July 3, 2018 | hide | past | favorite | 351 comments



Link to the email on the Python-Dev mailing list: https://mail.python.org/pipermail/python-dev/2018-July/15423...


With every PEP it seems Python gets closer to Perl...


It was done right the first time. Why not just use Perl.


I'm a full time "perl developer" actually :)


I miss that feature in multiple occasion list comprehension. I hacked through that with for x in [expr] but it never fell right.


Don't anaphoric macros [1] provide an alternative solution to this problem? Though they seem cool I can see how introducing a magic variable can result in unreadable code...

- [1] https://en.wikipedia.org/wiki/Anaphoric_macro


I would like to see adding a comprehension-like filtering clause to for-statements:

    for n in range(100) if n%2:
        print(f'{n} is odd number')
Does anyone know if there is a PEP covering that?


Historically there has been much resistance to proposals like this which only save a line. The existing code is, after all:

    for n in range(100):
        if n%2:
            print(f'{n} is odd number')
You proposal also leads to a more ambiguous grammar because the following is currently allowed:

    for n in range(100) if n%2 else range(n):
The ambiguity can be extended with multiple if's, compare:

    for x in range(10) if n%2 if n else range(n):
    for x in range(10) if n%2 if n else range(n) else n**2:
A work-around would be to raise something akin to the "SyntaxError: Generator expression must be parenthesized if not sole argument" that occurs with expressions like "f(b, a for a in range(3))", but that's a lot of work just to save a newline, two indents, and ":", isn't it?


How is that ambiguity currently handled in comprehensions?

My point is that it would be nice to have a consistent syntax for all for-loops, either being a part of a comprehension or standing on their own.

EDIT:

> You proposal also leads to a more ambiguous grammar because the following is currently allowed:

    for n in range(100) if n%2 else range(n):
Not really, I gives me "NameError: name 'n' is not defined". Unless it is an 'n' defined in the outer scope, of course.


"How is that ambiguity currently handled in comprehensions?"

A bit poorly. Compare:

  >>> f(1, 2 for x in )
    File "<stdin>", line 1
      f(1, 2 for x in )
                      ^
  SyntaxError: invalid syntax
  >>> f(1, 2 for x in r)
    File "<stdin>", line 1
  SyntaxError: Generator expression must be parenthesized if not sole argument
See how the first one gives the location of the error while the second does not? As I recall, this is because the first can be generated during parsing, while the second is done after the AST is generated, when the position information is no longer present.

That's why the following:

  >>> f(2 for x in X) + g(1, 2 for y in Y) + h(z**2 for z in Z)
    File "<stdin>", line 1
  SyntaxError: Generator expression must be parenthesized if not sole argument
doesn't tell you which generation expression has the problem.

Yes, I meant that if 'n' is defined in an outer scope. The expression I gave is not a syntax error but a run-time error.


This does not answer my question, so I checked -- comprehensions simply do not accept the `else` clause:

    >>> [a for a in range(10) if True else range(2)]
      File "<stdin>", line 1
        [a for a in range(10) if True else range(2)]
                                         ^
    SyntaxError: invalid syntax
And this is the argument why I can't have my wish, because the standard `for` loops have always accepted `if else`, so it would be a backward incompatible change.

That said, I have another idea: an update to the comprehension syntax which would omitting duplication of variables, using a new "for in" construct. For example, this line:

    (x for x in range(100) if x%2)
...could be written as:

    (x for in range(100) if x%2)
Just an idea... :D


Agreed these are a little verbose but they get the job done no?

    for n in filter(is_even, range(100)):
        print(f'{n} is odd number')

    for n in (i for i in range(100) if i % 2 == 0):
        print(f'{n} is odd number')
Are there any points against these solutions other than verbosity?


Yes, that's what I've been using so far, especially filter, which works quite well with lambda. But if you have a separate function anyway it's better to make it into a generator:

    def odd_range(count):
        return (x for x in range(count) if x%2)
        
    for n in odd_range(100):
        ...
As for the second one, I'm just not too happy with the implied two loops (even if it amounts to only one in practice).


Does this save anything? The canonical way to do this is

    for n in range(100):
        if n%2:
            print(f'{n} is odd number')
Only two more indents. What is the point of your proposed syntax?


I write that all the time and when python complains I usually rewrite it to something like:

for odd_numbers in [n for n in range(100) if n%2]:

after a quick "stupid python" comment.


  for n in [i for i in range(100) if i%2 == 0]:
    print n
Will work (if a bit repetitive looking)


It's not just repetitive; this particular example actually creates a list before starting the external loop -- imagine it with range(100000000) or something. It is better if you replace [] with (), which creates a generator.


`range()` also takes an optional `step` argument which would help here.


That's nice.

Give me the `|>` operator plz


What is it? The link points to a discussion more deep than I’m willing to read.


Basically it's about adding := as an "assignment expression operator", that does assignment and returns the value as an expression. That is, take this regex example:

    match1 = re1.match(text)

    if match1 is not None:
        do_stuff()
    else:
        match2 = re2.match(text)

        if match2 is not None:
            do_other_stuff()
Which is a bit clunky. you only want to evaluate match2 in case match1 fails, but that means a new level of nesting. Instead, with this proposal, you could do this:

    if (match1 := re1.match(text)) is not None:
        do_stuff();
    elif (match2 := re2.match(text)) is not None:
        do_other_stuff()
Evaluate and assign in the if-statement itself. This is not dissimilar to the equals operator in C. In C, you would frequently find loops like `while ((c = read()) != EOF) { ... }`. This would presumably allow a similar pattern in python as well.

More information can be found in PEP-572: https://www.python.org/dev/peps/pep-0572/


Hehe. More chances for C-style bugs like:

if (a = b) /* Oooops, meant a == b! */


Presumably that's why they've gone with the far more sensible ":=" syntax.

The use of "=" for assignment has long been a pet peeve of mine. It was a mistake when C did it, and it's been a mistake for so many subsequent languages to copy it.

"=" shouldn't be an operator at all, it makes a lot more sense to use ":=" and "==".

Pascal's use of ":=" for assignment and "=" for equality, strikes me as almost as clear.

Still, at least C makes consistent use of '=' for assignment, unlike that god-forsaken trainwreck of a language, VB.Net, which uses it for both assignment and for equality depending on context.


It's not a problem in C anymore as modern compilers warn about that so you had to put additional parenthesis to make it clearer.

I like C way of assignment being an expression. I think having separate statement and then assignment expresdion is a mess. It's still useful though as Python was missing where keyword like feature from Haskell which is necessary to avoid duplicating computation in list comprehension.


Except it's more likely you're accidentally inserting a character twice than inserting another extra character (':')


Difference is bigger, C is `if (a = b)` vs `if (a == b)`. Python is `if (a := b)` vs `if a == b`


It's a controversial PEP https://www.python.org/dev/peps/pep-0572/ which allows you to write Python like this:

    def foo():
        if n := randint(0, 3):
            return n ** 2
        return 1337


    [(x, y, x/y) for x in input_data if (y := f(x)) > 0]


It also seems include a special case for if/while that lets you do:

    def foo():
        if randint(0, 3) as n:
            return n ** 2
        return 1337
which looks a bit better to me.


I think that's a rejected alternative proposal, not part of this PEP.



This is horrible. It looks like ":=" is a comparison operator. The last line is dangerously close to Erlang list comprehensions:

[ {X, Y, X/Y} || X <- Some_Function (), Y <- Some_Other_Function () ]

And people bitch about Erlang syntax.

Edit: "/" is the division operator


This immediately looks useful for things like:

    if foo := bar[baz]:
        bar[baz] += 1
        return foo
    else:
        bar[baz] = 1
        return 0
Where foo is a dict keeping track of multiple things, and a non-existing key (baz) is never an error but rather the start of a new count. Faster and more readable than

    if baz in list(bar.keys()):
    ....
Similar to Swift’s ‘if let’, it seems.


The place I see using it is in (quoting Python's "python.exe-gdb.py"):

        m = re.match(r'\s*(\d+)\s*', args)
        if m:
            start = int(m.group(0))
            end = start + 10

        m = re.match(r'\s*(\d+)\s*,\s*(\d+)\s*', args)
        if m:
            start, end = map(int, m.groups())
With the new syntax this becomes:

        if m := re.match(r'\s*(\d+)\s*', args):
            start = int(m.group(0))
            end = start + 10

        if m := re.match(r'\s*(\d+)\s*,\s*(\d+)\s*', args)
            start, end = map(int, m.groups())
This pattern occurs just often enough to be a nuisance. For another example drawn from the standard library, here's modified code from "platform.py"

    # Parse the first line
    if (m := _lsb_release_version.match(firstline)) is not None:
        # LSB format: "distro release x.x (codename)"
        return tuple(m.groups())

    # Pre-LSB format: "distro x.x (codename)"
    if (m := _release_version.match(firstline)) is not None:
        return tuple(m.groups())

    # Unknown format... take the first two words
    if l := firstline.strip().split():
        version = l[0]
        if len(l) > 1:
            id = l[1]


It' a problem with re module really.

re.match should return a match object no matter what, and .group() should return strings, empty string if non were matched.


I don't see how that would improve things. Could you sketch a solution based around your ideas?


Don't wait for 3.8, and don't bother with defaultdict.

collections.Counter is what you want for the counting case.

dict.get() + dict.setdefault() for the general case.

defaultdict is only useful if the factory is expensive to call.


As pointed, you can use either a default dict or just simply, and [more pythonic](https://blogs.msdn.microsoft.com/pythonengineering/2016/06/2...):

    try:
      bar[baz] += 1
    except KeyError:
      bar[baz] = 1
Also you can check if a key is in a dict simply by doing "if baz in bar" no need for "list(bar.keys())", which will be slow (temp object + linear scan) vs O(1) hashmap lookup.


The error-catching method seemed too drastic to me before, but the article explains the LBYL vs. EAFP arugument quite well. Thanks!

I should find a way to get more code reviews, I really enjoy learning these small nuggets of info.


Alternatively

`bar[baz] = bar.get(baz, 0) + 1`

One line and no error checking.

But the OP was probably just illustrating a basic example where you might have some more intense logic


It's also time saving since the hash lookup needs to be done at most 1, as well. GP has two lookups in the hash list.


For stuff like that I'd just use `defaultdict`. That if/else tree then reduces to 2 lines total.


That’s a good tip, thanks!


Would've making regular assignment an expression broken too much existing code?


It's a voluntary design choice since the beginning of Python to avoid the very common mistake of doing:

    while continue = "yes":
instead of:

    while continue == "yes":
Those mistakes introduce bugs that are hard to spot because they don't cause an immediate error, linters can hardly help with them and even a senior can make them while being tired.


I don't know about linters but GCC warns me about that every time I make that typo. They could just require parenthesis when assignment value is used as boolean.


Probably not, since expressions can already be statements. But that would allow dangerous code like "if a = 3", which I don't think the Python devs would want to allow.


Reminds me of the kind of hacks you would find in an old-school K&R book.


Can somebody comment on why is this PEP controversial?


I don't think the controversy here is with the feature itself, more with the implementation. Many, me included, would have preferred to seen a different implementation of solutions to the same problems.

Code starts becoming a lot harder to reason about when more than one state is mutated on the same line. The good design of Python makes this harder than in say C and I think this is a step in the wrong direction in that regard.

The two real things this solves are checking for truthyness in an if and reusing values in a filterting comprehension. Instead of the syntax we have now that can be used anywhere, adds a whole new concept and feels kind of out-of-place, I would have much preferred a solution that can only be used in vetted places, doesn't add a new thing people need to learn and follows the style of the language

For example, my preferred solution for `if` would have been:

    if thing() as t:
        print(t)
Usage of `as` is already established by the `with` block

    [value for x in y
     if value
     where value = x * 2]
The order is unfortunately a bit weird here, but there is no need to add the whole concept of a different type of assignment and this syntax will feel instantly recognizable to people familiar mathematical notation, which is where the existing list comprehension syntax comes from and so has been established as well.


I wanted "as" too. But the accepted operator has the benefit of integrating perfectly with type hints.


For many people (including me) who learned Python the way that, in languages like C, the `if x=2` assignment combined with condition is an anti-pattern and prone to errors.

This PEP solves very little problem, saves a few characters of code, but adds complexity to readability.


It makes list expressions and some other things more powerful, but some feel the potential to create difficult-to-understand constructs with it is too high and the current ways of writing such code are clear enough.


Ick.


I've come around to it purely based on the application in list comprehensions.


The proposal: https://www.python.org/dev/peps/pep-0572/

Short version.

(x =: y) is an expression that:

1. assigns the value y to the variable x

2. has the value y.

So `print((x := 1) + 1)` prints '2', and sets x=1.

A ton of languages [eg: c, js] have '=' work this way. And a ton of style guides for those languages tell you to avoid using it like that, because it's confusing. So this is a bit controversial.


You're allowed to do assignments inside of expressions

E.g.

    if(x:=f() is not None):
        print(x)
You can read more about it here: https://www.python.org/dev/peps/pep-0572/


I'm immediately skeptical after seeing this example because I'm not sure if the first line parses as:

  if (x := f()) is not None:
or as:

  if x := (f() is not None):


That's why parenthesis are mandatory.


:= overrules everything except a comma, so it's the latter. Still, I agree it's potentially confusing.


High-level overview: it's an assignment operator that returns its value, similar to C's assignment operator.

The choice of := is to avoid accidentally using assignment where comparison is expected.


I feel the colon is unnecessary, especially considering how C deals with this. A plain '=' inside a conditional is already invalid syntax in Python.


And it's a very well-known source of bugs in C, since it's to close to "==". I don't think new languages adopting that is a good idea.


Sure. But if fidelity to C style was not a concern then I don't see why the '==' syntax was adopted in the first place.


== is an incredibly common syntax for equality and stand-alone not a problem. only if you introduce = to expressions too it becomes a risk. (well, you could theoretically accidentally write == for a normal assignment, but that kind of error is caught more easily)


No, it's necessary.


How so? Syntactically, or from a pragmatic point of view?


Yeah but there is already solution for that in C: put parenthesis around assignment when using its value as bool. The compilers warn if you don't so making this error in C can only happen if you don't use warnings.



Having just spend the last few weeks writing Python, this comment will come off as bitter, but - really? Out of all the shitty syntax things, this sort of thing is what they're willing to fix?


It would be more constructive if you added explanations of what other (in your opinion) "shitty syntax" things you prefer to see addressed and why.

EDIT: also, I'm mentioning "in your opinion" because adding that to your own statements indicates that you're open to discussion. It's also good to use it as a reminder to yourself (speaking as a former physics student, and most people who know physics students will agree how "absolute" and unintentionally arrogant they tend to be in their claims until they learn otherwise).

I'm sure your coding experience was frustrating, and I'm sorry to hear that, and I understand that we all need to vent sometimes, but trying to staying open to other viewpoints is better for your own sanity, wisdom, and social connections in the long run.


Not the OP, but the inability to write

  sorted(enumerate([('a', 1), ('b', 2)]), key=lambda (i, (k, v)): (k, i, v))
in Python 3 drives me nuts. They could fix this. It worked in Python 2.


Yeah, PEP 3113 was quite weak. It didn't even explore alternatives (like fixing the introspection without breaking the syntax).


Seems that the goal here is to sort via the letters 'a', 'b', etc combined with capturing the original ordering?

You could do this, although it's admittedly uglier than your example:

  In [1]: sorted(enumerate([('b', 1), ('c', 3), ('a', 2)]), key=lambda x: (x[1][0], x[0], x[1][1]))
  Out[1]: [(2, ('a', 2)), (0, ('b', 1)), (1, ('c', 3))]
However, if you're flexible about the ordering of the resulting tuples, this seems clearer and reasonably painless:

  In [1]: sorted((x, i) for i, x in enumerate([('b', 1), ('c', 3), ('a', 2)]))
  Out[1]: [(('a', 2), 2), (('b', 1), 0), (('c', 3), 1)]
I know that doesn't address your underlying complaint. This is mainly to note that the flexibility of Python tends allow a variety of approaches and that sometimes finding the clearest one takes some effort. ("There should be one obvious way to do it..." often does not hold, IMHO.)


> this seems clearer and reasonably painless

but gives the wrong answer, as you'll see if you try to sort [('a', 2), ('a', 1)]


Sure; it wasn't clear to me whether the letter fields could be repeated.


Why would I have included the indices for sorting if the letters were guaranteed to be distinct...


I love writing python, but the lack of decent map/reduce/etc. (that are really good in Javascript - the other language I mainly write) hurts. Stuff like your example really feels burdensome and inelegant in comparison.


Python list comprehensions _are_ map/reduce.


I think you mean map/filter.


Yes, thanks for correcting me.


Sure, but they compose really bad.


Why? Just assign to a variable, then do another list comprehension on the next line, no?


And that is a pretty sad state of affairs compared to ways that do compose well, like the one in Ruby which you can easily chain.


You're right, it does come off as bitter, but worse, it's not a good conversation starter. People who like Python and forced occasional Python developers tend to have different ideas about what is a "shitty syntax thing" and what is a core feature of the language.

Is there a better example of something that is generally agreed to be "shitty" yet could be fixed in a clean way, without breaking backwards compatibility?


OK you're all right, I should've included some examples, here's just some from the last few days (I guess some of these are arguably not purely 'syntax', but to me they mostly come down to that, and I guess most can be explained away with 'different philosophy', and I'm sure someone will come out and say 'oh but if 'only' you had just done this and this', but still...) :

- Instantiating an object from a name in a string. Like instantiate a 'Foo' when you have a string variable that contains 'Foo'. I can't remember the syntax even though I looked it up two days ago, and I never will because it's such a shit show. Not to use PHP here as and example of a great language, but there at least the intuitive '$obj = new $var' works as you expect it. Or, in C++ you have to do it manually, which is also fine - at least be consistent.

- The weird sort-of typing of variables. Variables have types, but you can assign a different value of a different type to them, and the actual types usually doesn't matter except when it does. So you do print "Hey " + var but now you need to know what type var is because you might need to str() it.

- The whitespace-is-important-except-when-it-isn't. OK so braces are the devil's work, but when it's inconvenient, we're not that strict on white space (when initializing lists, when having expressions that span several lines, ...) so now everything can still look out of whack.

- .iteritems(). Really?

- super(ClassName, self).__init__(argument). Wut? Yes when I parse it token by token I understand, but why? Maybe the other magic methods are in this category too, but probably to a lesser degree.

- (I had some other things here about primitive OO capabilities, shitty package system/versioning, and some more, but those were all so far away from 'syntactic sugar' that they didn't fit this list no matter how hard I twisted the argument)

Look, I do understand why they are this way. For each of them, there is a reason that any reasonable person would say 'yeah that makes sense' to, possibly after some explanation of the history or context or whatever. But then at least be honest and stop promoting the language as so 'intuitive' or 'beginner-friendly' or 'much more clean than other languages'. Sure, it's not as bad as R, but it's still just like any other 20+ year old language in wide spread use - crufty, idiosyncratic in many respects, and in general requiring a bunch of frustrating head butting before you can be productive in it.

And to tie it to the OP - it seems this new syntax is promoted as being for 'beginners' or to make it 'easier to teach'. Well good luck with that, I say.


"Instantiating an object from a name in a string. Like instantiate a 'Foo' when you have a string variable that contains 'Foo'. I can't remember the syntax even though I looked it up two days ago, .."

Python doesn't have specific syntax for that. It can be as simple as:

  obj = globals()["Foo"]()
That assumes "Foo" is in your global namespace. If you don't care about security then you can do:

  >>> import math
  >>> s = "math.cos"
  >>> eval(s)(3)
  -0.98999249660044542
If you care about security then you might not want to allow arbitrary objects, like "os.unlink" to be referenced. There are third-party packages which provide different models of how to get objects, like Django's "import_string" at https://docs.djangoproject.com/en/2.0/ref/utils/#django.util... .

"The weird sort-of typing of variables. Variables have types,"

Variables have only one type, "reference to Python Object". An expression like 'var = "Hey " + var' may change the type of the value that var references, just like how 'var = var * 234.567' may change the type of the value that var references from an integer/long to a float, or "var = var * "Hey"', if var == 2, causes var to be a reference to the string "HeyHey".

".iteritems(). Really"

This was for a transition phase. It no longer exists in Python 3, where items() returns an iterator instead of a list.

"super(ClassName, self).__init__(argument). Wut?"

In Python 3 this is: "super().__init__(argument)", as in:

  >>> class A:
  ...   def __init__(self, s):
  ...     print("A says", s)
  ...
  >>> class B(A):
  ...   def __init__(self, t):
  ...     super().__init__(t*2)
  ...
  >>> B("hello?")
  A says hello?hello?
  <__main__.B object at 0x10b201630>
"but it's still just like any other 20+ year old language in wide spread use - crufty, idiosyncratic in many respects"

A reason for the oft-bemoaned backwards-incompatible changes to Python 3 was to remove some of the crufty, idiosyncratic language features that you rightly pointed out. You are still using Python 2.7, so cannot take advantage of those changes.


Ok the 'global' syntax thing makes a lot more sense than the one I was told about (which involved the 'inspect' module iirc). The typing - yes I understand all of that, my point still stands that it's all 'you don't need to know all of this! Oh, at least, until you do'. Wrt python3 - fair enough, things probably do get better. But that just reinforces my point about cruft building up in any practical language, and Python being just as susceptible to it as other languages.


Referring to a class by its name is the wrong approach in Python, because classes are themselves objects.

In PHP, a class is uniquely identified by its name, everywhere. If you define a class Foo, then new $var will resolve correctly either everywhere or nowhere (the name needs to be fully qualified, to avoid namespace headaches).

That's not the case in Python. A class Foo has the same status as any other object. That means you can't rely on its name - it could be replaced by another value. But it does mean you can pass the class itself around instead of its name. Instead of putting the string 'Foo' or the literal Foo::class into a data structure or an argument list, you can just put Foo in there, and call it later.

I think the Python approach is cleaner, but then again, it's what I already knew when I first learned how PHP did it.

Python doesn't need to instantiate classes based on string values, so it doesn't provide an easy way to do that.

Python almost allowing you to take the PHP approach is a bit of a pattern. Python is dynamic enough to let you do a lot of things you shouldn't be doing. Ugly syntax like globals()[var] is usually (but not always) a sign that you're looking for the wrong kind of solution.


Like what? Python generally has pretty good syntax.


This comment will also come off as bitter - but why don't you just go and write your code in whatever you have been writing it in before?

Just leave it to those who have used Python for a while now and actually know what it's missing.


Can you elaborate what other syntax decisions you find shitty? In general I find the syntax of python to be very clean and easy to understand


Just curious, which pep would you prefer they do first? Or do you have a grudge with something not addressed in existing peps?


Oh but I can find grudges in anything :)

(as to your first point, if you'll allow me to be even more snarky and cynical as I already have been in this thread (might as well go all out now), the fictional 'pep' I would like to see most is 'method and apparatus to instill some realism and humility in the average Python advocate's conception and description of the language'. But here too I will freely admit that I'm probably susceptible to significant observation bias and/or bad luck, and that others could have radically different experiences from myself.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: