Hacker News new | past | comments | ask | show | jobs | submit login
WTFPython: Exploring and understanding Python through surprising snippets (github.com/satwikkansal)
460 points by gilad on May 31, 2022 | hide | past | favorite | 139 comments



I have a mildly surprising snippet which I believe they don't have. It's about interaction of class and instance attributes with behavior of "X += Y". Python expands it to "X = X + Y" but only if X implements "+" but not "+="

    class A:
        a = (1,)  # tuples are immutable and don't have "+="
        b = [1,]  # lists have "+="
   
    obj = A()
    obj.a += (2,)  # this creates obj.a
    obj.b += (2,)  # this modifies A.b

    print(A.a, A.b)
The snippet prints "(1,) [1,2]".


Oh, that's a wtf for sure. More to be scared of and more paranoid I'm going to appear to junior devs while reviewing merge requests. If I'm not mistaken there's a chapter in Effective Python about this, where they recommend never using lists or dicts as Class attributes or default values for parameters and instead use factories or use None and then initialize them with default values when a new instance is created.


> If I'm not mistaken there's a chapter in Effective Python about this, where they recommend never using lists or dicts as Class attributes or default values for parameters and instead use factories or use None and then initialize them with default values when a new instance is created.

Ya, this is because the initialized object is stored as the default parameter, and is not initialized every time the function runs.

    def functy(dicty={}):
        print('dicty=', dicty)
        return dicty
   
    output = functy()
    # dicty = {}

    output.update({'key': 'value'})

    functy()
    # dicty = {'key': 'value'}


Thanks for bringing this up, I didn't make the connection to default args.

I think this behavior is actually "default args" but reversed. In case of default args it is the mutability that causes the nasty surprise (ooops, all calls share the same object, which is mutable). In case of the "+=" it is the _immutability_ that causes the surprise - I would not expect "obj.a += ..." to create an instance attribute that shadows the class one.

I think the practical conclusion here is "don't call += on objects that don't support it - because it may work and this is not what you want!)".


Maybe we are looking at it differently. To me, the behavior of "obj.a += (2, )" is the obvious and expected one i.e. I expect a tuple "(1, 2)" to be created and stored in "obj.a" and the class attribute "A.a" is untouched and remains "(1,)". The behavior of the "obj.b += [2,]" for the list is the real gotcha here where it correctly alters the value of "obj.b" but wtf'ingly also alters the value of the class attribute "A.b" which completely alters the behavior of A.

I'm curious if you see it the same way?


> The behavior of the "obj.b += [2,]" for the list is the real gotcha here where it correctly alters the value of "obj.b"

It seems to, but, actually, it doesn't.

> but wtf'ingly also alters the value of the class attribute "A.b" which completely alters the behavior of A.

Actually, all it does is call the __iadd__ method on the list that it is the value of A.b (which modifies that list in-place.)

The value of A.b can, incidentally be accessed via obj.b because obj.b does not exist and the lookup path for instance attributes includes the class.

There are two things going on here there are potential sources of confusion, because they behavior differently in different circumstances: object member access (which can get members from the object or, if they don't exist on the object, from the class) and the += operator, which calls the __iadd__ method on the left value or acts as add and then assign to the name on the left, if that name doesn't reference a value that supports __iadd__.


> The value of A.b can, incidentally be accessed via obj.b because obj.b does not exist and the lookup path for instance attributes includes the class. [...] the += operator, which calls the __iadd__ method on the left value or acts as add and then assign to the name on the left, if that name doesn't reference a value that supports __iadd__.

That is not correct. += will always perform the assignment, hence the classic oddball concatenation:

    >>> v = ([],)
    >>> v
    ([],)
    >>> v[0] += [1]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'tuple' object does not support item assignment
    >>> v
    ([1],)
So `obj.b += [2]` desugars to something along the lines of:

    l = obj.b
    l.extend(2)
    obj.b = l
Hence if you call vars() on the instance or access its __dict__, you will see that the instance does have an intrinsic "b" attribute:

    >>> class A: b = [1]
    ... 
    >>> a = A()
    >>> a.b += (2,)
    >>> vars(a), a.__dict__
    ({'b': [1, 2]}, {'b': [1, 2]})
And if you re-set the attribute on the class, it won't affect the instance, which it would if the instance delegated to the class:

    >>> A.b = [5]
    >>> a.b
    [1, 2]


My intuitions come from C and C++, so "+=" should always modify a single object (in C lingo it would be "lvalue") in place. Storing the result somewhere else than the left side of "+=" is counterintuitive to me.

Side note: the object can be immutable, but then the (variable or attribute) left side should be replaced with a new immutable object.


> My intuitions come from C and C++, so "+=" should always modify a single object (in C lingo it would be "lvalue") in place. Storing the result somewhere else than the left side of "+=" is counterintuitive to me.

I thought of this as though the list A.b was marked static - then any modification of b through any instance of A modifies the shared static member

  >>> class A:
  ...   b = [1,]
  ...
  >>> obj = A()
  >>> obj.b.append(2)
  >>> A.b
  [1, 2]
  >>> foo = A()
  >>> foo.b
  [1, 2]
  >>>


Gotcha, I see where we differ. I've exclusively programmed in Python and since int, str and tuple are immutable in Python, I've always assumed "+=" operators create copies and don't update the object.


> int is immutable

But what does that have to do with assignment?

  >>> a = 3
  >>> a += 4
  >>> print(a)
  7
3 and 7 are immutable, yet a takes on those objects.

Obviously, this is inspired by the C += operator, and behaves accordingly, when it's not being weird.


If you go back to the original comment from the user praptak, you will notice in his example the assignments "obj.a += (2,)" and "obj.b += [2,]". These assignments work exactly as expected no matter which school of thought you come from.

The difference is in how the Class attributes are affected i.e. the values of "A.a" and "A.b".

  # obj.a += (2,) has no impact on A.a since tuples 
  # are immutable and copies are made.
  # This is unexpected from a C/C++ intuition.

  >>> print(A.a)
  (1,)    
   
  # obj.b += [2,] has an indirect impact on A.b
  # since lists are mutable and updates are made in place.
  # This is unexpected for beginners in Python since it
  # deviates from how int, str and tuples behave as "+="
  # can be used for int, str, tuples and lists.
  
  >>> print(A.b)
  [1, 2]

Edit: formatting


> These assignments work exactly as expected no matter which school of thought you come

No, they don't.

obj.b does not exist, and no assignment to it takes place, and there is no “indirect effect”.

A.b exists, and because class attributes can be accessed as if they were members on instances, and because A.b has an implementation of in-place addition, that implementation is called on A.b instead of an assignment to obj.b. That is the only thing that happens as a result of the statement that looks like it might be an assignment to obj.b. The effect on A.b is the only action, not an indirect effect of an assignment.


> The behavior of the "obj.b += [2,]" for the list is the real gotcha here where it correctly alters the value of "obj.b" but wtf'ingly also alters the value of the class attribute "A.b" which completely alters the behavior of A.

It necessarily alters the value of A.b, since `+=` is specifically overridden to update the list in-place. With lists, `a += b` is specifically not an alias for `a = a + b`, instead its behaviour is closer to

    a.extend(b)
    a = a
The issue here is the first part, I absolutely hate this override, and sadly the core team has not learned a thing there as `|=` was overridden the exact same way on dicts when `dict | dict` was added to the language.


Yeah it is a little tricky, but using them as class attributes can actually be handy in some circumstances.

Note to readers unfamiliar with Python: class attributes is a special thing, different than instance attributes.

Definitely one of the trickier things with Python for sure.


There's a similar surprising += 'problem' in JS. If you have the following code:

  async function totalSize(fol) {
    const files = await fol.getFiles();
    let totalSize = 0;
    await Promise.all(files.map(async file => {
      totalSize += await file.getSize();
    }));
    // totalSize is now way too small
    return totalSize;
  }
You get an overly low totalSize. It's caused by 'a += b' expanding to 'a = a + b', and the double-mention of 'a' creating a concurrency issue. If '+=' were a single operation with the right-hand-side being calculated first, it wouldn't be an issue.


I don't think this is a concurrency problem. It's a closure capture problem.

I guess the problem in python is different though, it's about having some data structures that are immutable (like tuple) and others that are mutable (like list)


The problem here is the closure on totalSize and where you're awaiting, totalSize is going to be 0 for all files and then you await for the result of getSize, so in the end totalSize is going to be equal to the size of the last file that completes the getSize call.

To fix this, simply await the result of getSize first, and then add the result to totalSize using +=


> and where you're awaiting

That is the sole source of the issue, and exactly what the comment you're replying to talks about.

The problem is that `a += b` desugars to `a = a + b`, so if `b` is an await you get

    totalSize = totalSize + await file.getSize();
Since javascript evaluates left to right, it first evaluates `totalSize`, gets zero, then `file.getSize()`, then suspends waiting for the result... at which point the handler for the next file can run, doing the exact same thing, repeat for all files.


stupid question incomming: isn't that a '+= await' problem?


Wow, that is surprising.

That must have been a fun one to track down!


I think this example is quite deceptive. When setting obj.a and obj.b, what is your intention? If you want to set the instance attributes, you should have created self.a and self.b in the class constructor.

  class A:
    a = (1,)
    b = [1,]
    def __init__(self):
      self.a = (1,)
      self.b = [1,]

  A.a # this is the class attribute
    (1,)
  A.b # class attribute
    [1]
  obj = A()
  obj.a # instance attribute
    (1,)
  obj.b
    [1]
  obj.a += (2,)
  obj.b += [2,]
  A.a 
    (1,) # still the same class attribute
  A.b
    [1]
  obj.a
    (1, 2) # instance attribute appended
  obj.b
    [1, 2]


I think that's the point is that it does seem rather ambiguous as to the intent. If someone is new to programming entirely, what are they thinking is happening with an operation *on obj.a or obj.b? If they know other languages and are just starting out learning Python, what expectations are they bringing from those languages?

If I had had magic wand, I'd make operations on class attributes from an instance a syntax error, and only allow it from the class name, and perhaps with special syntax like in C++, e.g.:

    A::a = (1,)
    A::b = [1,]
*edit: operation on obj.a or obj.b...


Edit: I understand it now. See comment below.

> # tuples are immutable and don't have "+="

What do you mean? Because it looks to me tuples have "+=":

    a = (1,)
    a += (3,)
    # a = (1, 3)
I get that tuple is immutable so you're actually creating a new tuple and shove it back into variable a, but it does not conflict with "having += operation".


> shove it back into variable a

Python is more of a lisp. The assignment semantics are different to C. Actually, you're letting the name "a" point to the newly created object.

In C, a variable is a memory position you can fill and point to.

In Python, a variable is a name that points to a memory position.


Other than in regard to the way type works, and lexical capture, mainstream Lisps have variable semantics that work more or less like C. The names of lexical variables disappear at compile time; a given a variable may be just a slot in some environment frame (perhaps a stack frame), a machine register, or disappear entirely via constant folding: or any of these three in different areas of the code. In any case, symbols do not "point" at memory positions.

Similarity to C is no accident here, since the lexical scoping concepts in Lisps like Scheme and CL, as well as in C, both trace back to Algol.

Classic Lisp global/dynamic variables may store a global value a "value cell" which is closely tied to the symbol itself (perhaps stored in it).


I would agree that the compilation to machine code might do something different than the high-level mental model. Though the machine code or intermediary code might be interesting to some, the semantics and mental model on a higher level are usually what we reason about. If we don't do it this way, then Lisp doesn't exist and functional programming doesn't exist, because we don't have a machine model that behaves like it.

And in that sense, C assignment and Lisp assignment are different beasts. Mainly in that - again in the semantics of the high-level description - "memory reservation" in C happens when we declare the left-hand side of an assignment and in Lisp it happens when we construct the right-hand side.

Though if you can show me a mainstream Lisp where the assignment works like setting a memory cell and not like a bind, I'm happy to be proven wrong.


If you follow the abstract mental model that the binding construct such as let reserves the memory cells, which assignment and initialization just fill in, you will not misunderstand your programs, except for some bugs that interact with optimization where you have to know when to let go of the model.

The right hand side reserves memory only in the sense that some heap object is allocated (if that is the case), but that is secondary to the assignment; and that is the same as C also, as in:

  f = fopen(...); // right hand allocates stream; pointer moves into variable.
> if you can show me a mainstream Lisp where the assignment works like setting a memory cell and not like a bind, I'm happy to be proven wrong.

The mainstream Lisps clearly separate binding from assignment.

   (let (x)        ;; x refers to a freshly allocated cell (nil-initialized, in Common Lisp)
     ...
     (setq x 42)   ;; cell is clobbered, replacing nil with 42.
     ...)
Some of the terminologly used in the specifications is a bit confused. For instance, Common Lisp says say that:

1. A variable is a "binding in the variable namespace" (Glossary)

2. A binding is an association between a name and a value (Glossary)

3. Yet, the description of SETQ uses language like: "First form1 is evaluated and the result is stored in the variable var1".

4. LET is described like this "let and let* create new variable bindings and execute a series of forms that use these bindings". No binding creation semantics is mentioned for SETQ.

Results can only be stored in storage places; nothing can be stored in an "association between a name and a value", unless that association is actually set up through a memory cell: the name refers to a location and the location holds a value. SETQ isn't described in terms of breaking an old binding and setting up a new one..

In the case of dynamic and global variables, there is a term "value cell" which the Glossary defines: like this: "The place which holds the value, if any, of the dynamic variable named by that symbol, and which is accessed by symbol-value. "

Let's turn our attention to Scheme. R7RS says in 3.1. Variables, syntactic keywords, and regions this:

"An identifier can name either a type of syntax or a location where a value can be stored."

and:

"An identifier that names a location is called a variable and is said to be bound to that location."

"The value stored in the location to which a variable is bound is called the variable’s value."

The next sentence is a kicker, and can be regarded as a criticism of the ANSI CL definition of binding:

"By abuse of terminology, the variable is sometimes said to name the value or to be bound to the value."

If you think that Lisp variables are bindings to values, then according to the Scheme maintainers, you've fallen victim to abuse of terminology. :)


I would read that as not implementing `__iadd__` https://docs.python.org/3/reference/datamodel.html#object.__...


Ah I got it. So it would fallback to __add__ if __iadd__ doesn't exist.

Thanks!


Tuples don't have an in-place addition method (__iadd__), and if the left operand of += is a name that does not contain a value that supports __iadd__, then l += r is a fancy way of writing l = l + r.

But if l does support __iadd__, then l += r becomes l.__iadd__(r).


That's true though kind of an implementation detail. Conceptually x += y is always x = x.__iadd__(y), and x can have its own semantics for what iadd does. Conventionally that is that mutable objects modify themselves in place and return themselves, while immutable objects create a new object and return that.

This behavior for list and tuple is familiar from regular local variables. Together with the fact that obj.x = ... always means "create or rebind an instance variable" never "rebind the A.x class variable", I think the example is less surprising.


obj.b += (2,) creates obj.b too, it's just that obj.b still points to the same object as A.b.


In Python 3.8 the tuple is in fact changed:

  >>> print(obj.a, obj.b)
  (1, 2) [1, 2]


What happens to `obj.a`/`obj.b` is not the point of the example, you should be looking at `A.a`/`A.b`.


Is that like a static member? Truly, dynamic languages are God's curse upon the Earth.


It refers to the class, not the instance.


I think this is more or less covered by "▶ Class attributes and instance attributes"


Hi, maintainer of the repo here, thanks for sharing :)

If anyone's willing to go through those examples with an interpreter on the side, you can check out https://www.wtfpython.xyz/

It's built with pyiodide, the only limitation is you may not get correct results for the examples that are version-dependent. The UX might not be great, especially on mobile (happy to hear ideas on how to improve it), but it does the job for now :)


Looks great but one quibble -- the "dark mode" simply flips the black and white of normal webpage and code blocks ... which is 50-50

can we use a different shade of grey for code blocks in dark mode?


Right, I agree with this. Reducing some contrast by using shades of grey will look more pleasant. I'll get this adjusted, thanks for the tip!


Thank you for this. Even as a senior Python developer I didn't know many of these.


Glad you found it informative.

I've found senior developers to have polarized opinions about the usefulness of the collection, so always good to see a review in favorable direction :)


> in favorable direction

Speaking of directions, how about the bidirectional use of `yield` and `yield from` ?

https://stackoverflow.com/questions/9708902/in-practice-what...

# Sending data to a generator (coroutine) using yield from - Part 1


This is beautiful! I've been known to hate on stuff like that walrus operator being added, but this I love. So clean, simple and powerful. Thanks for sharing.


I also learned a bunch of weird things about python from this, like the really gritty details of assignments and such. It's surprisingly messy once you leave the beaten good path.

I also learned that there is python code which causes a rather violent "Don't you ever get that near a code base I maintain"-reaction, which I much rather associated with perl.


Edge cases can keep me awake at night. Reminding oneself that there are no simple things in this enterprise of ours is a good reminder.


Walrus operator IMO is a huge mistake. It saves fuck all time/space, but introduces an entire new operator that probably added more complexity, edge cases and quirks than the benefits it provides. Terrible idea.

Worse, no one uses it. I've yet to come across anyone that advocates it or remembers it.


I actually just used it to what I would say was a perfect example of where it legitimately improved code readability while maintaining pythonic constructs:

    values = [
        value
        for line in buffer.readlines()
        if (value := line.strip())
    ]
Previously, I would have needed to either duplicate effort like:

    values = [
        line.strip()
        for line in buffer.readlines()
        if line.strip()
    ]
Or used a sub-generator:

    values = [
        value
        for value in (
            line.strip() for line buffer.readlines()
        )
        if value
    ]
Or rewritten it altogether using a (slower) for loop calling append each time:

    values = []
    for line in buffer.readlines():
        line = line.strip()
        if line:
            values.append(line)
The assignment expression is perfect for this sort of use case, and is a clear win over the alternatives IMO.

Edit: fixed initial example


I may have gone with the following. Yes, some characters are repeated, but I'm not playing code golf.

  stripped_lines = (line.strip() for line in buffer.readlines())
  non_empty_stripped_lines = [line for line in stripped_lines if stripped_lines]


Way better IMO. Clear variable names. No golfing.

Breaking down things in clear steps is underrated I think.


You can write it this way:

    values = [
        value
        for line in buffer.readlines()
        for value in (line.strip(),)
        if value
    ]


FWIW, that version ends up being slower, because you're constructing and iterating over a tuple for every iteration, which incurs a similar cost to running `.strip()` twice. The sub-generator example I gave is better because you're only constructing the generator expression once, and requires less overhead for each iteration.


It is unlikely (unless there is a benchmark that says otherwise). Before the walrus operator, the single item loop could have been used:

  nonblank = [value for line in file for value in [line.strip()] if value]


To me the second version seems not only clearer/more direct to follow but also is a few characters shorter anyways.


But the second version needs to run `.strip` twice. It might not make much of a difference for `strip` -- but it still hurts my eyes, and could be an actual performance issue for other operations.


Running strip twice makes it more explicit and readable IMHO - it's then abundantly clear that it's being run as a check and as a way of populating the list.


values = list(filter(None, [line.strip() for line in buffer.readlines()]))


Or, in a different language

  open("file", "r").readlines.
    map{|line| line.strip}.
    filter{|line| line != ""}
or some smarter but less readable ways.

I prefer the left-to-right transformations style to Python's list comprehension and inside-to-outside function composition. The reason is that it reminds me of how data flow into *nix pipelines. I spent decades working with them and I've been working with Ruby for the last half of that time. With Python in the last quarter of my career.

It's a matter of choices and preferences of the original designed of the language. Both ways work.


What does `filter` do with `None`? Would it not be an error? This seems not so readable, possibly relying on weird behavior of `filter`. If I had to guess, I would say: Maybe filter with `None` will give the empty list or the whole list, because either no list item matches a `None` condition, or all match it, since there is no condition. But in both cases the usage does not seem to make any sense. So it must be something else. Maybe when the argument is not a lambda, it will compare directly with the given argument? But then we would get only `None`. Nah, still makes no sense. I am at a loss, what this `filter` call is doing, without trying it out.


> What does `filter` do with `None`?

  filter(None, xs)
is equivalent to:

  filter(lambda x: x, xs)
That is, it will return an iterator over the truthy elements of the passed iterable.


I suspect the question was rhetorical. The point is, every reader is going to have that question pop into their head and have to look it up. Better to use code that doesn't raise any questions, even if it's a few more characters.


> Better to use code that doesn't raise any questions, even if it's a few more characters.

Certainly, I agree; I would usually use:

  (x for x in xs if x)
Or, if I know more about the kind of falsy values xs actually needs removed, something more explicit like:

  (x for x in xs if x is not None)
Because Python’s multiplicity of falsy values can also be something of a footgun (particularly, when dealing with something a collection of Optionals where the substantive type has a falsy value like 0 or [] included.)

Instead of:

  filter(None, xs)
Which is terse but potentially opaque.

Though it's additional syntax, I kind of wish genexp/list/set comprehensions could use something like “x from” as shorthand for “x for x in”, which would be particularly nice for filtering comprehensions.


From the docs:

If function is None, the identity function is assumed, that is, all elements of iterable that are false are removed.

So it just removes false-y values.

Very handy I've used it a ton


Not sure if this was your intention or not, but to me that proves the usefulness of the walrus operator: the first snippet in the parent comment seems far clearer to me, even though I'm fairly familiar with the functional operators.


It's a very useful pattern once you know what it does, sort of like the walrus operator


I was careful to pre-empt this exact response in my original comment: I do know what it does. The fact remains that it's less readable (IMO) because of the density of line noise and the lack of common structural elements (like if and for - I suppose filter and map fulfill this but their parameter separate out elements that ought to be next to each other). I do think that my preference, however slight, would remain no matter how much time I spent with the functional versions.


I'm glad you know what it does but you weren't born knowing what the walrus operator does either


I use it for simple cases:

    >>> if m := re.search(r'(.*)s', 'oh!'):
    ...     print(m[1])
    ... 
    >>> if m := re.search(r'(.*)s', 'awesome'):
    ...     print(m[1])
    ... 
    awe


The question one has to ask is whether it is worth the additional complexity and a dedicated operator. I am sure it is ever so slightly useful, but I am not convinced it is worth the trouble.

Feature creep is programming language's worse enemy after a certain maturity level.

I absolutely love Go in this matter. They took forever to add Generics and generally sides with stability over features.


I'm surprised no one's mentioned a while loop yet:

    start = 0
    while (end := my_str.find("x", start)) != -1:
        print(my_str[start:end])
        start = end + 1
vs

    start = 0
    while True:
        end = my_str.find("x", start)
        if end == -1:
            break
        print(my_str[start:end])
        start = end + 1
I'm still on the fence myself so I sympathise with your view, but the first version is certainly a bit tidier in this case.


I never use the walrus operator but your second example is pretty typical and it does indeed look a lot cleaner with the walrus operator.


Sure it's "tidier" if by that you mean smaller. Someone who doesn't work in Python all the time and isn't aware of these kinds of operators is going to have to spend a decent amount of time unpacking what the hell that all means whereas someone can take one look at the standard while loop, see the logic laid out plainly, understand what's happening, and make changes, if necessary, fairly easily. Unless there's a performance benefit to an operator like this I'll forgo "tidy" for clear any day of the week. Then again I'm just a senior dev whose only professional experience with Python was maintaining other people's Python projects who never had to touch them again after they wrote them, and who used Python for things Python should not have been used for just because it's "easy" to write.


Funny, I think the walrus operator makes code cleaner and easier to understand.

Many of my code were like this:

    foo = one_or_none()
    if foo:
        do_stuff(foo)
Now I have the following:

    if foo := one_or_none():
        do_stuff(foo)
This kind of code happens quite frequently, looks nicer with walrus operator to me.


I haven't yet fully adjusted to the walrus operator, but to me the choice would depend on what happens _after_ the "if" statement.

In both cases, "foo" continues to exist after the "if" even though the second example makes it look like "foo" is scoped to the "if".

So to my eye, the following would look super weird (assume do_more_stuff can take None):

    if foo := one_or_none():
        do_stuff(foo)
    do_more_stuff(foo)
whereas the following would look fine:

    foo = one_or_none()
    if foo:
        do_stuff(foo)
    do_more_stuff(foo)


Honestly, for this specific case, I prefer one_or_none() to return an iterable with zero or one items, and then just doing:

  for foo in one_or_none():
    do_stuff(foo)
If you don't control one_or_none, but it returns an Optional, you can wrap it with something like:

  def optional_to_tuple(opt_val: Optional[T]) -> Tuple[]|Tuple[T]:
    return (opt_val,) if opt_val is not None else ()


Would have been more Pythonic as:

    if one_or_none() as foo:
        do_stuff(foo)


I didn't recognize this use of "as" as valid Python, but tried it in 3.9 just to be sure. Got a syntax error (as expected).

I am not fully up to speed with 3.10, but quickly checked the docs and it doesn't appear to have been added in 3.10 either.

Let me know if I'm missing something.


Oh, "would have" meaning python-dev chose a different spelling. Not, as in it would-have been better if the example was written this way.


Oh, I see now. Thanks for the clarification.


Walrus avoids stuttering in constructs like comprehensions/genexps which greatly improves readability. (And avoids using less basically readable constructs, like explicit imperative loops, to avoid the visual noise of stuttering.)


The let expression syntax in OCaml is a nice alternative:

  let some_func () = 5
  
  let a = some_func () in if a < 5 then "<5" else ">=5"


I have to write Python for the job right now and this language has pitfalls everywhere. Glad I avoided this language so far.

I don't want to use Python for a software more than a few dozen lines or involving multiple developers. This language was designed to unintentionally shoot your foot so many times.


Counterpoint: I've been programming in Python in a web development context for almost a decade now and I don't think I've been bitten by any of these ever (at least not strongly enough to remember), and been involved in many large-scale projects with many developers and we had no issues.

The "worst" that I can remember was actually a Django quirk - ORM result sets (or QuerySets as they call them) execute their query at object evaluation time rather than at instantiation, so unless you cast your resultset to a list it won't actually talk to the DB just yet. Now attach a GUI debugger such as PyCharm/IntelliJ - it will internally evaluate every expression immediately (to populate its GUI) and cause the behavior to diverge (it will "fix" the code and behave as you'd expect when ran under the debugger).

I guess there could be certain contexts (system programming? low-level libraries? etc) where these are going to be an issue but when it comes to web/API or business logic development I haven't encountered these.


I agree in that it's not these quirks that are the problem with Python. My take on Python is that it has poor support for tools that make it easy(-ier) to reason about code that you don't know very well.

"Where is this called from?", "Where is x being set/modified/read?" - such questions are hard to answer with a large Python code base and I'm not even talking of code that abuses the dynamic nature of Python.


Although this didn't used to be the case just a few years ago, with modern tools and a well-type-hinted libraries, most of the browsing and refactoring functions I was used to from the Java world are available in Python.

But yes, the moment someone tries to be too clever (metaclasses, dynamic *kwargs parsing...), all of that falls apart and you're back to reading the docs.


I don't love python, but I generally think if you color between the lines and don't try to be too clever then you'll usually end up fine and not hit any clangers.

However, I do think that the pythonic way leads people to write terse clever code that's needlessly more complex.


I imagine that many of these aren't common ones to run into with a team that comes from many languages other than Python or that has people with enough experience to know that realize these kids of unclear, apparently inconsistent coding practices are not a good idea in code you want to be maintainable a year or more from now. The glance I took at a few of them shows me the smack of syntatic sugar that is all aesthetics without any real performance benefit thus provide no real benefit beyond maybe saving a few keystrokes. Keystrokes are cheap, days spent by a junior dev trying to find a bug in a pile of syntatic sugar filled Python is not.


Mutable default arguments is the one I see most likely to come up regularly, it's gotten us a few times: https://github.com/satwikkansal/wtfpython#-beware-of-default...


My experience with Python is that a mid sized code base does not survive a full staff turnaround. So somewhere around the time the last original author is gone, the codebase becomes a haunted graveyard - scary to everyone. Also, unit tests are not a substitute for static typing - they obviously help a lot but don't prevent the degradation.

This is based on 3 different companies, of which all had decent developers who did follow good practices.


>>> My experience with Python is that a mid sized code base does not survive a full staff turnaround. So somewhere around the time the last original author is gone, the codebase becomes a haunted graveyard - scary to everyone.

This is 100% true for medium to large codebases. Also because of the dynamic typing just looking at the codebase, it is very hard to understand the "shape" of variables and data. Of course this is improving now because of the type hints, but still it is comparatively hard.


> it is very hard to understand the "shape" of variables and data

And is it in, let's say Java, with several inheritance levels and very opaque types? It's one thing I struggle with

Sure you know a TypeA has method do_stuff() and returns a TypeB but what actually is happening, beats me. Then you chase down TypeA and find out the actual implementation is spread across TypeA0, TypeBaseA and you can't make sense of anything


Yes, the ecosystem encourages complex hierarchies and abuse of "patterns" but at least the standard tooling has no problems with things like "find me all places this method is called from" or "find me all implementations of this method". YMMV, but my experience with plain Java is better than with Python.

That was about plain Java. Where it stops working is some "dependency injection" frameworks which jumped the shark and stopped being about dependency injection (cough Guice cough). A fracking argument to a method comes from god-knows-where because The Framework injects it based on a combination of its class and and annotation? Yes, now it is as bad as Python or maybe even worse :)


Finding references isn't really that much harder in Python, it tends to work almost always with e.g. Pycharm and has for a long time. There's of course always the shotgun approach and just searching for the identifier, which turns up all but the most opaque ways to do things (e.g. concatenating a string together, then doing getattr - which you can also do in Java with reflection). Ye olde metaprogramming tricks in Python would often elide inspectors like that of Pycharm, which is why it had e.g. special support for the Django or SQLalchemy ORMs. Personally, I've never been fond of those, and find the Java-zope-style code borderline unreadable (heavy reliance on magic, non-local effects and "oh that setuptools entry over there refers to this class here and that's how we load it" kind of tricks) and stuff like Django is (or was) not much better in some areas. More modern metaprogramming approaches can often be typed in Python, which resolves a lot of the issues with it.

All that being said, the by far most common cause of bugs in Python is None in my experience, not any of the shenanigans the language allows you to do.

Mid-sized (5k < loc < 100k?) can absolutely survive a complete "hostile takeover" from new developers, with all the same potential (but not mandatory) issues and pitfalls that pretty much any such codebase invites, in particular in dynamically typed languages.


The problem here seems to me that apparently none of the remaining devs know how to use Python, not that Python itself is hard to read. Or that for some reason the original devs had no style conventions or good practices whatsoever, which you really can't blame on the language.


True, but that’s where type hints come in - this way you get best of both worlds.


Oh man, I love Python but what you say does sound truthful to me. Do you think that other programming languages are more resilient to this? (I'm trying to figure out whether there are some intrinsics of Python that make it more susceptible, and if so whether that could be mitigated)


I think Go is currently the best replacement for Python. Static type system seems to help a lot, also Go is slightly boring to write. One would say it's a bad thing but all those "if err != nil {" give something pretty valuable - the visibility of all the error-handling execution paths to the reader of the code.

Also, being boring to write means you seek ways not to write so much, which is also good :)


I'm in a similar position. I've worked with several languages during my 20+ year career. Now I am in a position where I need to do Python. Fortunately it is just smallish scripts (Data mangling). I've done some Python in the past and every time I approach it I hate it. It hasn't improved a bit since the last time I touched it (back when the 2.4/6 vs 3.?? was in all the rage).

At least now the easy_install vs pip contention is dead.


Every language has quirks. We should have a WTF for all the major programming languages instead new users randomly stumbling upon them in the worst possible way.


I always thought I was dumb. Everyone I know says "python just clicked for me", and I have been writing python on and off since 2003 and I have hated every moment of it.

About 75% of the bugs I write seem to be about mutable state, and python has nothing but mutable state.

It doesn't even get scoping right.



I picked up Python again a couple of months ago, and I am reexperiencing my love/hate relationship with the language.

Reading through the documentation in WTFPython I get some kind of affirmation of my bias against python. I hate it, but I also love it, but I hate it. It is just so "je ne sais quoi"--

The docs would benefit from some explanations on whys.


How I stopped worrying and learned to love Python.


> The Walrus operator (:=) was introduced in Python 3.8, it can be useful in situations where you'd want to assign values to variables within an expression.

Ok that is infuriating. They dragged their feet through the dirt on how the switch statement is useless and pointless, but then they add some random bullshit like that that nobody ever asked for.


Related, they also added a match statement with footguns. Python has "jumped the shark" in its recent envy of other languages. :-/

The existing "as" would have worked instead of walrus. I've never seen anyone use the additional flexibility that walrus affords because the code becomes too complicated and better at that point to just use two lines.


I learned a couple of things there, seems like a very organized way of learning.Just wish more of this approach is implemented by others.

Also the discussions in comments makes me wonder if HN should implement syntax highlighting feature.


There are some genuinely surprising ones, but is it just me who seems to think these are for 80% either well-known beginner's mistakes (== vs. is, underscored names) or just begging for problems with operator precedence (by omitting parentheses) and variable scoping (by shadowing variables)?

Random example I'm currently reading is: x, y = (0, 1) if True else None, None. The interpreter considers the last None outside the if statement. Gee, so you thought to need parentheses at first but not for the second one, and now it's a pitfall that parentheses are missing? The readme goes so far as so say "I haven't met even a single experience Pythonist till date who has not come across one or more of the following scenarios". Uh huh. Not me.

Add parentheses when in doubt, and when not in doubt, add parentheses for readability. In any language. Not excessively, like everyone knows what "if x == 2:" does without them in python, but for anything less obvious or more complicated than a=2*2+3, just do it.


I've seen these before and every single time, it's mostly based on a misunderstanding of certain language features that only a beginner coder would do (for example, not knowing that Python does not do block-level variable scoping, so loop variables leak) or an abuse of an implementation detail (for example, integers under 255 being singletons).

"is" is not the same as "==". Confusing the two is a rookie mistake, yet is treated like a "gotcha".


> not knowing that Python does not do block-level variable scoping, so loop variables leak

For what it's worth, it took me a long time to internalize this (more than ten years after first learning Python), I think partly because the way most people teach Python is to say what scopes Python has instead of saying what scopes it doesn't have. While it's bad style to abuse this, I would definitely consider this a part of Python that a lot of working developers probably don't understand well.


Yes. Scopes are complicated e.g.,

    class Foo:
          a = [1,2,3]
          b = [4,5,6]
          c = [x * y for x in a for y in b]
           
that leads to NameError: name 'b' is not defined https://bugs.python.org/issue3692


In Python, I've always run into the problem of expression-vs-statement. For example, in my mind I want to do this:

    (a++ if a==b else b++) + 1
Or something like this (borrowed from https://stackoverflow.com/questions/65024477/walrus-operator...):

    from datetime import datetime

    timestamps = ['30:02:17:36', '26:07:44:25',
                  '25:19:30:38','25:07:40:47']
    timestamps_dt = [
        datetime(days=day,hours=hour,minutes=mins,seconds=sec) 
        for i in timestamps
        day,hour,mins,sec := i.split(':')] 
But this type of stuff is always greeted with Python's inflexible syntax. Recently, I learnt about Rackets's idea that "everything is expression" on (https://beautifulracket.com/appendix/why-racket-why-lisp.htm....) and was blown away to realize these are valid Racket codes:

    ((if (< 1 0) + *) 42 100)
or

    (+ 42 (if (< 1 0) 100 200))
The second one has an equivalent in Python:

    42 + (100 if 1 < 0 else 200)
But the first one does not:

    42 (+ if 1 < 0 else *) 100
> SYNTAX ERROR.

Maybe it's time for me to get my hands dirty with Lisp.


> (a++ if a==b else b++) + 1

Please rewrite this as

  if a == b:
    a += 1
  else:
    b += 1
and don't leave the other developers (or yourself two months from now) scratching their head about what that code does. You have more important things to do than trying to understand your code again and again.

Furthermore, does that + 1 at the end run before or after the ++?

> (a++ if a==b else b++) + 1

I had to try it out (with a syntax that can run)

  >>> a = 1
  >>> b = 2
  >>> (a+1 if a == b else b+1) + 1
  4 # the expression evaluates to 4, wtf!
  >>> a
  1
  >>> b
  2
But this is not a += 1 which is syntax error and the point of your post.


The first example can be written like this in Python >= 3.8:

    a = 1
    b = 2
    ((a := a + 1) if a == b else (b := b + 1)) + 1
although that's a lot to cram into a single line IMO.

The closest I could come up with for the second example is this (noting that the datetime constructor requires both the year and month):

    from datetime import datetime
    timestamps = [
        "2022:04:30:02:17:36",
        "2022:04:26:07:44:25",
        "2022:04:25:19:30:38",
        "2022:04:25:07:40:47",
    ]
    datetimes = [
        datetime(*map(int, timestamp.split(":")))
        for timestamp in timestamps
    ]
That said, it'd probably be better to use datetime.fromisoformat() or datetime.strptime() rather than manual parsing.

And you could maybe use the operator module[1] for the last example, like this:

    import operator as op
    (op.add if 1 < 0 else op.mul)(42, 100)
[1] https://docs.python.org/3/library/operator.html


There’s a rather important semantic difference in what you wrote: a++ produces the value before incrementing, whereas (a := a + 1) is like ++a and produces the value after incrementing.

Presuming this is indeed what behnamoh meant by a++, that `(a++ if a==b else b++) + 1` would be better written `++a if a==b else ++b`, which would then become the clearer `(a := a + 1) if a == b else (b := b + 1)`.


> For example, in my mind I want to do this: > > (a++ if a==b else b++) + 1

Why? This is absurdly unreadable. Why not create a function, give it a name, and use simple if/else constructs. Help the next developer out so they don't need to figure out your tricky expression.


> But the first one does not:

>

> 42 (+ if 1 < 0 else *) 100

Of course it does:

    from operator import add, mul
    (add if 1 < 0 else mul)(42, 100)


Okay but not really, I’m lispy languages everything is an expression. And it’s monoiconic


  from operator import add, mul
  (add if 1 < 0 else mul)(42, 100)


This is a fun and well written set of examples, which shows and explains the cause of the counter intuitive behavior.


The wtfs about id and is are not deserved. These operations are not used in normal code and are explicitly implementation defined. I suppose they should have named it __is and __id. I don't see why anyone not hacking the interpreter would care about these though


I agree about id() but `is` is absolutely used (and needed) in normal code. But generally not on strings or integers, which are the types that have weird results with it.

Simple example: any sort of graph algorithm that needs to know if two nodes are the same node.


You should really use a nodeid in that case and compare nodeids. So that you can serialize your graph


Interesting. I would be interested with a WTFRuby. As I lately explored Ruby deeper, I was surprised that Complex would remove #% as well as most Comparable operators.

https://stackoverflow.com/questions/72314247/how-can-ruby-co...

https://stackoverflow.com/questions/71439853/why-ruby-doesn-...


I've been thinking it would be useful to create a sub-language of python with only essential features (removing hard to explain features such as 'is', ':=', and a plethora of non-essentials) -- that is, fully backward-compatible with python. This would also make restricted interpreters easier to write as well. I know micropython exists, but the goal is more minimal size than simplicity and ease to learn.


That Python uses the exact same string if they contain the same immutable content can probably be a bit confusing. But if you want truly confusing behavior look at how Java treats numbers wrapped in their object type:

     public class Main{
 
         public static void main(String[] args){
 
             Integer a = 300;
             Integer b = 300;
 
             System.out.printf(" %s == %s: %s\n", a, b, (a == b));
 
             a = 100;
             b = 100;
 
             System.out.printf(" %s == %s: %s\n", a, b, (a == b));
         }
     }


     // output

     300 == 300: false
     100 == 100: true
When combined with Java's auto-boxing, this can create some serious confusion


CPython does this too (`is` is roughly the same as Java's `==`):

    >>> a = 300
    >>> b = 300
    >>> a is b
    False
    >>> a = 100
    >>> b = 100
    >>> a is b
    True
    >>>
CPython has a small set of canonical objects for integers.


Luckily you don't typically compare numbers in Python with 'is'. In Java you should not compare Integer objects with '==' either of course, but '==' is typically used with numbers, and the auto-boxing features of Java makes it very easy to shoot your self in the foot with this stuff.

I once ran into a bug which had been there for years, since some Integer values had never crept over that magic limit, and auto-boxing allowed the numbers to be treated as int's all over, so it really took me some time to figure out what was going on...


Wait this is confusing for anyone beyond a fresh-faced junior dev? An Integer is an object while 100 is a primitive. Of course two different objects aren't going to be equal since they're literally not the same object. Two primitives of the same value will be though because they are the same primitive. It's exactly the same behavior across many languages.

This right here is why it's critical to have developers of all levels across a team. Those of use who have been doing this long enough don't remember what caused us issues as juniors. We need the mid-levels to translate and remind us of these things lol


It's confusing because using 100 works fine; both are seen as the same object. But using a number over 256 causes it to fail to see both as the same object. Of course, as an experienced dev, I would never use "is" for comparing integers, only for comparing things I know are objects that may otherwise be equal in value.


> An Integer is an object while 100 is a primitiv

They are both objects; they only differ in size.


This is only true for string literals. Strings constructed at runtime may (and often will) end up as separate objects:

    >>> a = "this is a long sentence in a string literal"
    >>> b = "this is a long" + " sentence in a " + "string literal"
    >>> a is b
    False
    >>> id(a)
    4386376272
    >>> id(b)
    4387721264
    >>> a == b
    True


Beware that your method of construction is risky to your demonstration: the python compiler does do some limited optimisations, and trivial constant folding is one of them.

If you test at the shell it apparently misses it (though it works for smaller strings e.g. "abc" / "a" + "b" + "c"), but if you put the same thing in a file and run it, it'll tell you the two strings are identical.


Those two strings aren't equal though, missing some spaces in b :-)


Oops! Thanks for catching that. I fixed the example.


IMHO this is one of the worst ways to improve your Python skills. Instead, focus on what the normal, robust, beautiful, clear, and idiomatic solutions to problems.

There is an enormous amount of quality training material available as well as excellent source code to major projects.

Most people would be better off just spending a hour or so with the tutorial at docs.python.org.

I get people coming to me asking for a job teaching Python when they don't know of some the material in the tutorial. Most have never read the FAQs and aren't aware of common solutions to common problems. To me, if you want to build your skills, start there. Get in the habit of reading docs, even boring ones, and occasionally read some of the source code from your favorite libraries.

That said, if you've already a very strong Python programmer, one possible use for these snippets is to help you root out any last, misconceptions of minor details.

But, you should resist the urge to show-off or use almost any of these "skills". There are a lot of cute things we can do with chained comparisons, but the only sensible things are: lo <= x <= hi or f(x) == g(x) == h(x). Anything else is too weird for communicating with the human beings. Likewise, you should only use "is" for None tests until you have strong understanding of identity guarantees.


I feel that it’s a decent approach for anyone already familiar with Python, since it’s making you think how the platform works underneath the surface vs just understanding it from the shallow language syntax level.


In the normal course of events, absolutely. This is hacker news though :-)


Fair enough :-)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: