
The PEPs of Python 3.9 - zdw
https://lwn.net/Articles/819853/
======
tln
Sometimes I wish that python strings weren't directly iterable...

this article sums it up better than I ever could
[https://www.xanthir.com/b4wJ1](https://www.xanthir.com/b4wJ1)

...then str.strip and variants could be cleanly and logically extended to
allow this functionality, because passing a string and a sequence of strings
would be distinguishable.

Alas, clean and logical function design can be hard to do late in a languages
life.

PEP 593 and PEP 585 are clean and logical... glad to see that :)

~~~
BiteCode_dev
You can easily distinguish them:

    
    
        if isinstance(msg, str)
    

So I don't think that's a good argument for not accepting iterables of strings
in str methods. Things like replace() would benefit a lot and it's not that
hard to do, you can even accept regexes optionally: [https://wonderful-
wrappers.readthedocs.io/en/latest/string_w...](https://wonderful-
wrappers.readthedocs.io/en/latest/string_wrapper.html)

I agree that iterating on string is not proper design however. It's not very
useful in practice, and the O(1) access has other performance consequences for
more important things.

Swift did it right IMO, but it's a much younger language.

I also wish we stole the file api concepts from swift, and that open() would
return a file like object that always gives you bytes. No "b" mode. If you
want text, you to open().as_text(), and get a decoding wrapper.

The idea that there are text files and binary files has been toxic for a whole
generation of coders.

~~~
diarrhea
The issue is that

    
    
        if isinstance(msg, str)
    

will clutter code that is otherwise clean. A single type has to be specially
handled, which sticks out like a sore thumb.

As a second point, do you have more on your last sentence? ("The idea that
there are text files and binary files has been toxic for a whole generation of
coders."). I have been _thoroughly_ confused about text vs. bytes when
learning Python/programming.

The two types are treated as siblings, when text files are really a child of
binary files. Binary files are simply regular files, and sit as the single
parent, without parents itself, in the tree. Text files are just one of the
many children, that happen to yield text when their byte patterns happen to be
interpreted using the correct encoding (or, in the spirit of Python, decoding
when going from bytes to text), like UTF8. This is just like, say, audio files
yielding audio when interpreted with the correct encoding (say MP3).

Is this a valid way of seeing it? I have to ask very carefully because I have
never seen it explained this way, so that is just what I put together as a
mental model over time. In opposition to that model, resources like books
always treat binary and text files as polar opposites/siblings.

This leads me to the initial question of whether you know of resources that
would support the above model (assuming it is correct)?

~~~
BiteCode_dev
The open() API is inherited from the C way, where the world is divided between
text files and binary files. So you open a file in "text" mode, and "binary"
mode, "text" being the default behavior.

This is, of course, utterly BS.

All files are binary files.

Some contains sound data, some image data, some zip data, some pdf data, and
some raw encoded text data.

But we don't have a "jpg" mode for open(). We do have higher API we pass file
objects to in order to decode their content as jpg, which is what we should be
doing to text. Text is not an exceptional case.

VSCode does a lot of work to turn those bytes into pretty words, just like VLC
into videos. They are not like that in the file. It's all a representation for
human consumption.

The reasoning for this confusing API is that reading text from a file is a
common use, which is true. Espacially on Unix, from which C is from. But using
a "mode" is the wrong abstraction to offer it.

If fact, Python 3 does it partially right. It has a io.FileIO object that just
take care of opening the stuff, and a io.BufferedReader that wraps FileIO to
offer practical methods to access its content.

This what what open(mode="b") returns.

If you do open(mode="t"), which is the default, it wraps the BufferedReader
into a TextStream that does the decoding part transparently for you, and
returns that.

There is an great explanation of this by the always excellent David Beazley:
[http://www.dabeaz.com/python3io_2010/MasteringIO.pdf](http://www.dabeaz.com/python3io_2010/MasteringIO.pdf)

What it should do is offering something this:

    
    
        with open('text.txt').as_text():
    

open() would always return BufferedReadfer, as_text() would always return
TextStream.

This completly separates I/O from decoding, removing confusion in the mind of
all those coders that would otherwise live by the illusionary binary/text
model. It also makes the API much less error prone: you can easily see where
to the file related arguments go (in open()) and where to text related
arguments go (in as_text()).

You can keep the mode, but only for "read", "write" and "append", removing the
weird mix with "text" and "bytes" which are really related to a different set
of operations.

~~~
zb
Let’s be clear here that the fault is not with Python but with Windows.

Python uses text mode by default to avoid surprising beginners on Windows. If
you only use Unix-like OSs you will never have this problem.

~~~
BiteCode_dev
The problem is not "text mode by default". The problem is that the API offers
a text mode at all.

Opening a file should return an object that gives you bytes, and that's it.

This "mode" thing is idiotic, and leak a low level API that makes no sense in
a high level language with a strong abstraction for text like Python.

Text should decoded from a wrapping object. See my ohter comments.

------
OJFord
> Eric Fahlgren amusingly summed up the name fight this way:

> > I think name choice is easier if you write the documentation first:

> > cutprefix - Removes the specified prefix.

> > trimprefix - Removes the specified prefix.

> > stripprefix - Removes the specified prefix.

> > removeprefix - Removes the specified prefix. Duh. :)

I actually don't agree that it's so obvious, since it returns the prefix-
removed string rather than modifying in-place. I think Fahlgren's argument
would work better for `withoutprefix`.

~~~
kbd
I would have preferred 'stripprefix' for unity with 'strip', 'rstrip', and
'lstrip'

~~~
pdonis
That's discussed in the article: the "strip" methods don't interpret strings
of multiple characters as a single prefix or suffix to be removed, so it was
felt to be too confusing to use "strip" type names for methods that _do_
interpret strings that way.

------
nemetroid
> Another kind of clean up comes in PEP 585 ("Type Hinting Generics In
> Standard Collections"). It will allow the removal of a parallel set of type
> aliases maintained in the typing module in order to support generic types.
> For example, the typing.List type will no longer be needed to support
> annotations like "dict[str, list[int]]" (i.e.. a dictionary with string keys
> and values that are lists of integers).

I think this will go a long way toward making type annotations feel less like
a tacked-on feature.

~~~
diarrhea
Looking "back" now, it never occurred to me that importing List when there is
list is particularly strange. Now it sticks out sorely. Very glad this change
is happening.

~~~
throwlaplace
That's because we're conditioned to think of constructors as functions rather
than as types. I think that's not that odd honestly but I do see how
counterintuitive it is for people that don't work much in typed languages. I'm
not a Haskellite but there you can clearly see the distinction when
defining/instantiating sum types (where the type and data constructor live in
different namespaces).

------
heavyset_go
Am I the only one who wants multi-lined anonymous functions in Python? I find
myself really wanting to reach for arrow functions sometimes while writing
Python, and end up disappointed that they aren't available.

~~~
quietbritishjim
What's wrong with using a nested named functions instead?

You may already be aware, but not everyone is: they capture variables from
outer scopes in exactly the same way that lambdas do.

~~~
netheril96
It's wrong because naming is hard. When writing inline, it is possible that
not having a name does not impact readability. When defining the function out
of line, naming it casually may confuse readers.

~~~
bhargav
When we use lambdas, don't we usual end up assigning them to variables? In
those cases, your point about naming will still hold!

~~~
silveraxe93
Lambdas are assigned to function arguments/parameters. Pretty sure assigning a
lambda to a variable is an anti-pattern.

------
theandrewbailey
No PEP 554 (subinterpreters). That's been moved to 3.10:
[https://www.python.org/dev/peps/pep-0554/](https://www.python.org/dev/peps/pep-0554/)

~~~
BiteCode_dev
Given how heated was the debate about those, it's good we didn't try to go too
fast with it.

I'm full of hopes for this feature, but it's going to be a slow hard work, and
we'll only rip the benefit on the long run. So no use to rush it.

I feel like we rushed asyncio and type hints, and it took years to make them
usable after they were introduced.

------
trashburger
I welcome the terser type hints for generics. I was wishing for something
terser, like:

    
    
        {str: [int]}
    

being equivalent to what is currently dict[str, list[int]] in the PEP, but I
guess it will have to do.

~~~
st1ck
`{key: val}` does look nice indeed, but then it takes more effort to replace
e.g. `dict` with `Mapping`. Everything would look much nicer with Haskell-like
syntax: `dict key val` or `list val`. Or maybe even prefer `{} key val` and
`[] val` (no that doesn't look good, I agree).

------
gorgoiler
I was quite surprised — only a few days ago in fact — to discover the standard
Python library has no support for Olson (as in _tzdata_ ) timezones. Time
arithmetic is impossible without them.

The ipaddress library also has no support for calculating subnets. It is quite
hard to go from 2a00:aaaa:bbbb::/48 to 2a00:aaaa:bbbb:cccc::/64\. It would be
less weird if the essence of the documentation didn’t make it sound like the
library was otherwise very thorough in the coverage of its implementation.

Can anyone write a PEP? Maybe I should get off my behind and actually submit a
patch for proper IP calculations? Or maybe I missed it in the documentation
(which, aside, I wish wasn’t written with such GNU-info style formality.)

~~~
chc
Unless I misunderstand what you're looking for, I think that functionality is
in there.

    
    
        original_net_48 = ip_network("2a00:aaaa:bbbb::/48")
        desired_subnet = ip_network('2a00:aaaa:bbbb:cccc::/64')
        subnets_64 = original_net_48.subnets(16)
        print(f"{desired_subnet} is one of the computed subnets: {desired_subnet in subnets_64}")
        #=> 2a00:aaaa:bbbb:cccc::/64 is one of the computed subnets: True

~~~
gorgoiler
Thanks, but your second line kind of has the answer in it already. It’s more
like...

    
    
      site = ip_network(‘2a00:aaaa:bbbb::/48’)
      subnet = f(site, 64, 0xcccc)
    

...and I don’t think _f()_ is in the standard library. But maybe I just index
the calculated subnets, from your example? I’ll give it a go!

Edit: yes! But it’s a bit slow...

[https://repl.it/repls/PoshPapayawhipNaturaldocs#main.py](https://repl.it/repls/PoshPapayawhipNaturaldocs#main.py)

~~~
xxpor
Well, you can speed it up slightly by using the iterator directly instead of
forcing the whole generator into a list.

But on the other hand, you can make it WAY faster by doing it with bit
manipulation instead like you're writing C:

[https://repl.it/repls/ClearcutElaborateEmbeds](https://repl.it/repls/ClearcutElaborateEmbeds)

------
jefft255
Could someone explain to me what kind of new language features the new parser
will allow? I'm curious and very incompetent when it comes to understanding
what LL(1) grammar would imply for the end-user (the python programmer like
me).

~~~
MHordecki
The linked LWN article[1] mentions context-sensitive keywords, ie. a way to
treat certain words as language keywords only in specific contexts. For
example, a new match statement that wouldn't require reserving the `match`
word as a language keyword, which would require a breaking change and break
all existing code that uses `match` as a variable name.

Such a feature requires support from the parser.

[1]: [https://lwn.net/Articles/816922/](https://lwn.net/Articles/816922/)

~~~
kalenx
One good example (for those who do not want to read the full article) is the
async keyword. Introducing it as keyword broke a few libraries which were
already using them as kwarg in some functions (e.g. pytorch).

------
traes
Is the next release still planned to be called Python 4? I seem to recall GvR
saying that at one point, though I could be mistaken.

~~~
downerending
And very importantly, would Python 4 be a new language, or compatible with
Python 3? (compare Python 3 vs Python 2)

~~~
BiteCode_dev
There is no Python 4 planned for now.

Python broke compat once in 25 years and gave 13 years to migrate.

It's a very conservative language.

~~~
downerending
I'm struggling to think of any other language that has done something like
this.

It might seem like a quibble, but it seems better to describe Python 3 as a
different language versus Python 2. Newbies seem to get that.

(Or, alternatively, "How many Python 2 scripts will run on a Python 3
interpreter?" Answer: "None of them.")

~~~
cesarb
> Or, alternatively, "How many Python 2 scripts will run on a Python 3
> interpreter?" Answer: "None of them."

That's obviously untrue. For example, consider the following Python 2 script:

    
    
        with open('a.txt', 'r') as a, open('b.txt', 'w') as b:
            for line in a:
                b.write(line)
    

It works identically on a Python 3 interpreter, and it doesn't even use "from
__future__ import ...".

~~~
downerending
Yes, I should have said, "a vanishingly small proportion, and even then mostly
by accident".

My guess is that if you sweep GitHub for Python2 code and push it into
Python3, that proportion would be under one percent.

~~~
tuco86
how will that number look when you first autoconvert via 2to3?

I did two migrations of >500k loc projects in an afternoon each, and
admittedly some days of testing to gain confidence since there where few unit
test. But i found it to be very smooth sailing.

I was very familiar with both projects, so that helped a lot.

EDIT: I also want to add that i did this using python3.5, when to ecosystem
seemed to be at a sweet spot of dependencies supporting both 2 and 3 mostly. I
guess if one has been waiting until now, the divide between library versions
will be a lot bigger.

------
slightwinder
> Eventually, removeprefix() and removesuffix() seemed to gain the upper hand,
> which is what Sweeney eventually switched to.

Great naming...missed their chance to make the functionality of
strip/lstrip/rstrip more clear by name the new methods
stripword/lstripword/rstripword. Which would also had the benefit of
consistence.

~~~
masklinn
stripwords could (/would) imply that it acts on words. As in, whitespace
separated things.

------
tartrate
re.sub?

