As a developer who has primarily developed applications in Python for his entire professional career, I can't say I'm especially excited about any of the "headlining" features of 3.8.
The "walrus operator" will occasionally be useful, but I doubt I will find many effective uses for it. Same with the forced positional/keyword arguments and the "self-documenting" f-string expressions. Even when they have a use, it's usually just to save one line of code or a few extra characters.
The labeled breaks and continues proposed in PEP-3136 [0] also wouldn't be used very frequently, but they would at least eliminate multiple lines of code and reduce complexity.
PEP-3136 was rejected because "code so complicated to
require this feature is very rare". I can understand a stance like that. Over complicating a language with rarely-used features can definitely create problems. I just don't see why the three "headline" features I mentioned are any different.
> The "walrus operator" will occasionally be useful, but I doubt I will find many effective uses for it.
The primary one I want is
if m := re.match(...):
print(m.group(1))
and
while s := network_service.read():
process(s)
both of which are both clearer and less error-prone than their non-walrus variants.
The other one that I would have found useful an hour ago is in interactive exploration with comprehensions. I frequently take a look at some data with [i for i in data if i[something].something...], and being able to quickly give a name to something in my conditional, as in {i: x for i in data if x := i[something]}, helps maintain the focus on data and not syntax. Obviously it will get rewritten to have clearer variable names, at the least, when it gets committed as real code, and almost certainly rewritten to be a normal non-comprehension for loop.
Like comprehensions, I expect the walrus operator to be valuable when used occasionally, and annoying if overused. There's no real language-level solution to bad taste. In Python 3 you can now do [print(i) for i in...], and I occasionally do at the REPL, but you shouldn't do it in real code and that's not an argument against the language supporting comprehensions.
Coming from Perl, I used to want this badly, but then I thought that there's absolutely nothing wrong with
m = re.match(...)
if m is not None:
pass
Now, I wonder what you meant saying that single-line version is less error-prone, because I don't think so. I believe they're exactly the same in this regard, except for a bizarre case when someone would bastardize the code by putting some irrelevant lines between the assignment and the comparison, obfuscating the logic.
Also, as I tend to use `if m is not None:` rather than just `if m:` (because I typically don't want to run all that subtle `__nonzero__` magic), it would - subjectively - look less pretty in the one-line version: `if (m := re.match(...)) is not None:`.
Use in the generator expressions to create aliases/shortcuts and avoid repetition is great, though. I love this use case.
I literally wrote your last example twice last week to hash files in 4k chunks. Those 4 lines would be reduced to 1 with the walrus operator. I welcome it, as well.
You're right. The [changelog] for Python 3.7 states:
> [bpo-32670]: Enforce [PEP 479] for all code. This means that manually raising a StopIteration exception from a generator is prohibited for all code, regardless of whether ‘from __future__ import generator_stop’ was used or not.
The last version clearly exposes that you have an infinite loop. This is something hidden by the `while(evaluate)` expression.
Also if it comes to pure form I would have preferred `while (evaluate) as x:` out of establishing a parallel with existing syntax, but that's not very important.
On the contrary: the loop isn’t fundamentally infinite, it’s just an artefact of the language that you have to write it that way. The walrus lets you put the termination where it belongs.
I don't understand that the walrus has any significance with regards to the `while` loop termination. it's the nature of the evaluated expression that makes a difference, more precisely, whether it will return something that is cast to False.
In that sense, while loops over expressions always require knowledge of the expression to understand the semantic. On the other hand, something like:
while True:
with x as y:
print(y)
would not require understanding of x to understand the intent. Therefore it is not semantically equivalent to:
while x as y:
print(y)
Note that in both cases, `:=` is strictly equivalent to `as` as it is defined for `with`. That is a matter of personal preference.
for line in f:
if (m:= pat1.search(line)) is not None:
... do stuff ..
elif (m:= pat2.search(line)) is not None:
... do other stuff ..
elif (m:= pat3.search(line)) is not None:
... do something else ..
In older Python that's:
for line in f:
m = pat1.search(line)
if m is not None:
... do stuff ..
else:
m = pat2.search(line)
if m is not None:
... do other stuff ..
else:
m = pat3.search(line)
if m is not None:
... do something else ..
I think the newer makes it clear that it's supposed to be a simple elif chain, where all branches following the same structure.
There are other ways to structure it, but the alternatives I can think of also have their own cumbersome complexities.
If actions are different, I tend to put them in functions (independent testable pieces, yay!) and iterate over a mapping:
LINE_ACTIONS = (
(re.compile("pattern 1"), do_stuff),
(re.compile("pattern 2"), do_other_stuff),
(re.compile("pattern 3"), do_something_else),
)
...
for pattern, action in LINE_ACTIONS:
m = pattern.search(line)
if m is not None:
action(m)
break
Or even use a method registry pattern that would auto-populate LINE_ACTIONS by just declaring the fuctions:
@action("pattern 1")
def do_stuff(m):
...
Alternatively, I might just break the processing into a function:
def _process(line):
m = pat1.search(line)
if m is not None:
... do stuff ...
return
m = pat2.search(line)
if m is not None:
... do stuff ...
return
m = pat3.search(line)
if m is not None:
... do stuff ...
return
_process(line)
Of course, this depends on the purpose. Could be completely inadequate in some situations.
I've found that I don't like using it. As you write, it's "inadequate in some situations", which I consider as part of the cumbersome complexities I mentioned.
For examples, do_stuff() and do_other_stuff() may need to share variables, and do_something_else() might need the line number to report an error while the others don't.
This can be handled with shared state/nonlocal, and by passing in more parameters to the generic handler API, but these add complexity.
Or, different parts of the file may have different line dispatch processing (eg, a header block followed by a data block followed by a footer block) where one of the handlers must indicate the transition to a different processor.
Also, function dispatch in CPython is slow. While the regex tests are also slow, it can also be important to consider the (using a hypothetical number) 5% overhead for dispatching over inline code.
If you only have one or two regex like that then there is no difference in readability. If you have ten in a row though then suddenly your function no longer fits on the screen and it’s more difficult to understand what’s going on.
IMHO the whole `re.match` is a design flaw in stdlib.
re.match() should always return a match object, but instead .group(1) will return None. Then we can write one-liners easier without the walrus operator.
I think there are a lot of use-cases for some kind of dedicated "if-with-outputs" syntax, where you have some extra variables available inside the if-block if the condition matched. Such a syntax could cover a lot of the problems of double execution but also prevent hard to understand code like the walrus operator.
... I've got no idea how the syntax could look though.
How would this hypothetical syntax differ from the walrus operator? I.e. what is hard to understand with the walrus operator that wouldn't be with this new hypothetical syntax and why?
Ah ok, so you'd essentially allow it in fewer contexts? I would definitely have supported that, although I do think combining walrus with other boolean expressions in if and while is super useful and not particularly hard to understand.
It comes up somewhat rarely but it is useful at times. But of those rare times, it is exceedingly rare to require any more complexity than you've shown above, such as having two assignments in one statement.
That was the reasoning given for using ":=" instead of "as", to allow more complexity. I still think it was a mistake.
Watch out with this one though. In this case it's good, but if you receive numbers from such function, it will break loop not just on None but on 0 as well. Easy to forget.
Even examples of the spec shows how unintuitive and "unpythonic" this is. Explicit is better than implicit.
IMHO adding features to the language to save 1 line of code for 10% of cases when you need it (I agree that there's occasional case when walrus will save you more than 1 line) is just bloat.
I am not a big proponent of Go, because it has its own flaws, though language is indeed very simple and creators of the language try to leave it simple.
IMO Python was very readable, super simple, intuitive and should stay that way, though recent releases show that Python is giving in for the feature bloat.
EDIT:
> Try to limit use of the walrus operator to clean cases that reduce complexity and improve readability.
I've been using Python since 1.5 and I feel like the language itself has been feature-complete for some time. In the 2.0 era, the big pain points were async and unicode; as demonstrated by, well, how painful Twisted's unicode was. ;)
But now if our big paint point is assigning a variable to len(), then in two lines evaluating that variable and printing it out, Python is just trying to find stuff to add.
Yeah. I don't like this obsession that so many developers have with adding features without end. I think it's mostly so that they can have something to do; don't think about the bigger problems, just have something to do so that they can fill out their timesheets and submit it at the end of the week.
I vehemently disagree - these walrus vars are still in scope outside of the condition block. And on top of that, now there are special conditions for these walrus vars which are completely not obvious.
I understand that nobody will force me to use this feature, but as I said before, even the spec and the write up tells how this feature is confusing and .. complex.
I know and I believe there is a purpose to be exactly after that line. Writing only simple code is not enough. If you write everything in a simple manner your code will most likely be complicated, that's exactly why you have next - complex is better than complicated.
I see this feature the same way as comprehensions, scary and complex at first, but once you learn it you would wish anyone would use it.
> As a developer who has primarily developed applications in Python for his entire professional career, I can't say I'm especially excited about any of the "headlining" features of 3.8.
Python is a fairly old, mature language.
What features would you have been especially excited about?
The f-strings from 3.6 are a (relatively) recent feature that I have absolutely loved. I'd go so far as to say they are my favorite feature introduced by Python 3.
I'm also looking forward to PEP-554 [0], which allows for "subinterpreters" for running concurrent code without removing the GIL or incurring the overhead of subprocesses.
Sounds like PEP-544 is actually included in 3.8 on a "provisional" basis, which I guess just means they reserve the right to change the API. That's a killer feature for me, thanks for pointing it out!
As someone familiar with the Python C extension API, I somewhat doubt that it'll make 3.9 either.
Disclaimer: I'm only familiar with CPython as a user of the extension API; I have no idea on how the Python team plans to address the challenges I'm mentioning here, and what their current progress is.
The current extension module API encourages global state, e.g. types (and objects too) are allocated statically in a global C variable.
For example, there is a global C variable `_Py_NoneStruct` that is the Python `None` value, and extension modules are accessing this variable directly.
Every use of this object needs to adjust its reference count, and that reference count is directly stored within the C global variable.
`_Py_NoneStruct` is currently even exposed in the PEP 384 stable ABI. Existing extension module binaries are commonly directly touching `_Py_NoneStruct.ob_refcnt` without any synchronization. Breaking the PEP 384 compatibility promise is fundamentally unavoidable here.
One of two things must happen:
Alternative one: All refcount operations must be made atomic for thread-safety. These are really really common in the Python interpreter, but atomic operations are expensive on modern CPUs (especially if there's contention). But multiple threads using the value `None` would be quite common in Python code, so I doubt you'd gain any speed even at today's core counts -- in fact I'd expect the constant inter-CPU-communication for the refcounts to make everything slower than just using a single core with today's Python!
So alternative two, ensure no Python objects are shared between the subinterpreters. That's the plan.
But that also means it's a breaking change for extension modules. And it's not just an ABI change (which would be handled by merely recompiling against the new headers). Any extension modules that do not yet support PEP 489 are already incompatible with subinterpreters, so that will take quite a bit of work until the ecosystem is upgraded.
But there will probably also be some other breaking API changes. I think type objects are currently shared across subinterpreters, and those are frequently defined as a `Py_TypeObject` global variable in extension module code. Also, if every subinterpreter has its own GIL, extension modules calling `PyGILState_Ensure()` will have to specify which subinterpreter they will be using, so that the appropriate lock can be acquired.
My prediction: 3.9 may have the basic functionality, but it still won't be able to run on multiple cores concurrently.
That will take a bunch of more work and breaking changes, and will likely be released as Python 4.0.
There will be another slow upgrade process ("my dependencies must upgrade before I can") until the Python ecosystem is multi-subinterpreter-compatible. But at least this one only affects extension modules.
f-strings are great. Much nicer than ".format". I hope in a next iteration of the language all strings will be f-strings by default, avoiding the need to prefix them by a silly "f".
I meant for inline strings (i.e., typed by the programmer). This is an inoffensive change. The only possible ``accident'' is when the programmer wants to write "{x}" instead of the value of x. This is such and exceptional case that it may be best treated by forcing to escape the curly brackets.
If anything, user-input strings must be treated as tainted whatever the case.
Then you would run into the same problem Kotlin has with its raw strings – the brackets need to be escaped, but raw strings disable escaping, so you'd have to write r"{}" as r"{'\{\}'}" which is quite ugly and not very raw at all.
That would break any existing strings that use curly braces, including most legacy use of str.format and docstrings that include code examples with dictionaries. It would be a big backward compatibility issue.
I wouldn’t mind cpython performance improvements over language features. Specifically startup time. Py cli tools can easily take up
to 300-600ms to start. Now, try to use them for scripting. A single script doing a bunch of calls to python clis has already several seconds of runtime penalty.
Though not a language feature per say, I would love if the language would take on the packaging ecosystem and deliver a ground up approach that isn't just a kludge on a kludge on a kludge.
It's not great, but also it isn't terrible, the problem with Python packaging is that there were many ways to do it and people are confused. What's worse, there are tons of articles (in including from PyPA) that are providing bad information.
If you use setuptools and place all your configuration declaratively in setup.cfg it is not that bad.
I've heard people recommend poetry on here as something that is good for package management, but I haven't tried it. Does anyone know of something like that, but also can build your application into a Docker container as well?
Into a docker container as well? The package manager isn't really related to docker though?
Does your container expose ports or need volumes? Does it need gunicorn or uwsgi in front of it? What about system packages? None of those (except maybe the last one) are really in the scope of the package manager.
The main use case is I have a project with a setup.py that has the normal stuff in it, and there's a main entry point in one of the files. I then make a dockerfile that installs that package in a container and runs the main file as the entry point. Ignoring things like ports, it would be nice to emit a standard dockerfile like that, since it's very common.
You might like portage. We have USE flags (which are ./configure --stuff), slots (so you can install multiple versions in a clean way), and any kind of dependency tree you can think of. When I am thinking of upgrading the system python, I can enable the next version and let things simmer for testing (building for py3.7 and 3.8 for example) before I actually throw the switch (eselect python set N).
I think that feature has exactly zero chances of getting adopted, but that's probably one of my top Python annoyances too.
If it's any consolation, the scope of a variable isn't determined by heuristics, it's just that the rules are (IMO) kinda bad. Basically, if a variable is assigned (in addition to `=` and friends, `import`, `class` and `def` are also assignments in disguise) to in what Python calls a "block" (which is not like a C-style block; rather `class` body, `def` body, or top-level) it is considered to belong to that block (with the notable exception of the name for a caught exception in a `except Exception as e: ...`, see [1]). If you want it to refer to a variable outside the block, you need to use `global` or `nonlocal` as appropriate. The full gory details are in [2]
Yes, I meant the rules are bad. I called them heuristics, because I assume they came up with the rules as a heuristic to what would be the most useful behaviour.
Python didn't use to have lexical scoping, so the rules are retrofitting around that.
Yeah, it all seems a bit hacky. Especially the bit where a caught exception bound to a variable is cleared at the end of the `except` clause. I'm pretty sure it's because if you didn't you'd get a reference cycle (exception object -> stack trace -> function invocation record -> function locals -> exception object), which makes CPython's GC sad.
Correctly done multi-line lambdas/expressions could bring Python's expressivitiy to where ECMAScript is nowadays, where you can write both foo = function(...) or function foo().
I would honestly like an option to "compile" python to check for typing inconsistencies with the type hinting introduced in Python 3. Also, the multithreading story is kinda shitty
The suffix is just "p" in Common Lisp[1], so I think the "predicate" explanation is more likely to be correct than the visual analogy to the question mark.
There have been many times when I've wished python had something like the walrus operator.
In particular, without the walrus operator you can't write an if-elif chain of regex comparisons. The best way to that that I've been able to find is with a bunch of nested if-else statements.
(source code for Eveem.org, which is arguably the best decompiler for Ethereum smart contracts out there. you can see a lot of pattern matching in pano/simplify.py , and I found no way to do it without extending the language/walrus while maintaining the readability)
I went to your repo, and it wasn't clear to me what Panoramix actually is or does. So I checked out Eveem.org, and even on the about page, it wasn't clear to me what Eveem was.
But with that context, coming back to your GitHub page, I was able to get the gist of it.
It might be helpful to have a section above the installation section that gives a little context / tells about the project.
And in your example section -- again, I literally don't know anything about decompilation so maybe I'm not your target demographic... but if I understand the gist of it, kitties is an example function or binary. It might be better to use something more concrete (and provide the context).
After a quick read through, I personally find this syntax much harder to read using operator overloads than it would have been using a series of comparison functions. Tilde is a very obtuse operator and I don't know that this really buys you anything.
Feels like a case of preferring cleverness over readability/usability/maintainability.
I actually had a series of comparison functions initially, but those weren't that readable.
To give you an example, let's look at this statement:
if exp ~ ('mask_shl', int:size, int:offset, -offset, :e):
...
It is the same as writing:
if type(exp) == tuple and len(exp) == 5 and exp[0]=='mask_shl' and type(exp[1])==int aand type(exp[2])==int and exp[3] == -exp[1]:
size, offset, e = exp[1], exp[2], exp[4]
...
If you can figure out syntax that makes this sort of matches more readable, I'll gladly use it :)
if exp ~ ('mask_shl', int:size, int:offset, -offset, :e): ...
vs
if type(exp) == tuple and len(exp) == 5 and exp[0]=='mask_shl' and type(exp[1])==int aand type(exp[2])==int and exp[3] == -exp[1]:
size, offset, e = exp[1], exp[2], exp[4]
...
There have been dozens of working C++ competitors over the years, each one technically as good as C++.
They all failed because their authors are good at writing languages, but very bad at knowing how languages are used in the real world.
The C++ committee really knows what it's doing, you're not going to do anything useful in this space without actually understanding the problem domain.
TL;DR - yes, C++ is complex, but that is because programming is complex. Your attitude of "programming is hard, let's go (language) shopping" will not displace C++.
C++ made a number of... unfortunate decisions early in its development, which resulted in a much worse language than it could have been. (Hindsight is 20/20, of course.)
I totally agree that C++ of 21st century is way better than C++ from 1998, and is becoming better. The problem is that maintaining backwards compatibility requires to keep a number of footguns in place, and some of them are cornerstones of the language :(
I have high hopes for Rust becoming the more sane competitor in the space traditionally occupied by C++. Its authors seem to understand the problem domain from a very practical standpoint, and are proficient in C++, to begin with. They also strive to apply reasonable design principles, and some actual math hopefully will keep the core of the language from logical pitfalls.
Another example of hitting a sweet spot is Go. I dislike many Go's language decisions. (Some are definitely great, though, e.g. the whole OOP approach lifted from Oberon.) But its authors definitely understand both the problem domain, and the target audience.
This is what I mean by talking about success despite the flaws. C++ has both flaws and merits, and merits outweighed the flaws. But the flaws are still painful.
There is a similar trick that you can use without the walrus operator. I wrote the code many years ago and I am reproducing this from memory, so the code below may be a little off. Define this class:
class Any:
def __eq__(self, other):
self.value = other
return True
Then you can use an instance of it as the hole in a pattern. My use case was performing peephole optimizations in a simple 6502 assembler. For example, if the LDA operation (load value from memory location into accumulator) occurs twice in a row, the first load is redundant. So I did this test:
x, y = Any(), Any()
if L[i:i+2] == [('LDA', x), ('LDA', y)]:
L[i:i+2] = [('LDA', y.value)]
(Note that the last line changes the length of the list, so it may be slow on large lists.)
I was going to write a blog post about this back then, but I never did. If anyone is interested, I can write it now.
This is super interesting! I've never seen codecs used to introduce language features like this... Do you have any good references for getting started with using codecs?
Also, I wonder if there's an easy way to implement this in a jupyter notebook, without fiddling with the kernel...
(issues of whether it's a good idea aside, definitely seems useful to know about.. )
Thanks! The idea for using codecs is not mine - found it on StackOverflow somewhere. Googling for "how to add custom language statements python" should give you some results, this included :)
Also, in my case I had to use Codecs because a new operator causes AST-parser to not go through. But if you have some kind of syntax that will be compatible with Python AST, then you can implement it even easier.
You can google Python AST to get a lot of useful information :)
The reason why assignment expressions initially were not allowed in Python and why they had to introduce "walrus" was because in languages that used single equal sign for assignment enable to easily make bugs by typing "=" instead of "==".
I find it strange to sacrifice syntactical simplicity for bug prevention in a language that needs deep testing anyway because it's neither nil- nor type-safe. But maybe I'm just to used to inline "=".
Every time I come across a use case when the "for else" construct would be applicable, I decide against using it just because the "else" keyword is just very unintuitive to people not familiar with this rarely used construct ... so yeah, I too wish it was a different keyword :)
PEP-3136 was proposed and discussed more than 10 years ago in the 2->3 process. The python community was very different then. As an example, Python 3 was also the language that removed the function argument tuple packing, to which my reaction is basically WTF.
> Over complicating a language with rarely-used features can definitely create problems.
Agreed.
There's a number of (full) languages that compile to the Python AST [0] (I'm especially fond of http://hylang.org), but they're very different from Python. It would be interesting to see smaller variations of standard Python implemented in an interoperable way like these languages do.
Not all releases have to be ground breaking, especially in a mature and stable language. Cleaning up things, putting other things in place for major releases, that's commendable too.
I did not follow all the conversation around the typing system but isn't the whole point of it to propose optimizations in the future? I find it exciting
> Same with the forced positional/keyword arguments
I also thought, why bother at first until I learned this is already a feature in python, but only for c-functions. So of course it makes sense to level the playing field.
The audit hooks are going to allow some really interesting things, especially around security controls on executing Python code. There is hope for sandboxed Python execution!
Rock and a hard place, no? Either you keep adding to a language and people keep using it, or you stop adding to a language and people call it a dead language and stop using it.
Most devs don't understand that software can be done and still be useful.
Actually, Guido approved the walrus operator. (In response to the justified backlash, instead of canceling the feature, he resigned as BDFL, which is not what anyone wanted)
Wow: IMHO that is a very ugly syntax to define a fn with 6 parameters, although it looks to be a path dependency on previous decisions (*, and CPython ,/) and clearly there was much discussion of the need for the functionality and the compromises: https://www.python.org/dev/peps/pep-0570/
It amazes me to see how certain features make it into languages via community/committee (octal numbers in JavaScript - arrgh!).
One thing I find really difficult to deal with in languages is the overloading of different syntactic(edit:semantic) usages of different symbols, which itself is a result of limiting ourselves to the ASCII symbols that can be typed. I don't recall written mathematics having the issue badly (although I haven't had to write any for a long time!).
Yes, "path dependence" is a good way to describe it. (And Python has been my favorite language for 16+ years now)
For https://www.oilshell.org/ , which has Python/JS-like functions, I chose to use Julia's function signature design, which is as expressive as Python's, but significantly simpler in both syntax and implementation:
Basically they make positional vs. named and required vs. optional ORTHOGONAL dimensions. Python originally conflated the two things, and now they're teasing them apart with keyword-only and positional-only params.
A semicolon in the signature separates positional and named arguments. So you can have:
func f(p1, p2=0, ...args ; n1, n2=0, ...kwargs)
So p2 is an optional positional argument, while n1 is a required named argument.
And then you don't need * and star star -- you can just use ... for both kinds of "splats". At the call site you only need ; if you're using a named arg / kwargs splat.
----
Aside from all the numeric use cases, which look great, Julia has a bunch of good ideas in dynamic language design.
The multiline strings in Julia appear to strip leading space in a way that's better than both Python multiline strings and here docs in shell.
Also Julia has had shell-like "f-strings" since day one (or at least version 1 which was pretty recent). Python has had at least 4 versions of string interpolation before they realized that shell got it right 40 years ago :)
Yes it can't, unless of course you just make two variables (one positional and one named) and inside the function you choose one.
Because of Julia's multiple dispatch paradigm, positional arguments are special compared to named arguments because they decide what method to dispatch to (in Julia f(x) is equivalent to x.f() in object oriented language, and it's extended to all positional arguments). That means that if you called f(a, b=nothing, c=nothing), f(1, c=1) it would dispatch to f(Int64, Nothing, Int64), while if you called f(a; b=nothing, c=nothing) with the same args it would dispatch to f(Int64). In Julia named arguments are effectively a way to pass more arguments without complicating the dispatch rules, and since there is only one way to call a function (outside of optional arguments, which appears to the end user as another implementation of a function) there is no ambiguity to where it dispatches to.
So basically, every language has it's own quirks, which the syntax decisions usually reflects, and Julia's scenario is fundamentally different from Python's.
Yeah that's true, but I consider it a feature and not a bug.
There was a rule in Google's style guide that said to only pass optional arguments with names, and required args without names. It basically enforced this separation while being slightly less flexible, because you can't have optional positional args or required named args.
Tens of millions of lines of code was written by thousands of programmers in that style, and it caused no problems at all. On the contrary it made code more consistent and readable.
I’d argue that written maths is worse with its overloading of the Greek alphabet. Although in the case of maths I’d argue that it’s a result of how slowly we write, with the programming equivalent usually being a variable name (autocompleteable).
Yes, math is a much bigger culprit. Including for operator overloading.
Their saving grace is that humans are more intelligent interpreters.
If you ever try to translate a human-readable proof, even a fairly formal and rigorous one, into a computer proof assistant like Agda or Coq, you can see all the little ways humans cut corners.
There are programming fonts that use ligatures to convert >= to ≥, or -> to →, so the source code remains in ASCII but symbol sequences show as unique Unicode characters.
A next step could be for common dev environments to actually convert symbol/key sequences to operators (how do APL programmers do it?)
Avoiding ambiguity and semantic overloading of ASCII symbols would surely help beginners (if also given a UI that clearly exposes ways to enter the new symbols). I always find one letter operators extremely strange too like u"word" s/foo/bar/ etc.
It seems a shame we can type in hundreds of Unicode symbols on a mobile virtual keyboard, but not readily on a physical keyboard.
JavaScript has supported Unicode for a long time, but the core language doesn't use it at all.
> There are programming fonts that use ligatures to convert >= to ≥
It's a neat hack, but that's the opposite of what I want. It means I can't look at source code and know what characters make it up, which is the primary reason I'm still putting up with plain text files (and mostly ASCII) for source code at all.
Out of habit, I occasionally type some control-letter combination which is a valid command in Emacs, but in Xcode (and other native Mac apps) inserts an invisible character, which happens to be invalid in Swift. 10 minutes later, I get a compilation error pointing at a line of code that looks perfectly valid. Frustrating!
That's like visual puns, and is limited to whatever symbols seemed important to the developer at the time.
How do you discover how to type ß, °, «, ‡ etc?
Android's gboard uses phonetics (long press [s] key for ß), symbolic similarity (long press [*] key for ‡) and visual similarity (long press [<] key for «) which is guessable for some symbols, but isn't discoverable for others (you can search for emoji by name, but not symbols).
The Mac has had, as a standard feature for decades, an on-screen keyboard to allow you to explore the results of different key combinations. It’s a great tool.
I should note that, alongside it, is a useful search tool that allows you to find the character you’re looking for if you’d rather just search and don’t care to learn the key combination (if there is one).
In Mojave, and it’s pretty similar across versions IIRC: open keyboard preferences in system settings and check the “Show keyboard and emoji viewers in menu bar” box.
That will replace the language flag icon in your top bar with an odd icon with the command key embedded.
The second option in that menu is the keyboard viewer. You can show that and drag on a corner to make it as large as you want. Dynamically changes the keyboard as you hold down modifier keys.
It's easy on the Mac to have a custom or purpose-built keyboard layout that makes sense for the user or context. It's a little harder with X (mostly due to uncoöperative DEs) but still doable.
Self-reply for Kwpolska: Adding layouts to /usr/share/X11/xkb/symbols/ is straightforward (but is necessarily system-wide, and requires root), and I do that. However, for the popular desktop environments, it's like pulling teeth to have that layout treated as first-class with respect to switching and settings.
(On MacOS, putting a .layout file in ~/Library/Keyboard\ Layouts is enough.)
X has Compose, which originated slightly earlier (the key, that is, not the X11/xkb implementation), and is a tolerable alternative. e.g. [Compose / =] ↦ [≠]
Putting double S on option-s just makes option-w for Sigma more grating. If they were going by shape, it'd be on option-b, which is equally stupid. But since it's on S, we can conclude that... someone at Apple knows German, but nobody can even name a Greek letter? That Germans are right and Greeks are wrong? What?
I don’t think it that extreme, but that German has higher precedence than Greek. If you need to include both characters, you have to pick one or the other, and for whatever reason this configuration happened. We might never know the true reason behind the decision, but stupility is unlikely the answer. Most seemingly stupid decisions (technical or otherwise) makes at least a certain amount of sense within its proper context, and it’s fair to talk down to it when you don’t have the risk of being judged by that decision.
The concept of variations on a common letter wasn't exactly new at the time. If you're supporting German and Greek, it seems safe to say you're also supporting French.
Windows has a keyboard called US-International with extra symbols. Also, note that those key combinations might not exist in other keyboard layouts (replaced with local keys).
Many languages allow that. Scala allows characters in the math symbols (Sm) and other symbols (So) Unicode categories as identifiers for functions and variables, and APL has a rich notation that relies on many single character operators.
I’ve been looking at making up a second keyboard with a bunch of useful characters on it (similar to Tom Scott’s unicode keyboard) and learning Perl 6 to make use of it. I’m thinking that’s going to be my Winter Project/New Year’s resolution this year.
They borrow syntax from LaTeX math-mode to allow symbol entry, such that you can type "\ne" and as soon as you hit the space after the "e", you get ≠ instead.
The language plugin for Visual Studio Code does the same thing.
Lean Prover's not exactly a programming language, but :
def foo (a b : ℕ) : ℕ → ℕ → ℕ :=
λ a b, a + b
Still looks decent.
I think it has to be all or nothing though, since if it's optional the odds of getting decent editor support for it is low.
What bothers me is that the / is separated by commas, which makes it seem like the signature is one more than it actually is (even if that is not the case technically). I haven't looked too carefully here though.
This got missed from the release announcement, but now there's `functools.singledispatchmethod`,[1] as the class method sibling to `functools.singledispatch`.[2] This allows you to overload the implementation of a function (and now a method) based on the type of its first argument. This saves you writing code like:
def foo(bar):
if isinstance(bar, Quux):
# Treat bar as a Quux
elif isinstance(bar, Xyzzy):
# Treat bar as an Xyzzy
# etc.
I understand runtime type checking like that is considered a bit of a Python antipattern. With `singledispatch`, you can do this instead:
With `singledispatchmethod`, you can now also do this to class methods, where the type of the first non-self/class is used by the interpreter to check the type, based on its annotation (or using the argument to its `register` method). You could mimic this behaviour using `singledispatch` in your constructor, but this syntax is much nicer.
The issue with this and `singledispatch` is that they no longer support pseudo-types from the `typing` module [1] so you can't use them with containers of type `x`, e.g. `List[str]`, or protocols like `Sequence`.
I don't understand why anyone would want singledispatch. Instead of having the function defined in one place where you can look it up, now the function is potentially scattered all over the place. (I'm not talking hypothetically. I've had the 'pleasure' of working on a codebase where different singledispatch cases of the same function were defined in different files!)
Because if you need to branch based on type more than a few times in your function it can get pretty hard to read. Am I right that the basis of your complaint is that it's now harder to find all the members of (what you'd call in C++) the "overload set" for a particular function? If so I can see your point.
Yes, that's my point. And it doesn't really simplify the function itself, it just rearranges it, in the same way you can break up a complex painting into jigsaw pieces and say, "Look, each piece is simple!"
I think the answer to the question "should I apply X refactoring technique to this function Y?" obviously has to depend on both X and Y. There's clearly a trade-off here. If I have a free function foo(x) that I want to work differently depending if x is type A or type B, splitting that means breaking up foo, sure, but because I can only ever call foo with an x of type A or type B (unless I'm converting some As to Bs halfway through foo or something awful like that) it might be useful to see all of the "A" logic in one place and all of the "B" logic in another place.
I can definitely imagine some places where this replaces type-checking, but it still seems like a bit of an unfortunate anti-pattern to me, since it's really a sort of C/C++ style function prototype match.
My immediate thought is that it's going to be hard for PyCharm to reliably point me to a function definition.
It can also do dispatch on multiple arguments and on equality and has dispatch cache implemented as C extension (from my rudimentary measurements it seems that dispatch with cache hit is actually slightly faster than normal CPython method call).
I saw that and was like "Oh, that'd be a handy feature for a lot of other programming languages too." But the more I think about it, the more I'd rather have a feature that takes a list of expressions and converts it into a dict where the keys are the expression text and the values are the values of those expressions. Basically, syntactic sugar for this:
{ k: eval(k) for k in ('theta', 'delta.days', 'cos(radians(theta))') }
This would trivially subsume the f'{user=}' syntax for the example given: just print out the dictionary. But it'd also be useful for: filling template dictionaries; printing out status pages for HTTP webservers; returning multiple variables from a function; flattening out complex data structures; creating dispatch tables out of local functions.
You could even have a syntax like locals('theta', 'delta.days') and keep it familiar.
Pretty trivial to do with proper macros in a language. Julia has macros for printing variables and values, like `@debug(var)`.
There's an Elixir library with macros to make a map (dictionary) using the variable names passed in [1]:
iex> import ShorterMaps
...> name = "Chris"
...> id = 6
...> ~M{name, id}
%{name: "Chris", id: 6}
Though wether its a good idea or not is another question. ;) If you want to do the type of programming you're talking about you should try Elixir/Julia/Clojure(/Rust?)... or any number of other languages with macros.
Logging.info(“%s %s”, name, country”) the string isn’t expanded if the logging is set to warn. While with your example the string is expanded then discarded
Why didn’t Python ship with the opposite functionality as well? Parsing instead of formatting. Given a string and a format string, return a list of variables (or a dictionary).
For example, to checkpoint a model, I would save it as “ckpt-{epoch_number}-{val_loss}”. Given this file name and the original format string, I would like to recover the epoch number and validation loss variables back.
The arguments were based around also supporting complex multiple assignments and being to be able to do them anywhere, not just if… and while…, list-comps, etc.
Of course neither of those things have been seen in the wild or production code since. Some folks complain even the simple case above is less readable.
"as" was the Pythonic choice, rather than the C/Pascalic one.
I think I remember seeing something about the "as" syntax. Wish I could find what I read, IIRC some arguments convinced me that the walrus operator was an improvement.
> I'm still looking forward to using assignment expressions for testing my re.match() objects.
The fact that this is the go-to example that everybody is using in justifying the introduction of the assignment expression convinces me that the real problem lies with the re module's API.
(To be clear: I will also be using assignment expressions for this case, but I don't think assignment expressions are really in line with the overall design of Python.)
Here the "better" solution (imo) would be support for pattern matching:
match my_map.get(key):
case None:
// do thing (or nothing, i.e., pass)
case Some(val):
// do other thing
Scala seems to have figured out how to get pattern matching over arbitrary data (i.e., not statically-defined algebraic datatypes). I'd like to see this come to Python.
(Normally I'd fight for pattern matching to always provide static type safety guarantees, but in Python it seems completely reasonable to omit such checks.)
EDIT: But you're absolutely right that this is a place where assignment expressions will be used regularly, so thank you for pointing that out!
I don't write much python so there's probably something obvious I'm missing, but I don't see why they didn't use "=". Is there some significant difference between assignment expressions and assignment statements that makes it worth having distinct syntax?
Has been the source of many errors in many languages. Forcing assignment to be more than a 1 character difference from the equality operator prevents this.
It depends on the definition of edit distance used; they both have a unweighted Levenshtein distance of 1 from “==” (# of inserts, deletions, or substitions), but “:=” has an LCS distance of 2 vs “=” with 1.
Perhaps more importantly, “=” and “==” are more visually similar than “==” and “:=” and also easier to mistakenly type for each other.
I should have said single-character additions or deletions . == And = have a levenstein distance of 1, == and := is 2. You can't simply make a typo and change an equality check to assignment.
So, wiki defines levenstein distance as including substitution, but I was taught or at least remember it only being additions and deletions?
It is, but many intro programmers (and non-intro programmers!) often forget to use == instead of = for comparison, because in the real world, = more often implies equality, rather than assignment. So a novice might mistakenly type `if x=1:` but they would be unlikely to accidentally type `if x:=1:`.
> Is there some significant difference between assignment expressions and assignment statements that makes it worth having distinct syntax?
Given that Python is statement-oriented, yes, having statements visually distinct from similar expressions is important.
It's also important to avoid making the equality operator and the assignment operator visually similar or easy to typo one for the other, which is arguably the bigger need for “:=” vs “=”, since “=” and “==” are quite similar and easy to accidentally mistype for each other.
Yeah, it’s super unfortunate how the existence of py27 security patches magically stops all py3 users from doing any work. It’s sort of how the existence of c prevents anyone from using rust.
Sorry, not good enough. The decision was made, good or bad, 11 years ago. Everyone had plenty of time to come to grips with that reality, anyone still lingering on Python2 is bad at their job.
The bad decision now is to allow the split to continue, and RedHat is allowing that to happen due to greed and ignorance.
It’s a business play. If companies don’t want to move off 2.x and are willing to pay a Software vendor to backport security fixes so that they can CYA, so be it.
Easy. Don't buy products that are obviously still using python2. Problem should solve itself in some time. Hard to tell sometimes, but looking at dependencies and plugin languages is a good way.
The only software I've written in Python that's been sold was written against 2.5 and has long been out of my control. We sold the source to the sole client (was a financial services migration tool for a very specific domain during a joint venture; I can't divulge anymore).
I know longer work at the company, but i'd wager that the client wouldnt have been willing to pay $500/hr (the rate my company billed me out at for support, features, etc), 10 years ago, to have me port the app and the handful of 3rd party dependencies to Python 3.
Moving from 2 to 3 took a while. I moved once the libs I used moved. For an "app" developer, the migration was easy once my dependencies were ready. Most of the changes were straightforward. The print statement becoming a function was easy. str becoming Unicode and introducing bytes has been a headache. Still having issues time to time with text encoding, especially with encoded text dealing with SQL Server (looking at the default CP-1252 for US English). Another one that still trips me up is is needing a "newline=''" argument when opening as CSV file.
That said, I love Python 3. Ita mostly about forgetting/relearning old syntax, which is going to happen for anything youve been using for decades and undergoes a significant change. I started with Python over 15 years ago, at version 2.2, for reference.
I think it's less that someone "can't get over Python 2.7" and more "why should I?". Nothing in 3 except bytes/string handling is compelling. Lots of 2 libraries haven't been ported. There is no legal reason to move and no other time pressure.
I would argue that people have been "getting shit done" with C for decades, despite newer shinier flavors of the month popping up, so your argument doesn't hold water to me.
If you want other people to make a change then it's on you to make a convincing argument for why the new thing is an improvement, not just go on tirades about how people should get with the times.
Personally, I lost faith in the Python core team because of the Py3 migration. Yes, 3.x now has a bunch of nice features that 2.x did, but almost none of them actually depend on the 3.0's breakage (as proven by Tauthon).
If you want people to follow you through a break-the-world migration then you need to motivate why it is needed and why it couldn't be done incrementally, not try tempt them with a bunch of unrelated carrots that are bundled together with the breaking change.
> If you want people to follow you through a break-the-world migration then you need to motivate why it is needed and why it couldn't be done incrementally
They did. I remember looking through this, and I remain convinced that the core developers were correct and there was no way to fix Unicode handling incrementally.
Add the u-sigil for unicode (as they did), add the b-sigil for bytestrings (as they eventually did), and then go through a regular deprecation cycle for sigil-less strings (rather than releasing a 3.0 where the u-sigils were removed completely). Maybe at some point re-add sigil-less strings as an alias for u-strings, but I'd rather have old stuff break with a clear message than have a bunch of weird side bugs.
I've done that; this isn't the first time I've talked about this problem. HN is not a safe space, there is no room to convince anyone of anything on here, so tirades are all that's left.
We're well past your argument. It's been 11 years, it is no longer reasonable to hold your particular grudge. Get on board with the modern Python or get the hell out of the conversation.
the emotional content outweighs the objective situation here; also humorous since Luddites are commonly misunderstood per https://en.wikipedia.org/wiki/Luddite
A luddite, generally, is one who is anti-technology. A Luddite, specifically, is a member of a 19th century movement to prevent automation from taking their job.
It's insanely pedantic to try and point out the difference, given the ease with which one can find the generic definition.
> “Final” variables, functions, methods and classes. See PEP 591, typing.Final and typing.final(). The final qualifier instructs a static type checker to restrict subclassing, overriding, or reassignment:
> pi: Final[float] = 3.1415926536
As I understand it, this means Python now has a way of marking variables as constant (though it doesn't propagate into the underlying values as in the case of C++'s `const`).
I don't see the point of `final` without an optimizing compiler. Name mangling is sufficient for stashing references to avoid accidental side-effects of overriding.
> The typing module incorporates several new features:
> A dictionary type with per-key types.
Ah, I've been waiting for this. I've been able to use Python's optional types pretty much everywhere except for dictionaries that are used as pseudo-objects, which is a fairly common pattern in Python. This should patch that hole nicely.
Agreed--if you're going to go through the trouble of adding detailed type hints to describe a dict, you're like 90% of the way to a dataclass with better usability.
One of the features I'm looking forward to using is the kwdargs support in dataclasses. Honestly, find it so much more useful and intuitive than the 3.7 positional only.
i am actually pretty sad that TypedDict has made it out of typing_extensions in its current state.
It was a major missed opportunity to provide a properly duck-typed Dict type, which by definition would allow untyped key-value pairs to be added to a "partially-typed" dictionary.
Gradual typing in general is a massive win for many kinds of real-world problem solving, but when you make it as hard as Python has to introduce partial types to a plain data object, you're leaving a lot of developers out in the cold.
I love MyPy and the static type hints since 3.6, but structural subtyping is superior and so obviously more Pythonic than nominal, yet support for structural subtyping keeps lagging behind.
Arguably one of the hotter debated functions is assignment-as-expression through the := or walrus operator.
Quite happy with the new SyntaxWarning for identity comparison on literals and missing commas :)
Especially neat is also the `python -m asyncio` shell which allows you to run top-level awaits in your repl instead of needing to start a new event loop each time!
I like the additions of the f-strings and walrus operator, but I find myself wishing a breaking-release removing the old features that the new features covers.
Python's philosophy was to have one way to do something, but the current situation of Python is very inconsistent.
Python 3 has like 4~5 ways to format strings, and due to the addition of the walrus operator, we have (I understand the differences between := and = but) two different syntaxes for variable declaration/assignment.
I understand that Python can't break all kinds of code (as the 2->3 conversion is still a pain), but still I imagine a Python-esque language without all the warts that Python have with it's 'organic' growth.
That's a good use case for a linter. Ban outdated constructs in your code, but still allow you to depend on things that use them. Beats another 2->3 split again.
It is: "There should be one-- and preferably only one --obvious way to do it."
The core of that statement is "There should be one obvious way to do it" - there's no "only" there, it just says that when you need to do something, there should be an/some obvious way to do it. Then, preferably, that should be the only obvious way — though of course that doesn't preclude there being many other less-obvious ways.
With string interpolation we've certainly now got multiple ways to do it; that doesn't violate the principle. I agree that at least two of those are "obvious" (f-strings and .format()), and you could argue that %-interpolation is obvious too — but none of that is in violation of the principle that there should be an obvious way to do it, just of the preference that there's only one.
Finally it's worth remembering where this idea came from: it was in contrast to the perl mantra of "there's more than one way to do it", and a reaction to the resultant confusion frequently experienced when reading someone else's perl code — this was python saying "we're not perl, we value clarity and comprehension". I think that much of that problem was bound up in perl's syntax choices, and that in the cases in python where there's more than one way to do it (say, string formatting, dataclasses/attrs/namedtuples/etc.), it's usually pretty obvious what machinery is actually being used. When was the last time you looked at a line of python and said "I have no idea what the fuck is going on here?" That was a frequent experience with perl in the heady days of the late 1990s.
That's a pretty good idea, deprecating & warning + providing automatic conversion utilities + using 'future' to make them errors...
But that won't happen :-(
And then import from past to remove warnings if you really want to use those constructs without warnings. Next time, feature would be removed or a depreciation warning when used with from past.
Some deprecation warnings have been enabled for a developers main source files recently. Though not in third-party libraries, which would be frustrating to the end-user.
I try PyPy every 3 months. It has improved greatly and for some periods I migrated to it. But for this particular project, most of time is spent inside lxml, pandas, scikit-learn and other extensions. CPython is actually faster than PyPy for this project.
Maybe GraalVM / GraalPython can improve on this use-case.
Ugh, I hate assignment expressions. I liked that they were missing from Python.
I've been coding in algolesque languages for 20 years and hiding assignments inside of expressions instead of putting them on the left like a statement has always tripped me up.
But, but, but... now you don’t need to write that extra line of code! It’s going to make everything sooooo much better, code will practically write itself now.
I’ll just leave this here:
“There should be one—and preferably only one—obvious way to do it.”
At this point it's about as true as G not being evil.
Which is fine by me, it was a silly idea to begin with. What you really want is separated concerns that compose well, obvious here doesn't mean anything over there.
> The list constructor does not overallocate the internal item buffer if the input iterable has a known length (the input implements __len__). This makes the created list 12% smaller on average. (Contributed by Raymond Hettinger and Pablo Galindo in bpo-33234.)
Wow. I believe it's a stated goal of CPython to prefer simple maintainable code over maximum performance. But relatively low-hanging fruit like this makes me wonder how much overall CPython perf could be improved by specialising for a few more of these.
It really is not supposed to be used unless you are wrapping a C library. Or at least that’s the original reasoning, I think. Let’s hope it doesn’t get abused.
Is shared memory one of those things that I should try to access/use immediately, or wait for someone to write a wrapper library around, due to the large number of edge cases/strangeness that might occur?
Why not both? You get a great feel for what the edge cases are and how they occur, and thus a better understanding of why decisions in "wrapper libraries" were made, when these edge cases bite you.
I really like the walrus operator, but I didn't realize how many "you shouldn't do this" and "this is too hard already" cases exist where they are discouraging using the walrus operator.
The new assignment expression is great and I wish that an optional operator (`?`) and Elvis operator (`?:`) will make it to Python one day, too. I would love to write:
v = obj?.prop1?.prop2 ?: "default"
instead of long if conditions:
v = obj.prop1.prop2 if obj and obj.prop1 and obj.prop1.prop2 else "default"
Yes, but your example is less efficient than the new code.
Remember: every variable is an assignment into a dict, and every lookup a query into a dict. Reducing name lookups and assignments can yield good speedup in tight loops. For instance, caching os.path.join, os.path.split into local names can significantly speed up tight loops iterating over a filesystem. For example, os.path.split is potentially 5 dictionary lookups. Checking locals, nonlocals and globals for os. Then another to find path, and a final one for split. And this happens at runtime, for every invocation.
Local variables aren’t stored in a dict, likewise when a class has defined __slots__. Globals, modules and usual classes do use dicts internally, but locals do not. So from the efficiency standpoint it’s (almost?) the same. I haven’t checked the bytecode, maybe there is some slight difference.
I checked the bytecode, in the walrus version you get [...DUP_TOP, STORE_FAST, ...] in the non-walrus you get [...STORE_FAST, LOAD_FAST, ...]. Besides that identical. I imagine DUP_TOP is faster than LOAD_FAST, but I feel either way this is useless micro optimization at its finest.
So looks like either way, you have an array access of something definitely in cache, a Py_INCREF, a PUSH, and a FAST_DISPATCH. The walrus operator saves you a null-check, but that check is probably skipped right over by the branch predictor, as it always throws. I'd bet the performance is indistinguishable, but I'd be interested to see for real.
It has been a while since I've looked at the implementation details. How does locals() work, then? It does return a dict. Slots are definitely an edge case I did not address. I honestly dont know how name lookup works in any version of Python.
Edit: I realize that local name lookup doesn't need to follow the result of the locals() built-in function.
I'm not a huge fan of the := operator for Python for clarity and single one clear way reasons, but the draw is the saved line of code here that is the "help"
I feel the same way as you do. For me, which version of an interpreter I'm using should be the kind of issue I only need to worry when solving extremely specific, deep-level problems. Python 3+ breaks this pact too often for my taste.
Considering this f-string example taken from another announcement:
This is valid Python 3.8, but it's not valid in Python 3.7 (no walrus operator). And removing the walrus operator still doesn't work in Python 3.5 (no f-strings). On top of that, other comments already mention how f-strings have lots of weird corner cases anyway.
The entire point of Python in my circle of friends was that it made programming easy. Instead, I feel more and more in need of those "It works in my machine!" stickers. And good luck solving these issues if you are not a full-time programmer...
I'm not sure why you frame this as an issue with Python 3. This has always been the case, even in the 2.x days every release added new features, and if you ran code using them in an older version it wouldn't work.
The minor releases are always backward-compatible, so just run the latest version and everything will work.
The issue is that the ecosystem as a whole tends to follow the tip of the version chain. This means that any human-oriented Python stuff (e.g. documentation/tutorials/code-review/etc...) requires you stay up to date with the language changes.
Asyncio was the worst culprit, as it essentially introduces an inner-platform with its own dataflow semantics, but at least there the upsides were large and tangible.
I don't understand why the walrus operator is needed at all. Why not allow, this to work:
if m=whatever():
do_something
Why create a new assignment operator? For all the talk about making code not confusing, etc., some of these decisions sure seem nonsensical.
Oh, OK, it confuses passing arguments into a function by name? Really? C'mon.
Also, I can't understand the nearly religious rejection of pre and post increment/decrement (++/--) and, for the love of Picard, the switch() statement.
I enjoy using Python but some of these things are just silly. Just my opinion, of course. What do I know anyhow? I've only been writing software for over thirty years while using over a dozen languages ranging from machine language (as in op codes) to APL and every new fad and modern language in between.
As I watch languages evolve what I see is various levels of ridiculous reinvention of the wheel for very little in the way of real gains in productivity, code quality, bug eradication, expressiveness, etc. Python, Objective-C, Javascript, PHP, C# and a bunch of other mutants are just C and C++ that behave differently. Sure, OK, not strictly true at a technical level, but I'll be damned if it all doesn't end-up with machine code that does pretty much the same darn thing.
The world did exist before all of these "advanced" languages were around and we wrote excellent software (and crappy software too, just like today).
What's worse is that some of these languages waste a tremendous amount of resources and clock cycles to do the same thing we used to do in "lower" languages without any issues whatsoever. Mission critical, failure tolerant, complex software existed way before someone decided that the switch() statement was an abomination and that pre and post increment/decrement are somehow confusing or unrefined. Kind of makes you wonder what mental image they have of a programmer, doesn't it? Really. In my 30+ years in the industry I have yet to meet someone who is laid to waste, curled-up into a fetal position confused about pre and post increment/decrement, switch statements and other things deemed too complex and inelegant in some of these languages and pedantic circles.
This is fair criticism. I've worked with Python for many years now, and one of the things that attracted me was that there was always a fsirly static "Pythonic" way to do things. The language has now grown so much that 80% of the people only use 20% of the language, but not always the same 20%, making it difficult to read other people's code. And all the syntactic changes contribute to fragmentation (not every project can have its Python interpreter upgraded regularly).
There's something to be said for a more stable, if less elegant, language.
It is somewhat strange that Python does not have a binary tree in the standard library. I also couldn't find any discussion on the topic either. It might be a nice contribution.
I suspect it's because you rarely need them in Python because the existing built-ins (list, tuple, dict, set) usually work well enough for a given job, or you're already using e.g. Pandas or something.
In Haskell we use sorted maps as persistent data structures. Persistent in this context means that the insert-method returns a new map and the old map is still around, if you need it.
It's basically copy-on-write. The old and new map share all but O(log n) data.
You could probably do something like that with an unsorted map, but it's a good fit for a sorted one.
Sure, but fix your comparison metric as "timestamp of insertion" (and forget for a moment that this isn't technically pure, you can fiddle to make it pure), and you still get all the CoW-niceness, but this is internal to the mapping type, my keys and values don't need to be comparable or orderable, only hashable. I'm given an ordering, the ordering is arbitrary, but I don't care, because CoW/persistent hash maps are mostly an implementation detail or optimization.
So on eway to think of a hash table/map is a set of tuples of <key, value>. Imagine we extend that to <insertion timestamp, key, value>. All of your conventional mapping methods (get/has/put) work on the key and value and ignore the timestamp. But anything that relies on iteration takes advantage of the timestamp[1], and iterates in timestamp order (or, in fact how this is normally done, is that you have a sparse array/map of <key, pointer> and a dense array of <timestamp, value>. Whenever you insert a new key/value, you always append value to the end of the dense array (so that array is dense), and the key is hashed as normal, but the value in the hash table is a pointer into the dense array where the timestamp/value pair is stored. So internally, get is
return (map[hash(key) % map_size])->value
or approximately that (I haven't written C in a while).
Then iteration over objects in the array is consistent: you just iterate over the dense array. Removal from the dense array is done by tombstoning, and possibly eventual compaction. This is essentially how python's current hash table implementation works.
IIRC, if you're table can assume only insertions, this actually becomes really, really nice as a persistent data structure, since you can replace the backing vector with a backing linked list, and you only really lose out on iteration speed. Then you further extend that by swapping the linked list to a tree, and you share the entire backing structure.
[1]: And you can use an increment-only mutation counter instead of a timestamp to make this pure.
Correct me, if I am wrong: take everything you described and remove the (logical) timestamp, and all you'd be losing out on would be the iteration in insertion order?
So how does the timestamp have any impact on persistence?
(We agree on basically everything you write.)
> [1]: And you can use an increment-only mutation counter instead of a timestamp to make this pure.
Agreed. That's what I'd call a logical timestamp or a logical clock.
> IIRC, if you're table can assume only insertions, this actually becomes really, really nice as a persistent data structure, since you can replace the backing vector with a backing linked list, and you only really lose out on iteration speed. Then you further extend that by swapping the linked list to a tree, and you share the entire backing structure.
That still doesn't tell you anything at all about how you make the hashtable itself persistent.
One simple way would be to just introduce an arbitrary order on your hashes, eg compare them as if they were ints, and then stick them in an ordered map container. Of course, that's just a round-about way to impose an arbitrary order on your keys. Works perfectly well, it's just not too interesting.
To come back to your original question:
> What's the value of a sorted map over an ordered map?
> I don't think I've ever cared to iterate over map values based on the alphanumeric ordering of their keys.
In the cases we talked about making the keys comparable is only a means to an end, and any arbitrary order will do. That's the most common use case. Iterating over the keys in insertion order is also often useful.
In practice, I did come across some use cases that make genuine use of the order of keys. Eg when I was interested in (automatically) keeping track of the highest or lowest key or quantiles.
You can use a sorted map as a priority queue easily. A min heap would give you O(1) access to the minimum item, but if you actually want to pop it, it's O(log n) anyway. A sorted map as tree gives you O(log n) for basically all operations. For most uses that's either good enough, or doesn't even make a difference at all compared to a heap. Using your sorted map as a priority queue gives you arbitrary priority updates in O(log n) for no extra implementation complexity.
Sorted maps also came in handy when I was implementing geometric algorithms where I wanted to sweep a scanline over points. Or for divide and conquer algorithms over space.
A sorted map is also useful as a basic datastructure to build step functions on top of. (https://en.wikipedia.org/wiki/Step_function) A simple application is solving the infamous skyline problem.
When you want to store your data structure, having similar keys close together can help with compression. Google uses sorted string tables (sstable) show that principle quite well.
But it does have a method to sort containers in its standard library, `sorted`, that C++ doesn't have. And it's trivial to use it to sort lists, sets, and dicts...
C++ also has the `sort()` function that allows you to sort any unsorted container. But that's not a replacement for a sorted container like `set` or `map` though. Because `set` or `map` allows you to insert elements at O(log n) runtime. If you have to sort every time you insert using the `sort()` or `sorted()` functions, the run time becomes O(n log n).
Everything of this looks like syntax sugar. I would expect more work on internals. Python has a lot of awkward edge cases in standard lib. In a lot of cases None is valid output for not-done states. Also I hate multithreading programming in Python 3, it has all drawbacks of C with additions of GIL.
Does the python community care about concurrency at all? I havent seen anything new in terms of concurrency in a while.
I might be wrong about this, but walrus operator seems like a gateway to writing obfuscated perlesque code.
Not sure about most excited, but I am looking forward to setting PYTHONCACHEPREFIX to a location that isn't in a volume mounted into my Docker container for development.
Subinterpreters are coming in 3.9, which should fill most of the same roles as multiprocessing but without the complexities and edge cases of multiple real OS processes.
The "walrus operator" will occasionally be useful, but I doubt I will find many effective uses for it. Same with the forced positional/keyword arguments and the "self-documenting" f-string expressions. Even when they have a use, it's usually just to save one line of code or a few extra characters.
The labeled breaks and continues proposed in PEP-3136 [0] also wouldn't be used very frequently, but they would at least eliminate multiple lines of code and reduce complexity.
PEP-3136 was rejected because "code so complicated to require this feature is very rare". I can understand a stance like that. Over complicating a language with rarely-used features can definitely create problems. I just don't see why the three "headline" features I mentioned are any different.
[0]: https://www.python.org/dev/peps/pep-3136/