This is a neat article, but it does have some errors.
One subtle point that the post gets wrong:
> So where does that come from? The answer is that Python stores everything inside dictionaries associated with each local scope. Which means that every piece of code has its own defined “local scope” which is accessed using locals() inside that code, that contains the values corresponding to each variable name.
The dictionary returned by `locals()` is not literally a function's local namespace, it's a copy of that namespace. The actual local namespace is an array that is part of the frame object; in this way, references to local variables may happen much more quickly than would be the case if it had to look each variable up in a dictionary every time.
One consequence of this is that you can't mutate the dict returned by `locals()` in order to change the value of a function-local variable.
Another, less-subtle error in the post is this:
> int is another widely-used, fundamental primitive data type. It’s also the lowest common denominator of 2 other data types: , float and complex. complex is a supertype of float, which, in turn, is a supertype of int.
> What this means is that all ints are valid as a float as well as a complex, but not the other way around. Similarly, all floats are also valid as a complex.
Oh, no no no. Python integers are arbitrary-precision integers. Floats are IEEE 754 double-precision binary floating-point values, and as such only support full integer precision up to 2^53. The int type can represent values beyond that range which the float type cannot.
And while it is true that the complex type is just two floats stuck together, I would very much not call it a supertype. It performs distinct operations.
> Accessing an attribute with obj.x calls the __getattr__ method underneath. Similarly setting a new attribute and deleting an attribute calls __setattr__ and __detattr__ respectively.
Attribute lookup in Python is way more complex than this. It's an enormous tar pit, too much so to detail in this comment, but __getattr__ is most often not involved, and the `object` type doesn't even have a __getattr__ method.
> Attribute lookup in Python is [...] an enormous tar pit
Spot on. Python is widely described as a simple language, but the complexity of attribute lookup is one thing that shows that's not true at all.
Many things in Python are easy, such as adding `@property` above a method definition to turn it into a getter. But `@property` is far from simple - the way it actually works is very complex (for example, properties have to be data descriptors, because non-data descriptors cannot override object attributes of the same name).
> Python is widely described as a simple language, but the complexity
> of attribute lookup is one thing that shows that's not true at all.
Python is a simple language to _learn_. My children learned the basics of Python before their seventh birthdays. But Python is not a simple language to _implement_.
It’s an easy language to learn the basics of, sure. But the complexity of things like attribute lookup doesn’t only affect language implementors.
The complexity is all exposed to Python programmers, which makes the language hard to master and, unlike truly simple languages, too large for anyone to understand completely.
There was a time (during Python 2 days) where one could master the language - completely. Today's 3.10 is indeed a very large language; fortunately, one doesn't have to exercise every new feature. And, it's pretty easy to find examples of all features that one might stumble across while reading other people's code.
The really cool thing about this, how descriptors have their __get__ called, is that methods are implemented this way. So when you access instance.method(), it’s a normal lookup for the attribute named “method”, which is (normally) itself a descriptor, so the __get__ magic is called and this binds the method to the instance at the moment it’s needed! Then you can just call it like a normal function. It’s incredibly elegant but extremely obscure. And vital to understand if you want to dive into monkey patching, which is an incredible skill to have!
Good article, thanks. It's worth pointing out that the summary (class hierarchy data descriptor > instance `__dict__` > class hierarchy other) only applies when looking up a normal attribute on a normal object:
* Special-method lookup (e.g. of `__add__` when you do `a + b`) works differently because it doesn't look at the instance `__dict__`, only the class hierarchy.
* Lookup on a class works differently because as well as looking at `__dict__`, it has to consider superclasses.
Much of the complexity relates to the different ways of handling the instance `__dict__`. By contrast, Ruby is able to have much simpler lookup rules because it never considers the instance, only ever the class hierarchy.
> Ruby is able to have much simpler lookup rules because it never considers the instance, only ever the class hierarchy.
“But!” I can hear people say, “I heard you could attach methods to instances in Ruby! How is this possible if Ruby never considers the instance in resolving a method call?”
Well, that’s tricky. The thing is, Ruby instances have no public members, instance variables are private (for direct access, but not in any strict way, because there are public methods called instance_variable_get and instance_variable_get on Object which…do exactly what the names say.)
But every Ruby instance conceptually has (though it doesn’t concretely have one unless you add something to it) a unique class for the instance that is the first thing in its class heirarchy. And you can add methods to that class (the instance “metaclass”, which is different than a Python metaclass) for the effect of attaching them uniquely to the instance itself.
It's a fun dream, but without the ability for a variable to carry along with it a notion of, say, how it implements slice notation, or the ability to avoid verbose names like foo_to_repr(foo) and bar_to_repr(bar) so that names don't collide... arguably we never would have seen the rise of scientific Python! Object oriented programming is an incredibly good abstraction for a lot of real-world scenarios.
Yeah, calling float and complex "supertypes" probably wasn't the best idea, but I couldn't think of a better explanation that wouldn't take too long to explain. I'll ponder about that one.
the getattr thing seems like a huge rabbit hole, I'm totally going to look into this. Thank you :)
Yeah, but we're not talking about pure mathematics. We're talking about floats, and I find that it's very important to be clear about the limitations. It's easy to get some nasty bugs if you start assuming that you can cram just any int into a float.
And I have no objections to the article's description of the bool type.
In addition to this, I highly recommend just reading the codebase. I haven't written C since college and it's remarkably readable.
I once tried to catalogue all the stdlib operations which release the GIL, meaning if you use only those (well, only those "heavy" bits, you can still use other small blocking glue bits), you can do "real" multithreading.
There's a really nice (although old now) walk through of the cpython code base on YouTube. I watched it on a long 24 hour flight between Canada and Sweden a couple of years back.
Me too. When I used to write Ruby I read a lot of CRuby source code. I achieved a much deeper understanding of the language that way. Even answered some really fun stackoverflow questions.
Now the first thing I do when I see a new language is read its source code.
Yep, when I was writing PHP I found myself digging into php-src on a regular basis to see what exactly was going on. PHP's documentation is good but the code is much more explicit.
Cute fact about __debug__: it is one of the only ways to get compile-time conditionals in Python. Performing a comparison with `if __debug__:` will output byte code for the ensuing statement if and only if the interpreter is in debug mode - notably, in `-O` mode, it will not even generate a load of __debug__ and a conditional jump, and acts as if the statement didn’t exist at all.
Most likely they just `assert` error conditions and let that blow up as error reporting, rather than write a full conditional and raise a custom exception.
I’ve been guilty of that in my own code when I just want to tell myself about an error.
With asserts enabled this is fine. With asserts disabled you might start sending nukes while the president is still outside or the doors to the bunker are still open.
"-o" is mostly "--remove-assertions" already, so that wouldn't help.
While today's use case for assert is unit testing, the actual killer feature of the keyword is that it's removed only from prod.
The idea is that you can write things like function contracts, that are expensive, but only exist in dev.
Now, if one of you dependencies use assert for something they expect to still be in prod, which is the problem we are talking about in the first place, "-o" or "--remove-assertions" will strip their assert too, breaking their code, and hence, yours since it depends on it.
Before this thread i had no idea assert was debug only in python.
Another solution would be to accept the current role of assert as quick error checking and add in debug_assert to indicate a conditional error check. The biggest issue with that approach is that a majority will suddenly ask "Wait, python has a debug mode?"
I still do not understand what the problem is and what users need to know. So assert throws and exception in debug mode and is 'commented out' in production, right?
So the problem is some vendors ship libs that have failing asserts so cannot run in debug mode?
Nicely written article! - Slightly off topic: I love seeing and reading Python code. Used to see many flaws in the language (things like `.append` changing the object, returning `None` instead of creating a copy and returning that), but after ten years of working with Python I really appreciate its versatility, it‘s ubiquitous availability, the large number of libraries and the community. There‘s nothing I can‘t solve with it. It‘s the swiss army knife in my pocket, available whenever I need to solve something by coding.
> (things like `.append` changing the object, returning `None` instead of creating a copy and returning that)
This would be horrendously inefficient without immutable data structures like Clojure's. Very few languages have that, so it's a strange assumption to make, especially for a language as old as Python.
It should also be worth nothing that Clojure sacrificed quite a bit to make this as efficient as possible.
“persistent vectors” are certainly an interesting data structure that strike a compromise between fast indexing and being able to relatively quickly create a copy where only one element changes, but it's a compromise and indexing is made slower to allow for the latter. — They also take up more memory on their own but are allowed to share memory with their copies.
I will say that my ideal language contains them in the standard library alongside standard vectors that index in constant time.
Further, it should be noted that much of the performance talk is on the assumption that accessing from memory is truly random access; — with the existence of c.p.u. caches that assumption is not entirely accurate and accessing from contiguous rather than scattered memory in practice is considerably cheaper so one also pays the price for their being scattered more in memory.
Random access into a clojure vector is going to need more memory lookups than conventional sequential buffer array (I don't recall the constants used in the implementation, I think it's either 4 or 8 lookups).
But when you're indexing into the vector sequentially, the memory layout plays rather well with memory caching behavior, and most lookups are going to be in L1 cache, just like they would be in a conventional array.
So lookups are a bit more expensive, but not as much more expensive as one might imagine.
The actual data of a Pvector is not in contiguous memory but scattered however the JVM wills it, and on top of that in order to find which address to retrieve it, an algorithm that runs in logarithmic time with respect to the length of the vector must be used opposed to a constant time one.
How can most lookups end up in L1 cache if an element that is 32 indices removed is statistically likely to be arbitrarily far removed in memory?
Of course, all of that is not that material to begin with given that most elements will be pointers to begin with so the actual objects wither they point will already be arbitrarily scattered and it simply adds one more pointer indirection, but for unboxed types such as integers it does play a factor.
So I don't recall what size of chunks Clojure's implementation uses, but I'll assume it uses 64-bit indices with 16-word chunks, because I want to use numbers.
Assuming no part of my-vector is cached at first, the first iteration needs to make a full 16 round trips to main memory — quite bad. But on the next iteration, all of that is cached, and we don't hit main memory at all, and the same until i=16, which requires one round trip. Then when i=16², we need to hit main memory twice, etc.
No doubt this is quite a bit worse than having everything nicely laid out sequentially in memory, but it's not as bad as you're describing.
Of course, all of that is not that material to begin with given that most elements will be pointers to begin with
I guess this is sort of true. If you're doing random lookups, then using a persistent vector instead of an array list mean 17 trips to main memory instead of just 1, so it's not totally inconsequential.
But I think (hope?) that modern JVMs can optimize collections of small immutable objects so that they're not represented as pointers to the heap. Surely ArrayList<Integer> x gets represented as int x[], and not int *x[], at least with the most optimizing JIT level.
I think you are talking about arbitrary stride sequential access, and the comment you were responding to meant stride-1 access. But conventional arrays also fare poorly with arbitrary stride access, though probably they have an advantage with small strides close to the Clojure persistent vector chunk size.
Things like `list.append` modifying in-place might feel like a flaw to some, but I think Python is really consistent when it comes to its behaviour. If you ask a person who comes from an object-oriented world, they'll say it only makes sense for a method on an object to modify that object's data directly.
There's always ways to do things the other way, for example you can use x = [*x, item] to append and create a new copy, while being quite a bit more explicit that a new list is being created.
One related area where Python is not consistent is operators like +=.
In pretty much all other languages that have them, the expected behavior of A+=B is exactly the same as A=A+B, except that A is only evaluated once. Now lets look at lists in Python:
xs = [1, 2]
ys = xs
ys = ys + [3]
print(xs, ys)
This prints [1, 2] [1, 2, 3], because the third line created a new list, and made ys reference that. On the other hand, this:
xs = [1, 2]
ys = xs
ys += [3]
print(xs, ys)
prints [1, 2, 3] [1, 2, 3], because += changes the list itself, and both xs and ys refer to that same list.
(Note that this is not the same as C++, because in the latter, the variables store values directly, while in Python, all variables are references to values.)
The worst part of it is that Python isn't even self-consistent here. If you only define __add__ in your custom class, you can use both + and += with its instances, with the latter behaving normally. But if you define __iadd__, as list does, then you can do whatever you want - and the idiomatic behavior is to modify the instance!
For comparison, C# lets you overload + but not +=, and automatically synthesizes the latter from the former to enforce the correct behavior.
You'll get an exception saying that local variable xs was used before it was assigned - precisely because += created a new local binding for xs inside foo.
But that's good. Because a string just needs to know aboit interable to perform that operation whereas every iterable would need to implement it's own join if you had it the other way around.
It may be necessary in python, but in general, a language could allow you to define a join method on the iterable superclasss/trait/interface that iterates over the elements, converting each to a string, and inserting the separator between each of them.
Yet Ruby and JS manage to do it somehow. To me it seems natural that join should be a method on the iterable, and I always have to pause to remember Python is different.
I don't think it should be a method at all. It's just a function: join(iterable, separator). It can also be implemented with reduce naturally: `reduce(lambda x, y: x + separator + y, iterable)`.
Oh yeah, it's horrendous, my point was just that it's functionally equivalent and makes more sense as a function than a method on either object. You can actually call it like this if you want, though: `str.join(separator, iterable)`.
The way it's managed in JS, digging the function out of the prototype to apply it, can be done in Python as well. But unlike JS you won't normally have to, thanks to the method not being defined only on one specific type of iterable.
Ruby does it by having a mixin (Enumerable) that anything meeting a basic contract (roughly equivalent to the Python iterable protocol) can include to get an enormous block of functionality; Python doesn’t have (or at least idiomatically use as freely; ISTR that there is a way to do it) mixins like Ruby does.
' '.join() makes more sense to me, and it's more universal too if done right and you accept anything resembling a "sequence" (which python does) and individual objects of the sequence have a sensible str(). And as language maintainer, you only have to maintain one such implementation, not one per collection type.
Javascript, on the other hand, kinda does it worst, at least of the languages I regularly use... .join() is a instance method on Arrays and TypesArrays. But they forgot to add any kind of join for Sets, for example.
(["a", "b", "c"]).join("")
"abc" # alright
(new Set(["a", "b", "c"])).join("")
Uncaught TypeError: (intermediate value).join is not a function
([...new Set(["a", "b", "c"])]).join("")
"abc" # grmpf, have to materialize it into an array first.
That illustrates the drawback: if you make it a method on the concrete sequence types you got, you better not forget some and make sure the different APIs are consistent, too. If Javascript had a String.join(sep, <anything implementing the iterator protocol>) this wouldn't have been an issue.
python isn't alone either, by the way. C# has the static string.Join(...) that accepts "enumerables" (IEnumerable<T>), but no array.Join() or list.Join() or dictionary.Join(). Combined with Linq, especially .Select, that becomes quite handy. It has been plenty of times I did print-debugging by adding a one liner along the lines of
In the cases you give, the original list is not being mutated; a new object (a string, not a list) is being created. So it does make sense not to have it be a method call on the list.
Huh I never even thought we would need to create copy of an object when adding new item to it (like a new item to list for example). Is there any drawback on doing that in standard pythonic way? I actually learned to program using Python and it was my first language. Since then I only used JS. In both I like using functions a lot and rarely dabble in OOP since it is more conveniet to me.
You often lose performance in traditional imperative languages when aiming for persistence.
When you have immutability guarantees (like in many functional programming languages like ML or Haskell) you can avoid making copies by sharing the parts of the data structure that don't change.
If this kind of thing interests you, you should check out Chris Okasaki's book "Purely Functional Data Structures".
whether mutating data is better than creating a new copy for everything is a really long debate about immutability and functional programming, with good points on either sides, but that's really beyond the point here.
In my opinion, you should use whichever method makes your code easy to read and understand for your usecase.
> things like `.append` changing the object, returning `None` instead of creating a copy and returning that
The obvious question is why it can't return a reference to the list instead of returning None. I feel like if I've been using the language on an almost daily basis for ten years now and I still get burned by that all the time, then it's just a poorly designed feature.
The advantage of mutating operations always returning None is that you can easily tell whether a mutation is happening by looking at the code. If you see y = f(x) that means x is unchanged, whereas if you see just f(x) on a line that means something stateful is happening.
Agreed. JavaScript's Array.sort is an example of this. Most of JavaScript's other array methods return a new array and people get used to chaining them, but sort mutates the array and also returns a reference to it. You can actually get pretty far before being bitten by this so long as you're sorting already-copied arrays. But then one day you hit a bizarre bug caused by behavior that's been sneaking past your radar the whole time.
I'll just point out that this is originally from Scheme (I think... Maybe Scheme got it from a previous Lisp) but borrowed by Ruby. Neither Scheme nor Ruby do a perfect job with sticking to the naming convention, at least if we include popular libraries, but it is very handy and intuitive.
Python has a naming convention as well: `sorted(arr)` returns a sorted copy and `arr.sort()` works in-place (and returns None). However, I've always thought it's a bit odd that one is a function and the other is a method.
The ! convention is ok but I don't think it's optimal because, in the presence of higher-order functions and related concepts, it's often not clear if a function should be marked as !.
For example if I have a map function that applies a function f to a sequence, should I call it map! because I might pass in a function f that mutates the input? If so then it seems like any function that takes a function as input, or any function that might call a method on an object, should get marked with ! just in case. But if I don't mark it that way then the ! marking is not as informative: I might end up with a line consisting only of non-! functions which still mutates the input.
Note that “!” in Ruby doesn’t conventionally mean “modifies the receiver”, it means “does something that is similar but less likely to be safe than a method with the same name without ‘!’”.
A very common case of this is mutating versions of non-mutating methods, but (1) mutating methods (in stdlib or other idiomatic code bases) that have no non-mutating equivalent are not named with a “!”, and (2) methods are sometimes named with “!” because they do dangerous things compared to a base method that are not mutating the receiver.
map! would mean a function that performs a map in-place on the array by replacing the values. So it would depend on if the callback was encouraged to mutate the array or discouraged from doing so.
Array#map! actually exists in Ruby, too. It mutates the array item by item. Enumerable#each doesn't have a bang, because it doesn't change the enumerable itself, even though it can mutate the objects contained in the enumerable. This is overall consistent -- there's a distinction between mutating the receiving object and mutating objects contained in or referred to by the receiving object.
random.shuffle() has bitten me that way a few times too:
array = random.shuffle(array)
because I expected it to return a copy or reference, instead making my array None.
It would also enable chaining operations:
array = array.append(A).append(B).sort()
In-place vs immutable copy is a language design choice with tradeoffs on both sides, but there's no reason that I can see to not return a reference to the list.
Perhaps recognizing this is really the job of an external linter. Sometimes I wonder if the future of enforcing canonical formatting on save like "gofmt" or "black" will extend to auto-correcting certain goofy errors on each save.
mypy would yell at you about this, but afaik type-checked python still isn't the norm.
In Python, a function that makes and returns a copy would be idiomatically named shuffled(). Consider sorting:
xs.sort() # in-place
ys = sorted(xs) # copy
As for functions returning the object - I think it's a hack around the absence of direct support for such repetition in the language itself. E.g. in Object Pascal, you'd write:
with array do
begin
append(A);
append(B);
sort;
end;
> Used to see many flaws in the language (things like `.append` changing the object, returning `None` instead of creating a copy and returning that)
I think it's pretty off-base to call this a "flaw". Immutable structures have their place and can be very helpful where appropriate, but making its core primitives work this way is far outside the scope or the philosophy of Python. If you want otherwise, you're really wanting an entirely different language. And there's nothing wrong with that! But I think it would be a "flaw" for Python to make these operations immutable, even though I love immutability personally.
And also like a Swiss army knife, it's not particularly great at anything, can be awkward even when functional, and there's always a better tool for any specific job.
> List comprehensions are basically a more Pythonic, more readable way to write these exact same things
More pythonic maybe, but you can't have more than a single expression in a list comprehension without it becoming completely unintelligible. I also often miss other standard list features. Reduce, flatmap, indexed versions, utils like first of predicate, split, filternonnull etc
Anything remotely interesting like that is dumped in itertools.
Python's creator, Guido van Rossum, doesn't like functional/functional-ish programming a lot. That's well-known.
Guido: "I value readability and usefulness for real code. There are some places where map() and filter() make sense, and for other places Python has list comprehensions. I ended up hating reduce() because it was almost exclusively used (a) to implement sum(), or (b) to write unreadable code. So we added built-in sum() at the same time we demoted reduce() from a built-in to something in functools (which is a dumping ground for stuff I don't really care about :-)."
> "I value readability and usefulness for real code"
Amen.
There are plenty of languages where your code ends up looking like an entry in an obfuscation competition without even trying. If you're using Python, and working for me, I expect the code to be readable by anyone.
And, no, I don't give a toss whether the code is three times the length it might have been if it was dangerously, and expensively, obscure.
Quite often, chains of map/filter/reduce/whatever are more readable because you can see the flow of data, like you were looking at a factory production line. List comprehensions and traditional prefix functions (e.g. map(iterable, function)) completely break the visual chain that makes basic functional code so readable.
Like, which of these make more sense?
strList.filter(isNumeric).map(parseInt).filter(x => x != 0)
[ x for x in [ parseInt(s) for s in strList if isNumeric(s) ] if x != 0]
filter(map(filter(strList, isNumeric), parseInt), lambda x: x != 0)
And it's not like Python doesn't have the language features to implement the first pattern. Map,reduce,filter,etc. could simply be added to the iterable base class and be automatically usable for all lists, generators and more.
if you come from a imperative background and everyone is used to working in imperative languages... I guess thats anyone.
Someone coming from, say ruby, or javascript would find list comprehension jarring. You can't compose them and you basically have to rewrite them when its time to extend them.
Yes, lots of them are available. But I also would like to be able to call .sum() on my iterable at the end of a chain, instead of having to mentally unwrap sum(map(filter(filter(map(...))))
But you only ever need to unwrap one sum(). And with sequence comprehensions, what you get instead is sum(... for ... if ... for ... if ...) - I don't think that's improved by rewriting it as (...).sum().
FWIW Python was fourteen years old when GVR joined the Borg. That doesn't address how popular it was but I think it's reasonable to say it was well established.
Guido van Rossum doesn't appreciate FP and he really genuinely doesn't understand it. That's not to say he's dumb or not a nice guy or anything. It's just not his area. And this is reflected in the language.
His attitude of you have to be "really smart" to understand FP is a mistake.
Also definitely learn about using sum on non-numbers, and the key argument to min and max. They can be incredibly handy, but I hardly see them used. Have a contrived example:
The list comparison here is also true when the first list is a prefix of the other:
class list:
def __eq__(self, other):
return all(x == y for x, y in zip(self, other))
# Can also be written as:
return all(self[i] == other[i] for i in range(len(self)))
run that with `[1,2,3]` and `[1,2,3,4]` and it'll be true because it only checks up to the 3.
It's probably simplest to compare `len(self) == len(other)` before.
Similarly, the set comparison will also be true if the first is a subset of the other.
No idea what their actual reasoning is, but here's how I think about it:
This is better for novices, because otherwise you create a whole bunch of land mines for people who are desperately trying to get something done. If they aren't aware of the built-in then they aren't trying to use it. Insisting that they become aware of something they don't want right then will be frustrating.
It's also better for experts, in that they're generally aware they're overriding a built-in and are doing it on purpose, and if not they'll have an IDE or linter reminding them.
To me, I see tooling as a spectrum from supportive to controlling. Python is very much on the supportive end. It feels controlling when I get interrupted because some programmer who has never met me programmed a tool to insist I do things their way. That would very much include insisting I respect a bunch of names they decided long ago to put in the global namespace.
Python itself doesn’t disallow this because there are quite a lot of builtins with useful names - for example, `file`, `id`, and `hash` to name a few. Disallowing setting these would be tantamount to adding a bunch of new keywords to the language, which they’ve been quite loathe to do in general.
A good linter will catch these, so in production environments you usually don’t run into issues. I agree that it can be a beginner trap though!
I think linters are really effective to figure out such issues in professional code.
For students for example, I'll have to agree. Maybe having a flag or environment variable that teachers can set up for it would be a nice idea. You should start a thread on the python-ideas mailing list about this, and it might get somewhere :)
Are there any good books that deal with writing pythonic code? As well as being focused on more intermediate or advanced features like this? If the book is project focused that's a bonus. Performance trade-offs another bonus.
I can personally recommend Fluent Python (its 2nd edition is about to come out in a couple months) for learning these intermediate/advanced concepts, and Python Cookbook for code examples using many of these features.
I don't know any books for projects per-se, maybe HN will know!
To me it looks like a lot of this knowledge is spread out over many different excellent technical blogs like yours. While the content is good, it's hard to get something that resembles a more complete picture compared to just another piece of a big puzzle.
Fluent Python is definitely a good book, I knew a whole bunch of the stuff in this article because of it. I only got to Chapter 9, but it legitimately made my Python much, much better.
The solutions presented typically include both a "basic" approach, a "as pythonic as possible" approach, and a brief discussion of the trade-offs between elegance and readability, etc.
I would recommend "Robust Python" by Patrick Viafore. It teaches you a lot about type annotations (among other thing) and gave me personally a whole new way of looking at the code that I write.
Author of Robust Python here: I definitely recommend Fluent Python as well once the 2nd edition is available. I wrote Robust Python to focus very much on how to write Python in a long-lived codebase and how to do trade-offs for readability/maintainability/testability/etc. It also covers a lot of things outside of standard built-ins (such as acceptance testing, mutation testing, pydantic, type checkers, etc.). I find Fluent Python to be more focused on more of the built-ins, and I think it might cover some of the performance trade-offs you might be looking for.
Long story short : I think both have a lot of value (but beware I'm quite biased on Robust Python)
In a way, learning Python is harder for people experienced with another language because so much of the content you find is for first-time programmers.
That said, I think I had good luck with Writing Idiomatic Python.
When I learned Kotlin, I just read through the docs, and then knew of basically all the different concepts in the language.
For Python, the docs were comparably very bad. For instance, Decorators aren't mentioned even once in the "The Python Tutorial". In "The Python Language Reference" (if one even bother to read such a dry document) it's barely mentioned in passing. How should a new user know it's a concept and how to apply it? And the language reference links only to a glossary item, and none of them specify how parameters in a decorator is supposed to work.
Pretty frustrating experience, put me a bit off the language from the get-go.
If you read the docs then you basically would not know about them. My experience mirrors the people upthread. I also found that the Python community is very hostile to the idea that the docs might be insufficient, in that people suggesting this in threads were frequently belittled or it was often suggested that the questant should just learn it like they did, or that it wasn't really all that hard.
Thank you for writing the post. Newbie question about “nonlocal” from your example:
def outer_function():
x = 11
def inner_function():
nonlocal x
x = 22
print('Inner x:', x)
inner_funcion()
print('Outer x:', x)
I get how the example works, but don’t see the point of the declaration? If I just left out the “nonlocal x” line, wouldn’t the example still work the same?
Python assumes that all assignments assign to the current scope. So by default when you reach "x = 22", it would create a new variable called "x" in the inner_function() scope which overrides the variable "x" in the outer_function() scope. So when you print "Inner x" you would only be printing the inner_function() version of x, not the outer_function() version, which would remain at 11.
Thanks, got a better understanding of the Python "philosophy" because of this article, easy to follow even if you haven’t written a single line of Python like me.
I am not even half way through it and I now understand how python actually works under hood. This is great for understanding how a lot of interpreted languages work
> Python has exactly 6 primitive data types (well, actually just 5, but we’ll get to that). 4 of these are numerical in nature, and the other 2 are text-based. Let’s talk about the text-based first, because that’s going to be much simpler.
What is your definition of a primitive data types? All of these have object as a superclass, so I wouldn't call them primitive data types in python.
Maybe there is just 1 primitive type: type? Or none at all?
I'm not sure about that - it's still `<a href="mypy-guide">type annotations</a>` which jumps to https://sadh.life/post/builtins/mypy-guide and then jumps to your homepage.
Your article is really easy to get through and digest. I'd like to keep it as a reference going forward. Please consider adding a floating TOC to the page.
It's actually customizable via the PYTHONBREAKPOINT environment variable and sys.breakpointhook(). The default does pdb.set_trace(), which gives you a built-in debugger prompt (not a REPL) at that location. But it can be set to execute arbitrary code, and most Python IDEs make it behave like "normal" breakpoints.
One subtle point that the post gets wrong:
> So where does that come from? The answer is that Python stores everything inside dictionaries associated with each local scope. Which means that every piece of code has its own defined “local scope” which is accessed using locals() inside that code, that contains the values corresponding to each variable name.
The dictionary returned by `locals()` is not literally a function's local namespace, it's a copy of that namespace. The actual local namespace is an array that is part of the frame object; in this way, references to local variables may happen much more quickly than would be the case if it had to look each variable up in a dictionary every time.
One consequence of this is that you can't mutate the dict returned by `locals()` in order to change the value of a function-local variable.
Another, less-subtle error in the post is this:
> int is another widely-used, fundamental primitive data type. It’s also the lowest common denominator of 2 other data types: , float and complex. complex is a supertype of float, which, in turn, is a supertype of int.
> What this means is that all ints are valid as a float as well as a complex, but not the other way around. Similarly, all floats are also valid as a complex.
Oh, no no no. Python integers are arbitrary-precision integers. Floats are IEEE 754 double-precision binary floating-point values, and as such only support full integer precision up to 2^53. The int type can represent values beyond that range which the float type cannot.
And while it is true that the complex type is just two floats stuck together, I would very much not call it a supertype. It performs distinct operations.
> Accessing an attribute with obj.x calls the __getattr__ method underneath. Similarly setting a new attribute and deleting an attribute calls __setattr__ and __detattr__ respectively.
Attribute lookup in Python is way more complex than this. It's an enormous tar pit, too much so to detail in this comment, but __getattr__ is most often not involved, and the `object` type doesn't even have a __getattr__ method.