Hacker News new | past | comments | ask | show | jobs | submit login
Python built-ins worth learning (treyhunner.com)
288 points by th 29 days ago | hide | past | web | favorite | 68 comments

Sets are my #1 favourite Python built-in. I used to have these tedious imperative functions for doing change detection on collections. Then I learned sets and it turned into obvious code like:

added = b - a

removed = a - b

repeated = a & b


Of course this is set theory and not Python specific. But still. Learn sets!

Pycon 2019 about how to use Sets more in your every day code: https://www.youtube.com/watch?v=tGAngdU_8D8

frozenset is a really nice one for enforcing immutabolity :)

Frozensets can also be keys in a dictionary.

A recent(-ish) addition to Python 3 I also think is worth mentioning is the statistics module: https://docs.python.org/3/library/statistics.html

Every codebase eventually needs an averaging function, now you don't have to re-implement it yourself every time. Plus it's a nice toolbox of a few different utilities that you would otherwise need to install numpy/scipy/etc. to get.

Thanks, hadn't seen this before!

I wonder why they didn't have just one median function and control its behaviour with keyword arguments instead of creating 4 different median functions.

Didn't know about this, thanks!

I disagree with only one thing in this article. Frozenset is super useful. You should prefer frozenset over set by default imo. Immutable objects are safer to work with, often easier to work with (frozenset is hashable, you can key a dict with it), and sets come up all the time in a lot of work. You can get by with a sorted list, but sets are nicer and faster.

Great point. I do agree that those who use sets heavily should probably know about frozenset. :)

I've never very rarely seen these used in production code though. I'm also not as strongly biased toward them because set objects in my own code are often pretty short-lived so their mutability rarely matters.

I found myself using them more and more often as I started to adopt type annotations.

As you do that, you start to realize that you don't need a MutableMapping. Just a Mapping will do, much less a Dict. Same with Set.

Troy, if you are reading this: please lower your output a bit. I like your stuff a lot but I had to start filtering out your emails because they were relentless.

> I disagree with only one thing in this article. Frozenset is super useful.

Many of the "don't" callables are useful and I'd say I've used about half (`format` is especially nice as a more flexible version of `str`).

But Python's tendency to mutability and the `set` interface makes frozenset verbose to use & have higher overhead than sets[0]. Usually code is terser and just as clear using sets without mutating them. Which is a shame as of course you lose the safety of enforcing immutability (at runtime but still).

Then again, I guess if you're using a type checker it matters less: it shouldn't let you mutate a `set` if you've typed it as `Set` (rather than `MutableSet`).

[0] IIRC there's only one operation which is specifically optimised for frozensets: copy

I completely agree. Working on complex enterprise Python applications, a pretty significant suite of bugs comes from long-lived, mutable objects. We’ve generally had good luck with preferring immutability wherever possible, both using builtins (tuples rather than lists when possible, frozenset) and custom classes. I just wish frozendict was part of the stdlib!

Ha! That was the thing I was most excited to learn about. I’ve never come across it in any codebases I’ve worked on, but will definitely begin training myself to use it by default over a normal set.

Relatedly, I highly recommend Dave Beazley's "Built-In Superheroes" talk: https://www.youtube.com/watch?v=j6VSAsKAj98

I rewatch it every 6 months or so and usually learn a new trick or two.

Any idea what tool he's using for the "slides"?

Just watched his PyCon 2019 where he derives lambda calculus from scratch in Python, and he had similar shell magics going on there too. Great tutorial if you've never seen lambda calculus before, definitely brain twister.


believe it or not, that's iTerm2 and the ipython REPL.

To borrow a phrase from Avatar: The Last Airbender, dabeaz is, uh, a mad genius.

I was really asking about the keywords he typed that showed the "slides" in the terminal. For instance, when he typed in "builtins" at the beginning, it cleared his screen and then showed a quick ASCII art image. I suppose he could have just built these functions, but if there is a tool that he might be using, that would be pretty handy.

no no, that's what i mean: he hacked the REPL up to hell and gone, and is actually using it to run his presentation. It's nuts. See ~04:20 where he briefly alludes to that in the talk.

I was at pycon when beaz programmed a set of raw socket server/client scripts running fibonnaci sequences LIVE from scratch in front of 1000 people using nothing but a text editor. The man is truly a mad genius.


I always tend to recommend intermediate/advanced programmers to take a look at `functools` and `collections` packages - they are filled with amazing things that you haven't even though you need!

My favourites are `functools.partial`, `collections.Counter` and `collections.defaultdict` that are used very often by people who are aware of their existance.

I use defaultdict so damn often, it honestly should even be a builtin in my opinion. You also forgot `itertools`, which imo is even more useful than `functools`. I use `chain` and `groupby` quite often. There's also `collections.deque` which is a quick linked list implementation if you need either a stack or a queue.

Forgive me doesn’t deque mean double ended queue? Also why can’t a simple array behave like a list?

Yes it does mean double ended queue. I'll assume you mean a python list when you say simple array. One drawback of lists is that you cannot add a new element to the start of a list in constant time (with a python list it requires O(n) time), whereas you can with a deque.

I had a script a while back that created really long lists at runtime by continually appending data as it came over the wire. The lists would quickly get so long that I needed to remove items from the beginning to conserve memory though because reasons I could only remove items one at a time. Long story short, converting those lists to collections.deque and making use of popleft() rather than "del l[0]" improved performance considerably.

Normal lists have a backing array. In theory, every time you resize it, it needs to allocate new memory and copy everything over, which is very slow. Obviously python does a lot of optimization behind the scene by over-allocating memory, which gives it better amortized speed.

Linked lists are slow at accessing data in the middle, but you can very quickly add and remove stuff from the ends, hence double-ended queue. A stack is basically a deque where you only use the tip, so deque is very versatile like that.

I have been using defaultdict and Counter a lot lately, not much partial though. Also sets are another nice thing about python.

Also `itertools.product` is very useful to avoid nested for-loops.

If you are reading this, the orange bar at the bottom 5% of my screen is hugely distracting. Please allow us to close it.

Thanks for noting this. I rarely look at my site on mobile and hadn't noticed how big that widget was. I just spent a few minutes making it less visually obtrusive. I may figure out more ways to improve it later.

Thank you!

I think it's because it overlaps a little of the content. I guess it makes some of us want to tidy it away.

http://archive.is/lwklL has it closed.

I notice that "globals" is on the "you won't use this much" pile. I am wondering, when using Python interactively from Emacs, I often do this kind of thing:

    if 'myvar' not in globals():
        myvar = ...
That way I can just execute the whole buffer instead of having to carefully be picky-choosy about what I send to the interpreter. It's especially helpful if the line involves loading a large file, calculating something that takes a while, or downloading something. Is there a better way to handle this in an interactive context? Like an automatic globals cache of some kind?

(Note, it's helpful to be able to dirty the variable when necessary by simply,

    del myvar


Amazing post Trey, as usual. This is a very comprehensive list. I can't think of any other important function missing. My students will love this :)

Thanks Santiago! :)

This article covers a lot of good material and I like the breakdown of categories from essential > maybe you'll look this up some day.

My one minor suggestion would be to cover getattr and related methods sooner. I stumbled across that one pretty early on as I was learning programming and it was a serious "Ah ha!" moment. Dynamic lookups with a sensible default value are useful in countless contexts.

> Many Python users rarely create classes. Creating classes isn’t an essential part of Python, though many types of programming require it.

Huh. Okay.

It totally depends on the kind of work you're doing. My first few years with Python I never defined a class. I (and imagine most people) were working on code get from a to b. There's a lot of that kind of work to be had. Only later when I started writing modules and libraries did I need to.

I also spent years writing Python scripts without a single class. I didn't really understand the point of objects at all. I used them because they were part of library interfaces, but I never created them. The scripts didn't have to manage much state, so writing everything in a pure-functional style worked fine. Nobody could really explain to me what objects were all about - all the arguments seemed to apply equally to functional programming ("re-use", "encapsulation" etc).

Then I had to write a piece of software that heavily interacted with various bits of physical hardware, each with masses of state (serial communication and so forth). Suddenly the need for isolating state became terribly obvious. After seeing how some libraries did it, I wrote a class for every bit of hardware that abstracted the state away and provided a functional interface. It all worked terribly well, and it was a real lightbulb moment for me.

You could have still kept the massive state in a dict or a list or a tuple and passed that dict around from one function to another, could you not? Why did it become necessary to implement classes?

Because I don't want to "pass around" the state - I want to hide it. Yes, of course it's possible to mingle the state in all the rest of the program, just like it's possible to scatter gotos everywhere instead of using structured control flow. But what I really want is to call EnablePowerSupply(), rapidly followed by SetVoltage(30), and have all the messy business of statefully talking over a serial port (and not having commands stomp on each other) neatly abstracted away. EnablePowerSupply and SetVoltage need to share state to do that. That could indeed be done by passing an extra parameter - EnablePowerSupply(blobOfState) and SetVoltage(blobOfState, 30) - but that's basically exactly what objects are syntactic sugar for in Python. Only blobOfState is more usually called "self".

Oh, and of course there's not one, but half a dozen power supplies. You could pass around the blobs of state seperately of course, but now you have to manage their scope independently from the functions that operate on them - a useless decoupling that adds overhead. What I ended up with was something like:

[psu.enable() for psu in psus]

Which you can always do, whenever psus is within scope. Hard to get terser and more idiomatic than that.

I know it's a contrived example, but it's worth pointing out that using comprehensions to cause side effects is considered by some to be at least unPythonic and at worst an abuse of the construct.

I know this because I wanted to do the same thing and have looked all over for a justification for it. In this case most people seem to agree that the bog standard for loop is the way to go:

    for psu in psus:
Some people (my boss) will even put it on a single line, so you don't lose much terseness.

I think the rule of thumb is don't use a list comprehension unless you're going to use the list afterwards, else you're wasting an allocation.

Thanks for the heads up. I'm not totally convinced about "Pythonic" as a figure of merit for anything (often it seems to merely mean "clunky"), but it's a good point about the wasted allocation [1]. And the one-liner for-loop is almost identical to the comprehension anyway.

I think the reason I instinctively write it this way is because in my mind, I basically think of list comprehensions as sugar for map + lambda. I'm "really" trying to write map(enable, psus). But of course, it's a method, so you need the instance[2] - map(lambda psu: psu.enable(), psus). The reason I prefer a map over a for loop is because it's a habit borne of the principle of least power - map provides a guarantee than no "funny business" (data dependency) is going on between the elements of the list you're iterating over. I scrupulously avoid for loops on principle, unless I need that kind of funny business. Of course in this case the for loop is so short as to make no difference, but like I say - it's a habit. In my code, "for" means "funny business here".

[1] not that it matters in this case - you're not toggling power supplies in a tight loop.

[2] Technically, in Python, map(psus[0].enable, psus) would work if psus was not empty. Or you could spawn a new instance: map(PSU().enable, psus). But ugh, talk about defeating the purpose.

That is one way of writing programs. Python is very OO friendly. If any of the main features of OO make your code more maintainable; inheritance, polymorphism, encapsulation, overloading, then use those. Passing around a dict is an object, but it's often nice to keep methods for interacting with that object along with the data (for reasons mentioned above).

I agree.

To be sure classes have a place and can be really useful, but so many people just create classes when they add no benefits.

Clear code always wins imho

I spent many years with Visual Basic before learning Python. Some of the earlier versions of VB were object based, meaning that you could use pre-written objects (standard and third party), but you had to buy a special kit to create your own objects.

This was quite useful for those of us who were bewildered by OOP.

When I learned Python, I was comfortable enough with using objects, that I had no problem creating classes.

So I think it's fair to say that creating classes is something that beginning Python programmers can put off.

Since I introduced Python to my workplace, I get to watch some fairly neophyte programmers develop their skills. These are typically engineers doing scientific programming, not commercial software developers. There's a point where I get to say: "You could put that stuff in a class." And later on, "You're creating too many classes." ;-)

It's really true. You can do a lot with data objects (dicts, lists) and stateless functions.

Rich Hickey has some nice talks about why separating the data from the methods is really beneficial.

There’s a great Python talk out there titled “Stop writing classes”. It has several great demonstrations of Python code getting shorter, simpler and faster by converting code to just use the built-in containers.

I did like that talk very much - and here was my response:

> The Python Datamodel: When and how to write objects - https://www.youtube.com/watch?v=iGfggZqXmB0

Which part are you reacting to? I can't speak for "many" users, but all three parts you quoted apply to my usage of Python.

Data scientists for example.

Huh? Every data scientist I have worked with who writes production code uses classes.

while python is object oriented in the sense that everything is an object it doesn't insist of doing everything via message passing. It's quite happy with basic data types and simple functions (plus decorators and context managers)

OK what? Both statements are objectively true.

what was fascinating to me was how I fitted into the MUST SHOULD MAY MAYBE-NOT DONT matrix (in pseudo-normative form)

I used all but one of the MUSTS.

I used a few of the SHOULD

I used a similar few of the MAY

I only used one of the MAYBE-NOT (pow, and I am doing base arithmetic but not crypto)

I don't think I use any of the DONT

I'm interested in moving up on the SHOULD and MAY

I am honestly a bit surprised divmod and complex are so high, I have only used them a couple of times myself - but I think they're almost always mentioned in "Intro to Python" books so most people should at least be familiar with them right?

I’ve been writing Python for a little over a year mostly as glue code and scripting around AWS. I always had a sneaking suspicion that there was more I needed to learn and that I wasn’t doing things the “Pythonic” way.

This article confirmed my suspicions. Great post.

For a few moments I thought of all these years during which I could have just typed "breakpoint" instead of "import pdb; pdb.set_trace()".

Luckily it's been added only recently (3.7) :)

Wow I had the same reaction.

And that's despite the fact that I read an article a few weeks ago about this new feature! If you haven't read it, I highly recommend it, because there's more to the breakpoint feature than mentioned here (basically, you can set an env variable to choose which callable is activated on breakpoint, meaning you can turn on and off debugging on production systems, or even set prod systems to open a web port for debugging): https://hackernoon.com/python-3-7s-new-builtin-breakpoint-a-...

I miss the "map" function. Sadly it was removed in the transition from Python2 to Python3. In Python3, "map" is a class.

i cannot remember the last time i used str() instead of a string format.

Wish Python would get 'study' from Perl.

It sorta does, now that study() is a no-op in Perl: https://perldoc.perl.org/functions/study.html

What is the use case?

Stop using python, let it die for god's sake!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact