
Specific ways to write better Python (2017) - guptarohit
https://github.com/SigmaQuan/Better-Python-59-Ways
======
dd82
The author has a second edition out with 90 examples at
[https://effectivepython.com/](https://effectivepython.com/), and the
corresponding github repo at
[https://github.com/bslatkin/effectivepython](https://github.com/bslatkin/effectivepython)

Its been updated to be exclusive to 3.x, including 3.8 samples.

~~~
guiambros
The author also has a really nice Safari training video course[1]. Always
worth reminding folks that Safari is included on a $99/yr ACM subscription,
which is a great value.

[1] [https://learning.oreilly.com/videos/effective-
python/9780134...](https://learning.oreilly.com/videos/effective-
python/9780134175249)

------
softwaredoug
On “return generators instead of lists” I think it’s also important when you
do this to return use a context manager to manage the generators lifetime.
Otherwise you might be surprised when the underlying file or source is closed
later in your code.

I blogged about this:
[https://opensourceconnections.com/blog/2020/06/10/python-
gen...](https://opensourceconnections.com/blog/2020/06/10/python-generators-
and-context-managers/)

~~~
xapata
Maybe I misunderstood, but if you're suggesting the generator function
open/close the resource, then I disagree. The caller has the responsibility of
safely opening and closing. The generator function should not care what type
of iterable it receives

~~~
softwaredoug
Then the caller must know they are receiving a generator dependent on the
files lifetime. Pretty brittle IMO. Receiving a list wouldn’t depend on the
files lifetime. This causes client code to be pretty error prone as you can
very easily think you’re saving off a list built from a file, but in reality
you hold a generator. Client code closes the file, the generator is used much
later, and blammo, crash! NOT doing a context manager forces clients to work
harder at guessing lifetimes.

So you can’t just replace lists with generators.

This break the uniform access principal. Client code IMO shouldn’t need to
think about whether they got a generator or an interable container.

~~~
xapata
> Then the caller must know they are receiving a generator dependent on the
> files lifetime

How would they not? The caller provides the file as an argument to the
generator function. Further, the caller decides when to close the file,
presumably via leaving a context and after the generator is consumed.

~~~
Groxx
it's not too hard to write code like:

    
    
      with open(...) as f:
        lines = process(f)
    
      # do stuff with lines
    

^ if `process` returns a generator, the file will be closed without having
done anything.

~~~
xapata
Put `do stuff with lines` inside the context. Where it belongs.

~~~
Groxx
and hold open a resource longer than necessary? or when looping over lines
multiple times? it doesn't unambiguously belong in there. often, yes, but not
always.

~~~
hansvm
As the caller you can always choose to write lines=list(process(f)) if your
use case requires it. If process(f) returned a list though, you wouldn't be
able to avoid the potentially large memory allocation if you only needed to
iterate over the results.

~~~
Groxx
Sure. It's entirely possible.

Do you `list(...)` _literally everywhere_ you want a list, whether or not it's
currently a list? Extreme-defensive-programming like that tends to be
extremely rare in my experience. Not nonexistent, but I'd be willing to bet
that if you took a random stack overflow or github line of code, it wouldn't
do this.

~~~
hansvm
Defensive programming wasn't meant to be the takeaway.

Rather, if you need a list and the otherwise perfect library method `foo`
returns an iterator then you as the caller have almost no downside in writing
`list(foo())` instead.

On the flip-side, if you need an iterator (e.g. because of memory concerns)
and the otherwise perfect library method `foo` returns a list then you as the
caller have no options other than re-implementing `foo` or finding another
tool to solve your problem.

If your project usually requires lists rather than iterators then the
syntactic burden of wrapping everything in `list(...)` could be annoying, and
I wouldn't be at all surprised to find that performance-critical code couldn't
tolerate the iterator overhead (though I'd posit that most of the time just
using a list instead probably wouldn't fix it), but using iterators rather
than lists seems like a good default if you don't have a good reason not to.

~~~
Groxx
Isn't that just the opposite tradeoff for code you call though?

E.g. lists are strictly more flexible than iterators since they are multi-pass
and can index. If you take a return from lib func A and pass it to lib funcs B
and C (possibly in a different lib), if A returns an iterator then you need to
know what B and C are going to do with it. If A returns a list, you do not - B
and C cannot affect each other in this way unless they directly mutate the
argument, which is _relatively_ rare and usually has very strong documentation
and/or naming patterns to make it not surprising. And B and C and all their
calls can change without you needing to change your code between them and A.

You can of course `list(A())` before passing it to B and C, but that's
arguably defensive programming unless you know A returns an iterator.

tbh I'd say I see multi-iteration and indexing several times more often than I
see iterator-only use in most code, and almost never see memory issues except
in large or embedded systems. Gigabytes of RAM are standard now, and dumping a
few thousand files into memory before processing will quite often out-perform
a more memory-efficient streaming approach. In the rare cases where you _do_
exceed those memory bounds, you're likely in a niche area (gigabytes of files,
tiny memory space, significant computation cost) and are likely using niche-
oriented libraries that benefit more from coordinating tightly than from being
perfectly general.

~~~
xapata
> never see memory issues

In contrast, I see them all the time (damn you, Pandas). Beyond out-of-memory
errors, generators can be vastly more compute-efficient because of the CPU
cache.

~~~
Groxx
Pandas I'd definitely categorize as being niche :) A moderately-sized one, but
still a niche. And there you use whatever performs the way you need it, yeah -
there are special constraints on the system, and pandas is built around
dealing with them.

------
cs702
Great suggestions for general-purpose code, but I would NOT recommend
following them in 'mathematically dense' code (e.g., deep learning models),
for which being able to fit as much logic as possible into a single editing
screen becomes increasingly important as we increase the complexity of the
code.

Taken to the extreme, the "fit as much logic as possible into a single screen"
approach leads naturally to code that looks a lot like something written in
APL or one its descendants (J, K, etc.). I recognize this is not for everyone.

Personally, I think Jeremy Howard and the folks at fast.ai have struck a
pretty good balance of readability and succinctness with their coding style
for ML/AI code:

[https://docs.fast.ai/dev/style.html](https://docs.fast.ai/dev/style.html)

~~~
sweezyjeezy
I work in deep learning, and I'm very dubious of the benefit of terse code. In
my experience 90% of novel deep learning models can be coded with base layer
classes from tf/pytorch/keras + maybe a few hundred lines of 'new' code. For
this code I feel like you will avoid mistakes and make things so much easier
for the reader (probably yourself in 6 months), if you modularise, write it as
clearly as possible and explain what each part is doing.

I have used code from the fast.ai codebase and personally I find it horrendous
to work with - I find the way they structure and name classes consfusing, they
use wildcard imports everywhere, they use tiny variable names for everything -
it's all extremely reader-unfriendly, which for an educational tool seems
completely bizarre.

------
carapace
> Assigning to a list slice will replace that range in the original sequence
> with what's referenced even if their lengths are different.

Not only that, but you can assign to a slice with a stride:

    
    
        In [1]: r = list(range(10))                                                                                           
    
        In [2]: r                                                                                                             
        Out[2]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    
        In [3]: r[1::3] = 10, 20, 30                                                                                          
    
        In [4]: r                                                                                                             
        Out[4]: [0, 10, 2, 3, 20, 5, 6, 30, 8, 9]

------
nickysielicki
Items 7 and 8 seem to be at odds with each other to me, don't use reduce and
map because it's ugly and unreadable, but if you use the alternative for
reduce and map then you can't make a deeper pipe because it becomes too
complicated to read? Get real!

Is it even true that comprehensions are faster than reduce and map? I write a
lot of python with reduce and map, and I know it's not considered a best
practice for perf reasons, but whenever I go-ahead and rewrite it in the "more
pythonic" way, I can't help but think that it ends up a lot less readable. I
feel like this divide makes functional programming in python more of a hassle
than it should be.

~~~
dragonwriter
I think #7 would be clearer if it said “Use list comprehensions instead of map
or filter where the latter would use lambda expressions, and in place of the
combination of map and filter”. Map and filter where you are applying a
function (including a function-returning expression) that doesn't require a
lambda are, IMO, cleaner and avoiding an extra lambda or map/filter combo is
explictly the basis of #7 (see #7.i., 7.ii.) I don't think this is bad
writing, the author expects you to read the subpoints, but if you were to
present the major points independently...

#8 is a little unclear as to which expressions it is counting (particularly,
whether it counts the return expression, which I don't think it intends to.) I
think it's referring to the two total “for” and “if” clauses, so either two
“for” or one of each as the preferred limit, which seems to me to be a
sensible guideline.

It think both #7 and #8, even with the additional clarification, need so to be
considered in combination with #4: they set the rules as to what constitutes a
complicated expression—either with comprehensions or map/filter/etc.—that
calls for factoring part of it out into a helper function or named
subexpression to avoid overly complex, code-golfy one-liners.

> Is it even true that comprehensions are faster than reduce and map?

While I'd love to see “accumulator comprehension” syntax* added to Python, in
real current python comprehensions aren’t generally an alternative to reduce.

* something like:
    
    
      (compute x from 0 as x+n for n in ns)

------
tumidpandora
I couldn't help but notice that the latest commit on the repo was over 3 years
ago

