
Python Tips and Traps - ryan_sb
https://www.airpair.com/python/posts/python-tips-and-traps
======
dalke
The namedtuple example is wrong. The constructor requires all of the
parameters, and an attribute cannot be set:

    
    
        >>> from collections import namedtuple
        >>> LightObject = namedtuple('LightObject', ['shortname', 'otherprop'])
        >>> m = LightObject()
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        TypeError: __new__() takes exactly 3 arguments (1 given)
        >>> m = LightObject("first", "second")
        >>> m.shortname
        'first'
        >>> m.shortname = 'athing'
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        AttributeError: can't set attribute
    

Also, bare try/excepts as in:

    
    
        try:
          # get API data
          data = db.find(id='foo') # may raise exception
        except:
          # log the failure and bail out
          log.warn("Could not retrieve FOO")
          return
    

are really bad. The failure might be caused by a ^C or MemoryError, or even a
SystemExit, should db.find() desire to do that.

Instead, qualify it by catching Exception:

    
    
        try:
          # get API data
          data = db.find(id='foo') # may raise exception
        except Exception:
          # log the failure and bail out
          log.warn("Could not retrieve FOO")
          return
    
    

It's also poor form to "return True" in the exit method of the context
manager. If there is no exception then that's not needed at all, and if there
is an exception ... well, that code will swallow AttributeError and NameError
and ZeroDivisionError, and leave people confused as to the source of the
error.

~~~
ryan_sb
Thanks for correcting me, I'll get those updated ASAP.

~~~
pdonis
A good rule of thumb: if you show actual code, run it before you post it!

------
bdevine
Always glad to see tips, tricks, and otherwise for Python. But for anyone
checking out Python who sees how useful the defaultdict construct is but
doesn't necessarily need nested attributes, the Counter class[0] has been
available for some time now. If you just want to keep track of, well, counts,
it's very handy and versatile.

[0]
[https://docs.python.org/3.1/library/collections.html#collect...](https://docs.python.org/3.1/library/collections.html#collections.Counter)

~~~
ryan_sb
Totally true, I'll add that in

------
kasabali
If you want to read hundreds of these look no further than Python Cookbook
[0].

[0]
[http://chimera.labs.oreilly.com/books/1230000000393/](http://chimera.labs.oreilly.com/books/1230000000393/)

------
task_queue
Use contextlib to write your context managers. Thought the exception handling
code was going to catch the all-encompassing use of 'except:' but nope. Don't
do that.

~~~
ryan_sb
I don't know how I've missed out on contextlib all this time. I'll update that
section.

------
jamiesonbecker
Excellent article. Loved the specific examples for collection types, esp
defaultdict and namedtuples!

------
rhapsodyv
I think I have a karma with python. Every python project a needed to touch I
lost a lot of time dealing with mixed spaces and tabs scattered throughout
every source file. I really know I am unlucky, this cannot be normal.

~~~
dalke
Don't use tabs. See [https://www.python.org/dev/peps/pep-0008/#tabs-or-
spaces](https://www.python.org/dev/peps/pep-0008/#tabs-or-spaces) which says
to prefer spaces.

You can start by converting all tabs into 8 spaces. This can be tricky should
some strings have tabs. That's a bad idea in the first place. Use "\t".

Don't mix tabs and spaces to get the same indentation level. Python 3
prohibits it. With Python 2 use "-t" or "-tt", which respectively warns and
raises an exception if both spaces and tabs are used in the same block.

~~~
rhapsodyv
Yeah, I ended converting all the project to spaces.

------
gknoy
Thanks for the reminder about integer division changing! I had seen some code
in our (2.7) codebase that used (a/b), and I had wondered if I should be
explicit about using math.floor, but this is even better.

------
3JPLW
How would you create that hypothetical recurise defaultdict with defaults of
defaultdicts? Is such a construct possible without creating a new
defaultdefaultdict class?

~~~
birken

      >> nested_dd = collections.defaultdict(lambda: nested_dd)
      >> nested_dd['a']['b']['c']['d'] = 'hello'

~~~
bdr
I don't think that's what they wanted:

    
    
        >>> nested_dd['d']
        'hello'

~~~
choochootrain
try nested_dd = lambda: defaultdict(nested_dd)

~~~
jamiesonbecker
Just. Wow.

------
tekromancr
I have used python for years, how did I never encounter python's collections
module? I have implemented functionally partial versions of some of these.
Looks like I gotta brush up on the batteries included!

~~~
pjmlp
While you are at it, check the itertools, functools as well, in case you also
don't know them.

~~~
tekromancr
itertools I know and love, functools is a mystery I should look into.

------
settrans
set() is a great way to deduplicate small lists, but it's important to note
that it requires O(n) extra space (in-place sorting can avoid this overhead,
but is more complex).

~~~
raymondh
Sorting is almost always the wrong way to do it. (Wordy and slow).

It is fragile design to write code that depends on 1) the data is so large
that you don't have room for a set() BUT 2) it is small enough for an in-
memory sort. (IOW, the almost-out-of-memory case invariably degrades over time
to flat-out-of-memory).

Another thought: people seem to place too much concern about about the size of
various data structures rather than thinking about the data itself. Python
containers don't contain anything, they just hold references. (Usually, the
weight of a bucket of water is mostly the water, not the bucket itself).

Finally, if your task is to dedup a lot of data, it doesn't make sense to read
it all into memory in the first place (which you would need for a sorting
approach). It is better dedup it using a set as you read in the data:

    
    
        # Only the unique lines are ever kept in memory
        with open('hugefile.txt') as f:
            uniq_lines = set(f)
    

Dude, sorry to go off like this, but the advice you gave is almost always the
wrong way to do it.

~~~
meowface
You're still reading all of the lines into memory in that example as soon as
you call `set(f)` (which is basically equivalent to set(f.readlines()),
though, which may not necessarily be what you want.

~~~
jzwinck
No, that's not at all what's happening. f.readlines() creates and returns a
full list of all lines, loaded into memory. But set(f) uses f as an iterator,
which reads chunks of the file and yields one line at a time, which can then
be inserted into the set, de-duping on the fly. Your parent is correct (and
clever).

~~~
meowface
It was incorrect of me to say it was equivalent, but if all lines in the file
are unique (ignoring the parent comment's assumption there will be lots of
duplicates) then you're still creating a large object in memory. I'm not sure
of the space of sets/hash tables vs. lists, but I imagine they'd be prety
close.

But of course that will be unavoidable if you want fast deduplication. You
could do fully lazy deduplication using a generator, but you'd have to avoid
using sets.

~~~
DasIch
If all lines in the file are unique, memory usage stays constant after the
first line is read.

A set by definition contains only unique items, that is all items in a set are
different from each other.

~~~
meowface
If you're going from generator object (or any other lazy iterator, like the
file object in this case) -> set object, there will be a memory usage increase
with each additional line read.

What I meant is you could in theory process a generator and omit duplicates
without any real memory usage (even with a file of millions of lines) by
chaining generators together. This would be slower than a set but much more
memory efficient.

------
raymondh
Nice article. It well expresses the delight that comes from seeing the
expressiveness of just a handful of tools that fit well together.

