
Let me introduce: __slots__ - trueduke
https://www.chrisbarra.xyz/posts/let-me-introduce-slots/
======
insertnickname
Please excuse this off-topic comment, but the webpage is severely
dysfunctional. Unless JavaScript is enabled in the browser, the page will just
display a loading screen. All the content is right there in the HTML and it
would be perfectly readable if it wasn't deliberately obscured by the
"loading" screen.[1] I'm sure the idea is that the page will load perfectly
all at once, there won't be flashes of unstyled text and so on, but for me it
just means that the real content won't load at all. Curiously, mobile users
(or other users with small screens) are spared the loading page.[2] That's
nice for mobile users, I guess, but as a desktop user it's just salt in the
wound. If the plain page is good enough for mobile users, why isn't it good
enough for me?

Please, web developers, stop doing this. It's not just the minority of people
who browse with JavaScript disabled who are bothered by this, think of all the
people on slow Internet connections who have to wait for your megabyte+
JavaScript program to download and execute before they can read the content of
the 50 KB HTML page.

I understand that it is not feasible to accommodate no-JS browsers for Single
Page Applications because JavaScript is essential to their functionality, but
this page is not an application and JS is obviously not essential to it. Use
JavaScript to enhance webpages, not degrade them.

\- [1] Yes, Reader View in Firefox (and similar) are able to render the page
properly. It's a wonderful utility, but I shouldn't have to strip away your
broken web design to read your content.

\- [2] [https://streamable.com/b390d](https://streamable.com/b390d) (no JS
mirror:
[https://cgt.name/files/fortheloveofgod.ogv](https://cgt.name/files/fortheloveofgod.ogv))

~~~
make3
let's be honest here, no company is going to make change for the 0.001% who
happen to disable javascript

~~~
snakeanus
I would assume that more than 0.001% of the people use the Tor browser.

~~~
infinite8s
The number of people using Tor is probably less then the number if people who
disable JavaScript.

------
gipp
For simple classes, you can also just use `NamedTuple` like:

    
    
      from typing import NamedTuple
      class Foo(NamedTuple):
          bar: int = 0
          baz: str = 'default'
      foo = Foo(1, baz='not default')
    

This provides the same performance benefits, and is very similar in a lot of
ways to a Scala case class -- aside from removing the boilerplate of stating
each attribute twice in __init__ and again in __slots__, it also provides nice
default implementations for __repr__, __str__ (something like `Foo(bar=1,
baz='quux')`), __hash__, and __eq__ (using member equality rather than
identity).

The downsides are A, you can't inherit from any other class (it's
syntactically valid, but silently ignored (!) though it will be caught by
linters), and B, `isinstance(foo, tuple)` returns `True`, which messes with a
lot of reflection code (e.g. the `json` module will not serialize them like
you expect)

~~~
gshulegaard
I believe this is just a typing shorthand for `collections.namedtuple`:

[https://docs.python.org/3/library/typing.html?highlight=name...](https://docs.python.org/3/library/typing.html?highlight=namedtuple)

Last I checked `collections.namedtuple` had some infamous performance
implications (although I vaguely recall some discussion about changing it's
implementation):

[https://stackoverflow.com/questions/2646157/what-is-the-
fast...](https://stackoverflow.com/questions/2646157/what-is-the-fastest-to-
access-struct-like-object-in-python)

I believe that `collections.namedtuple` is beneficial for memory consumption
[1], but you pay a cost when accessing members by `__getitem__`.

In general, judging by timings of the StackOverflow above, I think it is
better to use `__slots__` in most cases than `namedtuple`.

[1] [http://blog.explainmydata.com/2012/07/expensive-lessons-
in-p...](http://blog.explainmydata.com/2012/07/expensive-lessons-in-python-
performance.html)

Edit:

Also, fun fact, is that namedtuple does a clever bit of metaprogramming where
a string class definition is formatted and passed to `exec`:

[https://hg.python.org/cpython/file/b14308524cff/Lib/collecti...](https://hg.python.org/cpython/file/b14308524cff/Lib/collections/__init__.py#l232)

~~~
gipp
That post is from 2010, before even 2.7 was released. It is extremely out of
date.

Re-running the same tests I get:

    
    
        In [1]: from typing import NamedTuple
    
        In [2]: class A(NamedTuple):
           ...:     a = 1
           ...:     b = 2
           ...:     c = 3
           ...: 
    
        In [3]: a = A()
    
        In [4]: class B:
           ...:     __slots__ = ('a', 'b', 'c')
           ...:     def __init__(self, a=1, b=2, c=3):
           ...:         self.a = a
           ...:         self.b = b
           ...:         self.c = c
           ...: 
    
        In [5]: b = B()
    
        In [6]: c = dict(a=1, b=2, c=3)
    
        In [7]: d = (1,2,3)
    
        In [8]: e = [1,2,3]
    
        In [9]: key = 2
    
        In [10]: %timeit z = a.c
        38.2 ns ± 0.07 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
    
        In [11]: %timeit z = b.c
        48.1 ns ± 0.0461 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
    
        In [12]: %timeit z = c['c']
        38 ns ± 0.062 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
    
        In [13]: %timeit z = d[2]
        38.8 ns ± 0.0425 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
    
        In [14]: %timeit z = e[2]
        39.8 ns ± 0.0641 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
    
        In [15]: %timeit z = d[key]
        48.6 ns ± 0.0713 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
    
        In [16]: %timeit z = e[key]
        49.4 ns ± 0.0369 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
    

Namedtuple looks _faster_ than slots here. And for comparison's sake:

    
    
        In [17]: class F:
            ...:     def __init__(self, a=1, b=2, c=3):
            ...:         self.a = a
            ...:         self.b = b
            ...:         self.c = c
            ...: 
    
        In [18]: f = F()
    
        In [19]: %timeit z = f.c
        53.7 ns ± 0.265 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

~~~
gshulegaard
The question was asked in 2010, but the benchmark post I was referencing is
from 2014.

It doesn't appear as if `namedtuple` has changed much:

[https://hg.python.org/cpython/file/tip/Lib/collections/__ini...](https://hg.python.org/cpython/file/tip/Lib/collections/__init__.py#l302)

Running the same tests from the post, I get similar results:

    
    
        from collections import namedtuple
        
        STest = namedtuple("TEST", "a b c")
        a = STest(a=1,b=2,c=3)
    
        class Test(object):
            __slots__ = ["a","b","c"]
    
            a=1
            b=2
            c=3
        
        b = Test()
        
        c = {'a':1, 'b':2, 'c':3}
        
        d = (1,2,3)
        e = [1,2,3]
        f = (1,2,3)
        g = [1,2,3]
        key = 2
        
        if __name__ == '__main__':
            from timeit import timeit
        
            print("Named tuple with a, b, c:")
            print(timeit("z = a.c", "from __main__ import a"))
        
            print("Named tuple, using index:")
            print(timeit("z = a[2]", "from __main__ import a"))
        
            print("Class using __slots__, with a, b, c:")
            print(timeit("z = b.c", "from __main__ import b"))
        
            print("Dictionary with keys a, b, c:")
            print(timeit("z = c['c']", "from __main__ import c"))
        
            print("Tuple with three values, using a constant key:")
            print(timeit("z = d[2]", "from __main__ import d"))
        
            print("List with three values, using a constant key:")
            print(timeit("z = e[2]", "from __main__ import e"))
        
            print("Tuple with three values, using a local key:")
            print(timeit("z = d[key]", "from __main__ import d, key"))
        
            print("List with three values, using a local key:")
            print(timeit("z = e[key]", "from __main__ import e, key"))
    

Output:

    
    
        Named tuple with a, b, c:
        0.147349834442
        Named tuple, using index:
        0.0364408493042
        Class using __slots__, with a, b, c:
        0.0334758758545
        Dictionary with keys a, b, c:
        0.0339260101318
        Tuple with three values, using a constant key:
        0.0425179004669
        List with three values, using a constant key:
        0.0301761627197
        Tuple with three values, using a local key:
        0.0436310768127
        List with three values, using a local key:
        0.0309669971466

~~~
aaronchall
I just noticed your slotted object uses class variables. That is completely
wrong - so your analysis is probably wrong.

In Python, I think it's important to do what's directly semantically correct.

However, we can compare apples and oranges on what they have in common, so
here's a bit of my own analysis:

    
    
      >>> from timeit import repeat
      >>> from collections import namedtuple
      >>> class Slotted: __slots__ = 'a', 'b', 'c', 'd', 'e'
      ... 
      >>> NT = namedtuple('NT', 'a b c d e')
      >>> 
      >>> nt = NT(1,2,3,4,5)
      >>> s = Slotted()
      >>> s.a, s.b, s.c, s.d, s.e = 1,2,3,4,5
      >>> min(repeat(lambda: (s.a, s.b, s.c, s.d, s.e)))
      0.2850431999977445
      >>> min(repeat(lambda: (nt.a, nt.b, nt.c, nt.d, nt.e)))
      0.418955763001577
     

I'm showing the namedtuple to be a little slower, but I think the thing to
remember is to do the semantically correct thing first. If you've met your
technical requirements, you're done.

Only if you're too slow, _then_ you look for ways to optimize for bottlenecks.

~~~
gshulegaard
I was just copying the benchmark I was quoting as a sanity check since I was
just trying to validate references.

I also feel compelled to point out that __slots__ changes the way objects are
initialized [1]:

> By default, instances of both old and new-style classes have a dictionary
> for attribute storage. This wastes space for objects having very few
> instance variables. The space consumption can become acute when creating
> large numbers of instances. > > The default can be overridden by defining
> __slots__ in a new-style class definition. The __slots__ declaration takes a
> sequence of instance variables and reserves just enough space in each
> instance to hold a value for each variable. Space is saved because __dict__
> is not created for each instance.

Which means that, in actuality, simply defining __slots__ changes the nature
of the object.

Quick trial demonstrates this:

    
    
        >>> class mySlottedObj(object):
        ...     __slots__ = ('a', 'b')
        ...     c = 1
        ...     
        >>> x = mySlottedObj()
        >>> x
        <mySlottedObj object at 0x10e6a3510>
        >>> x.c
        1
        >>> x.__dict__
        Traceback (most recent call last):
          File "<input>", line 1, in <module>
        AttributeError: 'mySlottedObj' object has no attribute '__dict__'
        >>> x.__weakref__
        Traceback (most recent call last):
          File "<input>", line 1, in <module>
        AttributeError: 'mySlottedObj' object has no attribute '__weakref__'
        >>> x.a = 1
        >>> x.c = 2
        Traceback (most recent call last):
          File "<input>", line 1, in <module>
        AttributeError: 'mySlottedObj' object attribute 'c' is read-only
    

The benefit of namedtuples, in the context of Python is reducing the object
footprint to being no greater than a tuple. The performance hit, which I am
trying to demonstrate, is in lookup and is shown in the source (linked above).

    
    
        _field_template = '''\
            {name} = _property(_itemgetter({index:d}), doc='Alias for field number {index:d}')
        '''
    

As I understand it this means when you access using __getitem__, a named tuple
first maps to a property that then calls a __getitem__ using the aliased index
on the base `tuple` object that `namedtuple` is storing.

But at any rate, the benchmark script was not mine and refining the test to be
as close to the same code path is welcome.

And I wholeheartedly agree:

> Only if you're too slow, then you look for ways to optimize for bottlenecks.

I was merely responding to the original assertion:

> This provides the same performance benefits, and is very similar in a lot of
> ways to a Scala case class -- aside from removing the boilerplate of stating
> each attribute twice in __init__ and again in __slots__,

And I think at this point it is quite clear that namedtuple has a bit more
caveats than just "removing the boilerplate" of __slots__.

------
Animats
Guido used to be hostile to __slots__, but apparently he's lightened up.

If you're not using "setattr" to dynamically add fields to your object,
CPython's underlying machinery is overkill. Everything is a dict, and all
accesses take a lookup. With __slots__, a Python object is more like a struct
in other languages.

Slotted objects should be the default. You should have to inherit from
"DynamicObject" or something like it to enable setattr on an object. Actually,
dynamic objects stopped being that important once Python let you inherit from
"dict". Then, if you wanted "foo[ix]", it was easy to do it.

There are some packages, such as BeautifulSoup, which use dynamic attributes
heavily. BeautifulSoup uses it to map HTML tags to attributes. This is more
trouble than it's worth, because when an HTML tag name clashes with a Python
builtin name, workarounds are necessary.

~~~
make3
> "should"

this would make python a different language completely

~~~
Spivak
I mean it would certainly be a breaking change to existing code, and we know
how well that goes over in the Python community, but it wouldn't be a
completely different language.

The argument is that these objects are a sane default that would prevent a lot
of accidental programming mistakes and I agree.

------
trjordan
Since I don't think it was explicitly mentioned, this is because __slots__ is
effectively a named tuple. The size reduction comes from dropping the keys in
the dictionary (they're stored once on the class, not every time on the
instances), and the speed reduction come from it being an array lookup instead
of a dict.

It's also a great way to introduce some static typing. No more setting
mistyped class attributes without errors!

------
vidarh
I've opted for a similar approach "behind the scenes" for my (wildly
incomplete, though now close to compiling itself) Ruby compiler: While you can
usually statically determine most likely instance variables for Ruby classes,
and optimize that by creating the equivalent of slots, you also need to be
able to dynamically create more, so I allocate "slots" in the instances for
any variables that I can see, and fall back on the equivalent of __dict__ for
anything that's dynamically added.

I've always found it quite curious that Python made this explicit rather than
an implementation detail (I get that there are slight semantic differences
here in Python too) because it seems like such a useful optimization. And a
similar approach can be used for method lookup too.

I wonder (and I don't have any measurements on this myself) what the relative
benefits are to explicitly picking one over the other vs. automating it are.
The downside of doing it automatically is that some types of objects might
very well be very "sparse" in that it's not a given that all instance
variables are used for all objects (this is more applicable in for vtables,
especially in languages with single-rooted object hierarchies).

My intuition is that the overhead of a map/dict/hash table to store them is
likely to usually outweigh the cost of quite a lot of unused instance
variables, and so that inferring instance variables is generally likely to be
an improvement.

~~~
pmontra
Do you have a link to your Ruby compiler? EDIT found it in your profile
[http://www.hokstad.com/compiler](http://www.hokstad.com/compiler) and
submitted to HN. Great series of posts.

About Python, it's a little older than Ruby (its design started in 1989) and
Guido worked on the ABC language at the beginning of the 80s. IMHO that
difference shows in many places, for example in having to explicitly pass self
as argument in method definitions. Newer languages do without it and
programmers don't get confused at all. All those double underscore methods and
the general explicitness of the language smell of 80s and of C. Not
necessarily a bad thing but reading some Python code is like getting on a time
machine.

~~~
vidarh
Thanks. It's long overdue some updates as I've done quite a lot of work since
(though still moving slowly) and it's pretty close to being able to fully
compile itself now.

What you say makes sense, and I understand that it takes a lot of effort to
clean those type of things up without hurting backwards compatibility.

This specific distinction though seems like one where it'd still be possible
to get most of the benefit by changing the implementation of __dict__ to make
it do something similar to __slot__ under the hood, but fall back to a dict
for dynamic properties.

------
aaronchall
I have updated the documentation on __slots__, and those changes have made it
into the dev version of the Python docs.[0]

I have also written up __slots__ in great detail on Stack Overflow.[1]

I'm at a meetup right now, but I can try to answer any quick questions here
tonight.

[0]
[https://docs.python.org/3.7/reference/datamodel.html#object....](https://docs.python.org/3.7/reference/datamodel.html#object.__slots__)

[1]
[https://stackoverflow.com/q/472000/541136](https://stackoverflow.com/q/472000/541136)

------
pfranz
I see __slots__ come up quite often (I think more often than it should). It's
great to be aware of, but shouldn't be used unless necessary. Like any
optimization, it can make code maintenance more difficult. Obviously, you have
to update slots if you add more attributes to the class, but you have to be
aware of slots when dealing with class inheritance.

------
coconutrandom
A few years ago I had read about Python performance gains and some built in
attributes, but couldn't find the reference when I looked. Well it was
`__slots__`, thanks for posting!

------
abhirag
I just discovered
attrs([http://www.attrs.org/en/stable/](http://www.attrs.org/en/stable/)). On
paper it looks great, seems to reduce boilerplate, has support for
immutability and slots too and claims to have no runtime overhead. If anybody
here has experience using it, would love to hear from you. Is it as good as it
looks on paper? Do you recommend using it?

~~~
ericfrederich
I've been using it here and there in place of named tuples and data-only
classes. It is very convenient.

I had a project where I had to connect to a data source and serialize some
data. When I needed to add an attribute all I had to do was add it to two
places: the class itself, and the @classmethod constructor. So in this case I
got the serialization for free, but that's all I was using. In reality I also
got __repr__, __cmp__, etc, etc for free too, I just wasn't using it.
Serialization was free because attr.asdict() knows which attributes are
attr.ib() attributes.

    
    
      @attr.s()
      class Foo:
          bar = attr.ib()
          spam = attr.ib()
          eggs = attr.ib()  # added this line
      
          @classmethod
          def from_something_else(cls, x):
              return cls(
                  bar=x.bar.name,
                  spam=x.spam,
                  eggs=x.get_eggs().blah(),  # and this line
              )
      
      with open('blah.json', 'w') as fout:
          json.dump(fout, attr.asdict(o))  # got serialization for free

------
amelius
This is all nice and all, but of course such language hacks (as I would call
them) make for a less elegant language with a higher barrier to entry. At some
point it would make sense from a software-engineering standpoint to switch to
a cleaner lower level language.

By the way, I like the approach taken in JavaScript engines such as V8, which
determine the "slots" dynamically.

------
spraak
I just started learning Python, what does this syntax mean?

    
    
        **json.loads(my_json))
    

from

    
    
        with_slots = [get_size(MyUserWithSlots(**json.loads(my_json))) for _ in range(NUM_INSTANCES)]
    

Edit: I mean, I get what it's doing, but specifically I don't understand the
double * (somehow it's not rendering in this comment)

~~~
deathanatos
A function in Python takes two kinds of arguments: positional arguments, and
keyword arguments.

    
    
        def foo(a, b, c):
            pass
    
        # Arguments passed positionally:
        foo(1, 2, 3)
        # or passed by keyword:
        foo(a=1, b=2, c=3)
    

Now, say you want to make that second call somewhat dynamically, and you have
the dict:

    
    
        a_dict = {'a': 1, 'b': 2, 'c': 3}
    

You can make that second foo call:

    
    
        foo(**a_dict)
        # because of the value of a_dict, expands to:
        # foo(a=1, b=2, c=3)
    

Functions themselves can take only positional arguments, only keyword
arguments, or really, any combination of them. Here is a decent SO question
and answer[1] that might help, a bit about it in the tutorial[2], and last,
the formal documentation[3].

[1]: [https://stackoverflow.com/questions/1419046/python-normal-
ar...](https://stackoverflow.com/questions/1419046/python-normal-arguments-vs-
keyword-arguments)

[2]:
[https://docs.python.org/2/tutorial/controlflow.html#keyword-...](https://docs.python.org/2/tutorial/controlflow.html#keyword-
arguments)

[3]:
[https://docs.python.org/2/reference/expressions.html#grammar...](https://docs.python.org/2/reference/expressions.html#grammar-
token-call)

~~~
spraak
Awesome, thank you for the detailed explanation!

------
wutbrodo
I don't think it's necessarily a bad thing, but it's kind of odd to see an
article on the front page of HN that amounts to a less-concise version of the
documentation for a very simple to understand language feature (the doc page
has four or five sentences and is just as easy to understand).

------
ntrepid8
If you want to create a large number of Objects (like for rows in a data set)
using __slots__ saves a lot of memory over just using the standard __dict__.
I've always used named tuples for this in the past, but this is a nice way to
do it.

------
plainOldText
One caveat though, object creation with `__slots__` is a bit slower than
"normal" objects, if I remember my past Python benchmarks correctly.

