Hacker News new | past | comments | ask | show | jobs | submit login
Saving 9 GB of RAM with Python's __slots__ (oyster.com)
135 points by benhoyt on Nov 17, 2013 | hide | past | favorite | 43 comments

Worth pointing out that on PyPy you effectively get this for free.. http://morepypy.blogspot.co.uk/2010/11/efficiently-implement...

PyPy is not an option everywhere. Since as much I know, many, many enhancements and libraries are just not available for PyPy but just for plain old CPython.

PyPy might be an interesting project with much potential, but there seams to me a long way until it can be an one-shot replacement for CPyton.

If your goal is to save memory though, PyPy is not at all the right answer (yet).

That's the cost of being a dynamic language. Since Python objects can be dynamically enhanced everywhere (also from inheriting classes and even from outside of the class) it needs dictionaries. But those can be very memory inefficient, specially on modern 64bit Hardware. One dict can easily take 1-2k for very few stored attributes (size can even depend on actual names used, because of the nature of dicts). So when it comes to millions of object instances, it is better to use __slots__ but those come with a cost: Those objects are not enhance-able any more. You have to know all attributes of the objects in advance. So you should only use it on objects that are really used a lot or are really simple.

> That's the cost of being a dynamic language.

No, this is a cost of this particular style of dynamic object model. Not all dynamic languages are dynamic in this way.

> Those objects are not enhance-able any more.

I don't see any reason why Python can't do what Clojure's defrecord does: Provide fixed fields for pre-declared slots, while still using a dictionary for extensions. It has been a while since I've used python, but I'm almost certain that there is some __special__ magic that can make this work with relative ease.

It's also worth pointing out that most modern JITs, like V8 or PyPy, can automatically detect "hidden classes" like this and optimize these objects to pack such static fields.

"modern JITs" aren't so modern at all. All that work was originally done on Smalltalk in the 1980s — it's also entirely tangential to JITing compilers, as it can easily be done with interpreters too, so it's not even a cost of this particular style of dynamic object model — it's a cost of this implementation strategy of this particular style of dynamic object model. The fact that PyPy manages fine shows it is not the language, or any model to which it subscribes, that is at fault.

I did not say, that every dynamic language has to implement it this way, but Python does. And Python was not intended to be a language for building up huge data masses in memory, though it does not so bad in most cases (leaving out the mentioned ones).

Pythons style has some advantages though. Simplicity and really a huge quantity of flexibility are two of them.

(Traditional) Python normally is not a truly compiled language -- it is just a rather simple precompilation step that makes life easier for the interpreter. When you compile or "JIT" the code, you have more options. You see the whole program. The precompiler of Python does not! It just sees the local module. So, it can not find all classes that might enhance a base class.

If you have any ideas, how to implement a better language, why don't you implement your own? It's up to you! (I guess, creating new (more or less useful) programming languages is the hobby of computer scientists anyhow)

No, this is an implementation issue. Not all Python implementations suffer from this.

While language semantics do influence the quality of what is possible implementation wise, they are not the same thing.

It's an implementation issue.

I posted this as a comment to the article:

-- I'm working on a Ruby compiler and have taken pretty much [the PyPy] approach [of automatically using slots when possible]: Any instance variable names I can statically determine are candidates for the equivalent treatment (allocating a slot at a fixed offset in the object structure).

Anything else will still go in a dictionary. In practice my experience is that a huge proportion of objects will have a fairly static set of attributes, and the dynamic set is often small enough that having pointers them included in every instance is still often cheaper than using dictionaries. ---

In a static language, your options are generally to either statically allocate slots, or explicitly use a dictionary anyway.

You are right. But the normal application of Python is in circumstances, where you are not under such pressure anyway and flexibility and ease of use is more a requirement than having maximum speed or minimal memory requirements. And of course it is particularly the CPython implementation that I described ... but CPython is still the implementation that is used most and the one where most enhancements, libraries and so exist. Other implementations might do better in many ways, but what made Python great, is still available in its original form, the CPython environment.

That's CPython specific though, PyPy doesn't need it.

It's also true that you gain benefits out of this trick when you've got lots of instances.

This is usually true when, say, analyzing some data. If your design pattern is good, those object would be immutable anyway.

I did something similar for a batch log processing system I wrote in Python some time ago. All the log messages could be classified as representing one of a few dozen 'packet' types, each represented by an object instance (so I could do some additional processing later), so predefining each type's fixed sets of fields using slots noticeably decreased memory usage. Of course, it was the first time I had ever done anything like that in Python, so I may have been doing it wrong...

Anyways, definitely a good short read, thanks for posting!

Mmm I don't quite get why sys.getsizeof is reporting a bigger size in the slotted class, it should be the other way around according to that post.

Test code at: http://codepad.org/wlb53BLf not sure if I'm missing something...

Per the Python 3 docs (for some reason not in the Python 2 docs, but the same holds): "Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to."

Most of the space for the NonSlotted version is in the __dict__, and if you print the size of ni.__dict__ you'll probably get a couple of hundred bytes.

There are better, recursive ways to get the real size of a Python object in memory, for example see: http://pythonhosted.org/Pympler/asizeof.html#asizeof

Nice, just used pympler.asizeof and it reported that the slotted version has about 22% of the size of the non-slotted version.

I tried with this:


Try comment and un-comment the __slot__. You can measure the program memory footprint using

pmap -x <PID>

where <PID> is obtained using

pgrep python

...more than one python process?

cat /proc/<PID>/cmdline | xargs -0 echo

Useful tip. Anecdotally this helped me save 40% of memory on some data I need to store in memory for analysis: Used to be about 1KB per object, after adding __slots__ it came down to 590 bytes.

Using __slots__ is not really the same as using a namedtuple, because namedtuples are immutable.

The OP didn't say it was the same. It said they were similar. And they certainly are.

Although, there may be performance differences between `namedtuple` and `__slots__`. Particularly access time. This SO post elaborates.[1]

[1] - http://stackoverflow.com/questions/2646157/what-is-the-faste...

Thanks, great article. I've used Python for years, but this was a remaining dark corner I hadn't got to yet. Now, off to the next.

Basically, going back to Smalltalk's memory model. It also becomes much easier to JIT optimized machine code for such objects.

Or CLOS. It's a similar problem with keyword arguments passed in hash tables, I think. The space occupied is less of an issue (unless in a deep recursion) but it's slower then constructing a list of pairs, plus order of parameters is lost..

PyPy does the same thing as using __slots__ on CPython automatically, no need to use __slots__ to take advantage of the JIT.

I didn't say that __slots__ makes JIT possible. However, it does make writing one easier. (Also makes writing a faster one easier.)

EDIT: Is it the new modus operandi on HN: If a statement isn't seemingly 100% in support of your pet language, automatically read the statement in the dimmest and narrowest way possible?

Just think of it as there was some confusion and possible ambiguity, and your clarification has cleared it up for anyone interested in the subject but not yet knowledgeable enough. Someone can skim through and have their mental model corrected slightly now - a very nice thing!

Probably missing a lot of context here, but wondering why you wouldn't use something like nginx or squid for serving static content, as they are designed for this kind of use case.

Good question -- however, it's not completely static content. The hotel reviews and photos are more or less static (updated only on deployment), however a fair number of the features of the site are dynamic: user accounts, real-time pricing, search, recently-viewed hotels, etc.

See also my comment on reddit about design decisions: http://www.reddit.com/r/programming/comments/1qu5ai/saving_9...

Have you considered using something like Memcached or Redis then? There'd be some overhead sending data over a local TCP connection, but I think it would be a lot more memory-efficient.

Extra-nice thing about this feature: it can be enabled and disabled, for a class, with a very little effort. So you can check correctness first, and optimize later.

Does that fuck up? Rinse and repeat.

Does anyone know if there's any similar ability in Perl 5?

Yup. It's called 'fields': http://perldoc.perl.org/fields.html

I echo what they said in that post, though: don't prematurely optimize. If you find you have tons of objects and need the RAM or you're actually paying a premium for hash accesses, then fields can save you some effort... but if you've a small use case, don't bother.

Perl is better at sharing memory for hash keys by default, this particular problem might not show up enough for your use cases.

Does anyone know what the code is "compiled" into, if not a hash table?

I'm not entirely sure, but based on my experience with OO in Perl i guess that it simply uses an array in a special attribute, instead of putting the various attributes into dict keys on the actual object. Possibly it even uses some kind of inside-out implementation where the arrays are stored via closure in some other scope and only visible to accessor methods.

I believe CPython just allocates slightly more memory than the structure describing the object requres and stores the attributes in fixed locations immediately after it. It's basically the same way that it handles attributes of built-in types, except that those also have a C struct describing the attribute layout.

Why run Python on Windows?

Rather, why not?

Works perfectly well, and empowers one to escape from it at a later date if necessary.

How is that at all relevant to the OP?

If your OS is shit, your environment is shit, at least your language doesn't have to be shit.

Because you are one of those rare HN creatures that enjoy using it, after using most consumer OS since the early 80's, and need it for coding/system administration.

I am one of those.

Because you want to have some agile tool. Are you trying to dance tango while wearing a medieval armor?

They wanted an OS that's good at wasting RAM.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact