
Python consumes a lot of memory – how to reduce the size of objects? - atomlib
https://habr.com/en/post/458518/
======
drej
The article doesn't mention a hidden gem in Python's standard library - typed
arrays!

Those are in package `array` and they are a very barebones version of numpy's
ndarray - with just one (explicit) dimension and no overloaded operators. But
if you just want to keep a bunch of numbers in a contiguous array, they can
save you tons of memory.

(I know the purpose of the article is to describe a more complex data
structure, but arrays can still get you very far.)

~~~
vbezhenar
That's how you deal with that problem with Java as well: arrays of primitives.
If you need to operate on million of points, you can either use Point[] which
will incur something like 16 MB additional memory or use two int[] arrays
which won't incur any extra overhead short of few bytes. Your code won't be
pretty, but it'll be fast and you can always hide this weirdness behind pretty
API.

~~~
cesarb
Also known as the "Struct of Arrays" approach.

~~~
wtallis
"Struct of Arrays", to be contrasted with "Array of Structs", because in Java
it's actually "Array of pointers to Struct". In languages where "Array of
Structs" is actually possible, the decision of which to use is less clear-cut
and depends on how big the Struct is, what the access patterns are like, and
whether you're trying to perform SIMD operations on multiple Structs at once.

~~~
vbezhenar
That's not only about pointers. Every Java object incurs an overhead of around
16 bytes (architecture and JVM dependent, of course, but it's there).

------
Deimorz
It's not always an option, but simply using PyPy can massively reduce memory
usage without needing to change your code at all.

The PyPy site links to this blog post (from 10 years ago) with some info:
[https://morepypy.blogspot.com/2009/10/gc-
improvements.html](https://morepypy.blogspot.com/2009/10/gc-improvements.html)

And a quick search found this relatively recent post that does some
measurement of it: [https://dev.nextthought.com/blog/2018/08/cpython-vs-pypy-
mem...](https://dev.nextthought.com/blog/2018/08/cpython-vs-pypy-memory-
usage.html)

~~~
intellimath
In the article consider aproach with the `dataobject` from `recordclass`
library as a base object in class definition. This seem can produce less
memory than PyPy.

------
alkonaut
Any type that you have "a lot" of, is a bad candidate for a class/object. Your
OO program should typically never instantiate thousands of heap allocated
objects at once. A triangle mesh is a good object candidate. One triangle or
vertex is not. A device, a render system API handle and an image is a good
candidate, one pixel is not, and so on.

So don't use a lot of objects. In C# you use a struct and arrays of them to
avoid creating an object on the heap per array entry. In java you have to
resort to SoA, in Python it's the same. Just because you have objects doesn't
mean everything should be an object. Even languages that follow an "everything
is an object"-design usually have an escape hatch such as plain value arrays.

~~~
lacampbell
It's not about using objects or not, it's how you compose them. A C# array of
integers is an object. Each integer in that array is also an object (though
granted, it's a special case). Numpy arrays and javascript typed arrays as
well.

~~~
alkonaut
Primitives and value types are not objects in C#. The array is one object,
that’s it.

~~~
lacampbell
It has methods and fields, it implements interfaces, and descends from the
'Object' type.

The fact that it's unboxed is an implementation detail.

[https://docs.microsoft.com/en-
us/dotnet/api/system.int32?vie...](https://docs.microsoft.com/en-
us/dotnet/api/system.int32?view=netframework-4.8)

~~~
alkonaut
What’s relevant for performance is whether an array of 1000 “things” require 1
or 1001 allocated objects, whether accessing thing N in the array requires
dereferencing one or two pointers, and whether the item has a storage of 32
bits without overhead for headers. Which ones to call “objects” is semantics.

For efficient access, the array must be a consecutive array of primitives.
This is the case in both C# and java for integers. In C# it’s also the case
for an array of Vector2 with 2 primitives each, which isn’t the case in java.

My point is this: avoid heap allocating many things in collections. They must
be raw (primitive, consecutive) data without per instance overhead and of
course without heap alloc/GC cost.

------
syn0byte
While everyone haggles about the internals I have an interesting anecdote
about the costs of tools libraries.

A small service that landed in my lap needed to read(only) a data source that
was roughly 10k lines of yaml. No way that was going to be in any way
efficient so I asked for suggestions. All the work-a-day devs(I am not)
instantly said the same thing without a single real thought about it: Make it
a database duh!

Long story slightly less long, Loading up the libraries to interface with a
database ate between 2 and 3 times the memory(depending on the DB and lib)
that simply loading the entire 10k line yml ate and offered slower performance
and required more code.

SQLite was pretty darn close but in the interest of saving developers from
themselves vis a vis parameterized queries, or the need to queries all
together for that matter, increased the required code for zero benefit.

The service still hums along with a 10k line yaml in memory. "Worse is better"
indeed.

~~~
novok
You could of improved it with a protobuf or a json file then, but with the
size of your data set, it shouldn't really matter what your using.

It can be hard to beat an in memory data structure when your data set is small
enough, true.

------
erdewit
In the pure Python cases the 8 bytes per attribute are just pointers. The x, y
and z are themselves full-blown objects with all the extra memory overhead
that comes with it and this is not counted in the article. For example, an int
object uses 28 bytes, so three of them already use up more than each of the
described container objects.

The Cython and Numpy cases directly store the actual data and this has the
larger effect to reduce memory.

~~~
spott
On the other hand, the first 256 integers are singletons in python, so they
aren't duplicated.

~~~
mswtk
Slightly pedantic correction: This is a performance optimization in CPython. I
wouldn't be surprised if other implementations have something similar, but to
my best knowledge, this behaviour isn't part of the standard.

~~~
merlincorey
> ... but to my best knowledge, this behaviour isn't part of the standard.

To the best of my knowledge, Python isn't one such language with a standard,
is it?

~~~
ben509
The standard is documented in PEPs[1]. CPython is the flagship implementation,
and many PEPs discuss it directly, but if you're looking to write another
Python, that's where you'd go.

[https://www.python.org/dev/peps/](https://www.python.org/dev/peps/)

------
miohtama
Many workloads do not require more than 4GB addressable RAM per process. Linux
offered a 32-bit user space with 64-bit instruction set:

[https://en.m.wikipedia.org/wiki/X32_ABI](https://en.m.wikipedia.org/wiki/X32_ABI)

As effectively many Python workloads are usually object oriented business
logic and objects are mostly pointers, setting up x32 user space "halved" the
memory usage. It also made execution performance faster because of better CPU
cache utilisation.

Sadly, x32 was "very custom" and very hard to support. Last I heard x32 is
being phased out from Linux kernel.

~~~
chrisseaton
Another option is to use the standard 64 bit ABI, but store object pointers
compressed into 32 bits when on the heap. This lets you address perhaps 32 GB
in a 32 bit value.

~~~
srean
Could you elaborate more on how the 'compress' part works. Quite curious. I
can imagine working with a base pointer and 32 bit offsets.

~~~
miohtama
Also curious about this.

Does 64-bit instruction set provide some segments or functionality form this?
How about "native" pointers coming from glib and such?

If there has to be base + offset translation on every pointer access it is way
too slow.

I would also assume JavaScript VMs in browsers would be already utilising
this, as web page workloads are not gigabytes (hopefully).

~~~
chrisseaton
> If there has to be base + offset translation on every pointer access it is
> way too slow.

It does do this, but it's not too slow - the overhead of the translation is
lower than the benefit of reduced memory transfer, increased cache space, etc.
Obviously - otherwise people wouldn't be doing it.

------
mmrezaie
I am a C/C++ programmer mostly. Is there good documentation for python
development when I care about performance and memory footprint out there as a
book or something that anyone can recommend? It was fun reading this but I
like to know more.

~~~
njharman
I'm a Python "snob", going on 24 years and If I ever "I care about performance
and memory footprint" I'd not use Python.

Python is good enough 90%. You get faster 2x 10x, etc code by picking better
algorythms or solutions. In Python, you should not be caring about 10% or 30%
speed improvement. It's not worth it, It's not Python's strength.

When you need faster you go to C based libs (the python libs that need to go
fast are already in C),
[https://en.wikipedia.org/wiki/Cython](https://en.wikipedia.org/wiki/Cython)
[https://en.wikipedia.org/wiki/Numba](https://en.wikipedia.org/wiki/Numba)
etc.

~~~
jerven
Or go for [1]pypy or [2]graalpython. I know that equivalent code in graal java
ee produces equivalent assembly/performance to C+current gcc, which really
shows that yes changing to a JIT can pay off. The python in graalvm is early
stages but it shows that the old lesson of just write the hot spot in C is no
longer true. Chris Seaton shows that for Ruby/C the language Ruby
interpreter/C switch is expensive. I think the same is true for Python
interpreter/C switches. Something that GraalVM+Python|Ruby can optimize out.

[1] [https://www.pypy.org](https://www.pypy.org) [2]
[https://www.graalvm.org/docs/reference-
manual/languages/pyth...](https://www.graalvm.org/docs/reference-
manual/languages/python/)

~~~
jashmatthews
LuaJIT has great performance through the C FFI. It’s more that traditional
JITs can’t optimize across FFI boundaries. Hopefully Chris will correct me if
I’m full of shit.

There are other approaches which can work too, like compiling C to LLVM IR
using Clang so a function can be inlined by an LLVM based JIT at runtime.

------
wil421
>A significant reduction in the size of a class instance in RAM is achieved by
eliminating __dict__ and__weakref__. This is possible with the help of a
"trick" with __slots__:

What is the downside to the __slots__: trick? In what cases do you need the
dict and weakref?

~~~
weberc2
If you are writing code that adds or removes properties at runtime. I.e., when
you want to write shoddy code.

~~~
hermitdev
I have written code like that in the past, and would consider doing so again
in the future. When, it's appropriate, it can be hugely beneficial.

When I last did it, I was wrapping a C++ API that I needed to compare against
another dataset for a merge. The C++ API (which, admittedly, I also wrote),
didn't have equality or hash operators defined on the Python objects. So, I
monkey-patched them in for the keys I needed. It was actually the most elegant
solution I could come up with as I could then naturally use the objects in
sets and easily get differences to update the appropriate dataset.

As an aside, when I wrote the python wrapper around my C++ API, I purposely
didn't define equals and hash operators for the objects, despite having full
information to do so on the natural keys, because I wanted the flexibility to
do the monkey-patching and override how the objects were compared depending
upon circumstance.

~~~
weberc2
Pretty sure you can still monkeypatch the class instead of each instance, but
I’m too jet lagged to think it through at the moment.

------
kazinator
TXR Lisp, 64 bit:

    
    
      1> (defstruct blank ())
      #<struct-type blank>
      2> (pprof (new blank))
      malloc bytes:            16
      gc heap bytes:           32
      total:                   48
      milliseconds:             0
      #S(blank)
    

32 bit:

    
    
      2> (pprof (new blank))
      malloc bytes:             8
      gc heap bytes:           16
      total:                   24
      milliseconds:             0
      #S(blank)
    

The structure instance has a pointer to its type, followed by a numeric ID
(which is also in the type, but is forwarded to the instance for faster
access). The ID is combined with a slot symbol to perform a cache lookup to
get the offset of a slot. The numeric ID is a fixnum, which leaves a few spare
bits for a couple of flags:

    
    
      struct struct_inst {                                                            
        struct struct_type *type;                                                     
        cnum id : sizeof (cnum) * CHAR_BIT - TAG_SHIFT;                               
        unsigned lazy : 1;                                                            
        unsigned dirty : 1;                                                           
        val slot[1];                                                                  
      };                                                                              
    

If someone wanted to shrink this, they could patch the code so that the dirty
flag support, and lazy instantiation of structs is made optional (as in
compiled out), and so is the forwarding of the inst->type->id to inst->id.
This struct type is not known outside of struct.c, which is only some 1700
lines long. If you take out the id member from struct_inst, the C compiler
will find all the places that have to be fixed up; literally a 15 minute job.

I can also think of a more substantial refactoring that would eliminate the
type pointer also. There is no room in the heap object handle to store it
directly: heap handles have four words in them; the COBJ ones used for structs
have the type field, a class symbol, a pointer to an operations structure, and
a pointer to some associated object (in this case struct_inst). All structures
share the same operations structure. However, if we dynamically allocate that
operations structure for each struct _type_ , we could stick the type pointer
in there, taking it out of the instance. Thus an instance size could literally
just be sizeof(pointer) * no-of-instance-slots.

------
worik
I am paying US$5 a month for python's memory hogging.

Running Mailman it will not run in 1GB of ram (the US$5 VPS) so I had to give
it 2GB. 2GB? For a mail list server?

Had the same problem running motioneye. It will not run on a Raspberry PI
Zero. It is advertised to run on a Zero (Apparently I could squeeze it on by
doing some Python magic....)

What total crap is Python! What waste, what hubris, what technical failure!

I suspect the problem is actually Django - what hubris! NIH! lighttpd is
running sweetly, with some rust templating in a acceptable fraction of the
memory.

~~~
ggm
But moving up your stack of desire, what alternate to mailman would you live
with? Five a month is $60 a year. Assuming you value your own time at
professional wages, this job is worth an hour of your time at most before the
opportunity cost of complaint is cheaper than replacement.

~~~
ip26
_opportunity cost of complaint_

No, what he actually bought for his $60/year was license to complain about
mailman & python :)

~~~
ggm
The python complaint licence fee was the best deal I ever made.

------
skykooler
Seems to be an error in editing:

> ...which received a rating of [stackoverflow]
> ([https://stackoverflow.com/questions/29290359/existence-of-
> mu...](https://stackoverflow.com/questions/29290359/existence-of-mutable-
> named-tuple-in) -python / 29419745). In additioobjects liken, it can be used
> to reduce the size of objects in RAM...

------
ggm
Is this trading space for speed or is there actually a speedup as well in some
cases? With interpreters it's possible you be small and fast if you become
perhaps obscure or more rigid in your structured definitions

------
necovek
This is a pretty neat and useful comparison of how much memory different
structures use in Python to achieve roughly the same goals.

------
mlthoughts2018
I always hate how these things are phrased. The idea of “using a lot of
memory” doesn’t exist in a vacuum. “A lot” relative to what? Do you require
run-time dynamic typing features? Do you want to leverage the Python data
model? If yes, then this is the memory cost of your desires.

Advice in many of the comments about other languages sounds so tone deaf. The
question is not about using less memory. It’s about getting _exactly_ the
feature set of Python while using less memory.

------
kbirkeland
I feel like the last two are cheating a bit by explicitly using 32 bit
integers where the other examples seemed to use 64 bit.

~~~
auscompgeek
No, the fields that take up 8 bytes are pointers to PyObject. (I guess this
article assumes a 64-bit memory model.)

------
floatingatoll
OP, if you're reading this, your article has damaged syntax at the word
`additioobjects`.

~~~
intellimath
The author was fixed this.

------
omnimkar69
this functions u will also see in the case of JAVA s well.this problem arises
due to large no. of objects are active in RAM durig the execution of a progrm
especially if there ae restrictions on the total amount of availablle memory

