

How big are PHP arrays (and values) really? - there
http://nikic.github.com/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html

======
maratd
As the author mentions, if you run into a use-case where you need to store
100000 integers in memory, then you should use one of the many alternative
structures available. Some of them were explicitly designed to store integers
in an efficient manner. Arrays weren't designed to store integers efficiently
or anything else for that matter. They were designed to be fast and easy to
use.

~~~
mbell
I wouldn't call using 56 bytes to store 8 bytes of actual data efficient.

~~~
dangrossman
That'd be why he said "arrays weren't designed to store integers efficiently".

~~~
ktr
From the comment:

> then you should use one of the many alternative structures available. Some
> of them were explicitly designed to store integers in an efficient manner.

From the article:

> But if you do want to save memory you could consider using an SplFixedArray
> for large, static arrays. ... It basically does the same thing, but if you
> run it, you’ll notice that it uses “only” 5600640 bytes. That’s 56 bytes per
> element ...

EDIT: formatting.

~~~
maratd
That is only one of the alternatives and in my opinion, not a very good one. I
forget the exact details, but there is an extension by the guy who wrote
igbinary that is specifically designed for this use-case.

------
pacmon
Did anyone else even try running his suggested code? I think maybe he has a
problem with his setup. I don't get those numbers.

    
    
      My numbers for PHP 5.3.8:
      Windows 7 -
      8524568 bytes (using range)
      3600584 bytes (using SplFixedArray)
      
      Fedora 14 -
      7724600 bytes (using range)
      3200568 bytes (using SplFixedArray)
    

*edit - Added numbers for SplFixedArray

~~~
nikic
Yep, I already got some comments on that. You are either using a 32 bit system
or a 32 bit binary (at least I think that the binaries PHP distributes for
Windows are compiled for 32 bit, so even if you are on a 64 bit Windows you'll
still get 32 bit numbers).

The Windows number still is 8 bytes per element larger than the number I wrote
(76 per element). This might have various reasons, one could be that it was
compiled with head protection :)

~~~
pacmon
Ah. Yes. I think you're right. I think PHP is still complied for 32-bit on
Windows. That makes more sense.

------
legooolas
> A union is a means to make some value accessible as various types. For
> example if you do a zvalue_value->lval you’ll get the value interpreted as
> an integer. If you use zvalue_value->ht on the other hand the value will be
> interpreted as a pointer to a hashtable (aka array).

This is not valid C usage of unions. They are _only_ for use as a method to
save space, not for conversion between types, despite it being a very common
usage of unions.

This can cause all manner of problems when compiler optimizations such as
type-based alias analysis are used.

EDIT: Turns out I'm completely wrong on this and it's fine from C99 onwards.

~~~
chadaustin
Actually, the upcoming C1X standard makes type punning via union legal. In
C99, alias analysis works behaves exactly as you said, but using unions to
convert between floats and ints is so common that it's legal in C1X, VC++, and
I believe gcc.

~~~
cygx
Type-punning through unions is already legal in C99, but there's a known error
in Annex J, listing it incorrectly as unspecified behaviour.

See <http://stackoverflow.com/a/8513748/48015>

~~~
legooolas
I stand corrected :/

~~~
cygx
Don't feel bad about it - the C standard can be quite subtle, and I've been
known to spread lies about it as well.

Things about which I have stumbled somewhat recently:

* restrict-qualified pointer-to-const parameters do not guarantee that the pointed-to object won't be modified as restrict only applies if the pointer is actually used to access the object, which calling code can't know (ie restrict only enables optimizations in the called code and not reordering in calling code)

* functions with differently qualified, but otherwise compatible parameter types have compatible type (which is only mentioned in the last, parenthesized sentence of section 6.7.5.3)

------
wvenable
Has anyone run similar tests on the soon-to-be-released PHP 5.4? From what I
understand, one of the changes in that release is reduced memory consumption.
Andi Gutmans has said that PHP 5.4 could lower PHP's memory footprint by as
much as 35%.

~~~
nikic
I actually did most of my tests with PHP 5.4 and trunk binaries, but also
tested PHP 5.3 and the numbers didn't change. PHP 5.2 used 8 bytes less,
because the circular GC was introduced only in PHP 5.3.

By the way, you can test that yourself too. The codepad I posted the same on
has a switch for PHP 5.2, 5.3 and 5.4, so you can easily see for yourself :)

------
nestlequ1k
Interesting. Wonder how this compares to python and ruby memory handling.

~~~
fhars
Since PHP arrays are within a factor of two for the theoretical optimum for a
dynamically typed language that unifies arrays and hashes, all these languages
(add javascript to the list) should have roughly comparable behaviour unless
they do some ugly special casing for hashes where all keys are integers.
[edit: see the comment by InfernalH for a measurement that indicates that ruby
seems to do this.] You have to look that up in the relevant documentatio
and/or source code. Perl on the other hand should fare better, as it
distinguishes arrays and hahes. If you want performance, use a real
programming language.

~~~
samdk
Python and Ruby both have separate arrays and hashes. They're completely
different data structures in both cases.

Python's _dict_ type is a hash table like you'd expect. Python's _list_ is a
pointer-array-backed list. (it may inline ints/similar things--I don't
remember if CPython does, and exact details are implementation-dependent), and
raw arrays are in the standard library if you need them.

From a very quick check, a list of 100k ints in CPython is ~1.5mb, and a dict
of 100k ints -> other ints is ~6mb.

Ruby's _hash_ and _array_ implementations are similar, I think, although I
don't know Ruby as well, so I don't know the specifics.

~~~
jeltz
Good info about Python, but you are incorrect about Ruby. Hashes and arrays
are totally different in the Ruby implementations I know of.

EDIT: Sorry, I misread you. Ruby MRI and CPython are indeed similar.

At least in Ruby MRI (the mainline) arrays are implemented as a struct with a
size and a pointer to a normal C array which contains the object references
(references in MRI are pointers to object structs which use the lower bits to
inline integers of <= 31 or 63 bits, true, false, nil and symbols).

Hashes in MRI I have not looked that much into but I believe they are a hash
tables which in ruby 1.9 retain insertion order using pointers like a singly
linked list.

~~~
masklinn
> Good info about Python, but you are incorrect about Ruby. Hashes and arrays
> are totally different in the Ruby implementations I know of.

From its context, I'm guessing samdk is saying:

> Ruby's hash and array implementations are similar [to Python's dict and
> list]

not that they're similar to one another, which would make absolutely no sense
considering his comment starts with:

> Python and Ruby both have separate arrays and hashes.

so I'd say you agree with him and misread his comment.

------
noselasd
What does memory_get_usage() actually do ? Does it report the "heap" size
assigned to the process, or does it use PHP internal counters for the
allocated user data/variables ? A C malloc subsystem will assign a whole lot
of virtual memory, in steps of pages, or more if it decides to attach a piece
using mmap().

In order to make this test case relevant, I'd say one have to know what
memory_get_usage() does - it's at least meaningless to determine the overhead
of an array based on it, if for whatever reason creating the 1. PHP array in a
program also initializes "big" memory pools that count towards the memory
usage.

~~~
ezyang
This is easy to check in the source. memory_get_usage() calls
zend_memory_usage(), which accesses the size field on a global structure
mm_heap, which is updated by PHP's memory allocation system (e.g. if you call
*_zend_mm_alloc_int)

~~~
hardtke
I was debugging a php memory issue yesterday, and noticed that at some point
my get_memeory_usage() value became much, much smaller than the memory
footprint recorded by top (20 MB in get_memory_usage(), 500 MB in top RES). It
was a sudden jump during a loop execution according to top. Does top give
accurate memory estimates for php scripts?

------
iampims
A coworker just pointed me to: <http://us.php.net/SplFixedArray>

------
lang
php is not unique in having big memory footprint. AFAIC Python and Ruby are
also memory hogs. An interesting question is how memory efficient a dynamic PL
can be. Given that in modern computers memory access (cache misses) is fairly
expensive it probably makes sense to trade instructions for memory.

~~~
masklinn
> AFAIC Python and Ruby are also memory hogs.

Even more so in some areas, for instance a Python `int` is not a machine
integer but a full-blown object.

~~~
lvh
It's only _always_ a full-blown object in CPython. Smarter implementations,
notably PyPy, will do escape analysis, and never actually end up allocating
those objects.

(My point is that not having unboxed types does not imply being a memory hog.
You just need a smarter implementation.)

~~~
masklinn
> Smarter implementations, notably PyPy, will do escape analysis, and never
> actually end up allocating those objects.

Objects in a collection (which is what we're talking about here) escape kind-
of by default.

Until type-specialized collections are merged in PyPy (if they are not yet),
it'll have the same issue as CPython.

------
rorrr
PHP arrays are not really arrays, they are sort of hash-maps.

You can do things like

    
    
        $arr = array(1 => 10, "1" => 11);
    

Or even

    
    
        $arr = array('他妈的我的生活' => 5);
    

But at the same time you can treat them as regular zero-based arrays.

    
    
        $arr = array();
        $arr[] = 1;
        $arr[] = 2;
        $arr[] = 3;
        $arr[] = 'dog';

~~~
Wilya
It's funny. I don't have much experience with php, and I had to actually run
your last example to know that it appended the values (instead of recreating
the array with a single value each time).

Is there an append operator, or something more explicit ? '+=' seems to do
something strange..

~~~
ars
[] IS the append operator. And += only works right if both sides are arrays.
If not the right side will be converted. += is really intended for hashmaps,
not integer indexes, for those use [].

~~~
sapphirecat
+= does union-by-key: any keys present in the right array, and not in the
left, are appended to the left.

`array_merge` will append _all_ numeric keys from the right array to the left,
under new key values, as if you used [] one-by-one; for string keys, _all_
keys from the right are copied into the left, possibly overwriting what was
there.

In real code, I either have all-numeric or all-string keys, and `array_merge`
does what I want in bulk operations: it's essentially equal to Python
list.extend and dict.update, respectively. `+` on arrays is basically useless
for me.

------
baby
I'm kinda hijacking this thread because I always wonder how to handle vars in
PHP. Should we use short named vars like $a, $b? Should we avoid always using
the same var and changing its type? $a = 30; $a = "thing";

~~~
ars
Someone downmodded you for being offtopic, but I'll answer you anyway.

The name of the var doesn't matter at all. And you can change the type of a
var at will. There is nothing at all in PHP that will be better if you avoid
changing the type, so just do what is clearest for your program.

