
How PHP's foreach works - vlucas
http://stackoverflow.com/questions/10057671/how-foreach-actually-works/
======
Wilduck
As far as I can tell from my reading, the strangeness stems from the fact
that:

> Arrays in PHP are ordered hashtables (i.e. the hash buckets are part of a
> doubly linked list)

And that iteration is done using a "internal array pointer":

> This pointer is part of the HashTable structure and is basically just a
> pointer to the current hashtable Bucket. The internal array pointer is safe
> against modification, i.e. if the current Bucket is removed, then the
> internal array pointer will be updated to point to the next bucket.

Which together require some complex copying rules to allow for some simple
things like iterating over the same array in nested loops.

I'm not very familiar with the implementation details of many other languages
with these constructs, but in python a `for` loop (which operates similarly to
the described `foreach` loop in php) simply operates over an iterator, which
have a well defined implementation [1]. I don't know about the implementation
any deeper than that, however.

I'm curious how other languages implementation of foreach type constructs
stack up and how the choice of implementation for the standard list/array
datatype affects the interface.

[1] <http://excess.org/article/2013/02/itergen1/#iterators>

------
danso
Is this actually Stackoverflow or an impostor phishing site? I don't see the
"Question has been closed as not constructive" notice even though this
question meets all the requirements for it.

~~~
tikhonj
Really? It's a purely technical question with one correct answer--exactly the
sort of questions that StackOverflow wants. It is _not_ a question that would
breed discussion, have a list of answers or have no correct answer, which is
what gets closed.

~~~
dionidium
I wish that were true. Useful stuff is closed _all the time_. Which is good,
because _we are totally running out of bits on the internet!_

------
SeoxyS
If anybody feels like explaining something else that's also puzzling about the
Zend engine and PHP arrays; I had a few hours spent the other day on a WTF
moment writing a PHP extension in C and querying array zvals.

I was doing something fairly simple, trying to extract values passed as named
argument to a function and turning them back into simple C types (char * and
int):

    
    
        // capturing hash keys as zvals
        zval **salt_hex_val;
        zval **key_hex_val;
        zval **iterations_val;
        if (    // getting values
                zend_hash_find(hash, "salt", strlen("salt") + 1, (void**)&salt_hex_val) == FAILURE ||
                zend_hash_find(hash, "key", strlen("key") + 1, (void**)&key_hex_val) == FAILURE ||
                zend_hash_find(hash, "iterations", strlen("iterations") + 1, (void**)&iterations_val) == FAILURE ||
                // checking types
                Z_TYPE_PP(salt_hex_val) != IS_STRING ||
                Z_TYPE_PP(key_hex_val) != IS_STRING ||
                (Z_TYPE_PP(iterations_val) != IS_LONG && Z_TYPE_PP(iterations_val) != IS_DOUBLE)
            ) {
            php_error_docref(NULL TSRMLS_CC, E_WARNING, "Could not extract and check types on required values in hash: salt, key, and iterations.");
            RETURN_NULL();
        }
        
        char *salt_hex;
        char *key_hex;
    
        if (Z_STRLEN_PP(salt_hex_val) != salt_length * 2 ||
                Z_STRLEN_PP(key_hex_val) != key_length * 2) {
            php_error_docref(NULL TSRMLS_CC, E_WARNING, "Key or Salt length incorrect.");
            RETURN_NULL();
        }
        
        salt_hex = Z_STRVAL_PP(salt_hex_val);
        key_hex = Z_STRVAL_PP(key_hex_val);
    
        int iterations = (Z_TYPE_PP(iterations_val) == IS_LONG ?
            (int)Z_LVAL_PP(iterations_val) :
            (int)Z_DVAL_PP(iterations_val));
    
    

The part that I still don't understand (but that I figured out by trial-and-
error) was why `zend_hash_find` takes a `void••`[1] as argument, which should
actually be a `zval•••` cast as `void••`. What's the purpose of the triple
pointer here?

    
    
        zend_hash_find(hash, "salt", strlen("salt") + 1, (void**)&salt_hex_val)
    

[1]: Imagine the • there is a star / asterisk.

~~~
ahomescu1
Here's my understanding of why each pointer is needed, from reading the Zend
source code for a while:

1) The innermost pointer is needed because Zend hash tables actually store a
"zval* " (pointer to zval), not a zval directly. The zval is allocated
separately, then its pointer is stored into the table.

2) The second pointer is needed because Zend tables internally malloc storage
for whatever they store (zval* ) in this case, then access that data as a
pointer. The "zval* " pointer is memcpy'd into the malloc'd area. This data is
accessed through a "zval* * " pointer. This allows users of zend_hash_xxx to
not only access the "zval* " pointer, but also change it.

3) In C, one way for a function to return a value is by passing a pointer to a
variable that will store the result. Since zend_hash_find returns the internal
"zval* * " data, you need to pass in a "zval* * * " pointer to a "zval* * "
pointer that is the actual return value you want. Through this "zval* * "
pointer, you can read and also change the "zval* " data stored in that hash
table cell.

~~~
smsm42
1) Symbol tables store double pointer, not single, see my comment to the
parent for the reason why. There may be hash tables that store single
pointers, but not symbol tables.

------
stormbrew
I think the interesting thing that this highlights about php, perhaps
especially for people who've never worked in it, is the fact that php is an
extremely rare example of a scripting language that has value semantics for
complex objects.

I've always found that an interesting choice.

~~~
unconed
> perhaps especially for people who've never worked in it

At my last job, one of our interview questions for an experienced PHP
programmer was "What makes PHP arrays different or unique, from a computer
science point of view?" Not a single one ever got anywhere close to the right
answer (that they're ordered hash tables, not arrays).

Pretty sure this applies to by-value and by-reference semantics too.

~~~
1SaltwaterC
Most of the time, getting the right answer is a matter of asking the right
question. If most experienced people can't ask a fairly simple question, then
maybe the question is to blame. That bit about the "computer science point of
view" may be a little bit vague. Asking for implementation specifics may be
more appropriate. I do know that arrays in PHP are implemented as hash tables
as this is a common rant about it. But from your question I did not understand
what you mean.

I'm still curious what do you mean by "ordered" because the PHP arrays don't
order their keys. The common rant previously mentioned aka having to use
asort() for getting a proper array:

    
    
      php > $arr = array();
      php > $arr[2] = 2;
      php > $arr[1] = 1;
      php > var_dump($arr);
      array(2) {
        [2]=>
        int(2)
        [1]=>
        int(1)
      }

~~~
stormbrew
Most hash tables have a semi-random order dictated by the hash algorithm in
combination with the bucket count. PHP Arrays are ordered by insertion order
(each slot in the hash table has a next pointer, the last of which is appended
to on insertion).

The order may be unusual or even non-obvious, but it is predictable.

~~~
jeltz
Ruby 1.9 hash tables are also ordered by insertion order.

------
nkozyra
So basically it operates on a copy unless it determines it doesn't need to?

I'm not sure why this is interesting.

~~~
wvenable
It's even less interesting because PHP arrays have value semantics so
someFunc($array) and foreach($array..) aren't really that different. The whole
thing is pretty intuitive.

The answer to the question is pretty deep though.

------
francispelland
Wasn't this a given when working with PHP? You can afterall send the reference
so that you are modifying the array as you go, rather than at the end.

$array = array(1,2,3,4,5); foreach ($array as &value){...}

~~~
narcissus
Just always be sure to unset $value afterwards :)

~~~
xkcdfanboy
I can't count the times that references have caused wierd errors in my PHP
code. Definitely a good recommendation.

~~~
narcissus
I hear ya. In fact, I got burned by that problem enough times to make a test
for it in my PHPCS 'coding standard'... which is basically a handful of
standards that look for my stupid, repeated, coding errors :)

~~~
function_seven
My own standard is this:

    
    
        foreach ($array as & $ref) {
            // Do something with $ref
        } unset ($ref);
    

i.e. put the `unset()` call on the same line as the closing brace, forever
"welding" it to that block.

