

PHP5 Database Iterators - leftnode
http://leftnode.com/blog/2009/09/php5-database-iterators/

======
leftnode
I'm the author and submitter of the article, and I'd honestly love to hear
what people think about this. I know there's a lot of (understandable) disdain
for PHP, but I think you can do some neat things with it. This is one
application that I've found particularly useful and wanted to share it.

~~~
thaumaturgy
I don't generally comment on anything related to individual programmer's
styles or programming methodologies; I think those are targeted more at
inexperienced programmers that are still developing their technique.

That said, my personal take on things like this is that it defeats PHP's
strengths in an attempt to make it behave like something it isn't. The primary
reasons for coding web applications in PHP instead of Java, Ruby, or Python or
what-have-you is speed and portability. Your code isn't compatible with PHP4
-- which a lot of servers are still running -- and you're spending an awful
lot of code to duplicate functionality that's already built into PHP. You can
already perform a query and get the results back and if you want to iterate
over them, you have an array to step through. That would be maybe around 5
lines of code, and you wouldn't need comments explaining what it does because
it would be clear and obvious. That's a lot less code for the parser to
evaluate, and a lot less code for it to run, which will probably translate to
being able to handle a lot more hits.

It would be interesting to load your code onto an app, and then load a
simpler, more direct version that does the same thing, and see how their
performance compares.

Nicely written article though.

~~~
jrockway
_The primary reason[...] for coding web applications in PHP instead of Java,
Ruby, or Python or what-have-you is speed_

This seems like a bad reason:
[http://shootout.alioth.debian.org/u64q/benchmark.php?test=al...](http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&lang=php&lang2=java&box=1)

PHP even uses more memory than Java on some benchmarks, which is pretty damn
impressive. You have to work quite hard to use more RAM than Java.

(You'll also notice that source size of PHP is not much smaller than the
equivalent Java program. PHP is a very verbose programming language compared
to its other "dynamic language" brethren, and even to non-"dynamic" langauges
like Haskell.)

~~~
thaumaturgy
That's a really interesting link, thanks. Something about it kinda smells
though: it doesn't match my experiences with PHP at all. (I can't speak for
Java in this case.)

I've got half of a CMS under development that's just recently -- sadly --
topped a thousand lines of code. It has a built-in parser for its own human-
readable configuration file, which it has to parse every time; it has built-in
cache management; it merges templates; it handles recursion for nested
templates; and it does on-the-fly high quality image resizing. All of my out-
and-back server response times are well under 1 second for all this work.

I'll look around and see if there are other trustworthy benchmarks that
corroborate this.

EDIT: Yeah, IEEE disagrees:
[http://www2.computer.org/portal/web/csdl/doi/10.1109/ICWS.20...](http://www2.computer.org/portal/web/csdl/doi/10.1109/ICWS.2008.71)

I'm not surprised by PHP's memory usage, actually; I long ago figured it must
be enormous just based on the way it handles data types. I don't think the
shootout link is deliberately deceptive; rather, all the tests that I looked
at were massively recursive, and this is not a task for which PHP was designed
-- I think I would expect Java to perform better in those cases.

~~~
igouy
> Something about it kinda smells though: it doesn't match my experiences with
> PHP at all.

Did you look at what the programs do before announcing that "it kinda smells"?

Did you look at the "really interesting link" enough to realize the benchmarks
game website is written in PHP?

Did you look at that particular web page enough to see "PHP is rarely the
bottleneck (HTML slides)" ?

~~~
thaumaturgy
Yes, I saw all of those before making my comment.

None of them were a substantive response to the results of the graph.

When being presented with data, you should answer with more data, not
conjecture.

~~~
igouy
> Yes, I saw all of those before making my comment.

So when you say "it doesn't match my experiences with PHP at all" are you
talking about your experience writing planetary simulations in PHP?

> When being presented with data...

You presented opinion not data.

> all the tests that I looked at were massively recursive

Name them.

~~~
thaumaturgy
This kind of nonsense is precisely why engaging in language debates is a
pointless waste of time. Why, sure, I would love to take time out of my day to
deal with you.

> _So when you say "it doesn't match my experiences with PHP at all" are you
> talking about your experience writing planetary simulations in PHP?_

No.

I was being somewhat unclear earlier so that I wouldn't have to come right out
and say that at first glance the "benchmark" site looked like complete bunk.
The real game would be to figure out in exactly which way it was bunk, and
initially I wasn't interested in that.

The test results posted there -- based on the actual numbers, not on the
horribly-formatted and nearly useless graph -- were showing a nearly 100-fold
disadvantage to using PHP or Java.

That's where my experience comes in. Now, I happen to know that AT&T is
playing around with an internal presence application for the iPhone and
they're hosting everything on Java servers with Tomcat and they're pushing
millions of messages per day through them with no trouble. So, I know Java is
a reasonable server-side environment. But, there's no way that PHP is nearly
100 _times_ slower, not when the total request time for complex text
operations is averaging 800 milliseconds.

 _Those_ are the experiences I was talking about, and they were the reason
that I said simply that the results smelled funny.

> _You presented opinion not data._

Yes, I did, and you missed the point entirely.

I said simply, "the conclusions don't seem right". You responded,
argumentatively, with a couple of pointers to what is presumably your site
(based on your comment history). None of those had anything to do with the
data. You were presenting a completely unrelated slide from a PHP
presentation, plus the fact that the site apparently runs on PHP, and you
wanted to know if I'd even bothered to look at the test programs.

And, you know what? None of those has anything to do with anything. They're
pointless. Useless. Less than useless, they're a waste of time.

 _That_ is what I was referring to when I said that data needs to be answered
with data. You and I could sit here and waste a lot of perfectly otherwise
productive hours arguing back and forth with whose website runs PHP and whose
website runs Java, or we could just present some data.

And, that's exactly what I did. I went looking for data, because I wanted to
know if in fact my experiences with PHP over the last however-many years had
been atypical, or if Java had really become that incredibly fast, or for any
other explanation. It took some digging before I could find anything else even
remotely resembling a comparative analysis, and I did: IEEE. Well, they
usually know what they're doing. (Kind of.) And, their results much more
closely matched my experiences.

There, see? Data versus data. Now if you want to try to argue that there's
something wrong with my conclusions, you'll have to look at the data and
answer with more data -- because your opinion isn't a good enough response to
data.

> Name them.

Ah! And here, you did indeed catch me. The very first test I looked at was
binary_trees, in part because the memory usage for it was enormous. While I
did give a cursory glance to fasta, n-body, and mandelbrot, I stopped paying
close attention pretty quickly.

That said, now that you've forced me to pay closer attention to this, I think
I can support my main point even better than before.

The current user-submitted PHP test for pidigits is particularly stupid; it
re-calculates the same digits of pi over and over and over again as it
displays them in groups of ten. Of course it's going to be slow. Moreover, the
test results clearly show PHP running on a single processor, while the Java
version is running in parallel across 4 processors. I haven't worked in Java,
so in the thirty seconds I'm allowing for it I can't say for certain that the
test program for pidigits there is re-calculating the same ratio over and over
again, but it doesn't _look_ like it is. In fact, it looks like it's been
pretty sweetly optimized for this specific test environment.

binary-trees at least is recursive in both implementations.

And for the Fannkuck test, Java is able to take advantage of its built-in
threads library, while PHP is forced to fork() and implement IPC via network
pipes.

And so on. I really can't be moved to analyze the results or the test any
further, because I already found other data that does a better job of
answering to these tests.

Finally, the "benchmark game" itself is completely misleading, and almost
useless as a benchmark of any kind. There are all kinds of arguments out there
on the value of benchmarks, but when you're not even attempting to create a
level playing field between different test cases, then the benchmark becomes
useless. It's much more a game than a benchmark.

After looking at all of it much more closely, I now think that using that site
as any kind of evidence that one language is slower than any other (or vice-
versa) is about as silly as writing a blog post on just how bad C is at being
object-oriented. (C, not C++.)

~~~
jrockway
_But, there's no way that PHP is nearly 100 times slower_

I too am confused that it's only 100x slower. Consider the operation "2 + 2".
In Java with JIT on, this eventually becomes a single instruction, "add".
(Ideally 2 and 2 are already in registers from previous manipulation in the
function, and you don't even need a "load".)

In PHP, this operation is hundreds of instructions. You start with the AST
node. Then you look up the variable names. Then you look up their bindings
(fast because PHP has no scope, in Real Languages this is at least a linked-
list traversal). Then you finally get the object representing the value. Now
it needs to be coerced into the right type for +. Once that is done, we can
finally run the add operation. Then the result needs to be boxed.

You are surprised that that is 100x slower than a native CPU instruction?

------
daok
Well written article, very clear. Nice job.

The only thing that I do not like is that you would have to put some database
field name in the controller to set data to the persistence instead of having
a DataAccessLayer to handle it. The main problem is if in the future you
refactor the persistence of change it, you will have to go to multiple place
to be able to make everything work. I might be wrong... but at the first look,
this is what I see. Oh by the way, do you always but your email address on
each method of all your code :P ?

~~~
leftnode
Thanks for the comments.

Most of that code was copied from my framework, which is full of Doxygen
comments, and I use my username in our subversion repository and my email
address for the author of each method since several methods can have multiple
authors.

~~~
jrockway
What value do you derive from documenting the author of each method? That
seems like a lot of overhead for very little benefit, especially when your VCS
can do an annotation and give you more detail than one comment can.

------
zackattack
I don't have much formal CS training, and I code primarily in PHP. That being
said, I don't see how this is useful at all. I've started coding more in PHP5
recently: my last two projects have been using classes. I started creating a
DBHandler class. Then, whenever I need a DB query, I have a method like this:

    
    
        function getSnsWithoutIds($customerid) {
    		$query = sprintf("SELECT aim_sn FROM add_queue WHERE aim_id=0 AND cid=%d", $customerid);
    		$result = mysql_query($query);
    		$sns = array();
    		while($row = mysql_fetch_assoc($result)) {
    			array_push($sns, $row['aim_sn']);
    		}
    		return $sns;
    	}
    

And then I simply iterate on the array back in the code.

    
    
       $dbh = new DBHandler();
       $arr = $dbh->getSnsWithoutIds();
       foreach($arr as $ele) { doSomething(); }
    

What is the advantage to doing things your way?

~~~
leftnode
Well, imagine that the query returns 1000 (or a million or what not) results.
You've looped through it once already in the getSnsWithoutIds() to build the
$sns array (which you should use $sns[] = $row['aim_sn'], its faster). So,
you've looped through everything, which takes some time, and now you have a
huge data structure in memory that you may not need.

Then, you loop through it again with the foreach. What if you wanted to
paginate those results quickly? What if you only wanted a subset of the whole
pie?

This class, as I'll show in a future article, can easily be extended to allow
for easy pagination and filtering of your data.

~~~
zackattack
Well, it takes a trivial amount of time to loop through an array with a
million results. So can you please clarify the benefit of the class as-is,
besides its purported future extensiblity? I don't get the time to unset() the
array before I return it, but you do?

BTW I did a test, and it seems that $array[] is a faster way to append.

~~~
jrockway
_BTW I did a test, and it seems that $array[] is a faster way to append._

Classic PHP-style micro-optimization. If you can't use the right algorithm,
you can at least make the wrong one run 0.34% faster.

