I'm the author and submitter of the article, and I'd honestly love to hear what people think about this. I know there's a lot of (understandable) disdain for PHP, but I think you can do some neat things with it. This is one application that I've found particularly useful and wanted to share it.
I don't generally comment on anything related to individual programmer's styles or programming methodologies; I think those are targeted more at inexperienced programmers that are still developing their technique.
That said, my personal take on things like this is that it defeats PHP's strengths in an attempt to make it behave like something it isn't. The primary reasons for coding web applications in PHP instead of Java, Ruby, or Python or what-have-you is speed and portability. Your code isn't compatible with PHP4 -- which a lot of servers are still running -- and you're spending an awful lot of code to duplicate functionality that's already built into PHP. You can already perform a query and get the results back and if you want to iterate over them, you have an array to step through. That would be maybe around 5 lines of code, and you wouldn't need comments explaining what it does because it would be clear and obvious. That's a lot less code for the parser to evaluate, and a lot less code for it to run, which will probably translate to being able to handle a lot more hits.
It would be interesting to load your code onto an app, and then load a simpler, more direct version that does the same thing, and see how their performance compares.
To be honest, I'm not really concerned about PHP4 anymore, it's been dead a while.
Thanks giving me an idea for a followup article too, I'll do exactly that, write a small direct application and compare performance with large datasets.
Also, doing this in a purely procedural method would be an interesting article too.
I don't think its a strength of php to have code portable to older versions of php. New versions mean new features and he's taken full advantage of some of the most powerful features in php5. I'll quote the article "The code in this article was adopted from Artisan System, Leftnode’s PHP5 framework."
I'm curious how many situations where new code is stuck in a php4 environment. PHP4 is getting quite old, and has been end of life'd. Most hosts run php5 or can be asked to upgrade without much fuss. After all it has been years since it came out.
As for iteration of a model vs arrays, having your data in a model has many more advantages then a collection of strings. Chances are you don't need the slight performance you'd gain out of sticking with strings and you'd loose the advantages of working with a framework.
It's like saying ROR is inefficient or slow. The point is frameworks are designed to make development easier. If you need the performance you might gain out of not using the framework, then you're probably at a point where you're past needing a framework to help build your app.
Yeah, PHP4 was EOL'd quite a while back. Unfortunately, the transition from PHP4 to PHP5 wasn't easy for a lot of people, and application developers didn't make this any easier for end-users by having convoluted and often broken upgrade paths.
Of the two hosts I've dealt with regularly, both of them still support PHP4 on their older accounts, and both of them have handled the transition by setting up new servers and having all new accounts installed on their PHP5 servers.
I'm curious about the advantages you say that a model has over an array of strings. I can't honestly think of any.
I don't disagree that frameworks are designed to make development easier. Quite the contrary: that's why I have a strong preference for frameworks that are very clear, very concise, and very lightweight.
I'm way beyond busy right now, but I'm considering writing a competing procedural approach to leftnode's code, just for fun.
PHP even uses more memory than Java on some benchmarks, which is pretty damn impressive. You have to work quite hard to use more RAM than Java.
(You'll also notice that source size of PHP is not much smaller than the equivalent Java program. PHP is a very verbose programming language compared to its other "dynamic language" brethren, and even to non-"dynamic" langauges like Haskell.)
That's a really interesting link, thanks. Something about it kinda smells though: it doesn't match my experiences with PHP at all. (I can't speak for Java in this case.)
I've got half of a CMS under development that's just recently -- sadly -- topped a thousand lines of code. It has a built-in parser for its own human-readable configuration file, which it has to parse every time; it has built-in cache management; it merges templates; it handles recursion for nested templates; and it does on-the-fly high quality image resizing. All of my out-and-back server response times are well under 1 second for all this work.
I'll look around and see if there are other trustworthy benchmarks that corroborate this.
I'm not surprised by PHP's memory usage, actually; I long ago figured it must be enormous just based on the way it handles data types. I don't think the shootout link is deliberately deceptive; rather, all the tests that I looked at were massively recursive, and this is not a task for which PHP was designed -- I think I would expect Java to perform better in those cases.
Did you look at the "really interesting link" enough to realize the benchmarks game website is written in PHP?
What difference does this make?
Did you look at that particular web page enough to see "PHP is rarely the bottleneck (HTML slides)" ?
This is just random user-generated content; every language on the site gets to link to whatever they want.
(As an aside, I am tried of the "my app can be slow because the database is slow". If you are not Doing It Wrong, your database is not slow, and your app can do other things while waiting for IO.)
This kind of nonsense is precisely why engaging in language debates is a pointless waste of time. Why, sure, I would love to take time out of my day to deal with you.
> So when you say "it doesn't match my experiences with PHP at all" are you talking about your experience writing planetary simulations in PHP?
No.
I was being somewhat unclear earlier so that I wouldn't have to come right out and say that at first glance the "benchmark" site looked like complete bunk. The real game would be to figure out in exactly which way it was bunk, and initially I wasn't interested in that.
The test results posted there -- based on the actual numbers, not on the horribly-formatted and nearly useless graph -- were showing a nearly 100-fold disadvantage to using PHP or Java.
That's where my experience comes in. Now, I happen to know that AT&T is playing around with an internal presence application for the iPhone and they're hosting everything on Java servers with Tomcat and they're pushing millions of messages per day through them with no trouble. So, I know Java is a reasonable server-side environment. But, there's no way that PHP is nearly 100 times slower, not when the total request time for complex text operations is averaging 800 milliseconds.
Those are the experiences I was talking about, and they were the reason that I said simply that the results smelled funny.
> You presented opinion not data.
Yes, I did, and you missed the point entirely.
I said simply, "the conclusions don't seem right". You responded, argumentatively, with a couple of pointers to what is presumably your site (based on your comment history). None of those had anything to do with the data. You were presenting a completely unrelated slide from a PHP presentation, plus the fact that the site apparently runs on PHP, and you wanted to know if I'd even bothered to look at the test programs.
And, you know what? None of those has anything to do with anything. They're pointless. Useless. Less than useless, they're a waste of time.
That is what I was referring to when I said that data needs to be answered with data. You and I could sit here and waste a lot of perfectly otherwise productive hours arguing back and forth with whose website runs PHP and whose website runs Java, or we could just present some data.
And, that's exactly what I did. I went looking for data, because I wanted to know if in fact my experiences with PHP over the last however-many years had been atypical, or if Java had really become that incredibly fast, or for any other explanation. It took some digging before I could find anything else even remotely resembling a comparative analysis, and I did: IEEE. Well, they usually know what they're doing. (Kind of.) And, their results much more closely matched my experiences.
There, see? Data versus data. Now if you want to try to argue that there's something wrong with my conclusions, you'll have to look at the data and answer with more data -- because your opinion isn't a good enough response to data.
> Name them.
Ah! And here, you did indeed catch me. The very first test I looked at was binary_trees, in part because the memory usage for it was enormous. While I did give a cursory glance to fasta, n-body, and mandelbrot, I stopped paying close attention pretty quickly.
That said, now that you've forced me to pay closer attention to this, I think I can support my main point even better than before.
The current user-submitted PHP test for pidigits is particularly stupid; it re-calculates the same digits of pi over and over and over again as it displays them in groups of ten. Of course it's going to be slow. Moreover, the test results clearly show PHP running on a single processor, while the Java version is running in parallel across 4 processors. I haven't worked in Java, so in the thirty seconds I'm allowing for it I can't say for certain that the test program for pidigits there is re-calculating the same ratio over and over again, but it doesn't look like it is. In fact, it looks like it's been pretty sweetly optimized for this specific test environment.
binary-trees at least is recursive in both implementations.
And for the Fannkuck test, Java is able to take advantage of its built-in threads library, while PHP is forced to fork() and implement IPC via network pipes.
And so on. I really can't be moved to analyze the results or the test any further, because I already found other data that does a better job of answering to these tests.
Finally, the "benchmark game" itself is completely misleading, and almost useless as a benchmark of any kind. There are all kinds of arguments out there on the value of benchmarks, but when you're not even attempting to create a level playing field between different test cases, then the benchmark becomes useless. It's much more a game than a benchmark.
After looking at all of it much more closely, I now think that using that site as any kind of evidence that one language is slower than any other (or vice-versa) is about as silly as writing a blog post on just how bad C is at being object-oriented. (C, not C++.)
But, there's no way that PHP is nearly 100 times slower
I too am confused that it's only 100x slower. Consider the operation "2 + 2". In Java with JIT on, this eventually becomes a single instruction, "add". (Ideally 2 and 2 are already in registers from previous manipulation in the function, and you don't even need a "load".)
In PHP, this operation is hundreds of instructions. You start with the AST node. Then you look up the variable names. Then you look up their bindings (fast because PHP has no scope, in Real Languages this is at least a linked-list traversal). Then you finally get the object representing the value. Now it needs to be coerced into the right type for +. Once that is done, we can finally run the add operation. Then the result needs to be boxed.
You are surprised that that is 100x slower than a native CPU instruction?
>PHP even uses more memory than Java on some benchmarks, which is pretty damn impressive.
True, but in a web scenario, once a PHP script finished running it's completely removed from memory - so PHP usually ends up using less memory than other languages, since memory leaks don't get a chance to do that much damage.
As for verbosity, PHP5's object model makes it possible to write PHP that's almost as verbose as Java (eg the code in this article) but you don't have to. You can write PHP that's comparable to Ruby & Python in size (if not in elegance).
You're not the first to try this. Sapphire (the framework powering the SilverStripe CMS) implements this technique and has all of it's DataObject select calls return a 'DataObjectSet' which is basically the equivalent of your DB_Iterator.
The only thing that I do not like is that you would have to put some database field name in the controller to set data to the persistence instead of having a DataAccessLayer to handle it. The main problem is if in the future you refactor the persistence of change it, you will have to go to multiple place to be able to make everything work. I might be wrong... but at the first look, this is what I see. Oh by the way, do you always but your email address on each method of all your code :P ?
Most of that code was copied from my framework, which is full of Doxygen comments, and I use my username in our subversion repository and my email address for the author of each method since several methods can have multiple authors.
What value do you derive from documenting the author of each method? That seems like a lot of overhead for very little benefit, especially when your VCS can do an annotation and give you more detail than one comment can.
I don't have much formal CS training, and I code primarily in PHP. That being said, I don't see how this is useful at all. I've started coding more in PHP5 recently: my last two projects have been using classes. I started creating a DBHandler class. Then, whenever I need a DB query, I have a method like this:
function getSnsWithoutIds($customerid) {
$query = sprintf("SELECT aim_sn FROM add_queue WHERE aim_id=0 AND cid=%d", $customerid);
$result = mysql_query($query);
$sns = array();
while($row = mysql_fetch_assoc($result)) {
array_push($sns, $row['aim_sn']);
}
return $sns;
}
And then I simply iterate on the array back in the code.
$dbh = new DBHandler();
$arr = $dbh->getSnsWithoutIds();
foreach($arr as $ele) { doSomething(); }
Well, imagine that the query returns 1000 (or a million or what not) results. You've looped through it once already in the getSnsWithoutIds() to build the $sns array (which you should use $sns[] = $row['aim_sn'], its faster). So, you've looped through everything, which takes some time, and now you have a huge data structure in memory that you may not need.
Then, you loop through it again with the foreach. What if you wanted to paginate those results quickly? What if you only wanted a subset of the whole pie?
This class, as I'll show in a future article, can easily be extended to allow for easy pagination and filtering of your data.
Well, it takes a trivial amount of time to loop through an array with a million results. So can you please clarify the benefit of the class as-is, besides its purported future extensiblity? I don't get the time to unset() the array before I return it, but you do?
BTW I did a test, and it seems that $array[] is a faster way to append.
Can you explain the dichotomy of constant space / linear space here? Because constant = {whatever} and mine is linear because it necessarily doubles the constant space?