First, it was something that I always wanted to do. For no particular reason other than I wanted to do it. I knew it was possible, but possible and doing it are two very different things.
Second, it was far easier than I thought. The time to the initial commit (basic working VM) was only about 6 hours of work. So it's not like I spent a year building it...
Third, it could be a useful education tool. For me learning the intricacies of the Zend VM better (I know it fairly well, but knowing and building give two different amounts of knowledge). But also for teaching others how the VM works. By giving a PHP implementation reference, hopefully more people can understand how the C implementation works (they both operate off the same generic implementation at this point).
Fourth, it can enable certain interesting things. For example, we could hypothetically build an Opcode optimizer in PHP which parses the generated opcodes and optimizes things (removing redundant opcodes, statically compiling static expressions, etc). Then, we could build a PECL extension that would render those optimized opcodes directly into APC cache (or some other opcode cache mechanism).
Fifth, it can be used to quickly mock up future functionality changes. Consider that it's easier to alter a PHP VM simply because you don't need to worry about memory management at all. So whipping up a POC for a significant feature should be a lot easier in PHP than C (at least for many non-full-time C developers).
Sixth, it can be used to actually debug PHP code without working knowledge of GDB (and the underlying C structures). I wouldn't recommend this, as the chances of us getting it working 100% the same as the C implementation are practically 0, but it's a concept.
Seventh, it could wind up becoming a full implementation (like PYPY). If we can compile the implementation using HipHop, and do some other lower-level tricks, there's a chance we could achieve performance somewhere near the C implementation. I doubt it, but it's possible. Especially if we add a JIT component (or a way of rendering out to machine code certain opcodes)...
Eighth, why not?
Maybe update the github readme with this? It would help a lot, IMO.
A conforming, but massively improved (read: JIT) PHP interpreter is really the only way forward for us who are so unfortunate. The good news is that there are starting to be viable options in that space.
C is a notorious language in this case, and is the closest to the real chick-or-the-egg problem this is: how do you write a C compiler without a C compiler? C is often the lowest-level language compilers are written in, so in order to bootstrap a C compiler for a specific architecture / OS, you need to write the entire compiler in the target language (assembly).
You can still see the bootstrapping part well in GCC: when you compile a new version of GCC, you first compile the compiler using your old version (bootstrapping), and then re-compile the entire codebase again using your new version.
Nowadays, "real" compiler bootstrapping using assembly is hardly ever done. You can simply create a new architecture target for GCC in a different environment (for example, Linux), and compile GCC with your new environment as a target architecture on Linux. You then install this brand new compiler on your brand new OS, et voila: you skipped the nasty bit and have a working compiler for your new OS.
Nope, you probably don't do that. You use a computer (host) where you already have a C compiler, and you compile the new compiler (which you wrote/ported). Then you run this new compiler, on the host, to generate executable code for the new architecture (target), i.e. cross-compile. You can use this cross-compiler to (again) compile your ported compiler, but to run on the target architecture. You end up with a native compiler for your new architecture.
Though, there is no essential need to have a native compiler for an architecture. For example, you can program an AVR microcontroller in C without running any C compilers on the AVR itself.
Ah, yes, I tried to explain exactly that in the last paragraph I wrote. Is what you're describing different from what I described there?
That's why people like Kernighan and Ritchie (RIP) are so well respected. They wrote C to the target language first (asm) then rewrote C in C once they had a good enough compiler. Considering that C is the cornerstone of UNIX and modern variants I'd say this is no small feat.
C will be around for a long time after many of the current languages, domain specific or not, become just another wikipedia entry.
None of which is to downplay their achievements at all (I still write C full-time); but it's interesting that even in a period of computing history where operating systems were generally written in assembly language, compilers were bootstrapped from other high level languages.
You just need to be old enough ;-)
Off-topic, but this is a rather amusing benchmark for irrelevance. I suspect that deletionism would not permit it to be true, either.
First thing is that it's clearly a very clean and high quality implementation. It's clear the author knows a substantial amount of both the implementation of the Zend engine and its semantics. This is clear from the replication of zvals, and things like separation.
However I think that it's missing most of the copy on write, reference and separation semantics. These are pretty complex so that's not unexpected. And example here is the function implementation that copies the parameter list - that's almost certainly not right. I don't see reference counts or separation in assignment either.
That said, I didn't look fully so I could be wrong.
I would recommend the author use the test suites from phc, Zend, and hiphop - that would help highlight a lot of the weird corner cases (example copying an array that had referenced in it!)
Anyway, very cool (especially impressive for 6 hours!) and I hope you go on with this :)
(Comment via phone, excuse typos)
As far as the missing parts, yes, we know those are missing. The Zval implementation was a place-holder to let it work. Now that it's working for basic code, the goal is to refactor in an appropriate design for it. Once that happens, we can implement full copy-on-write. It's very much a work-in-progress...
As far as test cases, we've been using the Zend language test cases as part of a guide. Seeing as only about 6% of them pass right now, they are not in the repo. But eventually the goal is to port them in.
Once that happens, the next goal is to get it to host run-tests.php (the PHP core test runner).
From there, there's one goal left: self-hosting.
People are collaborating on actually finally decent solutions (Composer, PHPSpec2, Behat). I see people relying less and less upon PHP's bloated, arcane and mostly dangerous standard library, instead building reusable chunks of PHP with surprisingly acceptable (comparably beautiful) design (Goutte, Twig, Doctrine).
Drupal, which I have managed to end end up working with full time for the last few years, has finally stopped (poorly) reinventing everything it touches: http://www.garfieldtech.com/blog/off-the-island-2013 and I expect that trend to continue.
We are starting to have a decent forward-thinking and flexible frameworks and projects like PHPBB have started actually using them.
I've also seen people actually using IDEs, writing docs, specifying argument types and being aware of the concept of stepping debuggers.
Facebook has sponsored enough alternate implementations of the interpreter that I've begun to lose track, charting the massive room for improvements.
Things are genuinely starting to look better; I still lose sleep over my tools and hoop-jump laden development workflow, but I actually have hope that I might not have to by this time next year.
EDIT: It does not expose a REPL though, which I suppose is what you meant?
(Sure, it's only a fun learning project, i get that)