Hacker News new | comments | show | ask | jobs | submit login
PHPPHP: a PHP VM in PHP (github.com)
165 points by dbaupp 1635 days ago | hide | past | web | 47 comments | favorite



I'm the author of this library. I figured I'd answer a couple of questions as to why I wrote this.

First, it was something that I always wanted to do. For no particular reason other than I wanted to do it. I knew it was possible, but possible and doing it are two very different things.

Second, it was far easier than I thought. The time to the initial commit (basic working VM) was only about 6 hours of work. So it's not like I spent a year building it...

Third, it could be a useful education tool. For me learning the intricacies of the Zend VM better (I know it fairly well, but knowing and building give two different amounts of knowledge). But also for teaching others how the VM works. By giving a PHP implementation reference, hopefully more people can understand how the C implementation works (they both operate off the same generic implementation at this point).

Fourth, it can enable certain interesting things. For example, we could hypothetically build an Opcode optimizer in PHP which parses the generated opcodes and optimizes things (removing redundant opcodes, statically compiling static expressions, etc). Then, we could build a PECL extension that would render those optimized opcodes directly into APC cache (or some other opcode cache mechanism).

Fifth, it can be used to quickly mock up future functionality changes. Consider that it's easier to alter a PHP VM simply because you don't need to worry about memory management at all. So whipping up a POC for a significant feature should be a lot easier in PHP than C (at least for many non-full-time C developers).

Sixth, it can be used to actually debug PHP code without working knowledge of GDB (and the underlying C structures). I wouldn't recommend this, as the chances of us getting it working 100% the same as the C implementation are practically 0, but it's a concept.

Seventh, it could wind up becoming a full implementation (like PYPY). If we can compile the implementation using HipHop, and do some other lower-level tricks, there's a chance we could achieve performance somewhere near the C implementation. I doubt it, but it's possible. Especially if we add a JIT component (or a way of rendering out to machine code certain opcodes)...

Eighth, why not?


Serious achievement. Well done!


Nice. Your reasons all do make sense. Doing is learning. I will definitely check it out if only to learn more about Zend internals.

Maybe update the github readme with this? It would help a lot, IMO.


I've updated the readme with the above...

Thanks!!!


Brilliant , PHP source code will need a loads of cleaning one day or another anyway , could help making changes to the language ...


Anyone who could run a non-conforming PHP could just run something better.

A conforming, but massively improved (read: JIT) PHP interpreter is really the only way forward for us who are so unfortunate. The good news is that there are starting to be viable options in that space.


And what are the exact problems with the current interpreter?


With this brillant project, is it the end of nodejs hype ?


This has nothing to do with nodejs or any hype around it. Read the parent's comments on why he built this.


I'm surprised this wasn't done earlier. It's actually a quite common practice for a language to eventually write a compiler for it in said language; this is called bootstrapping a compiler [1], and makes the language "self-hosted".

C is a notorious language in this case, and is the closest to the real chick-or-the-egg problem this is: how do you write a C compiler without a C compiler? C is often the lowest-level language compilers are written in, so in order to bootstrap a C compiler for a specific architecture / OS, you need to write the entire compiler in the target language (assembly).

You can still see the bootstrapping part well in GCC: when you compile a new version of GCC, you first compile the compiler using your old version (bootstrapping), and then re-compile the entire codebase again using your new version.

Nowadays, "real" compiler bootstrapping using assembly is hardly ever done. You can simply create a new architecture target for GCC in a different environment (for example, Linux), and compile GCC with your new environment as a target architecture on Linux. You then install this brand new compiler on your brand new OS, et voila: you skipped the nasty bit and have a working compiler for your new OS.

[1] http://en.wikipedia.org/wiki/Bootstrapping_(compilers)


If I wanted to create a C compiler in C as quickly as possible and had no other tools except for an assembler, I'd probably write an interpreter for C in assembly first rather than a compiler. Then I'd write a compiler for C in C and then compile the compiler by running the compiler under the interpreter.


The implication here being that writing an interpreter is easier? I've found it easier to write a compiler.


> how do you write a C compiler without a C compiler? C is often the lowest-level language compilers are written in, so in order to bootstrap a C compiler for a specific architecture / OS, you need to write the entire compiler in the target language (assembly).

Nope, you probably don't do that. You use a computer (host) where you already have a C compiler, and you compile the new compiler (which you wrote/ported). Then you run this new compiler, on the host, to generate executable code for the new architecture (target), i.e. cross-compile. You can use this cross-compiler to (again) compile your ported compiler, but to run on the target architecture. You end up with a native compiler for your new architecture.

Though, there is no essential need to have a native compiler for an architecture. For example, you can program an AVR microcontroller in C without running any C compilers on the AVR itself.


> Nope, you probably don't do that. You use a computer (host) where you already have a C compiler, and you compile the new compiler (which you wrote/ported). Then you run this new compiler, on the host, to generate executable code for the new architecture (target), i.e. cross-compile. You can use this cross-compiler to (again) compile your ported compiler, but to run on the target architecture. You end up with a native compiler for your new architecture.

Ah, yes, I tried to explain exactly that in the last paragraph I wrote. Is what you're describing different from what I described there?


Sorry, I missed that, was distracted by your not-really-correct statement above :)


My mistake. I was trying to illustrate the more theoretical problem of bootstrapping a compiler without a compiler, since I personally find that fascinating. These problems were really solved years ago, and their only place nowadays is in an educational context.


This isn't exactly bootstrapping a compiler. The concept there is that in the end you do not need the foreign compiler. In this case, you still need the native implementation to run the php implementation, giving it much less practicle purpose than bootsrapping compiled languages. From a theoratical perspective, this is equivelent to bootstrapping, as it is not an intrinsic property of PHP that it does not get compiled.


Which you can then use to recompile your OS easily (FreeBSD, some Linux based OS's) and tune it to a specific machine / configuration.

That's why people like Kernighan and Ritchie (RIP) are so well respected. They wrote C to the target language first (asm) then rewrote C in C once they had a good enough compiler. Considering that C is the cornerstone of UNIX and modern variants I'd say this is no small feat.

C will be around for a long time after many of the current languages, domain specific or not, become just another wikipedia entry.


Actually (according to Ritchie's description of the history of C at http://cm.bell-labs.com/cm/cs/who/dmr/chist.html) they never wrote a C compiler in asm. Instead it was a gradual process of evolution, so the first C compiler would have been written in B; and the B compiler was itself written in B and self-hosting, bootstrapped from a B compiler written in a thing I've never heard of called TMG.

None of which is to downplay their achievements at all (I still write C full-time); but it's interesting that even in a period of computing history where operating systems were generally written in assembly language, compilers were bootstrapped from other high level languages.


Calling B a high level language is something of a stretch.


Nowadays, the same can be said of C.


Nowadays? C was never a high level language except and wasn't designed to be. Mid-level yes. As in just above asm.


There was a time being above asm, having non-register data types, functions and libraries was considered being a high-level language. I'd consider C higher level than FORTRAN 77, for instance.

You just need to be old enough ;-)


Yes, I forgot about B. :) "Ritchie wrote a compiler using TMG which produced machine code." vs Ken Thompson's B compiler that produced threaded code.


> C will be around for a long time after many of the current languages, domain specific or not, become just another wikipedia entry.

Off-topic, but this is a rather amusing benchmark for irrelevance. I suspect that deletionism would not permit it to be true, either.


Can you explain what you mean? I hardly think C is a benchmark for irrelevance.


The benchmark the GP offered was "just another wikipedia entry".


So I've studied the Zend implementation in depth as part of my phd, and I'm the author of a lot of the implementation, and all of the optimizations and static analysis in phc. So I know quite a bit about this, and looked through the implementation.

First thing is that it's clearly a very clean and high quality implementation. It's clear the author knows a substantial amount of both the implementation of the Zend engine and its semantics. This is clear from the replication of zvals, and things like separation.

However I think that it's missing most of the copy on write, reference and separation semantics. These are pretty complex so that's not unexpected. And example here is the function implementation that copies the parameter list - that's almost certainly not right. I don't see reference counts or separation in assignment either.

That said, I didn't look fully so I could be wrong.

I would recommend the author use the test suites from phc, Zend, and hiphop - that would help highlight a lot of the weird corner cases (example copying an array that had referenced in it!)

Anyway, very cool (especially impressive for 6 hours!) and I hope you go on with this :)

(Comment via phone, excuse typos)


Well, first, let me say thanks for the compliments!

As far as the missing parts, yes, we know those are missing. The Zval implementation was a place-holder to let it work. Now that it's working for basic code, the goal is to refactor in an appropriate design for it. Once that happens, we can implement full copy-on-write. It's very much a work-in-progress...

As far as test cases, we've been using the Zend language test cases as part of a guide. Seeing as only about 6% of them pass right now, they are not in the repo. But eventually the goal is to port them in.

Once that happens, the next goal is to get it to host run-tests.php (the PHP core test runner).

From there, there's one goal left: self-hosting.

Thanks again!!!


PHP has the wind in its sails or what? last month i got pointed to PH7 ( http://ph7.symisc.net/quick_intro.html ) which is an embeddable PHP interpreter (bytecode compiler + VM) for C/C++/Objective C host applications (amazing piece of work) and now I'm investigating this stuff.


Absolutely. The dams of NIH have begun to break down. (I've started to see teams use things like Compass instead of conclude "But I have to install Ruby?")

People are collaborating on actually finally decent solutions (Composer, PHPSpec2, Behat). I see people relying less and less upon PHP's bloated, arcane and mostly dangerous standard library, instead building reusable chunks of PHP with surprisingly acceptable (comparably beautiful) design (Goutte, Twig, Doctrine).

Drupal, which I have managed to end end up working with full time for the last few years, has finally stopped (poorly) reinventing everything it touches: http://www.garfieldtech.com/blog/off-the-island-2013 and I expect that trend to continue.

We are starting to have a decent forward-thinking and flexible frameworks and projects like PHPBB have started actually using them.

I've also seen people actually using IDEs, writing docs, specifying argument types and being aware of the concept of stepping debuggers.

Facebook has sponsored enough alternate implementations of the interpreter that I've begun to lose track, charting the massive room for improvements.

Things are genuinely starting to look better; I still lose sleep over my tools and hoop-jump laden development workflow, but I actually have hope that I might not have to by this time next year.


Forgive my ignorance, but could this have any application as a kind of sandbox? For example adding a plugin capability to an app that allows user-submitted code to be run safely?


PHPPHP is well on its way to being a simple PHP command line debugger. To my knowledge there currently isn't a good tool for command line debugging on PHP.


phpsh[1] by Facebook is good. Ironically, it's written in Python.

[1] http://www.phpsh.org/


phpsh is probably the best, but I also use Boris https://github.com/d11wtq/boris - "A tiny REPL for PHP"


It seems like a case of sensibility, rather than irony, to me. Using Python, even when developing PHP-related software, just seems like the intelligent thing to do.


xdebug exposes a gdbp interface which supports breakpoints, stepping, inspecting variables, etc. And you can use it with gdb or any GUI that supports gdbp, such as macgdbp.

EDIT: It does not expose a REPL though, which I suppose is what you meant?


Uh, where's all the code? I'm guessing I'm missing something, but a lot of the classes in the lib/ folder invoke classes that I just can't find in the project. Are these external dependencies that weren't called out in the README, or is not all of the code in master actually functional yet?


A lot of them are installed via composer (see the PHP-Parser project for most of them): https://github.com/nikic/PHP-Parser



Which classes are you specifically missing?


I thought this might be a bit similar to PyPy... turns out it's only similar in idea. But hey it looks fun, and as the author said, why not?


Here's a project which is strikingly similar to (has common tooling with) PyPy:

https://bitbucket.org/fijal/hippyvm (repo) http://morepypy.blogspot.com/2012/07/hello-everyone.html (article)


I made a PHP VM in PHP once, but it was just the eval function (http://php.net/manual/en/function.eval.php).


PHP^2


The FAQ has it right: For the love of god, why?!

(Sure, it's only a fun learning project, i get that)


SCIENCE!




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: