

PHPPHP: a PHP VM in PHP - dbaupp
https://github.com/ircmaxell/PHPPHP

======
ircmaxell
I'm the author of this library. I figured I'd answer a couple of questions as
to _why_ I wrote this.

First, it was something that I always wanted to do. For no particular reason
other than I wanted to do it. I knew it was possible, but possible and doing
it are two very different things.

Second, it was far easier than I thought. The time to the initial commit
(basic working VM) was only about 6 hours of work. So it's not like I spent a
year building it...

Third, it could be a useful education tool. For me learning the intricacies of
the Zend VM better (I know it fairly well, but knowing and building give two
different amounts of knowledge). But also for teaching others how the VM
works. By giving a PHP implementation reference, hopefully more people can
understand how the C implementation works (they both operate off the same
generic implementation at this point).

Fourth, it can enable certain interesting things. For example, we could
hypothetically build an Opcode optimizer in PHP which parses the generated
opcodes and optimizes things (removing redundant opcodes, statically compiling
static expressions, etc). Then, we could build a PECL extension that would
render those optimized opcodes directly into APC cache (or some other opcode
cache mechanism).

Fifth, it can be used to quickly mock up future functionality changes.
Consider that it's easier to alter a PHP VM simply because you don't need to
worry about memory management at all. So whipping up a POC for a significant
feature should be a lot easier in PHP than C (at least for many non-full-time
C developers).

Sixth, it can be used to actually debug PHP code without working knowledge of
GDB (and the underlying C structures). I wouldn't recommend this, as the
chances of us getting it working 100% the same as the C implementation are
practically 0, but it's a concept.

Seventh, it could wind up becoming a full implementation (like PYPY). If we
can compile the implementation using HipHop, and do some other lower-level
tricks, there's a chance we could achieve performance somewhere near the C
implementation. I doubt it, but it's possible. Especially if we add a JIT
component (or a way of rendering out to machine code certain opcodes)...

Eighth, why not?

~~~
lucian303
Nice. Your reasons all do make sense. Doing is learning. I will definitely
check it out if only to learn more about Zend internals.

Maybe update the github readme with this? It would help a lot, IMO.

~~~
ircmaxell
I've updated the readme with the above...

Thanks!!!

------
stingraycharles
I'm surprised this wasn't done earlier. It's actually a quite common practice
for a language to eventually write a compiler for it in said language; this is
called bootstrapping a compiler [1], and makes the language "self-hosted".

C is a notorious language in this case, and is the closest to the real chick-
or-the-egg problem this is: how do you write a C compiler without a C
compiler? C is often the lowest-level language compilers are written in, so in
order to bootstrap a C compiler for a specific architecture / OS, you need to
write the entire compiler in the target language (assembly).

You can still see the bootstrapping part well in GCC: when you compile a new
version of GCC, you first compile the compiler using your old version
(bootstrapping), and then re-compile the entire codebase again using your new
version.

Nowadays, "real" compiler bootstrapping using assembly is hardly ever done.
You can simply create a new architecture target for GCC in a different
environment (for example, Linux), and compile GCC with your new environment as
a target architecture on Linux. You then install this brand new compiler on
your brand new OS, et voila: you skipped the nasty bit and have a working
compiler for your new OS.

[1] <http://en.wikipedia.org/wiki/Bootstrapping_(compilers)>

~~~
lucian303
Which you can then use to recompile your OS easily (FreeBSD, some Linux based
OS's) and tune it to a specific machine / configuration.

That's why people like Kernighan and Ritchie (RIP) are so well respected. They
wrote C to the target language first (asm) then rewrote C in C once they had a
good enough compiler. Considering that C is the cornerstone of UNIX and modern
variants I'd say this is no small feat.

C will be around for a long time after many of the current languages, domain
specific or not, become just another wikipedia entry.

~~~
pm215
Actually (according to Ritchie's description of the history of C at
<http://cm.bell-labs.com/cm/cs/who/dmr/chist.html>) they never wrote a C
compiler in asm. Instead it was a gradual process of evolution, so the first C
compiler would have been written in B; and the B compiler was itself written
in B and self-hosting, bootstrapped from a B compiler written in a thing I've
never heard of called TMG.

None of which is to downplay their achievements at all (I still write C full-
time); but it's interesting that even in a period of computing history where
operating systems were generally written in assembly language, compilers were
bootstrapped from other high level languages.

~~~
hnriot
Calling B a high level language is something of a stretch.

~~~
rbanffy
Nowadays, the same can be said of C.

~~~
lucian303
Nowadays? C was never a high level language except and wasn't designed to be.
Mid-level yes. As in just above asm.

~~~
rbanffy
There was a time being above asm, having non-register data types, functions
and libraries was considered being a high-level language. I'd consider C
higher level than FORTRAN 77, for instance.

You just need to be old enough ;-)

------
pbiggar
So I've studied the Zend implementation in depth as part of my phd, and I'm
the author of a lot of the implementation, and all of the optimizations and
static analysis in phc. So I know quite a bit about this, and looked through
the implementation.

First thing is that it's clearly a very clean and high quality implementation.
It's clear the author knows a substantial amount of both the implementation of
the Zend engine and its semantics. This is clear from the replication of
zvals, and things like separation.

However I think that it's missing most of the copy on write, reference and
separation semantics. These are pretty complex so that's not unexpected. And
example here is the function implementation that copies the parameter list -
that's almost certainly not right. I don't see reference counts or separation
in assignment either.

That said, I didn't look fully so I could be wrong.

I would recommend the author use the test suites from phc, Zend, and hiphop -
that would help highlight a lot of the weird corner cases (example copying an
array that had referenced in it!)

Anyway, very cool (especially impressive for 6 hours!) and I hope you go on
with this :)

(Comment via phone, excuse typos)

~~~
ircmaxell
Well, first, let me say thanks for the compliments!

As far as the missing parts, yes, we know those are missing. The Zval
implementation was a place-holder to let it work. Now that it's working for
basic code, the goal is to refactor in an appropriate design for it. Once that
happens, we can implement full copy-on-write. It's very much a work-in-
progress...

As far as test cases, we've been using the Zend language test cases as part of
a guide. Seeing as only about 6% of them pass right now, they are not in the
repo. But eventually the goal is to port them in.

Once that happens, the next goal is to get it to host run-tests.php (the PHP
core test runner).

From there, there's one goal left: self-hosting.

Thanks again!!!

------
xtremejames183
PHP has the wind in its sails or what? last month i got pointed to PH7 (
<http://ph7.symisc.net/quick_intro.html> ) which is an embeddable PHP
interpreter (bytecode compiler + VM) for C/C++/Objective C host applications
(amazing piece of work) and now I'm investigating this stuff.

~~~
lotyrin
Absolutely. The dams of NIH have begun to break down. (I've started to see
teams use things like Compass instead of conclude "But I have to install
Ruby?")

People are collaborating on actually finally decent solutions (Composer,
PHPSpec2, Behat). I see people relying less and less upon PHP's bloated,
arcane and mostly dangerous standard library, instead building reusable chunks
of PHP with surprisingly acceptable (comparably beautiful) design (Goutte,
Twig, Doctrine).

Drupal, which I have managed to end end up working with full time for the last
few years, has finally stopped (poorly) reinventing everything it touches:
<http://www.garfieldtech.com/blog/off-the-island-2013> and I expect that trend
to continue.

We are starting to have a decent forward-thinking and flexible frameworks and
projects like PHPBB have started actually using them.

I've also seen people actually using IDEs, writing docs, specifying argument
types and being aware of the concept of stepping debuggers.

Facebook has sponsored enough alternate implementations of the interpreter
that I've begun to lose track, charting the massive room for improvements.

Things are genuinely starting to look better; I still lose sleep over my tools
and hoop-jump laden development workflow, but I actually have hope that I
might not have to by this time next year.

------
jakejake
Forgive my ignorance, but could this have any application as a kind of
sandbox? For example adding a plugin capability to an app that allows user-
submitted code to be run safely?

------
bananashake
PHPPHP is well on its way to being a simple PHP command line debugger. To my
knowledge there currently isn't a good tool for command line debugging on PHP.

~~~
JCB_K
phpsh[1] by Facebook is good. Ironically, it's written in Python.

[1] <http://www.phpsh.org/>

~~~
dhotson
phpsh is probably the best, but I also use Boris
<https://github.com/d11wtq/boris> \- "A tiny REPL for PHP"

------
languagehacker
Uh, where's all the code? I'm guessing I'm missing something, but a lot of the
classes in the lib/ folder invoke classes that I just can't find in the
project. Are these external dependencies that weren't called out in the
README, or is not all of the code in master actually functional yet?

~~~
nikic
Which classes are you specifically missing?

------
chewxy
I thought this might be a bit similar to PyPy... turns out it's only similar
in idea. But hey it looks fun, and as the author said, why not?

~~~
lotyrin
Here's a project which is _strikingly_ similar to (has common tooling with)
PyPy:

<https://bitbucket.org/fijal/hippyvm> (repo)
<http://morepypy.blogspot.com/2012/07/hello-everyone.html> (article)

------
mepcotterell
I made a PHP VM in PHP once, but it was just the eval function
(<http://php.net/manual/en/function.eval.php>).

------
utilitron
PHP^2

------
buster
The FAQ has it right: For the love of god, why?!

(Sure, it's only a fun learning project, i get that)

~~~
krapp
SCIENCE!

