

 Prototype PHP interpreter using the PyPy toolchain - Hippy VM  - kingkilr
http://morepypy.blogspot.com/2012/07/hello-everyone.html

======
sb
There is also another project named HappyJIT doing much of what is described
there. The corresponding paper was presented at last year's Dynamic Languages
Symposium [<http://dl.acm.org/citation.cfm?id=2047854>]

It would be nice if there were comparable benchmark results available, or a
discussion of what is different between both approaches/implementations.

EDIT: bb link didn't work, replaced it with ACM portal link.

~~~
fijal
indeed, the happyjit is not included here in the comparison. I've read the
paper (which I of course cannot access via ACM) and their benchmark results
were not that good.

They were using an old version of PyPy and did not use some of the advanced
features of the JIT generator.

~~~
ahomescu1
As one of the authors of Happy, your post got me very interested. We wrote the
paper over a year ago, we used a version of PyPy from back then. It's possible
that we get a bigger speedup with a newer PyPy.

It's interesting that you mention the advanced features. I looked at Hippy and
the most interesting JIT feature Hippy uses is _virtualizable2_, which
virtualizes all function locals and unboxes them. We tried using it ourselves,
but it forces each function to have a static list of variables and no dynamic
variable accesses (like $$x). It looks like Hippy falls back to the regular
implementation for dynamic variable accesses, where the entire list is stored
in a dictionary. Now I'm wondering how much this happens in real-world code,
we assumed it does happen enough times.

Also, I'm working on posting a publicly-available version of the paper. I'll
post a link when I do that.

~~~
fijal
For dynamic variable access you do what PyPy (the python interpreter) does for
globals. You indeed fall back to a dict, _but_ you keep a dict of cells, that
is indexes in the list, not the dict of variables. That way you have extra
indirection, but all accesses that are static are efficient, even if you have
a dynamic access somewhere (which indeed need to do a lookup, but too bad). I
fear this is not the best place to discuss further though, you have my mail.

~~~
ahomescu1
Right, we can continue this by e-mail.

~~~
Posibyte
Please find that some of us find the technical aspects of this conversation
very fascinating. It's nice to jump into the mind of another developer and see
how they approach problems.

------
nikic
I love this. It's always nice to see PHP being implemented on new platforms
(we already have it to some degree running on C++, on JVM, on .NET, now on
PyPy). Sadly most of those projects don't really make it to a production-ready
state :/

In any case, keep up the work!

~~~
fijal
It's very different to use the PyPy toolchain that is to run on JVM or .NET.
It does not reuse any bit of the Python VM, but uses RPython as the
implementation language. You don't share bytecode, you don't share
objectmodel. You share the garbage collector and you _generate_ the JIT for
your language (unlike reusing the existing one).

~~~
SoftwareMaven
This is one of the most fascinating things about PyPy, and one least
understood (most likely because the primary output of the PyPy interpreter
generator happens to be a Python interpreter written in a static language much
like Python called RPython).

I've been toying with the idea of building a little language, just for fun.
The mess of building an interpreter has always kept me from it, but as I've
spent some time looking at PyPy, I think I might just have a go.

------
apendleton
Given that in the conventional PHP interpreter, everything is executed from
scratch on each request, I wonder how the JIT behaves here, especially since
most PHP programs live and die in less time than it seems to take the Pypy JIT
to warm up. If/when a version with the web side of the stack is implemented,
will JITed code survive across requests?

~~~
fijal
in short - yes. there is nothing stopping the JIT code from surviving across
requests, as long as you keep the process alive.

~~~
ithkuil
php over fastcgi is a good example of how to reuse the same php process. I use
<http://php-fpm.org/> and nginx and it's great.

------
look_lookatme
Is there a good resource for learning how to implement languages using PyPy?

~~~
fijal
Linked in the blog post, but because reading is hard, I'll repost it. WARNING:
those articles are long

[http://morepypy.blogspot.com/2011/04/tutorial-writing-
interp...](http://morepypy.blogspot.com/2011/04/tutorial-writing-interpreter-
with-pypy.html)

[http://morepypy.blogspot.com/2011/04/tutorial-
part-2-adding-...](http://morepypy.blogspot.com/2011/04/tutorial-
part-2-adding-jit.html)

[http://tratt.net/laurie/tech_articles/articles/fast_enough_v...](http://tratt.net/laurie/tech_articles/articles/fast_enough_vms_in_fast_enough_time)

~~~
look_lookatme
Thank you.

------
chime
This is very cool. I never thought the improvements in something like PyPy
could end up benefiting PHP in such a direct way.

~~~
shuzchen
From: <http://doc.pypy.org/en/latest/faq.html#what-is-pypy> pypy is "a
framework for implementing interpreters and virtual machines for programming
languages, especially dynamic languages."

One of the big goals of the project is to create a general tool for people to
implement interpreters and VMs. The intention isn't to just provide a JIT vm
for python, but to allow all sorts of new or existing languages to be
(re)implemented easily. Don't be surprised if you start hearing about other
languages being implemented in RPython.

~~~
scribu
Seeing a speedy Ruby implementation in PyPy would be a tipping point, I think.
:)

~~~
lloeki
I started one as a toy project some months ago (in April, tells me the
filesystem), just for kicks.

I used ply to write the parser (aiming at 1.9.3, which has a lex/yacc parser),
and that was the first time I leveraged a LALR parser, so it's very
experimental, and as such code quality is lacking to say the least. Also,
since ply uses (at least) kwargs, it won't pass through pypy build step.
Still, early performance between cpython and pypy gives a clear edge to pypy.

To give you an idea of how much experimental it is, it's not even in my
~/Workspace (which is usually the step before github), but still in
~/Sandbox/pypy/pyby. I actually had a hard time finding it again.

------
krob
The concept is cool, but I think that once all the tokenization is done on the
full construct of the language, it probably won't be substantially that much
faster than hiphop C++ converter. This is only a 1.0 with benchmarks based on
limited functionality. As the code-base grows, so will the pains to optimize
the speed.

~~~
shuzchen
I disagree. Have you seen the benchmarks where they showed Python code in pypy
running faster than C equivalents? Sure they were contrived, but they support
the idea that theoretically any JIT runtime should be faster than ahead-of-
time compilation. It makes sense to me, since the JIT runtime has more
information with which to optimize the code - pypy knows what's going on with
the code while it's executing, whereas the hiphop compiler only has
information about the code itself.

In addition, an implementation in pypy should be able to support eval, which
is impossible (or perhaps extremely difficult) with hiphop.

~~~
gsnedders
They were far from all contrived: many were taken from bottlenecks in real
Python applications. Off-hand, at least the django, genshi, and html5lib
benchmarks are very much real code, and none of the three benchmarks are
optimized for PyPy in any way.

~~~
shuzchen
Yes, running Django in pypy gives you enormous speed boosts (I've seen it
myself), but that's not what I was talking about.

I was referring to the benchmarks in which they measured Python code in pypy
running against pure C code (see some examples below). While I'm not familiar
with the benchmarks you're referring to, I doubt anybody implemented all (or
even parts) of Django, genshi or html5lib in C.

[http://morepypy.blogspot.com/2011/02/pypy-faster-than-c-
on-c...](http://morepypy.blogspot.com/2011/02/pypy-faster-than-c-on-carefully-
crafted.html) [http://morepypy.blogspot.com/2011/08/pypy-is-faster-than-
c-a...](http://morepypy.blogspot.com/2011/08/pypy-is-faster-than-c-again-
string.html)

~~~
gsnedders
See <http://speed.pypy.org/> for the benchmarks I was referring to: that's the
general collection of benchmarks used for PyPy (several, such as the html5lib
one, imported from unladen-swallow), though obviously in specific cases
comparisons are made otherwise.

------
Nitramp
I wonder about the motives behind Facebook's (apparent) decision to continue
investing in PHP.

If I understand it correctly, most of their backend services are implemented
in some other language, and their PHP code is mostly used as a more flexible
templating language. If that's the case, it shouldn't be all too hard to
migrate away from PHP, if the chose to do so.

It seems like they spend a lot of engineering effort in optimizing PHP, and
probably also a lot of CPU cycles in executing it. I assume there is a tipping
point somewhere, when the investment in PHP stops making sense, even given
effort to port legacy code.

There are a lot of arguably nicer alternatives to PHP, and I bet they are not
benefiting from the one really superior feature of PHP (easy deployment on
shared hosting).

~~~
Trufa
I am honestly asking, is there a simple reason why, lets say python, can't be
easily deployed on shared hosting. Let's take Heroku, probably the easiest
way. It will take some persistance:
<https://devcenter.heroku.com/articles/quickstart> \+
<https://devcenter.heroku.com/articles/python> and that's just the hello
world. So my question is why is like that, can it be solved? What are the
technical limitations for it? I know complicated things are complicated and if
there would be some simple other way around it, it would already be like that,
but then again what is the reason?

~~~
gtani
Do you mean like how erlang, haskell and ocaml can be run on bare Xen slices?

[http://corp.galois.com/blog/2010/11/30/galois-releases-
the-h...](http://corp.galois.com/blog/2010/11/30/galois-releases-the-haskell-
lightweight-virtual-machine-halv.html)

<http://halvm.org/wiki/>

<http://anil.recoil.org/papers/2010-hotcloud-lamp.pdf>

<http://erlangonxen.org/>

~~~
gsnedders
No: with PHP you can upload a bunch of files over FTP and you're done. With
Python you pretty much always need to ssh into the server after pushing the
files to the server. For someone who knows nothing, this makes PHP much
easier.

------
rosser
Does anyone have any idea what "problems" one "inherits", per the article, in
building dynamic languages on bytecode VMs like Parrot and the JVM? It seems
to be talking about the fact that the JVM was designed for Java _the language_
, and how that's induced some kind of drag on porting other kinds of languages
to it, but I really don't see how that's relevant to Parrot, which was
designed to be language agnostic (modulo its bias towards supporting the needs
of dynamic languages) from the get-go.

~~~
Nitramp
To be fair, while there is invoke dynamic, he might be alluding to other
aspects of the JVM (and referencing Parrot hints to that).

Eg. class loading in the JVM is (afaik) still static, classes once loaded
cannot be modified, which is a problem for languages such as Ruby and JS, the
basic data types of the JVM might not match your language, etc.

Not sure if that makes a big difference, in essence you'll have to implement
those semantics in either case, be it in RPython for PyPy or Java for the JVM.

~~~
bascule
Classes can be changed after loading using HotSwap, it's just their method
signature can't change:

[http://docs.oracle.com/javase/1.4.2/docs/guide/jpda/enhancem...](http://docs.oracle.com/javase/1.4.2/docs/guide/jpda/enhancements.html)

------
dbaupp
It'd be pretty neat if there was an R[1] interpreter written in RPython, give
any R scripts a free speed boost; that said, R has a pretty hairy grammar, so
it might end up being easier to just switch to (for example) Julia.

[1]: <http://www.r-project.org/>

~~~
disgruntledphd2
I dunno, the grammar you use in R is pretty weird, but it can all be expressed
in simple function calls. For example "["(x, 2) is equivalent to the more
typical x[2]. I've actually been thinking about re-implementing R, and PyPy
does sound like the best way (I was also thinking of that Clojure
implementation in C, given that R started out as a scheme interpreter. It
would be a lot of work, but the rewards are massive. Reimplementing R the
language is a lot less work than reimplementing the entire set of libraries.

~~~
dbaupp
The function call transformation is good, but one still has to parse the code
to be able to do this. I remember reading an article a while ago that
described the hoops the author had to go through to get a proper R parser
working, though I can't find it now.

~~~
disgruntledphd2
Please submit it if you ever do, I'd love to read it.

~~~
dbaupp
Found it: <http://news.ycombinator.com/item?id=4254598>

(Direct link: [http://shape-of-code.coding-
guidelines.com/2012/02/29/parsin...](http://shape-of-code.coding-
guidelines.com/2012/02/29/parsing-r-code-freedom-of-expression-is-not-always-
a-good-idea/))

------
jamesmoss
I'd love to see a Kickstarter project for this so that author can be funded
and complete the implementation.

~~~
fijal
From Kickstarter FAQ:

To be eligible to start a Kickstarter project, you need to satisfy the
requirements of Amazon Payments:

    
    
      —You are 18 years of age or older.
      —You are a permanent US resident with a Social Security Number (or EIN).
      —You have a US address, US bank account, and US state-issued ID (driver’s license).
      —You have a major US credit or debit card.
    

So it's a big no-no for a lot of people who happen not to be residents of the
US, like me.

~~~
dbaupp
I have no idea on the details; but is there a member of the PyPy project (or
of the Hippy VM project) who does satisfy those requirements, and who you can
arrange to proxy the funds through?

~~~
lucian1900
That sort of setup might be considered fraud by Kickstarter, or their
financial services.

~~~
dbaupp
The "arrange" was meant to mean: arrange so that it is completely legal & not
considered fraud (etc). I have literally no idea if this is possible though.

------
dkhenry
It would be really nice to see a bigger comparison. Between not only different
version of PHP but also include in there Python in PyPy, Ruby , and some Java
alternatives for good measure. It seems If I am sticking with PHP this might
be nice, but if i need to do some re factoring to get it to work I want to
know how it compares to the speedup I might see from re factoring to a
different platform.

------
antihero
Is this different from LLVM? Or does it use LLVM? Could someone explain?

~~~
fijal
It's very different. LLVM is a portable assembler. PyPy is a compiler
generation toolchain. Using LLVM means you don't have to write your assembler
backend, however, there is much more to the Just in Time compiler than that.
LLVM is good at optimizing, but all the optimizations are low level. In order
to provide a dynamic language VM, you need to provide optimizations like
escape analysis, frame removal, etc. which are out of scope for the LLVM
project. On the other hand PyPy gives you that level of optimizations for free
(although for example it supports less assembler backends and low-level
optimizations might be sub-par).

In short - LLVM is great if you want to write a language like C, PyPy is great
if you want to create a dynamic language VM like Python or PHP.

------
ricardobeat
Would be interesting if someone implemented CoffeeScript using this.

~~~
cglace
I'm confused. Coffeescript compiles to JavaScript which usually runs in a vm
with a jit.

~~~
ricardobeat
That's why it would be interesting to see it as a language on it's own.

------
DanWaterworth
Has anyone tried to write a scheme interpreter?

~~~
fijal
<https://bitbucket.org/pypy/lang-scheme> I think it predates the JIT days
though, so a bit of work would be required to make it fast (not too much)

------
nnq
<troll(?)> PLEASE, for the love of god, stop improving PHP and strictly PHP-
related technologies! I know it's fun and you're doing cool stuff... but
you're just like the scientists doing bio weapons research because it's cool
stuff to them! Let's just bury this language once and for all! </troll(?)>
...but really, considering the amount of effort and man-hours the Facebook
guys invested in improving PHP, with Hiphop and now this, it really saddens me
...so much wasted resources (and I mean HUMAN resources, like hours of
people's lives!) almost wasted!

~~~
tylermenezes
Have you seen any of Facebook's extensions to PHP?

~~~
nnq
I was just thinking of Hiphop and XHP, and they both seem like a huge amount
of work and I've heard about improvements to the interpreter to optimize it
for their servers... of course it might all be FUD for competitors, they might
actually have everything coded in Common Lisp behind the scenes ;)

