
Parsing PHP in Go - codezero
https://stephensearles.com/?p=288
======
pbiggar
I worked on a PHP compiler: [http://phpcompiler.org](http://phpcompiler.org).
Although it only supports PHP 5.2, it has a really nice parser, which had a
lot of work put into it (a ton of edge cases in particular).

It's not in Go, but it does a lot of the interesting things you asked about:
static analysis, dead code elimination, transpiling. It also compiles to
(pretty poor) C code.

For static analysis, there's a lot to do. Here's my PhD on the topic:
[http://paulbiggar.com/research/#phd-
dissertation](http://paulbiggar.com/research/#phd-dissertation)

I worked on this for about 4 years, and if my experience is indicative of
working on PHP compilers is general, you have a lot of fun, and a massive
amount of frustration in front of you.

~~~
girvo
Since scrutinizer-ci took their interesting "PHP-analyser" private I've been
looking for a better static analysis tool for PHP that I can contribute to.
HPHPc is alright in HHVM but learning OCaml is slowing me down, so I'm
definitely going to take a look at your compiler! Well done :)

------
8ig8
And here's a PHP parser written in PHP...

[https://github.com/nikic/PHP-Parser](https://github.com/nikic/PHP-Parser)

~~~
TazeTSchnitzel
Unlike the OP, Nikita's PHP parser is actually used for a lot of things. He
wrote a script to detect code broken by syntax changes in the next version of
PHP, for example.

Anthony Ferrara used it to implement PHP in PHP:
[https://github.com/ircmaxell/PHPPHP](https://github.com/ircmaxell/PHPPHP)

~~~
girvo
nikic and Anthony are some of my favourite PHP developers. The stuff that
people have been doing in PHP, what with HHVM and Hack and the PSR standards
and Composer/Packagist, etc etc. is just amazing!

------
xmattus
Now we just need a Go parser written in PHP, for obvious reasons.

~~~
aikah
what about a Go compiler written in Go? is it bootstraped already?

~~~
kyrra
It's being worked on for Go 1.4 (scheduled to release in December). It's the
primary feature they want to get done for that release. There was a talk about
in back in May[0]. rcs is working on c2go to convert the existing compiler to
Go[1][2].

[0] [http://www.confreaks.com/videos/3432-gophercon2014-go-
from-c...](http://www.confreaks.com/videos/3432-gophercon2014-go-from-c-to-go)

[1]
[https://code.google.com/p/rsc/source/list](https://code.google.com/p/rsc/source/list)

[2]
[https://code.google.com/p/rsc/source/browse/#hg%2Fc2go](https://code.google.com/p/rsc/source/browse/#hg%2Fc2go)

------
jamescun
For those interested:

Ruby in Go:
[https://github.com/grubby/grubby](https://github.com/grubby/grubby)

Javascript in Go:
[https://github.com/robertkrimen/otto](https://github.com/robertkrimen/otto)

~~~
tjarratt
Woah, I'm pretty stoked to see someone link to my ruby implementation (neé
Grubby).

It seems like the authors of Golang believe that a lot of problems with
languages (refactoring, updating code to work with new libraries / versions,
etc) can be solved as parsing problems. Hence, Golang has a lot of good tools
for parsing text.

They even ported yacc to Go (via Plan9).
[http://golang.org/cmd/yacc/](http://golang.org/cmd/yacc/)

~~~
pjmlp
Such as? Do you mean the Go standard libraries for Go code?

Yacc clones exit for almost every language out there, and there are better
ways to do parsing than yacc with its stone age error reporting.

~~~
tjarratt
I'd be really delighted if you could show me a better tool for writing a
parser, given a grammar in Golang than goyacc. You're absolutely right that
the error reporting in yacc isn't that modern, but it's very functional, very
powerful and (best of all), a lot of people have experience with it.

I certainly couldn't find any better tools in Golang when I started, but I
wouldn't be surprised if someone had started one since.

~~~
pjmlp
Tools like ANTLR, for example.

Parser generators based in attribute grammars is another example.

The language is called Go.

------
kackontent
How performant is this or other similar projects (pfff, PHP-Parser)? Are any
of them a viable option to use for a base of a improved support for PHP in
text editors (say vim, st, atom)?

------
padator
Parsing PHP in ocaml:
[https://github.com/facebook/pfff](https://github.com/facebook/pfff)

try ./pfff -dump_php /path/to/file.php

~~~
VeejayRampay
Parsing, lexing and the general task of writing compilers is such a breeze in
OCaml. I remember when I was in college, one of our year projects was to write
a small compiler for a subset of postscript using ocamllex and ocamlyacc,
couldn't believe how nice and natural it felt. What a great language.

------
fabriceleal
Personally I would be a lot more interested in a good platform for writing
static analysis tools. I believe the community in general would take a more
immediate benefit (and what a benefit!...) from this than from a lone PHP to
Go transpiler.

------
ck2
PHP has to be one of the most transpiled languages.

C++ (well Hack->C++), .NET, Java, Python and now Go (probably others).

All in various states of incompleteness, though HHVM isn't going away anytime
soon.

~~~
dubcanada
Javascript has it's fair share of transpiledness.

~~~
benaiah
Usually JS is the compilation target, not the language being compiled, though.

------
jamesmoss
Are there any numbers on performance vs php itself?

~~~
RossM
It's not an interpreter - it parses PHP code into an abstract-syntax-tree
(list of entities like open-if, variable, assignment, etc).

~~~
themartorana
But with a transpiler, it could become an IL to PHP compilation via Go...?

Not that it would be faster or better than say, HHVM or any other of a number
of compilers for PHP [1] but my knowledge of that space is quite limited.

[1] [http://stackoverflow.com/questions/1408417/can-you-
compile-p...](http://stackoverflow.com/questions/1408417/can-you-compile-php-
code)

~~~
jerf
This could be a step in that process, but in the grand scheme of things
required to compile one language to another, the mere front-end parser is not
generally all that significant a portion of the effort. The vast majority of
the effort would be the bug-for-bug compatible implementation of PHP semantics
and base libraries and functionality.

("Bug-for-bug" here does not mean that PHP has a lot of bugs _per se_. What it
is is the highest level of compatibility. An emulator of a game console
strives to be "bug-for-bug" compatible, for instance. Programming and
programming languages being what they are, anything less often turns out to be
surprisingly non-linearly less useful, i.e., "80% compatible" isn't anywhere
near "80% useful".)

------
andrewchambers
Go is pretty nice for parser/compiler applications because it is fast and the
runtime doesn't take too long to startup.

~~~
pjmlp
ML languages are way better, specially given sum types and pattern matching.

~~~
andrewchambers
I agree for tree manipulation - I don't necessarily agree for writing
recursive descent parsers.

But I admit, I read the Ocaml and haskell compilers source code, and it was
pretty nice.

------
ahmett
Please don't make a PHP transpiler. :(

~~~
debaserab2
Why not?

This could be a crucial tool for companies backporting legacy PHP code into a
new language.

~~~
UnoriginalGuy
I agree with you (that it could be an important tool).

However I will say that the VB6->VB.Net transpiler which Microsoft produced
(and clearly spent significant amounts of effort on) was pretty terrible. And
that is one of the most "complete" transpilers I know of...

The problem is that for a transpiler to produce "good" output code it needs to
have a deep understanding of both context but also intent. This is
particularly important when converting from one language to another with
slightly different underlying concepts (like VB5-6 Vs. VB.Net). Without that
understanding it just produces spaghetti code, that will technically compile
(*although often it didn't in the VB6->VB.net example) but is unmaintable.

I liken it to Microsoft Word's HTML engine. Word can produce websites, and
those websites technically looked correct in most browsers, but they became an
unmaintainable mess in the medium to long term. A lot of transpilers have the
same issue.

The best thing I can say about transpilers is that they're very good for a
starting point (assume 100% refactoring anyway) and converting simplistic data
storage vehicles (e.g. classes with tons of constants).

~~~
TazeTSchnitzel
There's also the problem of dealing with the standard library. Not everything
has a directly equivalent function. So you'll make your app dependent on an
obscure library based on another language's standard library.

~~~
jonathanyc
More advanced transformers even handle direct transformation of library calls
to "native" library calls in the target language. I think it's mostly things
that try to take advantage of syntax that is already pretty similar -
Processing.js transforming from Java to JavaScript, for example, that decide
it would be easier to do a relatively simple syntax transformation and then
implement some sort of wrapper for function calls (as you describe) than to do
a potentially more in depth and complicated transformation.

