
Show HN: A PHP parser written in PHP - nikic
https://github.com/nikic/PHP-Parser
======
aarondf
This is not a snide comment, I'm truly curious. What is the point of "An [x]
parser/interpreter/compiler written in [x]." I've seen one now for JS and this
one for PHP. I lean more toward the sponge learning [1] side of HN, so forgive
me if this is super obvious.

[1] <http://alexrosen.com/blog/2011/05/sponge-learning/>

edit: grammar.

~~~
EmielMols
For instance to allow meta-programming (programming code executed at compile
time) in the same language as is being interpreted/compiled.

But more formally, a self-hosting compiler says stuff about the language per
se, see also <http://en.wikipedia.org/wiki/Bootstrapping_(compilers)>.

~~~
Zak
_a self-hosting compiler says stuff about the language per se_

That it's Turing-complete and can write to a file? I think all it really says
is that somebody was hard-headed enough to do it.

------
bradt
This would be a great way to create a PHP-based templating system that just
uses a subset of PHP as syntax. Would be very fast.

~~~
bithive123
This must be a stupid question because I've noticed that it's commonplace, but
why do PHP-based projects try to build templating systems in PHP? Do they not
realize that the original use case for PHP was to turn plain HTML files into
dynamic templates?

I saw this in Horde, the other day; mixed in with the usual PHP tags they had
added their own XML tags for doing "if/else" type things in the template. What
is the purpose of that?

~~~
geon
I believe it's an attempt to assure there is no logic in the template code.

...which I think is retarded and anal, since the important part is separation
of _business_ logic and presentation. There is bound to be some level of
display logic, which is why most template languages have loops and
conditionals. But if you are going that far, why not just use PHP? It is
arguably a very good template language.

------
amosrobinson
Cool... Now define your Node datatype in Haskell deriving Show & Read, and
pretty-print to that. Then you can (easier) do some interesting analysis and
transforms!

------
jasonlotito
How does this compare to PHP_CodeSniffer, and it's Tokenizer? You're both
using token_get_all under the hood, but PHPCS has support for CSS/JS as well.

~~~
nikic
PHP_CodeSniffer - as you already say - works with the source code at a token
level. This is necessary, because it looks at the precise formatting of the
code (like whitespace usage).

The parser is more for analyzers that are _not_ interested in the precise
formatting of the code, but want to look at the code from a higher level
perspective.

For example, if you want to do control flow analysis and type inference
working directly on the tokens would be really hard. An Abstract Syntax Tree
makes this kind of work much easier, as you don't have to think about the tiny
details of the language.

------
mgkimsal
One of the interesting things about Groovy for me has been the runtime AST
transformation stuff - annotating something as @singleton, then having the
engine make it in to a singleton at compile time, etc.

Certainly this project doesn't get us there immediately, but might give some
neat ideas for future PHP versions to incorporate.

------
nazar
Can it compile itself then?

~~~
cfdrake
It's only a parser - it just transforms plain source code into an abstract
syntax tree representation. However, if you wanted to, you could use this tree
for a variety of things - including translating and generating compiled code.

------
dawsdesign
<?php eval($codeBase); ?>

------
wopsky
Yo dog...

------
drewdrewdrew
PHP inception _brain explode_

------
icheishvili
It seems like you could have saved yourself quite a bit of parsing/lexing work
if you had used the parser that ships with PHP:

<http://us3.php.net/manual/en/function.token-get-all.php>
<http://us3.php.net/manual/en/function.token-name.php>

Very cool nonetheless.

~~~
mfonda
They are very different things. token_get_all just tokenizes the code, but
this tool parses PHP code into an AST. If you look at the source of this
project, you'll notice that it does indeed use token_get_all to handle the
lexing.

~~~
icheishvili
I just went and read Lexer.php to see what you mean. Never mind on my previous
comment :)

Well done with the project--I have a use case for it regarding enforcing PHP
style guide @ $work.

