How a Perl 5 Program Works

gdp · on Aug 18, 2009

From the "questions" section at the bottom of the article:

> Wait, so Perl 5 doesn't interpret every statement as it parses it?

I'm sorry, was anyone seriously suggesting this? Don't most production-grade interpreters still parse the program before executing it?

The article seems to be premised on a very strange definition of compilation. It's suggesting that building a parse tree and then executing it is compilation. Maybe I missed something, but compilation as an activity generally has a translation flavour about it. This really doesn't. There is no "target" language. There is a data structure that stores a representation of the source code, and then some program transformations (within the same source language semantics). It then executes the original program using the representation stored in this data structure. Sure, it's not like it's executing as it is parsing, but there is no "code generation" here. If you dumped out the data structure storing what is essentially a parse tree, you would basically get the original source program (+/- some syntax).

We're still in the same source language. No "compilation" has happened. Am I missing something? Is there a reason the author is claiming that there are distinct "compile" and "execute" phases when he appears to be describing a fairly run-of-the-mill interpreter implementation?

phaylon · on Aug 18, 2009

How is the transformation into bytecode before execution not a translation? Yes, the bytecode format is perl-specific, but you can't easily get at the real original source code again.

gdp · on Aug 18, 2009

It's not bytecode! There is no bytecode involved at all. It's just a representation of the source program text.

For example (in some pseudo-code):

a = x + 10

Perl 5 builds a tree from this (e.g. Assign(a, Add(x, 10))) and then executes it. Even the traversal of the tree proceeds according to the results of executing the program up to that point.

If we were to take the tree I generated above, we could get the exact same program out of it. I agree that this would be difficult with Perl - not because the parse tree is substantially different from the program text, but rather because it is ambiguous.

We could also target some arbitrary language. This is not the case in Perl, because it can't. The tree has absolutely no meaning other than that given to it through execution. It's not even an abstract syntax tree - it's just a parse tree! The semantics of the tree are exactly the same as the Perl code from which the tree was derived. And I don't mean "semantically equivalent", I mean, "identical by definition", because the source only has meaning because the tree is executed.

In a compiler setting, we can take an arbitrary (valid) program and give its meaning in terms of the target language. An abstract syntax tree has its own semantics, into which the original program text (or parse tree) is translated. The AST is given a semantics by a code generator, which translates constructs in the AST into constructs in the target language (which has yet another set of semantics).

phaylon · on Aug 18, 2009

I'm sorry if I was unclear, but I still don't see any problem with this being called a compilation of one format into another.

pbiggar · on Aug 18, 2009

Words mean what they mean. You could call any interpreter a compiler if you twisted it, and you'd sort of be right, but you'd be a lot more wrong. Same in this instance. Its an interpreter! And yes, it sort of does a job close enough to compilation that you wouldn't _really_ be incorrect calling it a compiler, but its not a compiler dammit!!

phaylon · on Aug 19, 2009

I agree that perl is not a compiler, but an interpreter. I disagree however if you say it doesn't perform compilation of the source.

scott_s · on Aug 18, 2009

gdp's objection to calling this compilation, which I share, is that there is only one form.

phaylon · on Aug 19, 2009

No, there isn't. There is the source, and there is the prepared and executed optree. The building of this is in Perl called the compile time phase. Which makes sense to me. I can't see how the source file and the parsed optree can be the same form. They do the same things, yes. But most source code and compiled forms have that in common.

scott_s · on Aug 19, 2009

As I understand it, the "executed optree" is just the parse tree. The building of that tree is the lexing and parsing phase, not compilation. The parse tree is never translated into another form - neither an abstract syntax tree, nor another language with the same semantics.

http://en.wikipedia.org/wiki/Parse_tree

http://en.wikipedia.org/wiki/Abstract_syntax_tree

If the parse tree is never translated to another form, then I think it's fair to say Perl is interpreted, not compiled.

calcnerd256 · on Aug 18, 2009

Someone should rewrite Perl in Lisp.

draegtun · on Aug 18, 2009

Well Perl Mongers have already re-written Lisp in Perl so i think Lispers could return the favour ;-)

ref: http://search.cpan.org/dist/perl-lisp/

gdp · on Aug 18, 2009

s/in Lisp//

draegtun · on Aug 18, 2009

Its called Perl6 / Rakudo & Parrot

chromatic · on Aug 18, 2009

Did you read the LTU link? Someone seriously suggested that Perl 5 obviously works by reparsing code at run time.

I admit that there's a continuum between "pure compilation" and "pure interpretation", but when you have people claiming that Java is a compiled language, the lines are awfully blurry.

I take "interpreted" in its purest form as "parses and executes a line at a time", which is why I mentioned REPLs in the article.

gdp · on Aug 18, 2009

Oh, that would be a very strange interpretation of interpretation. I'm not sure what you mean by "reparsing code at runtime". Given that there isn't really a "compile time" in a perl-like setting, when are you proposing the parsing happens?

chromatic · on Aug 18, 2009

Read the link. The poster suggested that Perl 5 subroutines take as an invisible argument the remaining source code (presumably from the end of the declaration of that sub to the end of the file) of the program so that they could reparse the remaining code when the subs evaluate to a known value.

In Perl 5 parlance, "compile time" is when parsing happens. "Runtime" is when execution happens.

gdp · on Aug 19, 2009

Oh, I just read the comment you're referring to, and I think you've completely misinterpreted the commenter. The use of "program text" is used in a fairly precise way to mean "the tokens unconsumed up to this point in the execution", which appears to be what you asserted in your reply. Of course, given that the parse tree is just a structured representation of the program text, with no pre-determined traversal order, is it not true that the parser needs the context resulting from having evaluated part of the tree in order to decide how to proceed in evaluating the rest of it?

I think you've taken a technically accurate comment and then answered a completely different point.

chromatic · on Aug 19, 2009

The claim is that you must execute a Perl 5 subroutine to discover its arity. That's nonsense.

In the general case, Perl 5 subroutines have well-defined arities: they consume as many arguments as possible, either delimited by parentheses or the statically-unambiguous end of an expression.

In the special case of prototyped subroutines, the parser knows their arities as soon as it encounters their declarations.

None of this has any connection to the claim that "it's necessary to execute the subroutine to discover just how many [tokens] it will consume." If that were true, perl -c would not work.

draegtun · on Aug 18, 2009

See also HN discussion on original PerlMonks post that started all this: http://news.ycombinator.com/item?id=761103

draegtun · on Aug 18, 2009

Also see Stackoverflow question on this topic: http://stackoverflow.com/questions/1280594/can-perl-be-stati...

known · on Aug 18, 2009

I love Perl.

mahmud · on Aug 18, 2009

Read that article and you wouldn't love it as much. Of all the high-level languages whose compilation and evaluation model I am familiar with, Perl5's has got to be the worst. I can't even classify it; the intermediate representation and evalution is not stack based or register based, it's not threaded interpretation, not metacircular, it's not graph reduction or term rewriting, not even "string" based (a la the weird 3rd implementation of Scheme in Kent Dybvig's PhD thesis.) So what on bloody earth is Perl5? At least Basic has the decency is to be stupid, both in essence and appearance.

Think of any language family and you could classify them in some fashion. They fit some sort of a theoretical model you could reason about. Perl5 is .. well, whatever Larry knew how to do. However, it does a good job of not sucking for Unix administration; whenever I need it, about once every 6 months, Perl saves me from the trouble of launching splitvt and reading the manuals of several unix utilities. It's like a John Waters or a Roger Corman* movie, so bad it's good.

--

* not to be confused with Richard Waters and the other Roger Corman, both fine Lispers ;-)

gdp · on Aug 18, 2009

It's building a parse tree, and then attempting to use the unambiguous bits like an abstract syntax tree, presumably to side-step the statically unresolvable ambiguity mentioned in the previous article.

And yes, that is as bizarre as it seems.

I think this article is pretty disingenuous. It's trying to pretend that there is something really complicated going on here (e.g. "That's one way in which Perl 5 differs from other language implementations; it manages the artifacts of compilation itself"), when really, it's actually just poor design layered upon poor design.

> Perl 5's execution model isn't quite the same as a traditional compiler (whatever that means) and it's definitely not the same as the traditional notion of an interpreter. There are two distinct phases of execution of a Perl 5 program: compile time and runtime. You must understand the difference to take full advantage of Perl 5.

This is true, but only if you don't really understand what a traditional compiler or a traditional interpreter might entail, given that Perl is basically doing what Python, Ruby, Matlab and any number of other language implementations do, it's just doing it in such a way that it has to account for flaws in the design of the language.

chromatic · on Aug 18, 2009

The much paraded statically unresolvable ambiguity does not apply when the Perl 5 parser parses Perl 5 code for reasons explained in the article. The parse tree produced is unambiguous.

Grammar modifications in BEGIN blocks make the parsing process a little different from many other languages with which I believe readers are familiar.

gdp · on Aug 19, 2009

Sure, the generation of a parse tree is unambiguous, but the traversal order of that tree is not statically determinable.

I'm not even saying there is anything wrong with this, I'm just arguing that the way you have described the "compilation" process in Perl is basically a textbook description of an interpreter that relies on some runtime information in order to execute the program.

draegtun · on Aug 18, 2009

Perl seems to be the Marmite of the programming languages ;-)

ref: http://www.marmite.com/

sharkbrainguy · on Aug 18, 2009

That "people either love or hate x" is almost always a myth created by vocal minorities.

e.g. I enjoy marmite but I don't _love_ it

Also consider the following:

- that most people dislike it the first time (i.e. it's an acquired taste)

- most first timers are introduced to it by someone who "loves" it

These make all acquired tastes seem polarising even if they really aren't.

... although maybe that's what you were trying to say about perl.

draegtun · on Aug 18, 2009

re: "acquired taste" - I totally agree.

for eg. I hated beer first time I tasted it... but now I love it ;-)