
Perl Cannot Be Parsed: A Formal Proof (2008) - dbaupp
http://www.perlmonks.org/index.pl?node_id=663393
======
jrockway
The article really doesn't prove that Perl can't be parsed, it proves that
Perl can't be parsed unambiguously.

The example given is:

    
    
        whatever  / 25 ; # / ; die "this dies!";
    

This can be parsed into two variants without running any code or knowing the
arity of whatever.

The case where whatever takes no arguments:

    
    
        8  <@> leave[1 ref] vKP/REFC ->(end)
        1     <0> enter ->2
        2     <;> nextstate(main 1 -e:1) v:{ ->3
        7     <2> divide[t3] vK/2 ->8
        5        <1> entersub[t2] sKS/TARG,1 ->6
        -           <1> ex-list sK ->5
        3              <0> pushmark s ->4
        -              <1> ex-rv2cv sK/128 ->-
        4                 <#> gv[*whatever] s ->5
        6        <$> const[IV 25] s ->7
    

The case where whatever takes a single argument:

    
    
        b  <@> leave[1 ref] vKP/REFC ->(end)
        1     <0> enter ->2
        2     <;> nextstate(main 1 -e:1) v:{ ->3
        6     <1> entersub[t2] vKS/TARG,1 ->7
        -        <1> ex-list K ->6
        3           <0> pushmark s ->4
        4           </> match(/" 25 ; # "/) sM/RTIME ->5
        -           <1> ex-rv2cv sK/128 ->-
        5              <#> gv[*whatever] s ->6
        7     <;> nextstate(main 1 -e:1) v:{ ->8
        a     <@> die[t3] vK/1 ->b
        8        <0> pushmark s ->9
        9        <$> const[PV "this dies!"] s ->a
    

What does this mean practically? If you want to rename "whatever" to something
else, you can do that, since the name of the function is captured no matter
how many arguments it takes. If you want to lint all your regular expressions,
you can do that with the chance that you'll be wrong if you guessed the
prototype wrong.

The only thing you can't do is figure out exactly how the code will run at
runtime. But if you could do that, you wouldn't have a computer program, you'd
just have a lookup table.

Another thing to keep in mind: if a static analysis tool can't understand your
code, perhaps the future maintainer can't either. In that case, why not
rewrite the code to not be ambiguous?

~~~
Vivtek
This perfectly encapsulates my feelings about the Perl parsing question!

An ambiguous parse is still a parse, and yeah, the 'whatever' example is
pathologically unmaintainable code. As you say, if a static parser finds it
ambiguous, you should probably fix it.

~~~
Vivtek
(Or mark it in some way, you know: "This is ambiguous, but it needs to be
because _______" - because frankly, ambiguous parses are destined to be bug
sources.)

------
joosters
As one of the comments in the original article points out, this is pretty
obvious. You just need a BEGIN { } block containing eval(string), where the
string depends on evaluating code (e.g. a random number)

At least the article doesn't jump to big conclusions and make stupid
declarations based on this result.

Unfortunately the commentators do: "This means that things like static code
analysis, code transformation and syntax hilighting will never be reliable."
So? Almost all code is still easily analysed, syntax highlighting works, etc.
It's like stating that you can't do this stuff with C/C++, because the code
depends upon #defines (whose values could be variable, or sourced from
external processes)

~~~
alipang
What does this mean for the potential of compiler optimization in Perl? You
can't really throw around optimizations that work for 99% of the time.

I feel like the type of work that's done by browsers on JIT compiling
Javascript would be really hard if you can't 100% reliably parse the language.

~~~
dbaupp
> You can't really throw around optimizations that work for 99% of the time.

You can, if you can detect (a superset of) the cases when it is invalid.

------
perlgeek
Well, you can't parse perl without running parts of it.

You can't parse lisp without running reader macros.

You can't parse TeX without running parts of it.

You can't parse C without parsing header files. (Ok, that one is as bit
different, because header files are just C code as well).

~~~
0x0
C++ templates are turing complete, so I guess you can't parse C++ without
"running" the templates either.
[http://stackoverflow.com/questions/189172/c-templates-
turing...](http://stackoverflow.com/questions/189172/c-templates-turing-
complete)

~~~
kivikakk
No. The fact that a feature is Turing complete doesn't make the language
unparseable.

C is Turing complete, but you don't need to run the program to parse C.
Likewise for parsing C++ templates.

~~~
0x0
But is parsing C turing-complete?

Because that's what I took away from reading about C++ templates - the turing-
completeness happens at the parsing/preprocessing stage, before the compiler's
code generator even starts translating. Was I wrong? Wouldn't a smart IDE that
resolves types (for example for tab completion or type hints), looking at the
Factorial<N> thing, have to solve for the value for Factorial<4>?

~~~
kivikakk
No. (You need parse all includes to know typedefs, but that's it.)

C++ is awkward for a different reason, but there's nothing unparseable about
it. Resolving types for tab completion/hinting is getting closer to static
analysis (at least, with C++ it is), but parsing is well and truly done by
that stage.

There's nothing about template<4> that can't be parsed, and nothing about the
templates or instantiations thereof causes the syntax to be interpreted
differently; this is probably the requirement for a parser to be Turing
complete.

Perl is actually unparseable without execution: you can modify the allowed
syntax mid-parse, based on some condition known only at runtime. Nothing about
a template argument (for example) in some call causes a later template
definition to be interpreted according to different syntax rules.

~~~
ExpiredLink
> _C++ is awkward for a different reason, but there's nothing unparseable
> about it._

Some corner cases in C++ are ambiguous. Those are 'resolved' by guessing what
may be more suitable in the current context.

------
glurgh
You don't have to be a perl user but this has never been a goal of perl.

<http://bigthink.com/videos/why-perl-is-like-a-human-language>

Love it, hate it, it just isn't what perl is about.

~~~
dasil003
Your comment lacks context. Do you mean parseability of perl has never been a
goal of perl? If you read the article, you'll see it was _the goal of the
author_ , which is why he wrote this proof. So I'm not exactly clear what
you're trying to say.

~~~
driax
The point the grand parent was making seems to me to be rather simple. He says
that the designers of Perl (not the author of the article) didn't consider
parseability of Perl an objective, since they focused more on making a
scripting language for humans. Of course the author (and many others) would
have liked if Perl was statically parseable since a lot of static tools like
documentation require this to be complete. So it just shows that maybe static
parseability should have been an objective of the Perl designers or any other
language designer in the future.

------
draegtun
Previous HN discussion from 3 years ago -
<https://news.ycombinator.com/item?id=761103>

Also came up in a comment a couple of weeks ago -
<https://news.ycombinator.com/item?id=5718637>

------
bbanyc
A language that can be statically parsed won't allow runtime redefinition of
itself, like Lingua::Romana::Perligata.
[http://www.csse.monash.edu.au/~damian/papers/HTML/Perligata....](http://www.csse.monash.edu.au/~damian/papers/HTML/Perligata.html)

As Tom Duff said in the "Duff's Device" post, I'm not sure whether this makes
an argument for or against this feature.

------
trobertson
It should be noted that this is from 2008, and last updated in 2009. I'm not a
Perl user, so I don't know if the language has changed enough since then to
affect the validity of the proof. We also don't know if it holds for Perl 6.

~~~
perlgeek
It also holds for Perl 6 (to the same degree that it requires you to run some
components of the code that is being compiled; Perl 6 does make it easier for
the compiler to know which code it must run to complete the parse, though).

------
tudorconstantin
That's why they say perl is easy to write and hard to read?

I'm a perl developer for almost 3 years and I simply love it.

------
fijal
Er. This proof is seriously flawed, starting with "what is Perl"? How do we
define a language that's not parseable? Does it only mean that you can't know
what the program will do until you run it? That's true for Ruby, Python,
pretty much everything that has any level of meta-programming. Also, it lacks
any formal definition of what is to be proved. Statement: "Perl is
unparseable" is not very well defined. "Perl" is the first problem, the second
problem is "unparseable" - does it mean you can't write a grammar that does
apply to all valid perl programs and does not to all invalid ones? We come
back to what's "valid" and "invalid" program.

Obviously if the statement is "you can't write any useful linting tools",
which seems to be what the author is after, than fine, but you have to define
"useful".

~~~
chrismorgan
Ruby cannot be parsed (e.g. for ``x - y`` the ``-`` could be unary if ``x`` is
callable or binary otherwise).

Python, on the other hand, can be parsed entirely unambiguously.

------
slacka
The Perl language has evolved over the years. Could it be "fixed", so that a
future version could be parsed?

~~~
perlgeek
It could, but then one would sacrifice syntactic extensibility. It's a
deliberate tradeoff.

------
n0body
That's how it's supposed to be. You need perl to parse perl. It's not an
issue, it's a feature of the language being so flexible.

Simply solution is, k.i.s.s

~~~
pekk
This is a troubling thing to say.

------
csense
This is yet another good reason not to use Perl.

------
brokenparser
I knew it!

