

Perl Cannot Be Parsed: A Formal Proof - fogus
http://www.perlmonks.org/?node_id=663393

======
andreyf
Keep in mind that _the term "parse" here is being used in its strict sense to
mean static parsing -- taking a piece of code and determining its structure
without executing it. In that strict sense the Perl program does not parse
Perl. The Perl program executes Perl code, but does not determine its
structure._

A less alarmist title would have been "Perl cannot be parsed unambiguously
without runtime information". A less technical summary:

    
    
        whatever  / 25 ; # / ; die "this dies!";
    

Could be parsed both as (using parentheses to show arguments):

    
    
        whatever( / 25 ; # / ); 
        die "this dies!";
    

Or:

    
    
        whatever / 25; # the rest is a comment
    

This depends on whether `whatever' is a function of one argument or not, which
you don't know until runtime.

~~~
cabalamat
I once attempted to learn Perl, several years ago. I always felt I was walking
on quicksand because I was never sure how a program should be parsed. It's
nice to confirm that it wasn't just my unfamiliarity with the system; the
actual design of the lanugage is deeply, deeply defective.

~~~
draegtun

       ... lanugage is deeply, deeply defective
    

With great power comes great responsibility ;-)

If i switch on warnings it will provide... well warnings on this example! If I
run Perl::Critic over it then it spews lots of things that I should be
concerned about!

You're unfortunately over sensationalising what is otherwise a very good
Perlmonks article.

~~~
cabalamat
> _With great power comes great responsibility_

Most people agree that Lisp is very powerful. Yet Lisp isn't hard to parse, in
fact its syntax is extremely simple.

------
nothingmuch
Interestingly, C suffers from a somewhat similar ambiguity, where not the
results of runtime-at-compile-time but rather type definitions affect the
parsing.

[http://calculist.blogspot.com/2009/02/c-typedef-parsing-
prob...](http://calculist.blogspot.com/2009/02/c-typedef-parsing-problem.html)

~~~
pmjordan
In C++, the problem becomes so bad that the language designers capitulated in
some instances, and you have to tell the compiler if something is a type or an
identifier. Haven't tested this example, but you get the point:

    
    
      struct test
      {
        typedef int bar;
      };
    
      template <class T> struct foo
      {
        typename T::bar x; // if you leave off 'typename', it won't compile
      };
      
      foo<test> y;
      y.x = 5;
    

What I don't quite understand is why, if it won't compile in the first place
(i.e. there is no ambiguity, just correct or incorrect), you need to specify
it in the first place. I suspect it somehow makes compiler implementation
easier.

~~~
haberman
It's not about making compiler implementations easier -- it's about protecting
template authors against having someone write:

    
    
        class MyBadClass {
          static int bar;
        };
        foo<MyBadClass> y;
    

By writing "typename", the template author can indicate that "T::bar" the
template author can indicate that "bar" is expected to be a type, not a
variable. This is explained in more detail here:
<http://pages.cs.wisc.edu/~driscoll/typename.html>.

~~~
pmjordan
Yes, but my argument is that such an instantiation could easily be determined
automatically. The declaration

    
    
      T::bar x;
    

makes no sense under any circumstances if T::bar is not a type. Ergo, putting
typename in front is redundant.

~~~
haberman
Yes but what if your declaration was:

    
    
        T::bar * x;
    

Now the parse is ambiguous. While the language could say "you only need to use
typename if the declaration would otherwise be ambiguous," it's simpler and
more consistent to say "all qualified dependent types must use typename."

------
teilo
And yet ... it is.

To quote Lao Tsu:

"The Tao that can be parsed is not the true Tao."

Who knew? (Probably Larry)

------
jrockway
The value of this is extremely limited. The Perl that most people write can be
parsed just fine.

(Just like many programs halt, even though it's proved that this is not the
case for _any_ input to _any_ turing machine.)

~~~
neilk
I don't think so. It is an example of how Perl's interpreter changes what a
program means, based on information gathered at runtime. The halting problem
stuff is just a formalization of the difficulty, to show that in some cases
it's actually impossible to parse.

Try this one:

    
    
       #!/usr/bin/perl 
    
       BEGIN {
         my $x = int(rand(2));
         print "randomly picked $x\n";
         if ($x) {
           eval ' 
             sub foo($) {
               print "executed foo()\n";
             }
           ';
         }
       }
       foo / 25; #/ ; die "DIED\n";
       print "DID NOT DIE\n";
    

This is technically parseable but only by thinking of it as a program that
branches at every call of 'foo'. And think about what would happen if many
such cases interacted.

"Modern" Perl programmers use a lot of syntax-perverting trickery, so this
isn't as unlikely as it may appear.

~~~
nothingmuch
can you please point to a real example of code like this?

just because a pathological case is possible, doesn't mean that it's actually
common, or even existent.

~~~
scott_s
That it's even possible is relevant - most programming languages can be
parsed.

~~~
nothingmuch
There's nothing preventing symbol table modifications done by Perl modules at
compile time to be reified into some sort of linkage unit, with which a parser
could then statically parse the code.

Similarly it's possible to prove that a certain block of code does nothing but
link in such deterministic units, removing the nondeterminism of function
prototypes.

Most of the source code out there can be parsed statically without resorting
to anything drastic. The semantics of this hypothetical Perl 5 variant do
differ, but as a strict subset it will still properly most of the useful code
out there.

~~~
scott_s
That's not the point. It is interesting, from a programming language theory
perspective, that Perl 5 can not be parsed statically. The person who wrote
the proof sketch cares very much about the practical implications of that
theoretical result because he's writing Perl parsing libraries.

Also, what you're describing is no longer static parsing.

------
rjurney
Another case of science telling us what we already know?

:D

~~~
salvadors
Another case of science telling us that what we always strongly suspected but
could never be 100% sure about, was indeed correct.

This is (in the general case) more useful than people tend to realise.

------
cjg
The halting theorem, which this result relies on, relies on the infinte length
of the tape in the turing machine.

~~~
gdp
And? That doesn't change the result. Whether you can implement the turing
machine or not is irrelevant, it's simply that you are able to frame the
problem as "does this halt?", which is unanswerable in the general case.

"Running out of storage" doesn't really count as "termination", especially
because we can't reasonably answer the question "will this run out of
storage?" in the general case either.

~~~
cjg
If you restrict yourself to a limited amount of storage then it is possible to
decide if a given program halts.

There are not many pieces of code that run with an infinite amount of storage.

~~~
gdp
OK, I'm not sure I agree with that in the general case either, but that still
wouldn't be important to the proof. We can safely assume infinite tape and
then the proof is fine.

If you think we _must_ implement the turing machine in order for the proof to
be believed, then you've only weakened the proof to "The static parsing
behaviour of Perl is dependent on the length of tape available", which is
basically just reducing the problem to a rather unhelpful interpretation of
"deterministic behaviour", approximately equivalent to the example using
randomisation (except you move the random element from the program under
consideration to the environment in which it is being considered).

Am I missing why you think this is a significant point?

~~~
cjg
If the tape is finite then there are a finite number of states. In which case
a halting oracle can exist - it detects repeated states.

I can't see how we can assume an infinite tape exists - there has never been
such an implementation, nor is it easy to see how this might be achieved.

The weakened proof only proves that a static parse could be possible if we
give the parser enough tape. The size of tape for the parser is determined by
the size of tape the perl program is allowed.

I'm afraid I don't understand your statements about deterministic behaviour or
randomisation.

~~~
gdp
Why does it need to be a real implementation of a turing machine? Why does the
(hypothetical) turing machine need to be implementable? The thing is, if you
can contrive a halting oracle for a machine with a tape of length _n_ , then I
can contrive another turing machine with a tape of length _n+1_ , and so on.
The result is uh... the halting problem.

And once again, I don't see why any of this even matters. Assuming the
existence of some Turing machine (with infinite tape) is a garden-variety
proof technique for these kinds of proofs. I still don't understand why you
are insisting that it must be implementable?

Computer memory is finite - does that mean we've solved the halting problem
for all the programs we care about?

~~~
cjg
If you don't have infinite memory (and who does) then there is no proof that a
halting oracle does not exist. In fact, there is a proof that such an oracle
does exist.

So, yes, we can solve the halting problem where there is finite memory as long
as our oracle is allowed more (perhaps much more) memory than that finite
amount. The fact that the oracle is allowed more memory than the program
obviates your n->n+1 objection.

My point matters because the proof doesn't prove that you can't create a
static parser for a perl program with only finite memory (in practice all perl
programs). In other words, we shouldn't discourage someone from trying to
build a static perl parser - it might well be possible.

If you assume an infinite tape to prove a theorem then that theorem can only
be applied in situations where an infinite tape is available.

~~~
gdp
But you're describing a well-known intractable problem in computability.
Finite-length tape doesn't make the problem any easier to solve, especially
within the sorts of time limits that would be acceptable to users of static
parsers.

E.g. from Minsky (1967), referring to a machine with a million parts:

"Even if such a machine were to operate at the frequencies of cosmic rays, the
aeons of galactic evolution would be as nothing compared to the time of a
journey through such a cycle"

So the conclusion stands. If you presuppose an infinite tape, you get
equivalence to the halting problem, and if you presuppose a finite tape beyond
any non-trivial size, you get complete intractability.

~~~
cjg
Intractible doesn't mean impossible: perhaps someone will come up with a great
new approach. The proof says nothing about what is possible with a finite
tape.

I hope you now feel that I have made a point that is, at least vaguely,
relevant.

------
Dove
This result fills me with visceral glee, for it is very much the way of perl.

~~~
Kaizyn
This is old and incorrect. Adam Kennedy released the PPI module and has been
working steadily on the Padre editor which is capable of refactoring Perl code
as well as parsing it.

Padre: <http://padre.perlide.org/> PPI: <http://search.cpan.org/dist/PPI/>

~~~
salvadors
No it's not. PPI's goal is only to do a "good enough" job. It explicitly
states in the documentation that it is not capable of actually parsing Perl
code and the discussion he gives in that documentation was what led to this
formal proof being developed. Furthermore, in the PerlMonks discussion linked,
Adam also commented: "This totally makes my day. Would you mind if I converted
this to POD and included it the documentation for PPI?"

