

Introducing Malicious Code Reviews - jasonmc
http://paulbiggar.blogspot.com/2008/12/introducing-malicious-code-reviews.html

======
jbarciauskas
I showed this to my roommate, and he mentioned in his product, the lexer is
similarly written. The reason for this is, apparently, it is a really simple
way of implementing a finite state machine.

Wikipedia mentions the pattern as well:

"The lex/flex family of generators uses a table-driven approach which is much
less efficient than the directly coded approach. With the latter approach the
generator produces an engine that directly jumps to follow-up states via goto
statements. Tools like re2c and Quex have proven (e.g. article about re2c) to
produce engines that are between two to three times faster than flex produced
engines.[citation needed] It is in general difficult to hand-write analyzers
that perform better than engines generated by these latter tools."
(<http://en.wikipedia.org/wiki/Lexical_analysis>)

------
cschneid
Is the PHP community open to refactorings of the code? Do they have the tests
in place to make that possible?

~~~
tptacek
Why would they want to refactor their lexer? They're hard to get right in the
first place.

~~~
cschneid
Mostly, I'm curious about my second question. What's the state of testing the
core of PHP? I'm totally unfamiliar with the implementation.

I understand that it may be dumb to refactor the lexer, but it gets less dumb
if it's supported by hundreds of tests, and far more dumb if it's based on
testing over time.

------
slackerIII
Yawn. Sometimes, code is both messy and perfectly functional. At that point,
the best thing for the business/community is to spend time thinking about
something else until you need to make changes to that chunk of code.

------
tptacek
The best you can come up with is how tangled the code in a hand-written lexer
looks? I may have skimmed a bit towards the end, but, did you actually find a
bug?

~~~
huhtenberg
That code _is_ really bad. Dunno about you, but to me this handcrafted
entanglement looks like the crap that any professional programmer should be
ashamed of:

    
    
      if ( .. )
      {
         for ( .. )
         {
      label:
            ...
         }
      }
      else
      {
         if ( .. )
         {
           ..
         }
         else
           goto label;
         ..
       }
    

The actual code is even worse than this as there's a second goto that jumps
from the top _for_ into one of _else_ blocks at the bottom from where the
control can go back to _for_ via another goto ... I sort of can see how this
code came about, but there is absolutely no excuse for checking in something
this f*cked up.

If they wanted to show off, they should've just used the coroutines :-)

~~~
coliveira
It has been explained that this is just a way of implementing a fast lexical
scanner. This is the kind of code that you don't come up with, it is a
mechanical implementation of a state machine.

~~~
huhtenberg
Mechanical or not, this is a very untidy way of implementing a "fast lexical
scanner". Even the Duff's device looks cleaner and more elegant in comparison.
And it's not like there is a shortage of other implementation options that are
as fast.

------
coliveira
This guy is not doing a fair review. He is just badmouthing the code without
providing alternatives. The code doesn't look good, but a constructive review
would be much more appreciated by everyone. By acting this way, he is just
making a fool of himself.

