Introducing Malicious Code Reviews

jbarciauskas · on Jan 27, 2009

I showed this to my roommate, and he mentioned in his product, the lexer is similarly written. The reason for this is, apparently, it is a really simple way of implementing a finite state machine.

Wikipedia mentions the pattern as well:

"The lex/flex family of generators uses a table-driven approach which is much less efficient than the directly coded approach. With the latter approach the generator produces an engine that directly jumps to follow-up states via goto statements. Tools like re2c and Quex have proven (e.g. article about re2c) to produce engines that are between two to three times faster than flex produced engines.[citation needed] It is in general difficult to hand-write analyzers that perform better than engines generated by these latter tools." (http://en.wikipedia.org/wiki/Lexical_analysis)

cschneid · on Jan 26, 2009

Is the PHP community open to refactorings of the code? Do they have the tests in place to make that possible?

ovi256 · on Jan 27, 2009

The PHP community is dead. The number of recent innovative libraries/components/technologies implemented in PHP (0) confirms it. The Dead Sea Effect (http://brucefwebster.com/2008/04/11/the-wetware-crisis-the-d...) only made it worse.

tptacek · on Jan 26, 2009

Why would they want to refactor their lexer? They're hard to get right in the first place.

cschneid · on Jan 26, 2009

Mostly, I'm curious about my second question. What's the state of testing the core of PHP? I'm totally unfamiliar with the implementation.

I understand that it may be dumb to refactor the lexer, but it gets less dumb if it's supported by hundreds of tests, and far more dumb if it's based on testing over time.

slackerIII · on Jan 27, 2009

Yawn. Sometimes, code is both messy and perfectly functional. At that point, the best thing for the business/community is to spend time thinking about something else until you need to make changes to that chunk of code.

tptacek · on Jan 26, 2009

The best you can come up with is how tangled the code in a hand-written lexer looks? I may have skimmed a bit towards the end, but, did you actually find a bug?

huhtenberg · on Jan 26, 2009

That code is really bad. Dunno about you, but to me this handcrafted entanglement looks like the crap that any professional programmer should be ashamed of:

  if ( .. )
  {
     for ( .. )
     {
  label:
        ...
     }
  }
  else
  {
     if ( .. )
     {
       ..
     }
     else
       goto label;
     ..
   }

The actual code is even worse than this as there's a second goto that jumps from the top for into one of else blocks at the bottom from where the control can go back to for via another goto ... I sort of can see how this code came about, but there is absolutely no excuse for checking in something this f*cked up.

If they wanted to show off, they should've just used the coroutines :-)

coliveira · on Jan 27, 2009

It has been explained that this is just a way of implementing a fast lexical scanner. This is the kind of code that you don't come up with, it is a mechanical implementation of a state machine.

huhtenberg · on Jan 27, 2009

Mechanical or not, this is a very untidy way of implementing a "fast lexical scanner". Even the Duff's device looks cleaner and more elegant in comparison. And it's not like there is a shortage of other implementation options that are as fast.

mindviews · on Jan 26, 2009

Didn't look like he pointed out any bugs - but that's not really the main point of a code review. It's all well and good for the compiler to understand the code and have it execute properly, but is it human-readable? Or maintainable? Having looked at the source file, I'd say that particular method is neither.

pbiggar · on Jan 26, 2009

Since its practically the entire contents of the file, its hard to do better.

As to the presence of a bug, impossible to tell. Which is the point.

tptacek · on Jan 26, 2009

So, I do a lot of code review --- a job hazard --- and we tend to score them based on whether we find anything. You may want to find a softer target.

coliveira · on Jan 27, 2009

This guy is not doing a fair review. He is just badmouthing the code without providing alternatives. The code doesn't look good, but a constructive review would be much more appreciated by everyone. By acting this way, he is just making a fool of himself.