
Adventures in the land of substrings and RegExps - jsnell
http://mrale.ph/blog/2016/11/23/making-less-dart-faster.html
======
evmar
I worked on a project where lexing speed was of concern. Because it was C++ I
didn't have the easy option of reaching for regexes so I instead wrote a hand-
rolled lexer as suggested in the post.

But later I found a tool, re2c, where you write regular expressions in your
source code and it expands them out to a fast parser using every last trick
(gotos and lookup tables, see an example of the generated code here[1]). The
result had the delightful and rare property where the final code was both
higher-level (regexes are a succinct way of expressing lexing) and faster than
the lower-level hand-rolled code.

[https://github.com/ninja-
build/ninja/blob/3082aa69b7be2a2a06...](https://github.com/ninja-
build/ninja/blob/3082aa69b7be2a2a0607441a7a2615d78aa983d7/src/lexer.cc#L128)

~~~
mraleph
I guess I should have been more clear in my advice about not using regular
expressions: if you have a tool that allows glueing multiple regular
expressions together and erasing the cost of the abstraction (e.g. no
allocation of temporary match result objects, substrings, etc) - then
certainly that would be a preferred way to code the lexer. Just don't sprinkle
a bunch of unconnected `new RegExp()` / `re.exec` around the code and expect
that to be reach peak lexing speed.

------
aisofteng
I don't understand how anyone could implement an O(n^2) substring
implementation and not think, "this seems really wrong".

~~~
taeric
I'm not sure I understand your post. There are tradeoffs to different ways of
implementing substring. They were fairly well covered in this page.

Are there points you disagree with?

------
mrkgnao
Now I'm really interested in the SpiderMonkey/V8 thing.

Also,

> Solution’s simplicity and elegance might provide a welcomed retreat from
> fighting Webpack configs.

I chuckled.

~~~
draw_down
I laughed at this part:

> However when I did necessary changes to the less_dart code I discovered that
> it actually became several times slower. Hmm.

It's almost like those kuh-razy V8 people knew what they were doing and wrote
all that code for a reason.

~~~
__derek__
The author is a compiler engineer on the V8 team.

------
OneOneOneOne
I can recommend Mastering Regular Expressions by Jeffrey Friedl. He goes into
great detail on regexp performance and the reason various expressions are
slow.

I picked up an older revision from abebooks or ebay. The Python 2.7 online
docs recommend revision 1 and states later revs don't cover Python.

