
RegExp Lookbehind Assertions - shawndumas
http://v8project.blogspot.com/2016/02/regexp-lookbehind-assertions.html
======
danso
Wow, I didn't even know there was a different way to implement lookbehinds in
such a way that variable-length patterns were even allowed:

> _Generally, there are two ways to implement lookbehind assertions. Perl, for
> example, requires lookbehind patterns to have a fixed length. That means
> that quantifiers such as_ or + are not allowed. This way, the regular
> expression engine can step back by that fixed length, and match the
> lookbehind the exact same way as it would match a lookahead, from the
> stepped back position.*

> _The regular expression engine in the .NET framework takes a different
> approach. Instead of needing to know how many characters the lookbehind
> pattern will match, it simply matches the lookbehind pattern backwards,
> while reading characters against the normal read direction. This means that
> the lookbehind pattern can take advantage of the full regular expression
> syntax and match patterns of arbitrary length._

> _Clearly, the second option is more powerful than the first. That is why the
> V8 team, and the TC39 champions for this feature, have agreed that
> JavaScript should adopt the more expressive version, even though its
> implementation is slightly more complex._

regular-expressions.info has roughly the same explanation here:
[http://www.regular-expressions.info/lookaround.html](http://www.regular-
expressions.info/lookaround.html)

I guess I had just assumed that if even Perl didn't implement variable-length
lookbehinds, then the performance or implementation cost must have been severe
enough to justify leaving out such useful flexibility. What is the tradeoff
that .NET and now JavaScript are willing to make?

edit: According to a comment in the posted article, Perl 6 now implements
variable-length lookbehinds
[http://www.perl6.org/archive/rfc/72.html](http://www.perl6.org/archive/rfc/72.html)

~~~
hashseed
Implementing lookbehind by reading backwards is not more computationally
expensive than reading forward. It adds some complexity to the regexp engine,
yes, but that's manageable.

There are some quirks vs. stepping back and reading forward, like the article
already explains.

I guess Perl initially just did not implement it this way, whatever the reason
was.

------
vmorgulis
> Yang Guo, Regular Expression Engineer

Interesting and specialized job :-)

~~~
hashseed
To clarify, this is not a real job title. Chrome and V8 blog posts are written
by engineers who worked on the feature being blogged about, and can choose a
funky title to go with the name :)

------
prohor
I use regex quite often and I'm happy we have it now more powerful in Chrome.
Though, what is so painful in regex is all those flavors of it. I with it was
standardized across all languages / implementations.

------
lolc
I wonder what code could profit from such a niche feature. I cannot even
imagine a situation where I would want to use this.

~~~
minitech
Ever used \b? This is like that, but for other patterns. Very useful in many
situations, like tokenizing (because you have to match on things the last
match already consumed).

~~~
lolc
Maybe I've used \b before, but I can't recall using regexes for tokenizing.
They've either been too powerful or not powerful enough for my tokenizing
needs.

