> I mean, if the engine tried matching from the second space, what would be matching the first space? Something has to.
Some regex engines provide an API call that puts an implicit `.STAR?` at the beginning of the regex so that the semantics of the match are "match anywhere" as opposed to "match only from the start of the string." (This is in fact the difference between Python's `match` and `search` methods.) Assuming the OP was using this type of method, then this difference exactly explains why you can't reproduce it. I can:
>>> import re
>>> haystack = (' ' * 20000) + 'a'
>>> re.match('\s+$', haystack) <-- is wicked fast
>>> re.search('\s+$', haystack) <-- chugs a bit
indeed, this chugs a bit too:
>>> re.match('.*?\s+$', haystack)
So technically, the OP left out this little detail, but it's a pretty common thing to find in regex libraries. In fact, Rust's library treats all searches as if they were re.search and provides no re.match function, instead preferring to require an explicit `^` to remove the implicit `.STAR?` prefix.
I guess this must have been the scenario. Perl also defaults to re.search but does not exhibit the pathological case, maybe because it knows it can't find a $ in a block of \s.
Some regex engines provide an API call that puts an implicit `.STAR?` at the beginning of the regex so that the semantics of the match are "match anywhere" as opposed to "match only from the start of the string." (This is in fact the difference between Python's `match` and `search` methods.) Assuming the OP was using this type of method, then this difference exactly explains why you can't reproduce it. I can:
indeed, this chugs a bit too: So technically, the OP left out this little detail, but it's a pretty common thing to find in regex libraries. In fact, Rust's library treats all searches as if they were re.search and provides no re.match function, instead preferring to require an explicit `^` to remove the implicit `.STAR?` prefix.