Hacker News new | comments | show | ask | jobs | submit login

> I mean, if the engine tried matching from the second space, what would be matching the first space? Something has to.

Some regex engines provide an API call that puts an implicit `.STAR?` at the beginning of the regex so that the semantics of the match are "match anywhere" as opposed to "match only from the start of the string." (This is in fact the difference between Python's `match` and `search` methods.) Assuming the OP was using this type of method, then this difference exactly explains why you can't reproduce it. I can:

    >>> import re
    >>> haystack = (' ' * 20000) + 'a'
    >>> re.match('\s+$', haystack) <-- is wicked fast
    >>> re.search('\s+$', haystack) <-- chugs a bit
indeed, this chugs a bit too:

    >>> re.match('.*?\s+$', haystack)
So technically, the OP left out this little detail, but it's a pretty common thing to find in regex libraries. In fact, Rust's library treats all searches as if they were re.search and provides no re.match function, instead preferring to require an explicit `^` to remove the implicit `.STAR?` prefix.



For anyone else who wants to time the examples without copying & pasting each line:

    python3 -m timeit -n 1 -r 3 -s "import re ; haystack = (' ' * 20000) + 'a'" -c "re.match('\s+$', haystack)"
1 loops, best of 3: 467 usec per loop

    python3 -m timeit -n 1 -r 3 -s "import re ; haystack = (' ' * 20000) + 'a'" -c "re.search('\s+$', haystack)"
1 loops, best of 3: 4.23 sec per loop

Options

    -n  how many times to execute statement
    -r  how many times to repeat the timer
    -s  setup code (run once)
    -c  command to run n times


I guess this must have been the scenario. Perl also defaults to re.search but does not exhibit the pathological case, maybe because it knows it can't find a $ in a block of \s.


By .? you mean .* ?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: