You can parse regular languages with a DFA -- think of a set of states and a transition function saying "if I'm in state x and we see letter c, go to state y".
Equivalently, for any letter you can ask "for each DFA state, where will this letter take me?" Like, "if we see an 'a', state 1 will go to state 5, state 2 will go to state 1, ...". You could represent this as a lookup table with an entry for each state, if you like.
These functions can be composed in the obvious way: just look up where the first one will take you, then look up where the second will go from there. So e.g. from the functions for 'c', 'a', and 't', you can get the function for 'cat', that tells what the DFA will do -- starting at any of its states -- if it next sees 'cat'.
(For a DFA with a constant number of states, composing two of these functions takes O(1) time.)
Composition is associative, of course. If you see 'ca' and then 't', this is the same as seeing 'c' then 'at', which is the same as seeing 'cat'.
To run a DFA on some string, you can think of it as equivalent to using the first letter's function to see where you go from the start state, then using the second letter's function on whatever state you end up in after that, and so on... which is the same as using the entire string's function to get to the final state in one jump.
How do you compute the whole string's function? Just compose the ones for its letters. This composition is associative, so you can do it in parallel as a tree-structured thing. For a string of length n, you can do this in O(log n) time as long as you can throw enough processors at it.
Boom, logarithmic-time parallel regular expression matching.
EDIT: I thought this sounded familiar, and sure enough: Danny Hillis and Guy Steele wrote essentially the same thing back in 1989, as one of the examples in "Data Parallel Algorithms". They also give a very clever explanation of how this can be extended to make a parallel lexer, also in logarithmic time:
I do this stuff with permutes and of course permutes are associative.
The limitation is that you can do this only on DFAs that are the size of your biggest usable permute (currently 16, but will be 64 once AVX512VBMI comes out - looking dicey now with the logjam on 10nm Cannonlake :-( ).
The idea also appears in Guy Blelloch's parallel prefix paper https://www.cs.cmu.edu/~guyb/papers/Ble93.pdf. To my knowledge actual practical regex matching using this technique is thin on the ground. For a start, detecting not just the fact that something accepts but that it accepts at a particular state at a particular offset (not part of the formal regular expression definition, but very useful) is less straightforward in these parallel formulations.
I imagine you could also do this for NFAs -- composing them is just a little extra work.
Here is a paper on how to do this efficiently with simd instructions https://www.microsoft.com/en-us/research/publication/data-pa...
The result is that you can parse in time O(log n) if you use O(n) processors, so parsing requires O(nlogn) work. (This is probably to be expected.) I think this means that, if you have o(n) strings to parse, you're better off with the good old DFA solution which uses O(n) work and parsing the different strings in parallel.
data MonoidMachine monoid input = Monoid monoid => MonoidMachine
(input -> monoid)
(monoid -> Bool)
-- From the proof of Theorem 1 in the paper.
translate :: DFA state input -> MonoidMachine (Endo state) input
translate (DFA _ _ delta initial accepting) =
MonoidMachine f g where
f a = Endo (\q -> delta q a)
g (Endo h) = h initial `member` accepting
runMonoidMachine :: MonoidMachine monoid input -> [input] -> Bool
runMonoidMachine (MonoidMachine f g) = g . mconcat . map f