Eleven years on, this remains one of my favorite papers!
There's a pretty variant of this idea introduced in 1995 by Valentin Antimirov called partial derivatives of regular expressions. His idea was to replace the single derivative with a set of partial derivatives that add up to the whole derivative. This simplifies the derivative algorithm quite a bit, and makes it possible to write amazingly concise and efficient regular expression matchers.
E.g., here's an implementation of a regular expression compiler that uses Antimirov derivatives to take a regexp and build a table-driven matcher in about 50 lines of code:
Curiously, in Ken Thompson's 1968 paper on his regex matcher https://www.fing.edu.uy/inco/cursos/intropln/material/p419-t... he says it works by taking Brzozowski derivatives. This was a headscratcher to me since his code seems completely different from the derivative-based regex matchers I'd seen. The answer is, it takes Antimirov derivatives, which didn't have a name yet.
One thing that I just learned that is SUPER WEIRD is that Thompson's paper contains the word "derivative" but it doesn't contain the words NFA, DFA, or even "state" !!!
First paragraph:
In the terms of Brzozowski [1], this algorithm con- tinually takes the left derivative of the given regular expression with respect to the text to be searched.
Thompson learned regular expressions from Brzozowski himself while both were at Berkeley, and he credits the method in the 1968 paper. The now-standard framing of Thompson's paper as an NFA construction and the addition of caching states to produce a DFA were both introduced later, by Aho and Ullman in their various textbooks. Like many primary sources, the original is different from the textbook presentations of it. (I myself am also guilty of this particular misrepresentation.)
However, figure 5 in Thompson's paper is CLEARLY an NFA for the regex a(b|c)d.
What's the connection between derivatives and NFAs? I don't understand why Thompson says his method is derivative-based and not NFA-based.
(The 2007 "Re-examined" paper linked here comments upon the Thompson line in section 6. They qualify Thompson's statement a bit, but I don't understand it fully. )
Annoyingly, now it's really needed Wikipedia gives no pronunciation guide. Apparently it's Bzho-ZOFFskyi in Poland (where zh is like the si in vision) or Briz-OWski in the US.
Nope, I'm pretty sure it's not wrong. I believe ϵ is the multiplicative identity ("1") here. Note that ϵ is the empty string, and concatenating it with anything results in the same thing, so Aϵ = A. Hence we have Aϵ - Aa = A(ϵ - a) = ϵ.
My formal automata class might have used non-standard terminology, but what I was taught was that while A: Aa and A: ε are productions, A: Aa | ε is just a shorthand to combine the two. Since there is only one start variable, A: Aa | ε is the entire grammar as well. a* is the regular expression representing the language for this grammar.
> but what I was taught was that while A: Aa and A: ε are productions
Huh? This is the exactly what I also just said, in response to the different claim you made above, which was that the production is merely "A" rather that "A: Aa" or "A: ε". If you're claiming they are somehow equal (which they clearly are not... one is a symbol and the other is a rule mapping that symbol to a sentential form) then I'm confused how you think A and a* were any less equal to begin with...
If regexps are understood as a kind of "baby" programming language, and the techniques for implementing it are understood as "baby" implementation techniques for more complete languages, does "differentiation" have any applications in implementing more complete programming languages?
As an aside, why do I think of regexps as a kind of "baby" programming language? Because:
- They can be converted to NFAs or DFAs. These NFAs and DFAs resemble assembly languages. The "assembly language" for NFA is actually quite exotic. Reminds me of compilation.
- NFAs can be understood as a kind of Virtual Machine. DFAs are like a subset of a physical machine.
- Just In Time compilation was first applied to regexps before anything else. This was done by Ken Thompson.
- The interpreter->compiler algorithm in RPython (the program that is used to produce PyPy) works quite well on regexps. This essentially automates the work that Ken Thompson did.
The differential of a term does give you some information about how a term behaves under reduction, so I guess the answer is "maybe" - sounds like a fun project to work on. :)
More generally, there are models of linear logic built on a notion of differentiation (mentioned in the same paper as above), which might translate to a compilation of linear lambda calculus/classical processes/pi calculus/etc.
A Theory of Changes for Higher-Order Languages - Incrementalizing λ-Calculi by Static Differentiation
> [...] We present a program transformation taking programs to their derivatives, which is fully static and automatic, supports first-class functions, and produces derivatives amenable to standard optimization.
The programming language follows a context-free grammar (CFG), which has been written in Backus-Naur Form (BNF). It is technically nonsensical to say "the programming language is BNF". It's like if I were to say "English is a Roman language" because we happen to use the Roman letters for our orthography. Just a technical nitpick.
> BNF is a more expressive language than regexp
The context-free languages are more expressive than the regular languages. BNF is a notation for writing a context-free grammar, and regular expressions are a notation for writing regular languages. (Again, a nitpick of terminology.)
> Just In Time compilation was first applied to regexps before anything else. This was done by Ken Thompson.
Citation for this? I'd be surprised if Thompson was first. Runtime optimization was long the domain of Lisp programmers. Blurring the distinction between compile-time and run-time is one of PG's things that "made Lisp different".
He says this is the first case of regular expressions on a computer, which I believe, but I'm not hearing anything about runtime optimization or compilation.
NFAs are not a "kind of virtual machine". The "assembly" language for an actual NFA is no more complex than the one for a DFA.
They are state machines that can be in a lot of states at once; that's all. There is a debasement of the term NFA, popularized by Friedl, that "NFA" means "any thingy that implements regex that isn't a DFA". This isn't true and should not be promulgated.
They are a virtual machine because the physical hardware can't be in more than one state at the same time. Representing an NFA as an assembly-type program makes it resemble something like the JVM, in that it looks like an assembly language for a non-physical machine. Virtual=non-physical
It's a strained metaphor which casts no light on actual NFA implementation or the true distinction of NFAs and DFAs.
In this case, instead of the distinction between "virtual" and "physical" we have, considerably less pretentiously, an "state id" vs a "set of state ids" (in case of actual NFA implementation, said set might well be represented by a bit-vector whether a general purpose register or a SIMD register).
Making tortured analogies to JVMs and virtual machines doesn't actually help explain anything about what's going on. It certainly fails to provide the insight that you might get from comparing the whole "id vs set-of-ids" (one can instantly intuit the risk of an exponential blowout in NFA->DFA conversion, which does in fact exist).
Fair enough. I think that implementation is a dreadful straight NFA implementation - and, IIRC, is not used in RE2 for pure NFA code.
It's not a dreadful way of implementing regex as it can do things a straightforward bitset NFA implementation can't.
To be fair, I think your original post does stand, but I guess I am a bit pedantic about the terminology after spending a little bit of time on regex implementation.
In fact, a number of strategies in regex implementation turned out to generalize to broader areas. The bitwise automata are helpful elsewhere, as are many of the strategies one might use to make them fast. The 4 Russians technique ( https://en.wikipedia.org/wiki/Method_of_Four_Russians ) has also cropped up all over the place.
Regarding the strained metaphor, the "virtual machine" is metaphorical, and it can be simply ignored because any possible implementation of a NFA is actually a DFA.
Proof: a finite amount of memory can be considered a DFA state and modifying that memory between consuming successive input symbols can be considered updating the state.
Yes, I think "virtual machine" is too loaded of a term now and is just confusing the point?
An NFA implementation can be "virtual" in the sense of "virtual memory" being virtual. It is a pretense of a practically unbounded set of states while really being bounded like any other practical implementation. It grows on demand and can handle many practical inputs while possibly hitting its implementation limits on a worst-case input.
That discussion covers some of the same ground as the Taylor series idea elsewhere in thus discussion. It doesn't get as far as inventing the zipper concept.
You can see all of these as data structures instead of programs and then derivatives transfer and generalize nicely: see Conoc McBride’s work on zippers.
One of my programming assignments for a programming languages class happened to be creating a DFA for regular expressions by taking successive derivatives, except slightly modified to use ranges because the naïve method of taking derivatives isn't great when you have to take 2^(2^16) of them when your alphabet is UTF-16 characters.
Actually, now that I think about it, the entire class was strangely centered around taking derivatives of regular expressions…
> except slightly modified to use ranges because the naïve method of taking derivatives isn't great when you have to take 2^(2^16) of them when your alphabet is UTF-16 characters.
The paper we read described how instead of deriving by a single character, one could also derive by sets of characters. I then improvised the range approach by myself - it was very cool being able to handle unicode!
I used derivatives in the regex implementation in TXR. The derivative back-end is only used for regexes that contain the "exotic" intersection and complement operators. (Or the non-greedy repetition which transforms to intersection and complement at the AST level).
The derivatives implementation starts with the appearance of the function reg_expand_nongreedy.
Wow, I see that in reg_derivative where it handles compound forms, I'm testing for some vanishingly improbable internal-error cases upfront: presence of uncompiled character classes. Fixed that.
Note that Parsing with derivatives is different than Matching regexes with derivatives.
Russ Cox likes regexes with derivatives -- they are a "win-win". He doesn't like parsing with derivatives because of the computational complexity. Linked from that article:
Your link doesn't really convince me they addressed the concerns ... There seems to be some hedging in the language, like that it seems to be efficient in practice; and that it should eventually be efficient in theory too.
I am seeing this error printed directly to the DOM:
Warning: mysql_connect() [function.mysql-connect]: Unknown MySQL server host 'mysql' (1) in /home/ltu/www/includes/database.mysql.inc on line 31
Unknown MySQL server host 'mysql' (1)
There's a pretty variant of this idea introduced in 1995 by Valentin Antimirov called partial derivatives of regular expressions. His idea was to replace the single derivative with a set of partial derivatives that add up to the whole derivative. This simplifies the derivative algorithm quite a bit, and makes it possible to write amazingly concise and efficient regular expression matchers.
E.g., here's an implementation of a regular expression compiler that uses Antimirov derivatives to take a regexp and build a table-driven matcher in about 50 lines of code:
http://semantic-domain.blogspot.com/2013/11/antimirov-deriva...