I would argue that Prolog Definite Clause Grammars, which date back to the early 1970s, have all three of these properties. Furthermore, since the context is maintained functionally, by threading additional values through the productions, no "undo actions" are required; Prolog's built-in backtracking is all that's needed.
Of course, the problem with DCGs is performance: they're exponential in the worst case. But I think they deserve mention in a dissertation like this anyway. Also, any backtracking parser risks exponential worst-case performance; it will be interesting to see how Colm avoids this fate (I've only read the first few pages yet).
Consequently I want to rephrase the first sentence of your third paragraph. The problem with DCGs is performance: they're exponential in the naive case. That might not sound too bad today, but back in 1985 the hype was that Prolog let you program declaratively. Just declare what a parse looks like. Reasonable performance in the naive case was the popular selling point for Prolog.
Note that threading the context through the parse tree while maintaining fully generalized parsing requires keeping all versions of the parsing context in memory. Consider making a C++ parser in that way ... ie every time you modify the structures you build in memory you make a copy of them first.
Obviously this makes no sense for C++, but Clojure and OCaml get away with it and in Haskell it's the standard way of implementing almost every stateful computation.
To address what you meant ... it's more a matter of what's practical. For example, to parse C++ one needs to do quite a bit of semantic analysis during parsing. A full implementation of the type system is necessary. Then the data structures change with every statement/clause parsed. Achieving this with persistent data structures is far too costly.
Edit: indeed I did not follow what you meant.
A simple example would be parsing a sequence of characters and storing that in a linked list. When you get an additional character you don't destructively modify the list, instead you create a new list node that points to the existing list.
def process(char, state):
return Node(char, state)
Your method is probably more efficient. Because you don't need access to all historical versions simultaneously, you only have to keep a stack of undo operations. So you only have O(1) slowdown per operation. With functional data structures you have in the worst case a O(log n) slowdown per operation.
For example if you implement a bit vector, then update and lookup can both be O(log n), e.g. a red-black tree. If you want update to be O(1) then lookup becomes O(n), e.g. a linked list. If you want lookup to be O(1) then update becomes O(n), e.g. copying the entire vector on update.
(please correct me if I'm wrong and if you can do better than O(n) for those cases)
I use the term "copy" in the general sense to make discussion easier. Conceptually, persistent data structures are a copy, even if they are optimized heavily to incur very little cost.
I think it's generally a hard sell if you try to convince people that they need to write their algorithms in your special language. Parsing tools deliver value because grammars are easier to write than the imperative code that implements those grammars. That value offsets the cost of having to learn a new special-purpose language. But imperative programming languages are already pretty good at tree traversal and transformation, so there's little benefit to using a special-purpose language for this.
I think that the next big thing in parsing will be a runtime that easily integrates into other languages so that the parsing framework can handle only the parsing and all of the tree traversal and transformation can be performed using whatever language the programmer was already using. This requires much less buy-in to a special-purpose language.
You are right, people want to use general purpose languages for the more complex algorithms. I agree a means of embedding is necessary and I have kept this in mind, though not yet achieved it. I would very much like to be able to parse, transform, then have the option to import the data into another environment and carry on there.
There is also a print_xml function, which puts the tree into XML, but it's mostly used for debugging at this point, not export to other systems. I'm hoping that with time these kinds of features will crop up.
With this approach, you could have a C extension that could load any grammar at runtime and parse it extremely fast.
C/Assembly code is orders of magnitude faster at parsing than generating eg. Ruby that does the parsing.
Then I would hazard that it is not yet a language as without documentation it has no "grammar". At best it is a patois.
I am really interested in DSNP and am fairly well versed in GNU/Linux and can do some programming, Java and Python mostly. I work as a web-frontend developer guy. Can I be of some help? Do you need testers, peers, documenters?
I need help from people like you actually. What I've done.
1. defined the protocol
2. implmented it in a C++ daemon that
a) talks to other daemons
b) serves the content managers (frontend UIs)
What needs to happen next is step 3 needs to be repeated by other people who know what they are doing. They don't need to understand the details of the protocol, they just need to understand the basic model, which is just message broadcast, distributed agreement, etc.
Email me for more details, will get back to you later tonight.