Unmentioned in the article is the fact that the Rete algorithm, due to how it breaks down and deduplicates predicates, has O(1) scaling behavior respective to the number of rules in the system, with the trade off of needing sufficient memory for the objects to be matched and the predicate network. It is completely fascinating to me that a rule system with one million rules could execute in the same time as one with a single rule.
I've become mildly obsessed with the potential applications of that behavior, and one of my favorite ideas (of which I'm mostly unqualified to implement) is a compiler super-optimizer, storing the rewriting rules as individual rules in the rule engine. This would also allow for adding interesting optimizations that might be purposefully omitted from existing optimizing compilers due to their infrequent or expensive matching. It would also allow for optimizations anywhere in the compiler pipeline without having to worry much about modularity, which could open up the possibility of macro-level optimizations at the AST level (for example, rewriting an array of objects as an object of arrays to speed up folds and maps over the array), or the machine code level (taking advantage of small differences in instruction sets and other on-die resources).
That being said, I'm pretty sure the existing Rete implementations are poorly suited to matching on subtrees as would probably be required for this type of optimization, and the memory requirements would mean compilation would likely need to be done on servers. Tradeoffs aplenty, but I would still love to see it happen.
RETE is often used as the inference engine for RDF/OWL/SPARQL servers. (I believe Sesame/SAIL was one of the first in early 2000s using Jena [1]) Since there are many graph databases now, some have added RDF/OWL and inference support.
You can find a matching subtree in time independent of the number of patterns using essentially the same idea as the Aho-Corasick method of string pattern matching. This is done in term rewriting systems -- e.g. I just looked up the Pure term-rewriting language and this code appears to work that way (just going by the comment):
You could try out your idea by expressing your optimizer rules as a program in Pure. I don't know if after a rewrite they rescan the tree from the root, or try to get smarter and restart the matcher automaton from a point only just far enough up the tree (and back in the matcher) to allow for any change from the rewrite.
It might be interesting to apply Rete directly to the tree represented as a set of tuples, for instance
I'd be surprised if this worked out better in general, but maybe there are some interesting cases.
BTW the GHC compiler for Haskell lets library authors define optimizations by rewrite rules -- like your array of records to record of arrays, I guess.
I did some work with Charles Forgy, and if I remember correctly, the challenge with RETE was that it scaled very well with a large number of rules (as you note), but it scaled very poorly with the size of the data set against which those rules were applied. He subsequently developed versions of RETE that addressed some of the weaknesses in the original algorithm, but the subsequent versions were not released in the public domain. For example, RETE3 (early 2000s) is now owned by Fair, Isaac.
Heh. I also ported Forgy's VPS5 to David Betz's XLISP around that same time, posting it to BiX (anyone remember BiX?). Ran a handful of guess-the-animal toys on it then turned my attention elsewhere.
I remember playing with this and other techniques in the past. Gensym's G2 was one of most successful commercial implementations back then. I did a knockoff cuz I wasn't going to pay them. I ran into the same problem others did: maintaining and especially debugging rules. It got really difficult as the rulebase increased. It was like programming with just if-then rules.
I think our modern languages with fast interpreters or compilers are a better route. The O(1) point saosebastiao brought up was a nice benefit. It does bring clarity as another commenter said. So, if we use it, probably best to use it selectively as a DSL in a LISP or ML language.
Note: The author's name was a surprise, too. I didn't realize Bruce wrote on that topic.
those early 2000's must have been the very brief golden age of rules engines/expert systems... but coincidentally, i was just recently thinking about doing some research on the current state of rules engines to see if there might be a viable option today for matching candidates to jobs.
my very first consulting project (yes, i had absolutely no experience), around that time, was implementing blaze advisor (now owned by fico: http://www.fico.com/en/products/fico-blaze-advisor-decision-...) to migrate a mainframe-based automated life underwriting system to a j2ee client/server system.
performance was a major concern, so i read Forgy's 1982 paper about RETE (that i didn't quite understand frankly), but the key takeaway (for me at least) was that all the hard work happened during compilation so running performance shouldn't be an issue. from what i dimly remember, the algorithm iterates through the rules set building something akin to a decision tree until the tree reaches a stable state. if it never reaches stability, it will throw a compile error. once the tree is built, matching is very fast.
the holy grail was that "business analysts" (remember when that title was in vogue? =) could write the rules, saving you months of precious time and thousands of dollars by not paying programmers (it did not turn out that way of course).
A lot is known about matching. A lot of the work boils down to min cost network flows on a network with arcs with costs per unit flow and a maximum flow that is an integer. The problem, then, is integer linear programming but in this case the old linear programming simplex algorithm returns integer results for no extra effort. A nice modification is due to W. Cunningham, one of my profs, later Chair at Waterloo's Department of Combinatorics and Optimization.
are you talking about codifying the matching rules as cost functions and then optimizing that as a linear system of equations? that's something i remember doing in college, but it's been a while. =) are there any existing software tools out there that helps in that regard? i'm not sure it fits my use case, but it's good to explore.
i'm working on a platform that helps hourly workers find jobs. today that matching is fairly straightforward, but it will get more complicated over time as we get develop better heuristics for matching candidates. it would be nice to be declarative about the matching rules and let the system figure out how to do the matching.
I wonder how many developers nowadays will consider a rules engine or decision tables to implement those underwriting rules. It gets pretty complex pretty quick when you look at the rules in the financial services industry (pensions, loans, etc.). I also see the application of DSLs as well, with the same idea of being more productive with less code.
I've been using products that have a RETE implementation for a while. While learning scala/akka, I decided to give it a try and build a production rules system myself. For a basic implementation, see https://github.com/bridgeworks-nl/scala-rete
At IBM's Watson lab, our team
did some work in OPS5 with
the RETE algorithm.
The resulting program was called
an expert system.
The team then continued with
a new programming language,
YES/L1. Using DeRemer's LALR
work in parsing, YES/L1 was
a preprocessor to IBM's language
PL/I.
The language had extensions
for expert system rules,
and that was based on the RETE algorithm.
We IBM made YES/L1 an IBM Program
Product, their highest quality
software product category,
with name KnowledgeTool.
Our team worked with GM
to make a deployment.
We wrote several papers
and gave one at an AAAI IAAI
conference at Stanford.
The work continued with
Resource Object Data Manager (RODM).
The role of RODM was automation for
system and network monitoring
and management.
So, RODM was some active storage
that sat between (A) some target
systems in a network or server farm
and (B) some managing systems
for the target systems.
For this work, RODM permitted building
software objects. The objects
were in an inheritance hierarchy.
Each object
had fields for data and methods
with code. The intention was that
the code would be rules written in
KnowledgeTool.
The objects in RODM were dynamic
in that they could be changed, say,
roughly like files and directories
in a traditional hierarchical
file system, during execution.
There was use of
Ronald Fagin,
Jurg Nievergelt,
Nicholas Pippenger,
H. Raymond Strong,
'Extendible hashing—a fast access method for dynamic files',
"ACM Transactions on Database Systems",
ISSN 0362-5915,
Volume 4, Issue 3, September 1979,
Pages: 315 - 344.
for keeping track of the names
and Cartesian trees for
dynamic storage management.
RODM had transactional integrity
with automatic deadlock
detection and resolution.
The role of RODM was
(A) have a place to write
software for system and network
automation,
(B) have tools, infrastructure, etc.
such code needed
readily available,
(C) be able to write code
using expert system rules,
e.g., in KnowledgeTool,
(D) present to the writer of
rules and code an automatically
up to date object model
of the target systems,
(D) via the transactional integrity, etc.,
have coordination
among several rules and
several systems much as
in relational database.
The most common approach to monitoring
was use of thresholds
on variables one at a time.
Such monitoring is necessarily
essentially a continually applied
statistical hypothesis test
with null hypothesis that
the system is healthy and a
detection of a problem is
essentially the same as
rejection of that null hypothesis.
Well, to have a more powerful
test, that is, better combinations
of rates of false alarms and
missed detections, it would
be good to have the tests be
multi-dimensional, that is,
consider several variables jointly
instead of separately and independently.
So, wanted some statistical hypothesis
tests that were multi-dimensional.
Next, commonly in statistical hypothesis
testing, know the probability distribution
of the data under the null hypothesis,
but there have been tests that were
distribution-free, that is, that made
no assumptions about probability
distributions. For the work
in system monitoring and management,
with multi-dimensional data,
distribution-free tests were
essential since in practice finding the
distributions of multi-dimensional
data was hopeless.
So, needed some statistical hypothesis
tests that were both
multi-dimensional and distribution-free.
So, invented some.
The intention was that
such hypothesis tests would
commonly be better means of
problem detection than
the usual rule left hand sides, i.e.,
When CPU_busy > 90% AND
paging_rate > 100 AND
later shipped by IBM as part
of NetView system monitoring and
management automation.
An intention was that
rules written in KnowledgeTool
would run in RODM.
I remember the RETE algorithm and some of
its applications.
The RETE algorithm was used in a project
for some work in artificial intelligence
expert systems for server farm and network
monitoring and management at the IBM
Watson lab in Yorktown Heights.
The work started with the programming
language OPS5.
Then the project invented a new
programming -- language Yorktown Expert
System Language 1, YES/L1. This language
was a pre-processor to IBM's programming
language PL/I. The pre-processing made
use of the DeRemer LALR parsing tools.
So, YES/LI was compiled.
And YES/LI had extensions for artificial
intelligence expert systems based on the
C. Forgy RETE algorithm and its working
memory.
IBM shipped YES/L1 as the IBM Program
Product KnowledgeTool.
Applying KnowledgeTool, we did joint work
with GM.
We published several papers and gave one
at a Stanford conference of the AAAI IAAI
on Innovative Applications of Artificial
Intelligence.
The work of the project continued with
Resource Object Data Manager (RODM). RODM
was more infrastructure for monitoring
and management of server farms and
networks.
The architectural idea of RODM was:
managing systems <--> RODM <--> managed
systems
where the <--> is for two-way
communications.
So, the role of RODM in the middle was
something like the role of relational
database between an applications program
and a computer operating system file
system, that is, to provide
infrastructure, functionality, and tools
to make the work easier, better organized,
more reliable, etc.
Or, the managing systems
didn't communicate directly
with the managed systems but
only through RODM.
So, in particular, RODM had, similar to
relational database, transactional
integrity. Also RODM had monotone
locking protocol with automatic deadlock
detection and resolution with roll-back
and restart, etc.
RODM was object-oriented with an
inheritance hierarchy. Each RODM object
had data fields and also methods. Some of
the methods were change methods and
would fire (execute) when an associated
field changed.
The object hierarchy was not static but
was dynamic, that is, could change
during execution, that is, during
real-time monitoring and management. Or,
the hierarchy was dynamic something like
an operating system hierarchical file
system that could change during execution
of the operating system.
So, to keep track of the names in the
object hierarchy, there was use of
Ronald Fagin, Jurg Nievergelt, Nicholas
Pippenger, H. Raymond Strong, 'Extendible
hashing—a fast access method for dynamic
files', "ACM Transactions on Database
Systems", ISSN 0362-5915, Volume 4, Issue
3, September 1979, Pages: 315 - 344.
The associated dynamic storage management
was based on Cartesian trees.
The intention was that the methods could
be written in rules in KnowledgeTool.
So, broadly, at the architectural level,
RODM presented to the managing systems a
model of, abstractions of, the managed
systems with infrastructure, tools,
handling the real-time aspects,
coordination, e.g., via transactional
integrity, etc.
Later RODM was shipped as part of IBM's
NetView systems management product.
The most common approach to system
monitoring was thresholds on variables
with values from the managed systems, say,
CPU busy, page faults per second, disk
storage used, database transactions per
second, network data rate.
Then essentially necessarily, monitoring
with such variables is a case of
statistical hypothesis tests with rates of
false positives (false alarms, Type I
error) and false negatives (missed
detections, Type II error).
Here what statistical hypothesis
testing calls the null hypothesis
is an assumption that the
target system being monitored
is healthy,
and when the hypothesis test
rejects the null hypothesis
is a detection of something
wrong with the target system.
Such a statistical hypothesis test has
higher power, such system monitoring has
higher quality, when, for a given rate of
false alarms, the rate of missed
detections is lower, that is, the
detection rate of real problems is higher.
In principle the
quality
can be improved
by processing the
variables jointly
instead of separately.
For this, we need
a statistical hypothesis test
that is multi-dimensional.
Commonly a statistical
hypothesis test
knows and exploits the
probability distribution
of the input data
assuming that the
system is
healthy (the conditional
distribution given that the
system is healthy).
Alas, in the context
of system monitoring and management,
with multi-dimensional data,
in practice,
knowing this probability distribution
is asking too much.
So, we need
statistical hypothesis
tests that
are distribution-free,
that is, that make
no assumptions about probability
distributions.
So, for improving the methods in the
objects in RODM for system monitoring, I
invented a large collection of such
statistical hypothesis tests that were
both multi-dimensional and
distribution-free.
So, in the sense of
rules, expert systems,
and the RETE algorithm,
might change
a rule from
When CPU_busy > 95% AND paging_rate >
100 AND transaction_rate < 10 Then
system_thrashing = TRUE;
to just a
three dimensional
statistical hypothesis
test on the three
variables
CPU_busy,
paging_rate,
transaction_rate.
So, in this way are doing what has been
called behavioral monitoring where we
look for what is unusual.
"IDFAQ: What is behavior based Intrusion
Detection?"
by Herve Debar, IBM Zurich Research
Laboratory with
"Therefore, the intrusion detection system
might be complete (i.e. all attacks should
be caught), but its accuracy is a
difficult issue (i.e. you get a lot of
false alarms)."
Well, commonly with a statistical
hypothesis test and with the tests I
invented, we get to select the false alarm
rate in advance, in small steps over a
wide range, and get that rate essentially
exactly in practice. So, such tests solve
the problem of
"you get a lot of false alarms".
That is, can set the false alarm rate as
low as one pleases and get that rate in
practice.
I've become mildly obsessed with the potential applications of that behavior, and one of my favorite ideas (of which I'm mostly unqualified to implement) is a compiler super-optimizer, storing the rewriting rules as individual rules in the rule engine. This would also allow for adding interesting optimizations that might be purposefully omitted from existing optimizing compilers due to their infrequent or expensive matching. It would also allow for optimizations anywhere in the compiler pipeline without having to worry much about modularity, which could open up the possibility of macro-level optimizations at the AST level (for example, rewriting an array of objects as an object of arrays to speed up folds and maps over the array), or the machine code level (taking advantage of small differences in instruction sets and other on-die resources).
That being said, I'm pretty sure the existing Rete implementations are poorly suited to matching on subtrees as would probably be required for this type of optimization, and the memory requirements would mean compilation would likely need to be done on servers. Tradeoffs aplenty, but I would still love to see it happen.