
The Rete Matching Algorithm (2002) - ohaikbai
http://www.drdobbs.com/architecture-and-design/the-rete-matching-algorithm/184405218
======
saosebastiao
Unmentioned in the article is the fact that the Rete algorithm, due to how it
breaks down and deduplicates predicates, has O(1) scaling behavior respective
to the number of rules in the system, with the trade off of needing sufficient
memory for the objects to be matched and the predicate network. It is
completely fascinating to me that a rule system with one million rules could
execute in the same time as one with a single rule.

I've become mildly obsessed with the potential applications of that behavior,
and one of my favorite ideas (of which I'm mostly unqualified to implement) is
a compiler super-optimizer, storing the rewriting rules as individual rules in
the rule engine. This would also allow for adding interesting optimizations
that might be purposefully omitted from existing optimizing compilers due to
their infrequent or expensive matching. It would also allow for optimizations
_anywhere_ in the compiler pipeline without having to worry much about
modularity, which could open up the possibility of macro-level optimizations
at the AST level (for example, rewriting an array of objects as an object of
arrays to speed up folds and maps over the array), or the machine code level
(taking advantage of small differences in instruction sets and other on-die
resources).

That being said, I'm pretty sure the existing Rete implementations are poorly
suited to matching on subtrees as would probably be required for this type of
optimization, and the memory requirements would mean compilation would likely
need to be done on servers. Tradeoffs aplenty, but I would still love to see
it happen.

~~~
hendler
RETE is often used as the inference engine for RDF/OWL/SPARQL servers. (I
believe Sesame/SAIL was one of the first in early 2000s using Jena [1]) Since
there are many graph databases now, some have added RDF/OWL and inference
support.

[1] -
[https://jena.apache.org/documentation/inference/](https://jena.apache.org/documentation/inference/)

Related
[https://en.wikipedia.org/wiki/Semantic_reasoner](https://en.wikipedia.org/wiki/Semantic_reasoner)

Some more background
[http://www.stat.ucla.edu/~tukw/waim05.pdf](http://www.stat.ucla.edu/~tukw/waim05.pdf)

------
mark_l_watson
I ported OPS5 to both Xerox 1108 Lisp Machines and the Macintosh in 1984. I
did this for internal use but my company also sold theses ports.

OPS5 was very hackable. For one project I modified the Rete algorithm to
support multiple data worlds. Fun times!

~~~
dandrews
Heh. I also ported Forgy's VPS5 to David Betz's XLISP around that same time,
posting it to BiX (anyone remember BiX?). Ran a handful of guess-the-animal
toys on it then turned my attention elsewhere.

As you say: fun times!

------
nickpsecurity
I remember playing with this and other techniques in the past. Gensym's G2 was
one of most successful commercial implementations back then. I did a knockoff
cuz I wasn't going to pay them. I ran into the same problem others did:
maintaining and especially debugging rules. It got really difficult as the
rulebase increased. It was like programming with just if-then rules.

I think our modern languages with fast interpreters or compilers are a better
route. The O(1) point saosebastiao brought up was a nice benefit. It does
bring clarity as another commenter said. So, if we use it, probably best to
use it selectively as a DSL in a LISP or ML language.

Note: The author's name was a surprise, too. I didn't realize Bruce wrote on
that topic.

------
rch
CLIPS is a useful implementation written in C, that is explicitly distributed
as public domain software:

[https://en.m.wikipedia.org/wiki/CLIPS](https://en.m.wikipedia.org/wiki/CLIPS)

------
clairity
those early 2000's must have been the very brief golden age of rules
engines/expert systems... but coincidentally, i was just recently thinking
about doing some research on the current state of rules engines to see if
there might be a viable option today for matching candidates to jobs.

my very first consulting project (yes, i had absolutely no experience), around
that time, was implementing blaze advisor (now owned by fico:
[http://www.fico.com/en/products/fico-blaze-advisor-
decision-...](http://www.fico.com/en/products/fico-blaze-advisor-decision-
rules-management-system)) to migrate a mainframe-based automated life
underwriting system to a j2ee client/server system.

performance was a major concern, so i read Forgy's 1982 paper about RETE (that
i didn't quite understand frankly), but the key takeaway (for me at least) was
that all the hard work happened during compilation so running performance
shouldn't be an issue. from what i dimly remember, the algorithm iterates
through the rules set building something akin to a decision tree until the
tree reaches a stable state. if it never reaches stability, it will throw a
compile error. once the tree is built, matching is very fast.

the holy grail was that "business analysts" (remember when that title was in
vogue? =) could write the rules, saving you months of precious time and
thousands of dollars by not paying programmers (it did not turn out that way
of course).

~~~
graycat
> for matching candidates to jobs.

A lot is known about matching. A lot of the work boils down to min cost
network flows on a network with arcs with costs per unit flow and a maximum
flow that is an integer. The problem, then, is integer linear programming but
in this case the old linear programming simplex algorithm returns integer
results for no extra effort. A nice modification is due to W. Cunningham, one
of my profs, later Chair at Waterloo's Department of Combinatorics and
Optimization.

What matching problem do you have in mind?

~~~
clairity
are you talking about codifying the matching rules as cost functions and then
optimizing that as a linear system of equations? that's something i remember
doing in college, but it's been a while. =) are there any existing software
tools out there that helps in that regard? i'm not sure it fits my use case,
but it's good to explore.

i'm working on a platform that helps hourly workers find jobs. today that
matching is fairly straightforward, but it will get more complicated over time
as we get develop better heuristics for matching candidates. it would be nice
to be declarative about the matching rules and let the system figure out how
to do the matching.

~~~
graycat
I used to work in a project that was trying to program using rules. I gave up
on rules.

Sorry about that: I just couldn't see rules as an effective way to specify a
problem or get a solution.

------
hukola
I've been using products that have a RETE implementation for a while. While
learning scala/akka, I decided to give it a try and build a production rules
system myself. For a basic implementation, see
[https://github.com/bridgeworks-nl/scala-rete](https://github.com/bridgeworks-
nl/scala-rete)

------
dm3
For those interested - there's a Clojure RETE implementation:

[https://github.com/rururu/rete4frames](https://github.com/rururu/rete4frames)

------
mml
If interested, there's a js/node rete called "nools"

------
graycat
At IBM's Watson lab, our team did some work in OPS5 with the RETE algorithm.
The resulting program was called an _expert system_.

The team then continued with a new programming language, YES/L1. Using
DeRemer's LALR work in parsing, YES/L1 was a preprocessor to IBM's language
PL/I. The language had extensions for expert system _rules_ , and that was
based on the RETE algorithm.

We IBM made YES/L1 an IBM Program Product, their highest quality software
product category, with name KnowledgeTool.

Our team worked with GM to make a deployment. We wrote several papers and gave
one at an AAAI IAAI conference at Stanford.

The work continued with Resource Object Data Manager (RODM). The role of RODM
was automation for system and network monitoring and management.

So, RODM was some _active_ storage that sat between (A) some _target_ systems
in a network or server farm and (B) some _managing_ systems for the target
systems.

For this work, RODM permitted building software _objects_. The objects were in
an inheritance hierarchy. Each object had fields for data and methods with
code. The intention was that the code would be rules written in KnowledgeTool.

The objects in RODM were _dynamic_ in that they could be changed, say, roughly
like files and directories in a traditional hierarchical file system, during
execution. There was use of

Ronald Fagin, Jurg Nievergelt, Nicholas Pippenger, H. Raymond Strong,
'Extendible hashing—a fast access method for dynamic files', "ACM Transactions
on Database Systems", ISSN 0362-5915, Volume 4, Issue 3, September 1979,
Pages: 315 - 344.

for keeping track of the names and Cartesian trees for dynamic storage
management.

RODM had _transactional integrity_ with automatic deadlock detection and
resolution.

The role of RODM was (A) have a place to write software for system and network
automation, (B) have tools, infrastructure, etc. such code needed readily
available, (C) be able to write code using expert system rules, e.g., in
KnowledgeTool, (D) present to the writer of rules and code an automatically up
to date object _model_ of the target systems, (D) via the transactional
integrity, etc., have _coordination_ among several rules and several systems
much as in relational database.

The most common approach to monitoring was use of _thresholds_ on variables
one at a time.

Such monitoring is necessarily essentially a continually applied statistical
_hypothesis test_ with _null hypothesis_ that the system is healthy and a
_detection_ of a problem is essentially the same as _rejection_ of that null
hypothesis.

Well, to have a more _powerful_ test, that is, better combinations of rates of
false alarms and missed detections, it would be good to have the tests be
_multi-dimensional_ , that is, consider several variables _jointly_ instead of
separately and independently. So, wanted some statistical hypothesis tests
that were multi-dimensional.

Next, commonly in statistical hypothesis testing, know the probability
distribution of the data under the null hypothesis, but there have been tests
that were _distribution-free_ , that is, that made no assumptions about
probability distributions. For the work in system monitoring and management,
with multi-dimensional data, _distribution-free_ tests were essential since in
practice finding the distributions of multi-dimensional data was hopeless.

So, needed some statistical hypothesis tests that were both multi-dimensional
and distribution-free. So, invented some.

The intention was that such hypothesis tests would commonly be better means of
problem detection than the usual rule _left hand sides_ , i.e.,

    
    
         When CPU_busy > 90% AND 
         paging_rate > 100 AND
    
    
    
    
    
    
    
    
    
    

later shipped by IBM as part of NetView system monitoring and management
automation.

An intention was that rules written in KnowledgeTool would run in RODM.

~~~
graycat
This post was due to a system error of some kind and before the post was
completed. The completed post is now also in this thread at
[https://news.ycombinator.com/item?id=11366724](https://news.ycombinator.com/item?id=11366724)

------
qznc
I still believe it might be a good idea for instruction selection, if some
compiler researcher needs an idea. ;)

------
graycat
I remember the RETE algorithm and some of its applications.

The RETE algorithm was used in a project for some work in artificial
intelligence expert systems for server farm and network monitoring and
management at the IBM Watson lab in Yorktown Heights.

The work started with the programming language OPS5.

Then the project invented a new programming -- language Yorktown Expert System
Language 1, YES/L1. This language was a pre-processor to IBM's programming
language PL/I. The pre-processing made use of the DeRemer LALR parsing tools.
So, YES/LI was compiled.

And YES/LI had extensions for artificial intelligence expert systems based on
the C. Forgy RETE algorithm and its _working memory_.

IBM shipped YES/L1 as the IBM Program Product KnowledgeTool.

Applying KnowledgeTool, we did joint work with GM.

We published several papers and gave one at a Stanford conference of the AAAI
IAAI on Innovative Applications of Artificial Intelligence.

The work of the project continued with Resource Object Data Manager (RODM).
RODM was more _infrastructure_ for monitoring and management of server farms
and networks.

The _architectural_ idea of RODM was:

managing systems <\--> RODM <\--> managed systems

where the <\--> is for two-way communications.

So, the role of RODM _in the middle_ was something like the role of relational
database _between_ an applications program and a computer operating system
file system, that is, to provide infrastructure, functionality, and tools to
make the work easier, better organized, more reliable, etc.

Or, the managing systems didn't communicate directly with the managed systems
but only through RODM.

So, in particular, RODM had, similar to relational database, _transactional
integrity_. Also RODM had monotone locking protocol with automatic deadlock
detection and resolution with roll-back and restart, etc.

RODM was _object-oriented_ with an inheritance hierarchy. Each RODM object had
data fields and also methods. Some of the methods were _change methods_ and
would _fire_ (execute) when an associated field changed.

The object hierarchy was not _static_ but was _dynamic_ , that is, could
change during execution, that is, during real-time monitoring and management.
Or, the hierarchy was _dynamic_ something like an operating system
hierarchical file system that could change during execution of the operating
system.

So, to keep track of the names in the object hierarchy, there was use of

Ronald Fagin, Jurg Nievergelt, Nicholas Pippenger, H. Raymond Strong,
'Extendible hashing—a fast access method for dynamic files', "ACM Transactions
on Database Systems", ISSN 0362-5915, Volume 4, Issue 3, September 1979,
Pages: 315 - 344.

The associated dynamic storage management was based on Cartesian trees.

The intention was that the methods could be written in rules in KnowledgeTool.

So, broadly, at the architectural level, RODM presented to the managing
systems a _model_ of, _abstractions_ of, the managed systems with
infrastructure, tools, handling the real-time aspects, coordination, e.g., via
transactional integrity, etc.

Later RODM was shipped as part of IBM's NetView systems management product.

The most common approach to system monitoring was _thresholds_ on variables
with values from the managed systems, say, CPU busy, page faults per second,
disk storage used, database transactions per second, network data rate.

Then essentially necessarily, monitoring with such variables is a case of
statistical hypothesis tests with rates of false positives (false alarms, Type
I error) and false negatives (missed detections, Type II error).

Here what statistical hypothesis testing calls the _null hypothesis_ is an
assumption that the target system being monitored is _healthy_ , and when the
hypothesis test _rejects the null hypothesis_ is a _detection_ of something
wrong with the target system.

Such a statistical hypothesis test has higher _power_ , such system monitoring
has higher _quality_ , when, for a given rate of false alarms, the rate of
missed detections is lower, that is, the detection rate of real problems is
higher.

In principle the quality can be improved by processing the variables _jointly_
instead of separately. For this, we need a statistical hypothesis test that is
_multi-dimensional_. Commonly a statistical hypothesis test knows and exploits
the probability distribution of the input data assuming that the system is
healthy (the _conditional distribution_ given that the system is healthy).
Alas, in the context of system monitoring and management, with multi-
dimensional data, in practice, knowing this probability distribution is asking
too much. So, we need statistical hypothesis tests that are _distribution-
free_ , that is, that make no assumptions about probability distributions.

So, for improving the methods in the objects in RODM for system monitoring, I
invented a large collection of such statistical hypothesis tests that were
both multi-dimensional and distribution-free.

So, in the sense of _rules_ , expert systems, and the RETE algorithm, might
change a rule from

    
    
         When CPU_busy > 95% AND paging_rate >
         100 AND transaction_rate < 10 Then
         system_thrashing = TRUE;
    

to just a three dimensional statistical hypothesis test on the three variables
CPU_busy, paging_rate, transaction_rate.

So, in this way are doing what has been called _behavioral monitoring_ where
we look for what is unusual.

At

[http://www.sans.org/resources/idfaq/behavior_based.php](http://www.sans.org/resources/idfaq/behavior_based.php)

is

"IDFAQ: What is behavior based Intrusion Detection?"

by Herve Debar, IBM Zurich Research Laboratory with

"Therefore, the intrusion detection system might be complete (i.e. all attacks
should be caught), but its accuracy is a difficult issue (i.e. you get a lot
of false alarms)."

Well, commonly with a statistical hypothesis test and with the tests I
invented, we get to select the false alarm rate in advance, in small steps
over a wide range, and get that rate essentially exactly in practice. So, such
tests solve the problem of

"you get a lot of false alarms".

That is, can set the false alarm rate as low as one pleases and get that rate
in practice.

------
blohard
Holy 1980s era AI, batman!

