
How can one manage thousands of IF…THEN…ELSE rules? - jarospisak
http://programmers.stackexchange.com/questions/103659/how-can-one-manage-thousands-of-if-then-else-rules
======
wickedchicken
Mildly unrelated, but I wanted to point out the awesomeness of the K-map[1]
for handling simple minimization problems like these.

Unfortunately, once your dimensions get large it becomes difficult for humans
to visualize the optimizations, which is where Espresso[2][3] comes in handy.

[1] <http://en.wikipedia.org/wiki/Karnaugh_map>

[2]
[http://embedded.eecs.berkeley.edu/pubs/downloads/espresso/in...](http://embedded.eecs.berkeley.edu/pubs/downloads/espresso/index.htm)

[3] ftp://ftp.cs.man.ac.uk/pub/amulet/balsa/other-software/espresso-
ab-1.0.tar.gz

~~~
pjscott
Ooh, espresso. There was also a tool called eqntott for converting boolean
logic expressions in a human-readable form into the truth tables that Espresso
takes as input. The source code was written in an archaic dialect of C, but
here's a resurrected version that should compile on modern compilers:

<http://code.google.com/p/eqntott/>

I haven't touched this code since 2008, so beware of grues, but I remember it
being a really slick system for minimizing boolean logic when you combine
eqntott with espresso. Just the thing for an electrical engineering student
who's sick of doing K-maps, which described me nicely at the time.

~~~
wickedchicken
"This program was originally written at Berkeley in the early 80s"

This phrase gives me the same feeling I'd get from finding an old Thomas
Bangalter track or something.

~~~
anamax
Trivia - some core parts of espresso are/were part of specmarks, which are
used to evaluate computer designs and implementations.

------
_delirium
While I'm not sure thousands of if/then/else rules are actually the best
knowledge representation for his domain (it sounds like he really wants to
learn some model from data, in which case throwing some off-the-shelf machine
learning may be a better fit), the area of "expert systems" may be relevant if
they really are rules extracted directly from some source that need to be
managed, such as a human domain expert.

Expert systems were particularly big in AI in the '80s, and considerably less
hot now, but still widely used in industry. There are a lot of issues that
come up, most of which aren't really purely technical, such as: 1) how you
extract knowledge from people who might know it; and 2) how you validate that
the knowledge you've extracted is actually what they know, for example by
validating that it produces the same decisions that they would make; and 3)
how you allow updates/revisions to the knowledge base over time. Once you have
the rules and some reason to believe that they're any good, actually managing
and applying them can be done through one of several rule engines, such as
Drools or Jess.

The keyword "knowledge engineering" may also turn up relevant info.

~~~
Dn_Ab
I don't think thousands should be taken literally. I don't think he has
actually built it yet - I sort of saw it as an optimistic estimate =)

------
ck2
Is it just me? It's not a good question, it shows lack of core coding
experience.

For example a video game consists of millions of possibilities/results in
movement based on environment - it's not approached as a collection of
thousands of if/then/else statements.

In fact the movement of cows seems very much like video game coding, it needs
a "cow engine".

~~~
kamaal
How do video games solve it?

Do you have any material one can read to learn more about this?

~~~
seanmcdirmid
Often through physics engines. You express a bunch of constraints then let a
physics engine solve them on each time step (and sometimes across multiple
time steps).

Custom (non-physics) solutions involve declarative sets of constraints being
solved by some sort of engine; such solutions often start to resemble physics
engines even if the physical rules even if the constraints are not exactly
physical.

Learning how to use a physics engine, first, then how those physics engines
work, is a good start to understanding these kinds of problems.

Edit: I'm really surprised the stackexchange answers don't mention this.

~~~
flyinRyan
What would be a good place to start learning about what you're describing
here?

~~~
seanmcdirmid
I would start with Erin Catto's Box2D.

<http://box2d.org/>

There are C#, Flash, Java versions if you aren't into C++. Lots of
documentation on the web, but the getting started guide is good enough.

------
Dn_Ab
Such a great question. Want to predict cow movement while taking into
consideration sudden events, food and sun, using thousands of if then rules?
Sounds like the perfect problem for one of the following tree learners that
not only manage themselves but build the model for you, in order of
suitability to the problem* :

Genetic Programming

ADTree

Random Forest

Decision Tree

Most of the work would be in figuring out the best way to gather all the data.

*opinion.

~~~
SeanLuke
All machine learning techniques build models for you.

ADTrees, Random Forests, and Decision trees are all classifiers, so they serve
this guy's needs. Genetic Programming is not a machine learning technique: it
is an optimization method for a very specific task and is quite unsuited for
his purposes.

~~~
Dn_Ab
No, I am certain you are confusing Genetic _Programming_ [1] for Genetic
_Algorithms_ [2]. I am a fan (do a lot with them) of the prior but have never
really been impressed by the latter.

Wikipedia says Genetic Programming is a specialization of genetic algorithms
where the objects are programs. I think this is the only thing I have ever
seen where the specialization turns out to be more general than the parent.

I'd also like to point out that I although all ML learns models, all those I
gave can learn stuff as what amounts to a bunch of IF-THEN statements. Well GP
need not be so limited. And Random Forests lose the trees in the forest
(harder to interpret) but yeah, that's why I formed that particular list.

[1] <http://en.wikipedia.org/wiki/Genetic_programming>

[2] <http://en.wikipedia.org/wiki/Genetic_algorithm>

~~~
SeanLuke
> No, I am certain you are confusing Genetic Programming [1] for Genetic
> Algorithms [2].

Trust me. I am not.

~~~
Dn_Ab
Then you will know that Genetic Programming can be used for classification,
structured learning, symbolic regression and even meta learning [1]. For
classification you can build a program that searches for a program that best
uses the input to predicts classes. I _have_ done this (there are better
heuristics than the old kind that lead to really big dumb programs). Or you
can combine boosting with Genetic Programming or you can use them in Learning
Classifier Systems. All these lead to classification based on genetic
programming.

[1] <http://en.wikipedia.org/wiki/Eurisko>

As an aside, My definition of Machine learning is more inclusive than yours. I
mean if you are going to separate out optimization then I guess you don't
count stochastic gradient descent or Matrix factorization as part of machine
learning? Machine learning is basically the combination of statistics and
optimization where you can work with a lot of data and the output of your
computation is more important than the model.

------
exDM69
I suggested using Prolog in the comments. His problem description was not
quite good enough for me to assess whether or not Prolog will be a good
choice, but at least it's a candidate.

A Prolog program is essentially a list of rules that are written as an
implication. Something like:

    
    
      moves_to(Cow, Location) :-
        hungry(Cow),
        current_location(Cow, OldLoc),
        food_in(OldLoc, OldFood), food_in(Location, Food),
        Food > OldFood.
    

In human language: a cow moves to a location if it's hungry and there's more
food in that location than in current location.

I should probably write a proper answer.

edit: Added a proper answer to the stackoverflow with an almost complete
introductory Prolog example. It's waiting for your upvotes :)

~~~
cartosys
And we're waiting for a link.

~~~
exDM69
There you go: <http://programmers.stackexchange.com/a/165126/65434>

------
lutusp
You really don't want to think of this in terms of IF ... THEN ... ELSE but as
individual binary or multi-path choices. These are most efficiently handled
with hash tables that choose a route through a logic diagram in a way that is
more intuitive than If ... THEN ... ELSE as well as more efficient.

You start by writing a logic flowchart that you understand, then you reduce
the logic flowchart to code, usually automatically. This is how a logic
flowchart looks:

<http://en.wikipedia.org/wiki/File:LampFlowchart.svg>

Here is the article it appears in:

<http://en.wikipedia.org/wiki/Flowchart>

The single most important parts of this process are:

1\. That you understand the logic diagram and see that it meets the program's
requirements.

2\. That there is a way to turn the logic flowchart into code, ideally without
human intervention.

3\. That the rebuild time be short between adding a new logical step and
testing the resulting program.

~~~
JoeAltmaier
Maybe I'm off the mark, but whenever I see a long runon procedure full of if-
then-else, I think "This is a job for a state machine!" With that approach,
you can exhaustively provide actions for every combination of states and
events. You can also immediately identify the code responsible for handling
any state+event. You can also track which state+event transitions have been
tested automatically.

~~~
lutusp
> Maybe I'm off the mark, but whenever I see a long runon procedure full of
> if-then-else, I think "This is a job for a state machine!"

I'm a big fan of state machines, but systems like this are often multitasking
and (for all practical purposes) stateless. OTOH it might be more accurate to
say their state is formally unpredictable.

A system like the one being described would be more like a multi-threaded
critical-path flowchart, where multiple processes are active at a given time.
And at any time a high-level linear programming result might mandate a change
of strategy based on available resources, even though individual processes
continued to follow the tactical flowchart.

So even though a state machine approach looks attractive (as it always does),
it's a matter of deciding how many states and how many levels. Because of the
nature of the original problem and the number of connections with everyday
reality, the fact that it was a state machine might escape the attention of
even a careful observer.

------
joe_the_user
Even with the clue that he wants to "predict how cows move around in any
landscape", his description seems like it wouldn't be enough any definite
design advice.

What do the states of the cows and the landscape look like? What is changed by
the if-then-else results. Does he already have his rules? Is this for a
biology project or a video game, etc.

You could use a finite automaton system, an array of cow-states, a cow DSL or
something else depending on the answers to these and other questions.

[insert joke about "till the cows come home"]

~~~
ShabbyDoo
Yes. The piece missing for me in most answers I've seen is eventing. Does one
pump a series of events/state changes into the CowModel and see what actions
pop-out? Does the model involve interactions with other cows where the actions
of one cow cause actions of others? If so, one must have an event queue to
handle chain reactions. How discrete are the events? Is there a "temp is 89F"
event where, for every N minutes the cow is in that temperature in direct
exposure, it becomes more likely to seek shade? If so, the "it is 89F" event
must be triggered multiple times. And, such a rule requires that the past
states of the cow be retained. I think the if/then/else logic is only part of
the solution to a simulation problem of this sort.

------
gavanwoolery
A finite state machine is actually not a great solution for this particular
problem (in fact, FSMs are often over-used), IMHO. An expert system is a far
better solution (as one commenter mentioned). The problem with a FSM is that
you still must explicitly specify ALL of the nodes (without some complex
algorithm to automate such a task). An expert system, like prolog, only
requires you to state all of the rules, and then it will figure out any query
for you (via propositional logic, backwards chaining, etc). The number of
nodes typically grows exponentially with the number of rules (if the rules are
inter-related), so for a problem of any significant size an expert system
would probably be the method of choice.

------
JustLikeThat
My favorite part is the third comment in on the question:

"How can one manage thousands of IF…THEN…ELSE rules? By developing a drinking
problem"

~~~
indiecore
That solution is generalizable to the entire domain of computer science.

------
tylermauthe
Such a great answer. This is why I love Stack Exchange.

~~~
psingh
agree.

------
Swizec
Wouldn't this best be handled by something like a decision tree?

<http://en.wikipedia.org/wiki/Decision_tree>

It's basically thousands of if-this-then-that statements, but dressed up into
something you can reason about and most importantly train and build
automagically.

------
dave_sullivan
This really sounds like a machine learning problem--why hand engineer all
these rules yourself (which can be error prone in any case) when a program
could do it for you? If you can figure out how to get a large data set
together and come up with a good general representation of the data, you're
half way there.

I'd consider taking a look at recurrent neural networks if you're looking at
your data as a time series or if its more of a static problem that doesn't
take into account the last N things that came before, you might consider tree
based methods or even DBNs if you can get a lot of data together (say
50000-100000 samples)

Potential pitfall: if you're using a NN based approach, you will judge results
based on test performance, but you won't get as much insight into the "rules"
your network has learned.

------
sandGorgon
I have an axiom "any sufficiently complex if-then-else ruleset can be modeled
as a search problem with boosts".

I have modeled a logistics lookup system (assign "best" courier for an
ecommerce setup) using a search solution. The rules were modeled as documents
in a search index and looked up using weights. Fundamentally, what I did was
flatten an if-then-else hierarchy and assign weights based on (roughly) level
of nesting.

The con of this model is obvious - it is at best a heuristic. However, with
clever overrides built in (as a separate search index?), you can get pretty
close to the ideal solution.

The pro of this model is scalability - when your rules are in millions, this
system scales beautifully.

------
instakill
The question is an interesting one but it's futile in its context. You can't
build a prediction engine for cattle movement. Adding factors like weather,
food etc. seems well and dandy but there's hundreds if not thousands of other
factors that are definitely going to be missing and that will render the
prediction events pointless.

Never mind the fact that you can throw in black swan type of events into the
mix or just unknown unknowns you could never think of (maybe a cow has a frog
phobia and panics when it sees the frog and starts a stampede or whatever it
is that cattle do).

~~~
mmcnickle
You can't predict how an _individual_ cow will move, but it's perfectly
acceptable to use models to predict how cattle generally move. Similar to the
way we use models to predict how humans navigate hallways and find paths
across grass.[1] When you have a bulk of interacting particles, it tends to
remove the "personality" of the individual, leaving you with how the group as
a whole will move.

Philip Ball's (former editor of Nature) book _Critical Mass_ [2] is an
excellent read about the interesting effects that happen when you look at
large number of complex interacting actors. He discusses the effects in
traffic, pedestrian models, finance, plant growth etc. I highly recommend it.

[1]<http://www.youtube.com/watch?v=8SmRBTJ-jeU> This is just a video I picked
out quickly, there are much better resources, I just can't find them on my
phone.

[2][http://www.amazon.co.uk/Critical-Mass-Thing-Leads-
Another/dp...](http://www.amazon.co.uk/Critical-Mass-Thing-Leads-
Another/dp/0099457865/)

------
mark_l_watson
I am late to this discussion, but I would like to ad my experiences with
compiling rules to a Rete network. I have done a lot of hacking on the old
Lisp OPS5 code and found it easy to work with, once you read and understood
Charles Forgy's papers. I hacked OPS5 to support multiple data worlds and a
few other enhancements for specific projects.

The more modern Rete based software projects are probably also easy to work
with.

------
xarien
Man, I must be getting old, first thing I thought of when I saw the question
was a FSM implemented with function pointers in C.

------
ynniv
If only these questions appeared on one's résumé...

~~~
sliverstorm
What, are you insinuating that asking about great big if-then-else trees means
you'd never hire that person? Because, what, you've never been a complete
novice programmer before?

Considering the question-asker has a specific problem to solve, I would even
hazard they probably aren't a programmer by trade.

~~~
jimbobimbo
I'd say the fact that person is asking this question is a very good sign -
they have a gut feeling that something's off with IF-THEN spaghetti.

~~~
ynniv
... a very good sign that they're still too green.

When did HN become so touchy-freely? It's important that people start learning
somewhere. If you show any interest in programming, I highly recommend
learning more. However, there comes a time when you must have a certain amount
of experience. This is a profession where important things are made. Even
people who should know better end up building systems that leak our sensitive
personal information, or let others impersonate us.

I didn't say that such a person shouldn't be hired, but I certainly implied
that the original question is a better indicator of their capability than the
always positive keyword soup and "years experience" that many people slather
on their résumé.

If "avoiding IF-THEN spaghetti" is all you require from contributors, fine.
Let's not pretend that this is better than an early high school effort.

------
yabadab
This problem smells like one that should be tackled with a supervised ML
approach.

Even something as simple as a Decision Tree might be enough?

~~~
kos_mos
of course this is a machine learning problem.

------
tlogan
Prolog.

