How to teach a Bayesian spam filter to play chess

npk · on Nov 2, 2007

On another random note, was anyone else dissapointed when they found out that a neural net is a fancy way of drawing a line through some points. i.e. regression.

pixcavator · on Nov 3, 2007

I was mostly relieved. I wasn't insane after all. Some problems are simple, some hard, and some can't be solved. These approaches are very indirect and people may be tempted to use them without first asking the question: can this problem be solved (in principle!) with the data that we have?

pixcavator · on Nov 2, 2007

How about teaching it how to do ADDITION?

This would be a better experiment because (1) it is simpler and faster, (2) the feedback is unambiguous, (3) the ability to add is verifiable.

Essentially you supply it with all sums of all pairs of numbers from 0 to 99 and then see if it can compute 100+100.

zyroth · on Nov 2, 2007

Bayesian classification works a bit like this: You have a set of inputs and a set of targets. By having seen a history of elements of the powerset of inputs and its manually tagged classes, the bayesian classificator learns how to classify new elements of the powerset of inputs.

Thus, a bayesian classifier has to know the classes and the inputs before. It cannot extrapolate from the sets it was trained on, since the bayesian approach does not see numbers as something that some operations are defined on, but only as symbols.

A simple backprop neural network can learn addition, though.

pixcavator · on Nov 2, 2007

Are you saying that a neural network can learn addition symbolically? Can you recommend a book or site to read about this?

dpapathanasiou · on Nov 2, 2007

Can you recommend a book or site to read about this?

David MacKay wrote a good intro book on Bayesian, neural networks, and related topics: http://www.inference.phy.cam.ac.uk/mackay/itila/

jsackmann · on Nov 2, 2007

Whether you're interested in the book or not, don't miss this page:

http://www.inference.phy.cam.ac.uk/mackay/itila/Potter.html

Should you buy the MacKay text or one by J.K. Rowling?

pixcavator · on Nov 2, 2007

I've looked at the chapter on neurall networks. It does not seem to address operations with symbols, only numbers...

hhm · on Nov 2, 2007

You can always encode symbols as numbers, and otherwise. That's what computers do, that's what you do when you do a sum by yourself, and that's what such a neural network should do too. (You should encode the symbols to a binary input, then decode the neural network output to symbols).

pixcavator · on Nov 3, 2007

So given a function with f('1','1')='2', the computer will figure out that f('1','2')='3', right?

hhm · on Nov 3, 2007

Yes, that's what I'm talking about.

pixcavator · on Nov 3, 2007

aswanson · on Nov 3, 2007

hhm is right. You would end up with a network with connection weights of 1. As long as your sigmoidal transfer functions are biased such that they have no multiplying effect your output would be an addition of the inputs.

pixcavator · on Nov 3, 2007

What if the machine does not know that it is dealing with numbers? It's all symbolic: '1'+'1'='2', etc.

machine · on Nov 3, 2007

What you are talking about is more along the lines of inductive logic programming, where the goal is to induce general symbolic rules from specific examples. Even with ILP I'm not sure how exactly you could phrase things so that the machine could learn "addition" though. An ILP system could probably learn something like a + b = c implies b + a = c (i.e. commutativity) and other properties of addition from examples.

pixcavator · on Nov 3, 2007

If it can learn these properties, it may be even better! Then it can do all computations. So, can it learn commutativity, and other properties? Would it be smart enough to look for them in the first place?

machine · on Nov 3, 2007

I don't know a lot about ILP algorithms, but my understanding of them is that they basically search the space of logical rules for rules that "explain" the examples they are given. More formally, they look for rules from which you could then prove the given examples. So in an ideal world you could give an ILP system a bunch of formulas and it would give you back the Peano axioms or something like that I'm not sure if state of the art ILP systems are good enough to do that in practice or not.

aswanson · on Nov 3, 2007

The neural network doesn't "know" that it is dealing with anything, just as you don't "know" the function your body uses to expand and contract your heart. You could be feeding it stock quotes, rgb pixel values, your daily weight, anything. If the information can reduced to numeric values (it can) the network will determine the relationship in the form of a function.

pixcavator · on Nov 3, 2007

If the input is numbers then what happens is simply extrapolation (or regression etc). If the input is symbols, then what? The fact that you can associate numbers to the symbols does not help. The extrapolation will be meaningless. Unless you happen to have the correspondence '1'->1, '2'->2, etc,

aswanson · on Nov 3, 2007

How are you defining symbols? How are you defining meaningless, or meaning at all for that matter?

pixcavator · on Nov 3, 2007

Suppose you have a function f of two variables and all you know about it is that f(x,y)=x+y for a few values of x,y. With this data, the computer extrapolates linearly and finds f(x,y) for _all_ values of x,y. The computer just "learnt" how to add! If all you have is symbols, however, you have to have numbers associated with those symbols. But you can't expect 1 correspond to '1' and 2 to '2'. As a result, your function f - and its extrapolation - will have nothing to do with addition (f won't even be linear).

aswanson · on Nov 3, 2007

So am I correct in assuming that you are looking for a machine that can do the mapping of a symbol into a form that the function can learn from? Your visual and auditory systems do this, and since they are physical systems this mapping is computable.

By the way, for the addition example , the neural network needs only one example, e.g. 1+1=2, applied repeatedly to discover the addition of any two inputs. There is no extrapolation between examples needed in this case.

pixcavator · on Nov 3, 2007

"Extrapolation is the process of constructing new data points outside a discrete set of known data points."

aswanson · on Nov 3, 2007

I have rethought what you have said, and I think I am now closer to understanding what your question is. You are posing the question of what to do with an input, not the operation itself. For instance, if I feed the network an image of 5 and 3, with the output being the image 8, the network would not perform addition but some other functional mapping. Is that the feature you are suggesting that it should 'understand'?

pixcavator · on Nov 4, 2007

aswanson · on Nov 3, 2007

Right. Confusing extrapolation with interpolation.

zyroth · on Nov 2, 2007

No, I'm saying that a neural network can learn addition.

See, a bayesian cannot really understand that there are relations between two numbers (like "is bigger than" or "is the following number of"), but a neural network can, since addition is part of a NN.

My personal recommendation on machine learning is 'Pattern Recognition and Machine Learning' by Chris Bishop. But you definately do need a solid mathematical background for that.

pixcavator · on Nov 3, 2007

Where would the idea of "is bigger than" or "is the following number of" come from if not from the person who creates the network?

zyroth · on Nov 3, 2007

Training examples.

pixcavator · on Nov 3, 2007

Computers can form concepts, really?

zyroth · on Nov 3, 2007

If you want them to learn a specific concept that we know, they can learn it, yes.

leecs1 · on Aug 5, 2008

If chess playing is a matter of classification (ignoring the complexity of input vector encoding and computational feasibility), then Bayesian filter can surely achieve this goal.

zandorg · on Nov 2, 2007

On a random note, I was first introduced to Paul Graham's writings by his article on the Bayesian spam filter he wrote.