* Instance-based methods can adapt to any kind of pattern if you have enough data. This is a well-known result in machine learning -- the interesting question is simply how much data do you need?
* It's quite remarkable how well RBF SVMs do in almost all cases. They even outperformed a 2nd degree polynomial kernel SVM on a 2nd degree polynomial!
* Logistic regression is pretty terrible for anything except "easy" data -- simple linear data.
* Random forests are sometimes ok, but tend to prefer axis-aligned data (at least in his formulation).
* Meta-point: all the tests shown are with "clean" data...i.e., there are no "wrong" training examples. This is unrealistic, and in practice makes a HUGE difference for some of these methods. E.g., a lot of the rule-based methods get demolished by even a little bit of wrong data. In contrast, SVMs have a slack variable that can tolerate some amount of noise, and would probably shine even more on such data.
I think (not a ML expert by any means) that RBF should probably outperform most fixed-integer polynomial kernels, but it does so at the cost of introducing more structural risk into the model.
What do the cognoscenti recommend for doing automatic text classification/categorization? I have been looking at Spam filters, and they're mostly boolean type predicates that return Spam/NotSpam results along with a confidence number. I want to be able to do that same for a large number of categories.
If you have a lot of training data and computational power, an SVM will almost always outperform naive bayes, but they take longer to train and tune. Try naive bayes first.