For one thing, it's representing something that's not really a tree. "Support Vector Machine" and "Neural Networks" appear more than once as leaf nodes. Like any Chinese Encyclopedia classifications they often succumb to the temptation to add nodes that say "Other" to keep the length of the branches constant. (They could probably think of some name for what neural nets and the SVM have in common -- they've got all day to think about this stuff because they get paid to teach and to do research, it's not like they are harried practitioners.)
At some point I quit distinguishing regression and classification. There was a time when I knew some tricks for classification and regression seemed mysterious. Once I got over my mental block it seemed pretty obvious that much of my box of tricks worked for regression too.
Another issue is that it's not a good graphic for the web. You could probably print this out and read it but you can't take it in at glance on the web which destroys the purpose of it being an infographic
if you wanted to make things clear you're better off picking ONE problem and ONE method applied to the problem. the real challenges in understanding ML or DM are pretty much the same with all methods.
a graphic like that creates the illusion that it's transfered some knowledge when really it hasn't. that's why infographics are dangerous. if HN was my site i'd have it reject any article that has an image that's > 800 pixels high in it.
I look forward to your attempt!
I would put the statistics under modelling or summatory analysis, not under exploratory analysis. I wouldn't divide methods by sufficient descriptive statistics - I'd divide them by domain and loss function. I would name an artificial neural network.
Or perhaps I would arrange them by computational complexity.
That said, this is a good summary in that it's an overview, and overviews are always helpful.
The "explaining the past" (and I suppose the regression) section is covered well by most universities statistics departments, but the "predicting the future" stuff is usually limited to one or two Artificial Intelligence classes in the computer science faculty.
As in "those that belong to the Emperor", etc., as Borges put it ( http://www.multicians.org/thvv/borges-animals.html )
(I suppose, but I do have a pro-Borges bias ;-)
I'm not an expert, but in my opinion the difference is that Classification is sorting a basket of apples and bananas into two separate baskets, while Regression is predicting which fruit will come out of the basket after X apples and Y bananas.
This is what the comment means when it says the division between regression and classification is often artificial (i.e., imposed by people who are concentrating on taxonomy).
Regression is often used to mean regression with infinite range, but especially since so many classification techniques are just searching for a regular partition of an infinite range, the distinction isn't made much anymore. Whether the rare techniques that are specifically over a discrete finite range still count as regression depends on who you ask - usually yes.
While every method I can think of differentiates itself at a lower level than regression/classification, and thus has interpretations in each, it's not always feasible or easy to make the "flip". Often times the approximation steps that make the algorithm tractable depend on the objective function.
I really think this is the way complex topics need to be taught. It's so easy to get caught in the weeds and lose track of where you are in the overall picture, an approach like this is extremely helpful.