What is NOT in this document:
- Although most aspects of the algorithms have been implemented and tested in
software, none of the test results are currently included.
-There is no description of how the algorithms can be applied to practical problems.
Missing is a description of how you would convert data from a sensor or database
into a distributed representation suitable for the algorithms.
So until someone has tried it on something, we don't know how it will perform :s
Basically, what problem (problems?) does this algorithm actually solve?
No references in their paper.
The smells like Wolfram's "New Kind of Science" type work. And they are welcome to prove me wrong, I like AI topics. But it is isolated from other peer work, when AI field is not new. So kind of suspicious.
However, Hawkins work is similar to Deep Neural Nets in that both are unsupervised neural nets (unlike perceptrons and back-prop) and both aim to represent data as sparse representations. Also, here's a similar question I answered on Quora: http://www.quora.com/Machine-Learning/Does-Numenta-build-on-...
In addition, Numenta does use CLA's to solve real problems for customers: https://www.groksolutions.com/solutions.html
Finally, Hawkins' book, On Intelligence, has at least inspired researchers like Stanford's Andrew Ng at a philosophical level which he talks about IIRC in his online ML course. https://class.coursera.org/ml/class
"Systematic experiments on three line-drawing datasets have been carried out to better understand HTM peculiarities and to extensively compare it against other well-know pattern recognition approaches. Our results prove the effectiveness of the new algorithms introduced and that HTM, even if still in its infancy, compares favorably with other existing technologies."
If I recall, Numenta's model is said to excel in the case where the data it's being trained on is similar to the kind the human brain tends to process: sensory data with temporally-local "features" (letters on a page, instruments in a song), where the features form a hierarchy (shapes->letters->words->sentences, frequencies->notes->chords->riffs->melodies, etc.)
This is an extremely misleading statement. There is nothing naive about probabilistic graphical models. It's a massive topic with a huge number of different model variations, training methods, and techniques to avoid overfitting or do feature selection.
More many probably you probably only want a linear classifier, and a discriminative one at that. But if your data is not linearly separable, or has structure, then you'll want something else again.
I didn't mean that Bayes nets are themselves naive--they're extremely good models for many use-cases. In fact, they're eventually ideal models for all use-cases, given enough training, since all ML models are ways to approximate Bayes' law, while Bayes nets just solve for Bayes' law directly. However, for being so generic, they have no embedded structure that allows them to efficiently learn/represent/compress data in any particular domain.
When I have a Big Data problem, and I have a BNN library close to hand, I just throw it at a random subsampling and see what happens. This is what I mean by "naive": since it's a generic solution, you can try applying it to a problem before even studying the structure of the problem domain, and you might just detect a signal soon enough that you don't need to go any further.
The rest of the machine-learning toolbox, then, is basically a set of models that you replace your BNN with when you have a model that hews closely to the structure of your data (in est, a set of screwdrivers for different kinds of screws.) The approximate model will get quicker training and clearer signals for that data, but not for just any data.
 Another way to say it: a Bayes-net implementation is to gzip as other ML models are to JPEG, MP3, etc. Bayes-nets can learn anything, but they can only learn it as well as the data can be used to model the data.
 People do use linear regression as the same kind of general-purpose tool, as you say--and linear classifier libraries are much easier to write, so you'll find them as "batteries included" in more math packages--but if you don't know whether the data is linearly separable, linear models aren't gonna give you even the slightest hint. A BNN is an extremely-inefficient, but completely-general probe.
For example, can Numenta look at two similar pages from pdf, and be able to deduce and discover the location of title, paragraph, heading, subheading, etc? Those pdf files are different in layout but a human can immediately determine the hierarchy by categorizing entities on that temporally local feature.