
Machine Learning, Big Data, Data Mining, Statistics, Probability FAQ - disgruntledphd2
http://wmbriggs.com/blog/?p=6465
======
tlarkworthy
Um, this is total garbage.

What’s the difference between machine learning, deep learning, big data,
statistics, decision & risk analysis, probability, fuzzy logic, and all the
rest?

Not _NONE_. machine learning is the super discipline, an ensemble of
techniques. Deep learning is a specific type of machine learning, focusing on
higher order statistics. Big data is a term from business not academia, but
deals with the unique challenges of machine learning with very big data sets.
Fuzzy logic was the fashionable way of dealing with uncertainty before
Bayesian logic came to the front (particular flavour of probability).
Probability and fuzzy logic should be considered notation.

So what’s the difference between probability and logic?

 __ _Not_ __Not much. There are lots of types of logic, Bayesian probability
can be considered a type of logic. Most people when they think of logic they
think of propositional logic, boolean algebra, predicate calculus, formal
methods etc. That kind of logic is a billion miles away from probability in
culture and the kind of things you can conclude from its application. Logic is
rigid truths, undecidability. Probability is a representation of uncertainty
and half truths, from which you hope to extract the most likely explanation
from partial or noisy data. Formal logic is next to useless on noisy data.
Probability is a good fit for noisy problems (like perception).

Logic is classic AI. Probability is Modern AI. There are attempts at
unification (e.g. Bayesian Logic networks, but that's cutting edge stuff).
Saying they are basically the same is lunacy.

~~~
disgruntledphd2
I think that the author was attempting to note that the similarities between
these techniques/models/tools are far greater than the differences.

For me, machine learning is a particular model set atop the edifice of
statistics mixed with coding. I don't agree that big data poses unique
challenges, I was reading a psychometric book from the late eighties where a
lot of the material was familiar from the big data stuff we read about lately.

Logic is a way of deriving conclusions from propositions/data. Probability is
a method for introducing uncertainty into logical reasoning (IMO, clearly). I
don't understand your point about probability in culture, can you explain? I
would suggest that your last statement in your second last paragraph is rather
proving the author's point, but as my perception is noisy, I can't be sure.

And seriously, both probability and logic are related to natural intelligence
(or whatever we want to call ourselves) far more than AI. In fact, the reason
they have been applied to AI is because they are useful tools for humans. When
you say bayesian logic networks do you mean Bayesian belief networks (as put
forth by Pearl et al) or something else?

~~~
tlarkworthy
"I don't agree that big data poses unique challenges" Google does. IBM does
too (Watson). Unique algorithms on unique hardware for the sake of ML. I was
at a machine learning conference with a speaker from Google (soz can;t
remember who). They said they are not interested in any machine learning with
complexity above O(nlogn) or that has to visit a training point more than
once. That's a hell of alot of literature in the bin for the sake of big data
(e.g. SVM, back propagation NNs). So the problems there are solving do have
novel constraints (even if the overall function they are trying to approximate
is the same)

Yeah I agree probability is a logical inference scheme. I was just commenting
on the kind of logic people think of on the term "LOGIC". That's classical,
but you can also argue any system with symbols is logic. I was just trying to
say that the Bayesian people think differently to the formal logic people.
Thats what I meant by culture. They are very different sets of people normally
(exception, see below)

Bayesian Logic Networks:-
<http://ias.cs.tum.edu/_media/spezial/bib/jain09blns.pdf> probabilistic
inference upon hard logic constraints. != Bayesian Belief Networks (standard
Bayesian formulation)

------
wookietrader
I really like the standard statistician stance "machine learning is basically
just statistics". There is so much hate in it. :)

The FAQ is full of this. For those who want to know the difference--and there
are two--let me add the following points:

(a) Most machine learners do not care that much about proper modelling. The
point is not about having a _right_ model of the data, but a _useful_ model.
Check Breimans "The two cultures" paper. (b) Machine learning actually cares
about computation (Big O notation and such), something that is not part of the
standard statistics curriculum.

------
maaku
Humorous, but factually incorrect. Please don't try to learn anything from
this FAQ.

------
mtraven
The author seems to be something of a rightwing crank (see his writings on
diversity for example), which of course does not invalidate anything purely
technical he writes on statistics, but it certainly lowers my confidence in
anything he says. Or, in probabilistic terms Being wrong in one area
correlates well with being wrong in others.

~~~
rjdagost
Let me introduce you to the ad hominem fallacy...
<http://en.wikipedia.org/wiki/Ad_hominem>

The group think on HN is getting bad.

~~~
cwhy
But that is for deductive reasoning. Using one person's past experience only
as a reference to judge is not wrong. That's why things like Curriculum Vitae
is useful.

By applying your logic, I can also suggest a wiki page for
you:<http://en.wikipedia.org/wiki/Hasty_generalization>

------
md8
What does convergence exactly mean in Machine Learning?

