
Top data mining algorithms in plain English - Rexxar
http://rayli.net/blog/data/top-10-data-mining-algorithms-in-plain-english/
======
aerovistae
Can we have more things like this, please? This is an AMAZING introduction to
these concepts.

I wish more people on Hacker News would make such clear blog posts for their
personal share of cryptic knowledge, helping other people through the door.

~~~
bipin_nag
I also want that, the side-by-side comparison specially. The follow up
discussions are also helpful, someone mentioned about C5.0 I didn't know
about. There are lot of algorithms/steps pieces that are needed when solving
the machine learning puzzle. The article was very helpful.

------
stellographer
I found this article useful.

What I was really hoping for was a layman's translation of the maths on
wikipedia [ie. how to implement]. But this is a good jumping off point for
figuring out which black box to use.

~~~
blumkvist
So you want to do math without knowing how to do math?

~~~
stellographer
I'm just saying it takes about 30 seconds to explain the Metropolis-Hastings
algorithm in plain english, and the wikipedia article is almost intentionally
esoteric on the matter:

[http://en.wikipedia.org/wiki/Metropolis–Hastings_algorithm](http://en.wikipedia.org/wiki/Metropolis–Hastings_algorithm)

Meanwhile the much more complex Firefly algorithm is adequately explained in
just 3 sentences that are obvious to understand:

[http://en.wikipedia.org/wiki/Firefly_algorithm](http://en.wikipedia.org/wiki/Firefly_algorithm)

------
ma2rten
I feel like some of these algorithms are outdated, i.e. rarely used anymore:
C4.5, Apriori, CART.

Instead I would suggest: Logistic Regression, RandomForest and Neural
Networks.

~~~
shazeline
CART forms the basis of the decision trees used by scikit-learn's
implementation of random forests [1]. CART may be old, but definitely not
outdated.

[1] [https://github.com/scikit-learn/scikit-
learn/blob/master/skl...](https://github.com/scikit-learn/scikit-
learn/blob/master/sklearn/tree/tree.py#L542)

------
tomelders
I'll add my noise to the cacophony and agree that this is a grat article. I've
been struggling with an idea for a while, and this has opened up a whole new
horizon of possibility for me. So thanks.

------
diego_moita
Wow! It explains, in 20-30 minutes, what took me a whole semester to study at
the University.

I only wish it had a little about Neural Networks and Deep Learning.

------
kiril-me
You can significantly improve C4.5 speed using See5/C5.0

------
zatkin
Wow, this is an awesome read. I especially like how each algorithm starts with
a simple explanation and then dives deeper into each lesser known vocabulary.

------
Grue3
How convenient, I'm currently tasked with classifying some data and this will
surely come in handy.

------
mrfusion
I'd be interested to see some computer vision algorithms explained this way
too, SIFT, SURF?

~~~
honodk
Have a look at this: [https://gilscvblog.wordpress.com/2013/08/18/a-short-
introduc...](https://gilscvblog.wordpress.com/2013/08/18/a-short-introduction-
to-descriptors/#more-3) . I found it very helpful!

------
lessthunk
Great to get specialist knowledge/understanding into the mainstream.

------
leeleelee
Very, very awesome article.

------
natch
Awesome. What is a 'kernel' btw?

~~~
Bostonian
In kernel density estimation, each observed data point is spread out by a
"kernel" function such as a Gaussian, essentially exp(-0.5*x^2), or uniform
function, f(x) = 1 for |x| < 0.5. So I think of a "kernel" as a function used
to distribute a point mass.

~~~
natch
In English?

------
aaronsnoswell
What an excellent article!

