Top data mining algorithms in plain English

windowshopping · on May 18, 2015

Can we have more things like this, please? This is an AMAZING introduction to these concepts.

I wish more people on Hacker News would make such clear blog posts for their personal share of cryptic knowledge, helping other people through the door.

bipin_nag · on May 19, 2015

I also want that, the side-by-side comparison specially. The follow up discussions are also helpful, someone mentioned about C5.0 I didn't know about. There are lot of algorithms/steps pieces that are needed when solving the machine learning puzzle. The article was very helpful.

stellographer · on May 18, 2015

I found this article useful.

What I was really hoping for was a layman's translation of the maths on wikipedia [ie. how to implement]. But this is a good jumping off point for figuring out which black box to use.

profinger · on May 18, 2015

This is what I was thinking as well! The explanation is good but it's the actual steps of the algorithm that are the difficult part!

blumkvist · on May 18, 2015

So you want to do math without knowing how to do math?

stellographer · on May 18, 2015

I'm just saying it takes about 30 seconds to explain the Metropolis-Hastings algorithm in plain english, and the wikipedia article is almost intentionally esoteric on the matter:

http://en.wikipedia.org/wiki/Metropolis–Hastings_algorithm

Meanwhile the much more complex Firefly algorithm is adequately explained in just 3 sentences that are obvious to understand:

http://en.wikipedia.org/wiki/Firefly_algorithm

ma2rten · on May 18, 2015

I feel like some of these algorithms are outdated, i.e. rarely used anymore: C4.5, Apriori, CART.

Instead I would suggest: Logistic Regression, RandomForest and Neural Networks.

shazeline · on May 18, 2015

CART forms the basis of the decision trees used by scikit-learn's implementation of random forests [1]. CART may be old, but definitely not outdated.

[1] https://github.com/scikit-learn/scikit-learn/blob/master/skl...

tomelders · on May 18, 2015

I'll add my noise to the cacophony and agree that this is a grat article. I've been struggling with an idea for a while, and this has opened up a whole new horizon of possibility for me. So thanks.

diego_moita · on May 18, 2015

Wow! It explains, in 20-30 minutes, what took me a whole semester to study at the University.

I only wish it had a little about Neural Networks and Deep Learning.

kiril-me · on May 18, 2015

You can significantly improve C4.5 speed using See5/C5.0

zatkin · on May 18, 2015

Wow, this is an awesome read. I especially like how each algorithm starts with a simple explanation and then dives deeper into each lesser known vocabulary.

Grue3 · on May 18, 2015

How convenient, I'm currently tasked with classifying some data and this will surely come in handy.

mrfusion · on May 18, 2015

I'd be interested to see some computer vision algorithms explained this way too, SIFT, SURF?

zild3d · on May 18, 2015

I'd be interested in just about every algorithm explained this way! And as a side note some of these algorithms certainly are used in computer vision, k-means for example in clustering/segmentation.

https://courses.cs.washington.edu/courses/cse576/12sp/notes/...

honodk · on May 25, 2015

Have a look at this: https://gilscvblog.wordpress.com/2013/08/18/a-short-introduc... . I found it very helpful!

lessthunk · on May 18, 2015

Great to get specialist knowledge/understanding into the mainstream.

leeleelee · on May 18, 2015

Very, very awesome article.

natch · on May 18, 2015

Awesome. What is a 'kernel' btw?

Bostonian · on May 18, 2015

In kernel density estimation, each observed data point is spread out by a "kernel" function such as a Gaussian, essentially exp(-0.5*x^2), or uniform function, f(x) = 1 for |x| < 0.5. So I think of a "kernel" as a function used to distribute a point mass.

natch · on May 19, 2015

In English?

aaronsnoswell · on May 18, 2015

What an excellent article!