Hacker News new | comments | show | ask | jobs | submit login
Top data mining algorithms in plain English (rayli.net)
318 points by Rexxar on May 18, 2015 | hide | past | web | favorite | 22 comments

Can we have more things like this, please? This is an AMAZING introduction to these concepts.

I wish more people on Hacker News would make such clear blog posts for their personal share of cryptic knowledge, helping other people through the door.

I also want that, the side-by-side comparison specially. The follow up discussions are also helpful, someone mentioned about C5.0 I didn't know about. There are lot of algorithms/steps pieces that are needed when solving the machine learning puzzle. The article was very helpful.

I found this article useful.

What I was really hoping for was a layman's translation of the maths on wikipedia [ie. how to implement]. But this is a good jumping off point for figuring out which black box to use.

This is what I was thinking as well! The explanation is good but it's the actual steps of the algorithm that are the difficult part!

So you want to do math without knowing how to do math?

I'm just saying it takes about 30 seconds to explain the Metropolis-Hastings algorithm in plain english, and the wikipedia article is almost intentionally esoteric on the matter:


Meanwhile the much more complex Firefly algorithm is adequately explained in just 3 sentences that are obvious to understand:


I feel like some of these algorithms are outdated, i.e. rarely used anymore: C4.5, Apriori, CART.

Instead I would suggest: Logistic Regression, RandomForest and Neural Networks.

CART forms the basis of the decision trees used by scikit-learn's implementation of random forests [1]. CART may be old, but definitely not outdated.

[1] https://github.com/scikit-learn/scikit-learn/blob/master/skl...

I'll add my noise to the cacophony and agree that this is a grat article. I've been struggling with an idea for a while, and this has opened up a whole new horizon of possibility for me. So thanks.

Wow! It explains, in 20-30 minutes, what took me a whole semester to study at the University.

I only wish it had a little about Neural Networks and Deep Learning.

You can significantly improve C4.5 speed using See5/C5.0

Wow, this is an awesome read. I especially like how each algorithm starts with a simple explanation and then dives deeper into each lesser known vocabulary.

How convenient, I'm currently tasked with classifying some data and this will surely come in handy.

I'd be interested to see some computer vision algorithms explained this way too, SIFT, SURF?

I'd be interested in just about every algorithm explained this way! And as a side note some of these algorithms certainly are used in computer vision, k-means for example in clustering/segmentation.


Have a look at this: https://gilscvblog.wordpress.com/2013/08/18/a-short-introduc... . I found it very helpful!

Great to get specialist knowledge/understanding into the mainstream.

Very, very awesome article.

Awesome. What is a 'kernel' btw?

In kernel density estimation, each observed data point is spread out by a "kernel" function such as a Gaussian, essentially exp(-0.5*x^2), or uniform function, f(x) = 1 for |x| < 0.5. So I think of a "kernel" as a function used to distribute a point mass.

In English?

What an excellent article!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact