
Ask HN: Clean code to learn ML feature-extraction techniques - datashovel
The ML community has done an amazing job disseminating information about the field, making it accessible to large numbers of people like myself who aren&#x27;t professionally involved in the field.<p>That said, the one area that has stumped me and I haven&#x27;t been able to find good clean code to learn from is Feature Extraction.  I was wondering if anyone here can point me in the direction of some clean code that a non-expert in the field could learn from.<p>I&#x27;m definitely aware that there are books &#x2F; tutorials on the subject, but none of that has &quot;made it click&quot; for me yet.  To be honest I feel like most of my real &quot;advances&quot; have been while looking at code (after familiarizing myself at a high level with the theory and math).<p>For example, the tensorflow playground source code appears to be a gold mine filled with good clean code that a novice can grok.<p>EDIT:  If beggars can be choosers I&#x27;m most interested in seeing a practical implementation that uses a clustering algorithm (such as kmeans) to build up a set of features from image data.  Such a technique is discussed in the following video.<p>https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=wZfVBwOO0-k
======
karterk
What task are you learning the features for? If it's classification, instead
of treating it as a two step (feature generation followed by classification)
problem, I suggest that you just use a multi-layer neural network with a
sigmoid function at the final layer so that you directly predict the output
class from raw pixels. This way the feature engineering is taken care of by
the algorithm (the weights of the hidden layers).

To give an idea of what I mean, see:
[http://karpathy.github.io/2015/10/25/selfie/](http://karpathy.github.io/2015/10/25/selfie/)

~~~
datashovel
Thank you for the link. I will look into this. Some of the reasons I get the
idea I need to incorporate this additional step (feature detection) are:

1) I'm building this primarily for a robot, so my impression is that by
building up a set of primitive features, and building up on top of them, will
be more efficient for soft real-time use cases.

2) I have done what you suggest (more or less), for example as a test with the
MNIST Digit Classification data. I was only getting approx. 95% success rate
using this basic technique, which at first glance appears impressive until you
realize what is considered "state of the art".

3) My goal is to also incorporate additional layers of data (which I'm under
the impression will help improve my overall success rate), such as edge
detection, corner detection, and most importantly 3d data (ie. distance from
camera based on data extracted from Google Tango), so if I'm correct about #1
where this will be a more efficient calculation, incorporating all the
additional data (and extracting features from that as well) will slow things
down only incrementally.

~~~
karterk
1) Once the training is done, the prediction part is simply matrix
multiplications - I would think that's probably faster and far easier to
parallelize than any other advance feature extraction algorithm you will be
performing real time on the image.

2) Did you try using CNN?

I might be grossly wrong here, but I have gone through this path before, but I
have come to the conclusion that clumsy hand-crafted feature engineering on
images usually performs worse than letting the algorithm figure out. It's much
harder to scale and you will end-up coming across a lot of edge cases.

~~~
datashovel
Thank you for the insights. I have not implemented a CNN yet, but is certainly
on my radar. In fact other reading the other night pointed me in that
direction so I am exploring that at the moment. With regard to scale and/or
efficiency of the algorithms, I would guess if anyone is wrong here it would
be me :)

