
Machine Learning from scratch: Bare bones implementations in Python - eriklindernoren
https://github.com/eriklindernoren/ML-From-Scratch
======
schmit
One quick comment: in general it is a bad idea to compute the inverse of a
matrix (to solve a linear system). It's much better to compute the QR
factorization or SVD instead (or simply call least square solver).

See for example: [https://www.johndcook.com/blog/2010/01/19/dont-invert-
that-m...](https://www.johndcook.com/blog/2010/01/19/dont-invert-that-matrix/)

~~~
JustFinishedBSG
It's usually even better to use iterative methods

~~~
thearn4
GMRES is definitely my go-to these days. Though it is worth noting that direct
methods do have a benefit of letting you quickly solve many successive linear
problems involving the same matrix, but different right-hand sides. But
iterative methods scale very well for large sparse problems that they are very
often the only tool to consider.

Block Krylov methods are a thing , but I haven't experimented with them yet.

~~~
JustFinishedBSG
For solving linear systems there are recent mind-blowing methods by R. M.
Gower and P. Richtarik:

[https://arxiv.org/abs/1506.03296](https://arxiv.org/abs/1506.03296)

[http://www.maths.ed.ac.uk/~richtarik/papers/SDA.pdf](http://www.maths.ed.ac.uk/~richtarik/papers/SDA.pdf)

Plus R. M. Gower is fantastically nice and enthusiastic so there's that

And thanks for the link !

~~~
stabbles
Only scrolled through it, but AFAIK if its results are comparable to Kaczmarz,
then note that it is extremely slow.

If your goal is to solve say a LS problem, why not go for CGLS?
[http://web.stanford.edu/group/SOL/software/cgls/](http://web.stanford.edu/group/SOL/software/cgls/)

------
imdsm
Great resource, but it could be a phenomenal resource if you documented each
method and explained how and why it does what it does.

Don't get me wrong, having working code to play with is key, but when you
don't fully grasp the concepts behind it, an explanation can become so
valuable.

That being said, you've included names, so research can be done. Great work
and I hope you're enjoying it!

~~~
eriklindernoren
Given the amount of publicity this repository has gained I will make sure do
put more work into documenting the code. And also add references. Thank you
for your feedback!

------
compactmani
This is a nice project. I think it would be great to add references used for
the implementations and some tests that demonstrate they return what is
expected (or perhaps the same result of sklearn maybe).

~~~
eriklindernoren
Those are great suggestions. I will look into adding that.

------
onvalleysilic
Just tried it with an equities dataset
[http://54.174.116.134/recommend/datasets](http://54.174.116.134/recommend/datasets)
and it seems to have performed nicely. Great work!

~~~
eriklindernoren
Awesome. Thank you!

------
f311a
Similar project:
[https://github.com/rushter/MLAlgorithms](https://github.com/rushter/MLAlgorithms)

------
Jasamba
This is impressive, and kindof exactly what I am in the process of doing. It's
certainly the best way to get familiar with the internal workings of these
methods than just tune parameters like an oblivious albeit theoretically
informed monkey. How long did it take you to do them?

~~~
eriklindernoren
It certainly is! Thanks. :) I have been working on it for about three weeks
now.

~~~
JustFinishedBSG
> I have been working on it for about three weeks now.

That's not a whole lot, you're quick

------
victor106
Would you suggest any books/resources to learn the theory behind these
implementations so a newbie can follow along?

~~~
saboot
Pattern recognition and machine learning by Bishop is one of the canonical
text books. It helps to have a linear algebra background, it includes a
refresher though

~~~
tnecniv
Bishop is good but reads a little too much like a literature review sometimes.
That may or may not be a problem depending on what you are looking for.

------
fnl
This could become a fantastic resource for anybody who is teaching machine
learning.

One vital improvement suggestion to make that path attractive would be if the
Jupyter notebook format were used. It would be easier to add more
documentation and references.

But in any case, thanks for sharing!

~~~
bayonetz
Better yet, write your own simple versions. That's what we did in undergrad
and grad classes in AI, ML, Neural Nets, etc. There is nothing like building
one yourself, even a simple model like kNN!

------
onlyrealcuzzo
This is awesome! I'm working on something similar for JavaScript. Definitely
will be using yours for reference. Thanks, dude!

~~~
eriklindernoren
Thank you!

------
metaobject
In your RandomForest implementation, on the line in fit() where you're
building the training subsets to give to each tree, it appears that your
bagging approach doesn't use 'sampling with replacement' strategy.

    
    
        idx = np.random.choice(range(n_features), size=self.max_features, replace=False)   
    

It would appear that the replace=False prevents the 'sampling with
replacement' behavior usually implemented by bagging algorithms. Should the
replace=False be changed to replace=True?

~~~
eriklindernoren
Thank you for your feedback! I have read up on the feature bagging part of the
algorithm and I believe that you are correct. This is fixed in the latest
commit.

------
mrcactu5
sci-kit learn is excellent, but their implementations are a bit to complicated
to learn from.

this is for people who don't just want to tune parameters but build the whole
thing from scratch

I can buy buy a pie all the fix-ins from a bakery, or I can buy the
ingredients myself, and make it to exactly my liking. it may not be a
professional.

~~~
eriklindernoren
Glad that you liked it!

------
screwston
A friend sent me a link to this - nice work, and I happen to be intermittently
working on a very similar (and unfortunately similarly named) project -
[https://github.com/jarfa/ML_from_scratch/](https://github.com/jarfa/ML_from_scratch/).
Check my commit history if you suspect me of copying you ;)

I don't think I'll be implementing as many algorithms as you though, I should
force myself to work on more projects outside my comfort zone.

~~~
eriklindernoren
Oh, cool! Haha, that's unfortunate. Maybe I should have done a better job
finding an original name. Good luck to you anyways!

------
ussser
Cool! How long did it take to learn and implement these models?

~~~
eriklindernoren
I have taken some ML courses during my university studies and have also done
some model implementations in other programming languages. So I didn't have to
start from scratch. But I have been working on this project for about three
weeks now.

------
edshiro
Nice! I have started brushing up my maths and reading about machine learning
in general. Next step is to get my feet wet in the implementation. I think
looking at your project can give me a good idea as to how to implement some of
the most basic algorithms. Good luck!

~~~
eriklindernoren
Awesome! :) Doing the nitty-gritty has been a great way of learning the
limitations and benefits of using certain models for a given task. Good luck
to you as well!

------
grzm
For the future, if it meets the guidelines, this likely should have been a
Show HN:

[https://news.ycombinator.com/showhn.html](https://news.ycombinator.com/showhn.html)

------
jbrambleDC
This is awesome. I am currently building a decision tree from scratch in Java
and will use yours as a reference.

One comment I have. in kNN, it is best to ensure that the neighbors list
occupies O(k) space.

------
dnautics
Nice project! I'm doing something similar in julia, with the added advantage
that as I build it the numerical types are variadic so I can play around with
numbers that aren't IEEE FPs.

------
searchfaster
Very nice project! Very very useful for a ML beginner like myself. Thank you
very much !

------
jogundas
Very cool! I have actually been planning to do exactly what you did, sir :)

~~~
natch
Not sure what you had in mind, but a Python 3 version of this would be great!

~~~
jogundas
Thanks for the comment, will have this in mind!

------
peter_retief
I feel happy to see your wonderful work you share so freely

~~~
eriklindernoren
Glad you liked it!

------
joelberman
Very nice project! Learning stuff makes me happy.

------
Winterflow3r
This is really cool and inspiring!

~~~
eriklindernoren
Thank you! The response here has been amazing. :)

------
sp4ke
Amazing, thanks for sharing :)

------
thinkr42
This is awesome!

~~~
eriklindernoren
Glad you like it!

------
mmrr88
[http://vschool.io/en/apply/](http://vschool.io/en/apply/)

------
SvenDowideit
Deliver and release stuf that people actually use. Or work on projects that
do.

Delivering value trumps painting every day

