Machine Learning from scratch: Bare bones implementations in Python 694 points by eriklindernoren on Feb 25, 2017 | hide | past | web | favorite | 63 comments

 One quick comment: in general it is a bad idea to compute the inverse of a matrix (to solve a linear system). It's much better to compute the QR factorization or SVD instead (or simply call least square solver).See for example: https://www.johndcook.com/blog/2010/01/19/dont-invert-that-m...
 It's usually even better to use iterative methods
 GMRES is definitely my go-to these days. Though it is worth noting that direct methods do have a benefit of letting you quickly solve many successive linear problems involving the same matrix, but different right-hand sides. But iterative methods scale very well for large sparse problems that they are very often the only tool to consider.Block Krylov methods are a thing , but I haven't experimented with them yet.
 For solving linear systems there are recent mind-blowing methods by R. M. Gower and P. Richtarik:https://arxiv.org/abs/1506.03296http://www.maths.ed.ac.uk/~richtarik/papers/SDA.pdfPlus R. M. Gower is fantastically nice and enthusiastic so there's thatAnd thanks for the link !
 Only scrolled through it, but AFAIK if its results are comparable to Kaczmarz, then note that it is extremely slow.If your goal is to solve say a LS problem, why not go for CGLS? http://web.stanford.edu/group/SOL/software/cgls/
 Has anyone implemented them in production? I read their paper a while ago, but thought that the dual methods they showed might have practical limitations (like insane cache misses, compared to more standard methods)
 These problems you sketch can already be overcome by using the approximate eigenspectrum info of the matrix obtained as a by-product of Krylov methods. This has been invented many times, and is used in 'thick restart', 'augmented Krylov subspace methods', 'deflation'. In particular this was developed for scattering problems with many electromagnetic incident waves hitting an object, changing only the rhs.Also, you can also use an approximate LU-decomp as a preconditioner for Krylov methods.Similarly, Tykhonov regularization (solving for a range of slightly perturbed matrices "A + labda I" where labda is a parameter) are easily tackled using Krylov subspace methods by noting that the Krylov subspace is invariant under shifts like these. So only once an orthonormal basis must be found for the Krylov subspace, which can then be used for every lamda of interest.
 Depends on the number of unknowns, sparsity of the matrix, whether you want 3 correct digits or 16, etc.Direct methods as Gaussian elimination with pivoting are proven to be stable. Iterative methods are not but can be a lot cheaper in computational costs. Also they can stop when a certain relative residual is reached, unlike direct methods.If iterative methods like Krylov subspace methods where stable, then they could actually be seen as direct methods themselves, as the Krylov subspace has at most a dimension of N, where N is the number of unknowns; so after at most N iterations the solution could be extracted exactly from the search space. In practice this is not the case due to rounding errors.
 Thank you for the feedback. :) I plan on fixing this soon!
 Not very important and for a learning project I find using cvxpy a better idea as it's more readable ( like you did ) but:Solving the full quadratic optimization problem for SVMs in basically impossible to do. You are forming an n^2 matrix, so I'm going to let you imagine what happens when n = 100 000.Using people use either approximation methods ( Incomplete Cholesky, Nystrom ) or do it exactly but iteratively ( SMO, Pegasos... )I'm implementing them for class right now so it's still fresh in my head haha
 Great resource, but it could be a phenomenal resource if you documented each method and explained how and why it does what it does.Don't get me wrong, having working code to play with is key, but when you don't fully grasp the concepts behind it, an explanation can become so valuable.That being said, you've included names, so research can be done. Great work and I hope you're enjoying it!
 Given the amount of publicity this repository has gained I will make sure do put more work into documenting the code. And also add references. Thank you for your feedback!
 This is a nice project. I think it would be great to add references used for the implementations and some tests that demonstrate they return what is expected (or perhaps the same result of sklearn maybe).
 Those are great suggestions. I will look into adding that.
 Just tried it with an equities dataset http://54.174.116.134/recommend/datasets and it seems to have performed nicely. Great work!
 Awesome. Thank you!
 Similar project: https://github.com/rushter/MLAlgorithms
 This is impressive, and kindof exactly what I am in the process of doing. It's certainly the best way to get familiar with the internal workings of these methods than just tune parameters like an oblivious albeit theoretically informed monkey. How long did it take you to do them?
 It certainly is! Thanks. :) I have been working on it for about three weeks now.
 > I have been working on it for about three weeks now.That's not a whole lot, you're quick
 Would you suggest any books/resources to learn the theory behind these implementations so a newbie can follow along?
 Pattern recognition and machine learning by Bishop is one of the canonical text books. It helps to have a linear algebra background, it includes a refresher though
 Bishop is good but reads a little too much like a literature review sometimes. That may or may not be a problem depending on what you are looking for.
 Thanks everyone for the suggestions...will check these books out
 How does that compare to Andrew Ng's course?
 I can recommend Applied Predictive Modeling by Max Kuhn: http://appliedpredictivemodeling.com/
 Introduction to statistical learning http://www-bcf.usc.edu/~gareth/ISL/
 This one is interesting to see the "statistical" other side of the industry vs machine-learning people. For example I don't think gradient descent is used once in that book.
 This book as a prerequisite to anyone who wants to get into machine learning imo.
 While a great book, most of this is just implementations of sklearn.
 This could become a fantastic resource for anybody who is teaching machine learning.One vital improvement suggestion to make that path attractive would be if the Jupyter notebook format were used. It would be easier to add more documentation and references.But in any case, thanks for sharing!
 Better yet, write your own simple versions. That's what we did in undergrad and grad classes in AI, ML, Neural Nets, etc. There is nothing like building one yourself, even a simple model like kNN!
 This is awesome! I'm working on something similar for JavaScript. Definitely will be using yours for reference. Thanks, dude!
 Thank you!
 In your RandomForest implementation, on the line in fit() where you're building the training subsets to give to each tree, it appears that your bagging approach doesn't use 'sampling with replacement' strategy.`````` idx = np.random.choice(range(n_features), size=self.max_features, replace=False) `````` It would appear that the replace=False prevents the 'sampling with replacement' behavior usually implemented by bagging algorithms. Should the replace=False be changed to replace=True?
 Thank you for your feedback! I have read up on the feature bagging part of the algorithm and I believe that you are correct. This is fixed in the latest commit.
 sci-kit learn is excellent, but their implementations are a bit to complicated to learn from.this is for people who don't just want to tune parameters but build the whole thing from scratchI can buy buy a pie all the fix-ins from a bakery, or I can buy the ingredients myself, and make it to exactly my liking. it may not be a professional.
 A friend sent me a link to this - nice work, and I happen to be intermittently working on a very similar (and unfortunately similarly named) project - https://github.com/jarfa/ML_from_scratch/. Check my commit history if you suspect me of copying you ;)I don't think I'll be implementing as many algorithms as you though, I should force myself to work on more projects outside my comfort zone.
 Oh, cool! Haha, that's unfortunate. Maybe I should have done a better job finding an original name. Good luck to you anyways!
 Cool! How long did it take to learn and implement these models?
 I have taken some ML courses during my university studies and have also done some model implementations in other programming languages. So I didn't have to start from scratch. But I have been working on this project for about three weeks now.
 Nice! I have started brushing up my maths and reading about machine learning in general. Next step is to get my feet wet in the implementation. I think looking at your project can give me a good idea as to how to implement some of the most basic algorithms. Good luck!
 Awesome! :) Doing the nitty-gritty has been a great way of learning the limitations and benefits of using certain models for a given task. Good luck to you as well!
 For the future, if it meets the guidelines, this likely should have been a Show HN:
 This is awesome. I am currently building a decision tree from scratch in Java and will use yours as a reference.One comment I have. in kNN, it is best to ensure that the neighbors list occupies O(k) space.
 Nice project! I'm doing something similar in julia, with the added advantage that as I build it the numerical types are variadic so I can play around with numbers that aren't IEEE FPs.
 Very nice project! Very very useful for a ML beginner like myself. Thank you very much !
 Very cool! I have actually been planning to do exactly what you did, sir :)
 Not sure what you had in mind, but a Python 3 version of this would be great!
 Thanks for the comment, will have this in mind!
 Thanks! :)
 I feel happy to see your wonderful work you share so freely