Hacker News new | past | comments | ask | show | jobs | submit login
A Brief Introduction to Machine Learning for Engineers (2017) (arxiv.org)
379 points by lainon on Feb 25, 2018 | hide | past | favorite | 38 comments



This makes a good case for not learning machine learning three years ago. If you wait long enough for new paradigms to coagulate, you can use your energy more wisely during the chaotic transient period (e.g. by spending your effort on time-invariant parts of human knowledge, such as gaining a deeper understanding of parts of pure mathematics, or even philosophy or history), while you wait for a new set of fundamentals to converge, and really solid pedagogical explanations to surface.

The same could be said about skipping the earlier period of javascript frameworks. Had you used your time to learn things that are still useful today, and will remain useful decades in the future, while waiting for the industry to jump here and there, eventually converging to what seems to be a fairly robust steady state (react), then you'd probably be better off today than if you followed along with the hype cycle month after month.

A counter argument is that you might miss out on a lot of the monetary rewards that comes from learning the much desired skill that others don't have the time or will power to tackle in the period where doing so involves a lot of friction.


> This makes a good case for not learning machine learning three years ago.

Have you looked at the monograph at all? This could have been written 5 to 10 years ago, 15 years ago, maybe a few sections would have looked a little different. In fact, I like to recommend Mitchell's Machine Learning book (I think it was written in the 90s) as an introduction to people with a serious interest.

There currently is a lot of hype going on for machine learning algorithms, because we see good progress in things like computer vision / pattern recognition. This kicks of a marketing machinery that really blurs the reality.

In reality, we have an established body of methods and modelling techniques that are sufficient, because the available data is the bottleneck for prediction quality. The actual challenge is to come up with a valuable business proposition, not necessarily to build the predictive model.


I'd like to point out the fact that there is a split within Machine Learning into "classical" and "Deep Learning"-based one. While Deep Learning are just neural networks with Big Data on GPU, they obliterated many areas of classical ML. What you really want to learn is Deep (+Reinforcement) Learning, or what is coming known as differentiable programming. For classical ML, learn how to use Spark's MLlib and take Ng's Coursera course, that might be all you need from practical point of view.


Deep learning has shown to have obliterated much of the computer vision area and other examples where classical approaches such as feature engineering are naturally hard. This comes at a price of a massive hunger for data of Deep Learning models to achieve an acceptable prediction accuracy. So with use cases like image recognition, etc. the relative performance of deep learning models is much better than with classical algorithms.

In general (you'll probably find specific counterexamples for this, but as I said _in general_) the relative performance of deep learning models to classical machine learning may drop below that of classical models when you start looking at medium-sized data sets. And I can assure you, there are many data sets, with many applications for machine learning. And while the world talks about deep learning, I know of many companies, where random forests, suport vector machines or bayesian models have been running for years (which means, cross-validated by data unavailable during model development) with a prediction performance that a business can depend on.

I agree with you that Reinforcement learning will as a technology become much more important in the next years. However, only if exploration is cheap enough. I expect Deep Reinforcement learning not to be the answer, at least not in its current state, but I can very well imagine that we will see more machine-learning-algorithm in the reinforcement-learning-loop experiments. I personally would hope to see more research in the bayesian reinforcement learning area.


I know, DL works only if you have massive datasets, DRL is even worse for the number of training episodes. Maybe you've heard about recent craziness of using GANs to generate training set fillers when your training sets are small, i.e. you have only 1,000 examples but need 10,000 for reasonable performance. Instead of gathering more examples, you use GAN to create believable training data, and it seems to be working quite well (i.e. a bump from 60% accuracy to 80% while bigger training dataset with real examples would bump you to 90%).

What I observed is that many ML companies now run two pipelines in parallel, one based on Deep Learning and the other on classical ML, then cherry pick solutions that work best for the problem/scale they have.


> I know, DL works only if you have massive datasets, DRL is even worse for the number of training episodes. Maybe you've heard about recent craziness of using GANs to generate training set fillers when your training sets are small, i.e. you have only 1,000 examples but need 10,000 for reasonable performance. Instead of gathering more examples, you use GAN to create believable training data, and it seems to be working quite well (i.e. a bump from 60% accuracy to 80% while bigger training dataset with real examples would bump you to 90%).

Sounds a bit like Baron Münchhausen pulling himself and the horse on which he was sitting out of a mire by his own hair.

I'd assume that instead of pulling such stunts, a reasonable generative model might have done the trick.

> What I observed is that many ML companies now run two pipelines in parallel, one based on Deep Learning and the other on classical ML, then cherry pick solutions that work best for the problem/scale they have.

Putting it this way, I agree. And my personal addendum here is: classical ML outperforms DL more often than the hype might make people think.


> Sounds a bit like Baron Münchhausen pulling himself and the horse on which he was sitting out of a mire by his own hair.

It sounds crazy, but you've likely seen what NVidia did with high-resolution synthetic faces using their progressive GANs; I'd totally use them as training examples without any hesitation.


> Have you looked at the monograph at all? This could have been written 5 to 10 years ago

The counterpoint is that a _lot_ could have been written 5 to 10 years ago, and only 10% of it would still be relevant. This is the relevant 10%.


Looking at the table of contents, it is a subset of PRML [Bishop 2006]. And I suppose it is more than 10% PRML. And the things contained in PRML but not in the monograph are still relevant I'd say (Kalman filters, ...).

[Bishop 2006] http://www.springer.com/de/book/9780387310732


A lot of the recent successes in ML haven't been based on new breakthroughs in theory. For instance deep neural networks have been studied for decades now. Same with the reinforcement learning algorithms behind advances like AlphaGo. So I don't think skipping ahead five years will necessarily help you.

That said, there was a lot of AI work with hand-crafted features and expert systems that has no basically been rendered obsolete by deep learning. But advances generally don't emerge out of a vacuum, and it helps to have some background knowledge of previous work to have a good idea of what has and hasn't worked in the past.


We have replaced hand-crafted features with hand-crafted architectures.

While performance might be important, at least the features were easy to understand by non experts. :)


I find it's much easier to understand why 3 CNN layers applied on the raw image can learn convolutions that are relevant to solving my task... than it is to understand what is being done by "artisanal hand-engineered authentic patches" with a couple obscure dimensionality reduction algorithms thrown in, and an SVM on top.

Expert systems were also incredibly hard to build and debug, and weren't nearly as useful as ML systems are nowadays.


Looking through the TOC - It seems like most of this was in just as a "fairly robust steady state" as it was three years ago? Was there some part of this that struck you as having solidified in the last few years?


Funnily enough this is a strategy I’ve been using for years.

At University I used to wait a while for others to complete their projects and then pick their brains about how they went about solving them.

Now in industry I don’t bother learning new tech until I start seeing those skills show up on job sites or if I have a very particular need that requires a particular technology as part of the solution.

This strategy has treated me well but I believe has only done so because not everyone takes this approach. I’m also grateful for the time and effort others put in to allow me to essentially freeload off of their hard work. Every now and then I do find something I can be truly passionate about to help others do the same.


Relevant xkcd (https://xkcd.com/989/)


This is my reasoning about blockchain. It was the first time I realised that just because I'm disposed to understand something better than the average person, doesn't mean I have to. Even if that something will be part of everyday life for every person. Society will just take care of packaging it into a familiar interface, e.g. credit card.

I then however realised that it would be a shame to let my advantage go to waste. So I use this:

> a lot of the monetary rewards

to partially motivate myself.


Programming frameworks have lots of stuff that becomes obsolete over time. You don't really learn complex, open-ended and expanding mathematical subject like machine learning like you describe.

There is no book you just read and then you understand and now it's behind you. You are constantly making new connections and deepening your understanding when you learn more. Ten years is small time for things to start to sink in and make a connected whole in the mind.


> A counter argument is that you might miss out on a lot of the monetary rewards that comes from learning the much desired skill that others don't have the time or will power to tackle in the period where doing so involves a lot of friction

Perhaps it can be argued that one is not in the position to take advantage of such situations. Switching jobs is not necessarily cheap


Another counter-argument is you're still learning about frameworks in general which should help you pick up new ones quicker or provide a more comprehensive base of knowledge to draw upon when developing your own.


This appears to be an thorough overview of machine learning. Even though many bases are covered, I wish there was more on how to create or select features for machine learning in a systematic way. Feature extraction gets 1 paragraph, but feature engineering and selection aren't mentioned much in the 200+ pages!

I don't have a Phd in machine learning, but I have spent many years using it as a tool to solve problems. While the details here can get you a long way, without understand feature engineering or feature selection, you will have a hard time building accurate models.

For any engineers looking for more on feature engineering after reading this, I maintain an open source library for automated feature engineering called Featuretools (https://github.com/featuretools/featuretools). We also have demos on our website (https://www.featuretools.com/demos) if you want to see it in action.


Is there anything for supervised document classification? Like raw plain text -> features?


I was wondering how featuretools differs from https://github.com/AxeldeRomblay/MLBox https://github.com/crawles/automl_service

and the proprietary and newly launched driverless ai (from h2o)


Featuretools focuses on handling data with relational structure and timestamps. Here's an example to explain those two key points.

Imagine you have a relational database from a retail store with tables for customers, transactions, products, and stores.

Featuretools can make a feature matrix for any entity in the database using an algorithm called Deep Feature Synthesis. We wrote a blog post about it here: https://www.featurelabs.com/blog/deep-feature-synthesis/. Basically, it tries to stack dataset-agnostic "feature primitives" to construct features similar to what human data scientists would create. This means that a data scientist can go from building models about their customers to models about their stores in one line of code.

One aspect worth highlighting is that Featuretools can be extended with custom primitives to expand the set of features in can produce. As the repo of primitives grows, everyone in the community benefits because primitives aren't tied to a specific dataset or use case. Some of our demos highlight this functionality to increase scores on the Kaggle leaderboards.

Featuretools is good at handling time. When performing feature engineering on completely raw data it is important not to mix up time. When your data is timestamped, you can tell Featuretools to create features at any point in time and it automatically slices the data for you (even across relationships between tables!). You want to avoid situations similar to training a machine learning model on stock market data from 2017, testing that it works on data from 2016, and then deploying it and expecting to make money in 2018. You can read more about how featuretools handle time here: https://docs.featuretools.com/automated_feature_engineering/...)


Documents like this (like traditional books) on technical subjects need to be read very carefully, requiring a time commitment that is large relative to the length of a human life or career. Also, they have a huge amount of overlap with other similar documents, so critical choices must be made about what to try to read.

It would be nice to have a high-quality and widely-used set of community ratings for such documents. E.g. a place where

- new documents can be added

- they are categorized (automatically should be doable) according to subject matter, level, etc

- some sort of community voting system (perhaps augmented by automated scoring by well-established predictors) scores each document for its utility/recommendability, in each of the subject areas that it covers.

Does anything like that exist for arXiv?

Do people put general expository material on arXiv? (E.g. lecture notes, textbooks, etc).


"Applying Machine Learning Like a Responsible Adult", a 30 minute talk from GDC 2017, is a nice introduction to common machine learning approaches and pitfalls for game developers:

https://www.youtube.com/watch?v=RLsKzkxWpK8


I think the concept of Machine Learning clashes in many ways with the core values of an engineer. Engineering is about constructing new things, making them correct by design. Machine learning, on the other hand, is throwing data at a problem, waiting for data to be processed, and hoping that the resulting models make sense. While both useful, it requires such a different mindset, that I wonder if engineers would feel satisfied working on such solutions.


Using machine learning is indeed more like science than engineering. After all it is just building a model based on data that you have seen. The advantage of using ML is the model can be arbitrarily complex and something you'd never be able to come up with manually. But that model is only a part of a larger project.

To the engineer it seems no different to designing a solution in the context of accepted scientific theories. You can't engineer the theories, they are accepted based on evidence. But you can build the project around it.


>Engineering is about constructing new things

Engineering is about solving problems under constraints. ML is a tool in the tool box to use against problems when the constraints are appropriate, similar to just about any algorithm


ML in practice (from my xp) is mostly getting and cleaning data. As far as training, testing, and deploying a model, to an engineer it's just more algorithms.

While some algorithms may have more explainability than others, the engineer cares if they solve the business problem at hand.


The root of the word "engineer" means "to give birth to". In that sense, using Machine Learning to evolutionry give rise to new models is actually quite fitting.


Another very interesting resource on the subject is the book Introduction to Statistical Learning by Garet James et al [1].

The book introduces the foundational concepts of statistical learning (classification, regression, cross-validation) and algorithms such as support vector machines.

It is also available on PDF at the website [1].

[1] http://www-bcf.usc.edu/~gareth/ISL/


Funny that in the abstract it aims specifically at electrical engineers, since electrical engineers do machine learning for about six decades now, under different names. Since the invention of the Wiener filter, the Kalman filter, etc...


I wish someone said to me a year ago (when I started ML) clearly at the beginning: "Machine Learning is about MATH, deal with it".

It'd be much easier.


I think it's very contentious to some people (though I agree that at this stage, ML == mathematics).

I've read quite a few people on the internet who will swear up and down you don't need the mathematics to apply basic models to business problems. I disagree, and I kind of find it weird to divorce the mathematics from it.


yes and no. Of course it is absurd to claim you don't need math. On the other hand, I think it is sometimes exaggerated. You'll see this tutorials, that are thorough linear algebra primers and claim that you need it for ML, when in fact you'd only need it to get a thorough understanding of the inner workings. On the other hand, I have seen highly educated math experts pretty much fail to understand, that a L2 norm isn't the best loss function for a business context where a deviation from the truth actually means linear costs. So I'd argue being able to map business problems into the shallow math domain is much more important, than mastering the deep math domain.


I know ML is the new hotness but engineers (proper engineers) have used it for years in basic areas for condition monitoring..

- hooking up sensor input from your oil pumps to a neural network to understand statistics of your population to predict damage.

https://en.wikipedia.org/wiki/Condition_monitoring


Funny how the author pushes the language to extremes:

  - title: a "brief" introduction (it's 206 pages!)

  - chapter 1: a "gentle" introduction through Linear Regression, where gentle means that the relationships and equations are provided in all their notational beauty, but without the motivation or meaning part.





Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: