
A Brief Introduction to Machine Learning for Engineers (2017) - lainon
https://arxiv.org/abs/1709.02840
======
tw1010
This makes a good case for not learning machine learning three years ago. If
you wait long enough for new paradigms to coagulate, you can use your energy
more wisely during the chaotic transient period (e.g. by spending your effort
on time-invariant parts of human knowledge, such as gaining a deeper
understanding of parts of pure mathematics, or even philosophy or history),
while you wait for a new set of fundamentals to converge, and really solid
pedagogical explanations to surface.

The same could be said about skipping the earlier period of javascript
frameworks. Had you used your time to learn things that are still useful
today, and will remain useful decades in the future, while waiting for the
industry to jump here and there, eventually converging to what seems to be a
fairly robust steady state (react), then you'd probably be better off today
than if you followed along with the hype cycle month after month.

A counter argument is that you might miss out on a lot of the monetary rewards
that comes from learning the much desired skill that others don't have the
time or will power to tackle in the period where doing so involves a lot of
friction.

~~~
wirrbel
> This makes a good case for not learning machine learning three years ago.

Have you looked at the monograph at all? This could have been written 5 to 10
years ago, 15 years ago, maybe a few sections would have looked a little
different. In fact, I like to recommend Mitchell's Machine Learning book (I
think it was written in the 90s) as an introduction to people with a serious
interest.

There currently is a lot of hype going on for machine learning algorithms,
because we see good progress in things like computer vision / pattern
recognition. This kicks of a marketing machinery that really blurs the
reality.

In reality, we have an established body of methods and modelling techniques
that are sufficient, because the available data is the bottleneck for
prediction quality. The actual challenge is to come up with a valuable
business proposition, not necessarily to build the predictive model.

~~~
bitL
I'd like to point out the fact that there is a split within Machine Learning
into "classical" and "Deep Learning"-based one. While Deep Learning are just
neural networks with Big Data on GPU, they obliterated many areas of classical
ML. What you really want to learn is Deep (+Reinforcement) Learning, or what
is coming known as differentiable programming. For classical ML, learn how to
use Spark's MLlib and take Ng's Coursera course, that might be all you need
from practical point of view.

~~~
wirrbel
Deep learning has shown to have obliterated much of the computer vision area
and other examples where classical approaches such as feature engineering are
naturally hard. This comes at a price of a massive hunger for data of Deep
Learning models to achieve an acceptable prediction accuracy. So with use
cases like image recognition, etc. the relative performance of deep learning
models is much better than with classical algorithms.

In general (you'll probably find specific counterexamples for this, but as I
said _in general_) the relative performance of deep learning models to
classical machine learning may drop below that of classical models when you
start looking at medium-sized data sets. And I can assure you, there are many
data sets, with many applications for machine learning. And while the world
talks about deep learning, I know of many companies, where random forests,
suport vector machines or bayesian models have been running for years (which
means, cross-validated by data unavailable during model development) with a
prediction performance that a business can depend on.

I agree with you that Reinforcement learning will as a technology become much
more important in the next years. However, only if exploration is cheap
enough. I expect Deep Reinforcement learning not to be the answer, at least
not in its current state, but I can very well imagine that we will see more
machine-learning-algorithm in the reinforcement-learning-loop experiments. I
personally would hope to see more research in the bayesian reinforcement
learning area.

~~~
bitL
I know, DL works only if you have massive datasets, DRL is even worse for the
number of training episodes. Maybe you've heard about recent craziness of
using GANs to generate training set fillers when your training sets are small,
i.e. you have only 1,000 examples but need 10,000 for reasonable performance.
Instead of gathering more examples, you use GAN to create believable training
data, and it seems to be working quite well (i.e. a bump from 60% accuracy to
80% while bigger training dataset with real examples would bump you to 90%).

What I observed is that many ML companies now run two pipelines in parallel,
one based on Deep Learning and the other on classical ML, then cherry pick
solutions that work best for the problem/scale they have.

~~~
wirrbel
> I know, DL works only if you have massive datasets, DRL is even worse for
> the number of training episodes. Maybe you've heard about recent craziness
> of using GANs to generate training set fillers when your training sets are
> small, i.e. you have only 1,000 examples but need 10,000 for reasonable
> performance. Instead of gathering more examples, you use GAN to create
> believable training data, and it seems to be working quite well (i.e. a bump
> from 60% accuracy to 80% while bigger training dataset with real examples
> would bump you to 90%).

Sounds a bit like Baron Münchhausen pulling himself and the horse on which he
was sitting out of a mire by his own hair.

I'd assume that instead of pulling such stunts, a reasonable generative model
might have done the trick.

> What I observed is that many ML companies now run two pipelines in parallel,
> one based on Deep Learning and the other on classical ML, then cherry pick
> solutions that work best for the problem/scale they have.

Putting it this way, I agree. And my personal addendum here is: classical ML
outperforms DL more often than the hype might make people think.

~~~
bitL
> Sounds a bit like Baron Münchhausen pulling himself and the horse on which
> he was sitting out of a mire by his own hair.

It sounds crazy, but you've likely seen what NVidia did with high-resolution
synthetic faces using their progressive GANs; I'd totally use them as training
examples without any hesitation.

------
kmax12
This appears to be an thorough overview of machine learning. Even though many
bases are covered, I wish there was more on how to create or select features
for machine learning in a systematic way. Feature extraction gets 1 paragraph,
but feature engineering and selection aren't mentioned much in the 200+ pages!

I don't have a Phd in machine learning, but I have spent many years using it
as a tool to solve problems. While the details here can get you a long way,
without understand feature engineering or feature selection, you will have a
hard time building accurate models.

For any engineers looking for more on feature engineering after reading this,
I maintain an open source library for automated feature engineering called
Featuretools
([https://github.com/featuretools/featuretools](https://github.com/featuretools/featuretools)).
We also have demos on our website
([https://www.featuretools.com/demos](https://www.featuretools.com/demos)) if
you want to see it in action.

~~~
bistro17
I was wondering how featuretools differs from
[https://github.com/AxeldeRomblay/MLBox](https://github.com/AxeldeRomblay/MLBox)
[https://github.com/crawles/automl_service](https://github.com/crawles/automl_service)

and the proprietary and newly launched driverless ai (from h2o)

~~~
kmax12
Featuretools focuses on handling data with relational structure and
timestamps. Here's an example to explain those two key points.

Imagine you have a relational database from a retail store with tables for
customers, transactions, products, and stores.

Featuretools can make a feature matrix for any entity in the database using an
algorithm called Deep Feature Synthesis. We wrote a blog post about it here:
[https://www.featurelabs.com/blog/deep-feature-
synthesis/](https://www.featurelabs.com/blog/deep-feature-synthesis/).
Basically, it tries to stack dataset-agnostic "feature primitives" to
construct features similar to what human data scientists would create. This
means that a data scientist can go from building models about their customers
to models about their stores in one line of code.

One aspect worth highlighting is that Featuretools can be extended with custom
primitives to expand the set of features in can produce. As the repo of
primitives grows, everyone in the community benefits because primitives aren't
tied to a specific dataset or use case. Some of our demos highlight this
functionality to increase scores on the Kaggle leaderboards.

Featuretools is good at handling time. When performing feature engineering on
completely raw data it is important not to mix up time. When your data is
timestamped, you can tell Featuretools to create features at any point in time
and it automatically slices the data for you (even across relationships
between tables!). You want to avoid situations similar to training a machine
learning model on stock market data from 2017, testing that it works on data
from 2016, and then deploying it and expecting to make money in 2018. You can
read more about how featuretools handle time here:
[https://docs.featuretools.com/automated_feature_engineering/...](https://docs.featuretools.com/automated_feature_engineering/handling_time.html))

------
Myrmornis
Documents like this (like traditional books) on technical subjects need to be
read very carefully, requiring a time commitment that is large relative to the
length of a human life or career. Also, they have a huge amount of overlap
with other similar documents, so critical choices must be made about what to
try to read.

It would be nice to have a high-quality and widely-used set of community
ratings for such documents. E.g. a place where

\- new documents can be added

\- they are categorized (automatically should be doable) according to subject
matter, level, etc

\- some sort of community voting system (perhaps augmented by automated
scoring by well-established predictors) scores each document for its
utility/recommendability, in each of the subject areas that it covers.

Does anything like that exist for arXiv?

Do people put general expository material on arXiv? (E.g. lecture notes,
textbooks, etc).

------
cpeterso
"Applying Machine Learning Like a Responsible Adult", a 30 minute talk from
GDC 2017, is a nice introduction to common machine learning approaches and
pitfalls for game developers:

[https://www.youtube.com/watch?v=RLsKzkxWpK8](https://www.youtube.com/watch?v=RLsKzkxWpK8)

------
amelius
I think the concept of Machine Learning clashes in many ways with the core
values of an engineer. Engineering is about constructing new things, making
them correct by design. Machine learning, on the other hand, is throwing data
at a problem, waiting for data to be processed, and hoping that the resulting
models make sense. While both useful, it requires such a different mindset,
that I wonder if engineers would feel satisfied working on such solutions.

~~~
cup-of-tea
Using machine learning is indeed more like science than engineering. After all
it is just building a model based on data that you have seen. The advantage of
using ML is the model can be arbitrarily complex and something you'd never be
able to come up with manually. But that model is only a part of a larger
project.

To the engineer it seems no different to designing a solution in the context
of accepted scientific theories. You can't engineer the theories, they are
accepted based on evidence. But you can build the project around it.

------
joaovictortr
Another very interesting resource on the subject is the book Introduction to
Statistical Learning by Garet James et al [1].

The book introduces the foundational concepts of statistical learning
(classification, regression, cross-validation) and algorithms such as support
vector machines.

It is also available on PDF at the website [1].

[1] [http://www-bcf.usc.edu/~gareth/ISL/](http://www-bcf.usc.edu/~gareth/ISL/)

------
cosmic_ape
Funny that in the abstract it aims specifically at electrical engineers, since
electrical engineers do machine learning for about six decades now, under
different names. Since the invention of the Wiener filter, the Kalman filter,
etc...

------
eddd
I wish someone said to me a year ago (when I started ML) clearly at the
beginning: "Machine Learning is about MATH, deal with it".

It'd be much easier.

~~~
verletx64
I think it's very contentious to some people (though I agree that at this
stage, ML == mathematics).

I've read quite a few people on the internet who will swear up and down you
don't need the mathematics to apply basic models to business problems. I
disagree, and I kind of find it weird to divorce the mathematics from it.

~~~
wirrbel
yes and no. Of course it is absurd to claim you don't need math. On the other
hand, I think it is sometimes exaggerated. You'll see this tutorials, that are
thorough linear algebra primers and claim that you need it for ML, when in
fact you'd only need it to get a thorough understanding of the inner workings.
On the other hand, I have seen highly educated math experts pretty much fail
to understand, that a L2 norm isn't the best loss function for a business
context where a deviation from the truth actually means linear costs. So I'd
argue being able to map business problems into the shallow math domain is much
more important, than mastering the deep math domain.

------
kingkongjaffa
I know ML is the new hotness but engineers (proper engineers) have used it for
years in basic areas for condition monitoring..

\- hooking up sensor input from your oil pumps to a neural network to
understand statistics of your population to predict damage.

[https://en.wikipedia.org/wiki/Condition_monitoring](https://en.wikipedia.org/wiki/Condition_monitoring)

------
udev
Funny how the author pushes the language to extremes:

    
    
      - title: a "brief" introduction (it's 206 pages!)
    
      - chapter 1: a "gentle" introduction through Linear Regression, where gentle means that the relationships and equations are provided in all their notational beauty, but without the motivation or meaning part.

------
martinjoshen
[https://sites.google.com/site/gujjubhaimostwantedgujaratihd/](https://sites.google.com/site/gujjubhaimostwantedgujaratihd/)

