
An Introduction to Model-Based Machine Learning - gk1
https://blog.dominodatalab.com/an-introduction-to-model-based-machine-learning/
======
plusepsilon
Author calls it "Model-Based" in place for Bayesian.

I transitioned from using Bayesian models in academia to using machine
learning models in industry. One of the core differences in the two paradigms
is the "feel" when constructing models. For a Bayesian model, you feel like
you're constructing the model from first principles. You set your conditional
probabilities and priors and see if it fits the data. I'm sure probabilistic
programming languages facilitated that feeling. For machine learning models,
it feels like you're starting from the loss function and working back to get
the best configuration.

Much of the underlying machinery behind Bayesian vs. machine learning models
is the same. Hidden Markov Models are Hidden Markov Models whether they have a
prior or not. But this difference in feel influences how you build models and
hence, the results.

Now that optimization algos for Bayesian models are catching up, Bayesian ML
might become a thing.

Cool stuff.

~~~
dj-wonk
The blog post author (Daniel Emaasit) wasn't the first to use the "model-based
machine learning" phrase. He cites "Model-based machine learning" by
Christopher M. Bishop.

------
vertis
The authors initial feelings around ML are similar to how I feel. It's such a
broad subject, it feels like you could read/study for years and not cover
everything. Worse still it's ever changing.

~~~
amelius
I'd like to see a high-level overview of the kind of problems that can be
solved by the different kinds of ML, and their applications.

~~~
denzil_correa
The only material I have seen which does this or comes to this is the Andrew
Ng Machine Learning class on Coursera and Academic Earth.

------
blahi
What is a good textbook that will take me from 0 to practical proficiency with
Bayesian Nets, if I have experience with regression models, hand built and
machine learned?

~~~
apathy
Do what the author did: take Koller's course and follow the research that
interests you. In this case, the propagation through factor graphs is also a
good analogy for when you start looking at back propagation in NNs and the
autoencoder.

~~~
blahi
This is a very fragmented approach and is very frustrating to go down that
road. Every field is filled with hype and people pushing their own agenda and
publicity to further their careers. Speaking from experience with self
learning discriminatve modeling, the overwhelming majority of the books is
either superficial or needlessly complicated. In both of those situations the
books are badly written as well.

Finding a book that hit the sweet spot for regressions wasn't easy but was
doable. I was hoping there would be something similar with Bayesian
Nets/Generative Models.

~~~
Florin_Andrei
> _Finding a book that hit the sweet spot for regressions wasn 't easy but was
> doable._

Could you provide an example?

~~~
blahi
Regression Modeling Strategies. You need at least some notion of what
regressions and probability are all about, but if you have the basics covered,
this book will take you through 80% of the journey and the rest is some
googling to figure out some concepts that might be murky.

This is a book that emphasizes practical applications without getting bent on
the math details too much. If on the other hand you are a math whiz, Elements
of Statistical Learning is THE book but it expects you to be very proficient
in math.

Both books are seriously underrated, which is kind of funny to say because you
will find only praises about them, but they deserve even more.

~~~
apathy
I disagree re: ESL and RMS, but Harrell's book is superb. It's really more
aimed at biostatisticians and clinicians, though.

If the math in ESL gives you trouble, you might prefer [http://www-
bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.p...](http://www-
bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf) (and I'm not just saying
this because the first author was one of my advisers, although I do think that
he and Daniela are particularly gifted teachers).

If the math in ESL is too trivial for you, there's always
[https://web.stanford.edu/~hastie/StatLearnSparsity_files/SLS...](https://web.stanford.edu/~hastie/StatLearnSparsity_files/SLS_corrected_1.4.16.pdf)
, which covers some graphical modeling strategies in later chapters and even
kicks the tires of the autoencoder (imho perhaps the greatest recent advance
in neural networks for practitioners) along the way.

Koller's course and Ng's course are also good.

Ultimately I feel like you have to get the math right or you'll never acquire
the intuition that helps you design your own approaches. But you also have to
put in the work.

That reminds me, tibshirani's Stanford course (accompanies ISL and ESL) is
terrific. Better than those other two, actually. I wish Harrell would offer
one.

~~~
blahi
You disagree wit what regarding ESL and RMS?

>Ultimately I feel like you have to get the math right or you'll never acquire
the intuition that helps you design your own approaches.

But what do you mean by that? Do I really need it if my applications are not
as demanding as Netflix? I feel like many people consider anything less than
phd-level understanding lol worthy, which is simply not true. Majority of
analysts out there are doing just fine with canned procedures. Are there
something like canned procedures for Bayesian Nets?

~~~
apathy
I disagree that RMS is necessarily better than ESL in some meaningful way. The
two are complementary. It's like saying a gravel truck is better than a
motorcycle: it all depends on what you want to do with it.

Re: "do I really need it?": hell if I know, I'm not you. But my assertion was
specifically that if you want to design your own methods (i.e. do research)
you need to understand what they are doing. This doesn't seem like a
controversial position; an expert is simply a master of the fundamentals.

~~~
blahi
But I thought I made it clear that I am not looking to do research. I'm not
even looking for state of the art performance. I am looking for that 30% of
the skills which allow me to do 70% of the tasks. Like RMS. Because outside of
Google/Microsoft/Amazon et al, domain knowledge beats superior math skills 10
out of 10 times.

~~~
apathy
You should do whatever you like, but remember that domain knowledge and math
skills are not mutually exclusive. If the problem you need to solve for a
major customer or project happens to be in that 30%, it may come in handy.

Linear algebra and calculus (to a lesser degree) are foundational for a great
many things. Got missing data? K-NN or nuclear norm matrix completion (or
marginalizing over the rest) can help. Systems of differential equations? Use
a matrix exponential.

You are free to do whatever you like. A bus driver doesn't need to know how to
rebuild an engine. But if you want to race cars you'll get a lot further if
you do know how.

~~~
blahi
If I have missing values, I can use multiple imputation or even simple
averaging and the hit I will take will be negligible. I simply do not work in
the remaining 30%. I can tell if the work is going to be above my head and
refuse the project in those cases.

So after all this back and forth, I still don't know if there is a book
similar to RMS in scope, for Bayesian Nets.

~~~
apathy
Try [http://www.r-bayesian-networks.org](http://www.r-bayesian-networks.org)

------
tartakovsky
It seems this is a rebranding of probabilistic graphical methods.

------
et2o
So what did he actually do with this model? Why is it better to do it this
way? What is the point? There's no evidence or output here.

~~~
dj-wonk
Just to touch on one part of your question -- did you see the case study
section? There are five conclusions:

> 1\. This approach provides a systematic process of developing bespoke models
> tailored to our specific problem.

> 2\. It provides transparency to our model as we explicitly defined our model
> assumptions by leveraging prior knowledge about traffic congestion.

> 3\. The approach allows handling of uncertainty in a principled manner using
> probability theory.

> 4\. It does not suffer from overfitting as the model parameters are learned
> using Bayesian inference and not optimization.

> 5\. Finally, MBML separates the model development from inference which
> allows us to build several models and use the same inference algorithm to
> learn the model parameters. This in turn helps to quickly compare several
> alternative models and select the best model that is explained by the
> observed data.

