
Gaussian Processes for Machine Learning (2010) - privong
http://www.gaussianprocess.org/gpml/
======
vladislav
There are some interesting connections between GPs and neural networks. Deep
neural networks with random weights concentrate about gaussian processes, so
the two may not be so far apart in practice.

[https://arxiv.org/pdf/1711.00165.pdf](https://arxiv.org/pdf/1711.00165.pdf)

------
jkabrg
Where have people used Gaussian Processes to good effect? And how do they
compare to competing models? There appears to be a lot of theory in this book,
and I'm wondering how much of it is useful to applied data science.

~~~
cultus
I'm a data scientist who uses gaussian processes all the time. They are:

1\. Typically very accurate. 2\. Sound theory and good uncertainty estimates.
3\. Being a Bayesian model, tuning is very easy.

The main competing models for some of the same tasks are gradient boosted
decision trees and sometimes neural networks. GBTs win over NN for most tasks
in practice, although they don't get much hype. GPs do well with smooth data
in my experience, with GBTs winning over any data where a limited number of
bespoke decision tree splitting rules can represent the data well.

Interestingly, damn near anything, including neural networks, linear
regressionm and GBTs can be interpreted as gaussian processes (or an
approximation of GPs) by certain choice of covariance function. GPs are just
functions in a reproducing kernel hilbert space defined by the covariance
function. That can include most anything.

GPs with full covariance matrices don't scale to more than a few thousand
examples (n^3), but approximations can be made to scale to large datasets.

~~~
Xcelerate
> GBTs win over NN for most tasks in practice, although they don't get much
> hype

I've always thought GBDTs get too much hype. As a data scientist, it seems
like everyone wants to immediately throw a random forest or GBDT at the
problem without knowing anything else about it.

~~~
mxwsn
Yeah, I think in the data science community, GBDTs are appropriately-hyped,
since their dominant performance on Kaggle has been well known for some time
now. In addition to that, GBDTs are so easy to run; taken together, it's
probably always correct to just run a GBDT as one of the first things you do
after you've got the data wrangled. Of course, as a phd-in-training data
scientist, I feel disappointed (either in myself or in the task) if I can't
think of a more interesting and better performing method than a GBDT :)

------
wodenokoto
Does it require Matlab or are all tasks doable in R or Python? (I.e, does it
rely on some sort of GP-library for Matlab?)

~~~
awav
No book itself doesn't rely on any frameworks. Nevertheless, you can look at,
[https://github.com/GPflow/GPflow](https://github.com/GPflow/GPflow).

