
Probabilistic programming from scratch - bryanrasmussen
https://www.oreilly.com/learning/probabilistic-programming-from-scratch
======
inlineint
A relevant book is Think Bayes [1] by Allen B. Downey. It also uses only pure
Python to solve simple problems of Bayesian inference. I like the simplicity
of the way it communicates the ideas of Bayesian approach.

But the subject is not a duplicate of that one, as it seems to put more focus
on sampling.

[1] [http://greenteapress.com/wp/think-
bayes/](http://greenteapress.com/wp/think-bayes/)

------
deckar01
I recently learned about prolog and stumbled upon problog along the way. I
imagine automatically generating probabilistic rules from Bayesian inference
on a dataset would be really useful. I wonder how it would compare to other
machine learning techniques in accuracy and efficiency? I am always a little
skeptical about the randomness neural networks. This seems like a more
deterministic alternative.

~~~
dimitry12
Both Machine Learning (ML) and Probabilistic Programming (PP) work towards
building a mathematical object (model), which takes observation as input and
produces prediction as output. Both ML and PP are about finding the parameters
of the model, which requires bits of information.

Main differences:

1\. The type of bits of information used to find parameters of the model is
one main difference.

In ML, information used to "train" the model is called "training set" and
usually consists of the observation-prediction pairs (or just observations,
for unsupervised learning), which are of the same kind as desired observation-
prediction pair. In other words, with ML you "fit" your model using the same
type of data, which you will use the model for. At least, this is the most
common scenario of ML.

In the problems, which PP specializes in, this is not as prevalent. For
example, with PP we may often see a model which predicts where the planet is,
while the model is itself trained using apples falling from the tree.

2\. The type of output is the second difference.

ML focuses on producing a single value of the most likely prediction (Maximum
a posteriori estimate), while PP produces a probability distribution of the
predicted quantity.

Arguably, the probability distribution of the prediction is more useful for
any problem where we need to _use_ the prediction for some decision-making
process because then we can calculate the expectation of (mis)decision-cost
over possible values of prediction.

Practically, many decision-making situations can be transformed into
_estimating_ some quantity (instead of taking some arbitrary action) and ML
model can be built to directly produce prediction, which is at the same time a
decision.

3\. The speed of prediction is the third difference.

ML model, after the "slow" training phase, produces a mathematical object,
which is a straight function and can be very quickly calculated for any
observation to produce prediction (think: forward-pass in a deep-learning
network).

With PP, with most common models, you need to go through a "slow" inference
(MCMC-like) process for every new observation to produce a prediction. It is
possible, in some cases, to design a probabilistic model such that it only
needs to be fitted once and can then be transformed into an analytical
expression (function) producing predictions from observations.

As the result of these differences, people applying PP and ML are of different
characters :) Still, I expect ML and PP to converge over time. Specifically, I
notice that:

\- ML-models are used to represent probabilistic distributions in
probabilistic models; and

\- trained ML-models are being interpreted in terms of their information
content (in probabilistic terms, as if ML-model is probabilistic model).

