
Probabilistic Programming & Bayesian Methods for Hackers - luu
http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/
======
nohuck13
The "understanding-first" approach is so right, especially with a topic like
Bayesian methods where the logic makes _so much sense_, and yet is so easy to
miss by brute-force memorization of equations. IPython notebook is a great
choice. I've only skimmed the first couple of chapters but cloned and will
definitely spend some time with this.

This seems inspired by the awesome Yudkowsky article that comes from a similar
(introductory, not programming-specific) place:
[http://yudkowsky.net/rational/bayes/](http://yudkowsky.net/rational/bayes/)

For anyone that hasn't used IPython notebook and is interested in scientific
computing in python, you need to check it out. The ability to mix prose and
live python, with effortless plotting, storable in git, sharable with links or
nbviewer, is just magic. That probably seems mundane, but for things like
exploratory data analysis across a team, it's a game-changer. Another staple
in this stack is Pandas (
[http://pandas.pydata.org/](http://pandas.pydata.org/) )

~~~
tokipin
you should see the kinds of things Mathematica notebooks can do

~~~
gcr
Ooh, have any examples?

~~~
tokipin
The most powerful stuff revolves around the _Dynamic_ symbol[1]. If you use
this, for example:

    
    
      Dynamic[Graphics[Disk[p]]]
    

And then somewhere else, either manually or programmatically, you change the
value of p, you will see the disk move around. (In link [1], open up the "Neat
Examples" dropdown).

Basically, with _Dynamic_ you are saying "I want this updated as soon as any
of its components are updated." And the system takes care of the rest.
Furthermore, the dynamic content can be completely arbitrary. Mathematica's
_Manipulate_ symbol[2] is essentially a wrapper around _Dynamic_. See [3] for
countless examples built using _Manipulate_.

To be clear, these aren't "generated programs." All this dynamism resides
comfortably and natively inside notebooks. (Mathematica also allows data that
persists across sessions, through _DynamicModule_ ).

See the last example in [4], combine that with the fact that you can have
arbitrary expressions anywhere, and you can start to get a sense of the power
the system gives you. For example, you can create an ad-hoc tool for making
diagrams or whatever, and then plop that tool right in the middle of a piece
of source code, inline.

As another example of the power of Mathematica's notebooks, links [1] [2] and
[4] are HTML exports of Mathematica notebooks. Note all of the fancy
formatting.

But wait, there's more, as Mathematica notebooks are themselves fundamentally
Mathematica expressions (essentially M-expressions. Mathematica is in part a
Lisp on top of symbolic semantics). Thus you can construct/alter them
programmatically, etc. In other words, Mathematica is a ridiculously
homoiconic system, not only in the language sense but also in the broader
systemic sense. As an example, [5] is the Mathematica expression behind the
notebook of [4].

[1]
[http://reference.wolfram.com/mathematica/ref/Dynamic.html](http://reference.wolfram.com/mathematica/ref/Dynamic.html)

[2]
[http://reference.wolfram.com/mathematica/ref/Manipulate.html](http://reference.wolfram.com/mathematica/ref/Manipulate.html)

[3] [http://demonstrations.wolfram.com/](http://demonstrations.wolfram.com/)

[4]
[http://reference.wolfram.com/mathematica/ref/DynamicSetting....](http://reference.wolfram.com/mathematica/ref/DynamicSetting.html)

[5] [http://pastebin.com/58GYSCFy](http://pastebin.com/58GYSCFy)

------
martincmartin
For those who want a more solid take on machine learning, and who still
remember their math and probability/statistics, (i.e. advanced undergrad or
new grad student), the best texts seem to be:

The Elements of Statistical Learning by Hastie, Tibshirani and Friedman,
available for free on line.

Pattern Recognition and Machine Learning by Chris Bishop. Very Bayesian.

Machine Learning: A Probabilistic Perspective by Kevin Murphy. Also Bayesian,
although not as Bayesian as Bishop. The most recent of the three, and
therefore covers a few topics not covered elsewhere like deep learning and
conditional random fields. The first few printings are full of errors and
confusing passages, should be better before too long.

Did I miss any?

~~~
stiff
The best introductory book and also the most cohesive one is "Learning from
Data" from Yaser Abu-Mostafa, accompanied by great video lectures:

[http://amlbook.com/](http://amlbook.com/)

It differs from other books in that all the material is treated from the
unified perspective of statistical learning theory and VC dimension, as a
result the book feels less like a hodgepodge of unrelated techniques and more
like an introduction to a coherent field.

Hastie and Tibshirani also have a new, less demanding mathematically book out:

[http://www.amazon.com/Introduction-Statistical-Learning-
Appl...](http://www.amazon.com/Introduction-Statistical-Learning-Applications-
Statistics/dp/1461471370/ref=sr_1_1?s=books&ie=UTF8&qid=1378737829&sr=1-1)

------
xmpir
I also like "Think Bayes" even though it is "just" a book.
[http://www.greenteapress.com/thinkbayes/](http://www.greenteapress.com/thinkbayes/)

~~~
darkxanthos
I've been working with both books and Think Bayes is more accessible (and its
also free). I recommend going through it before getting to PyMC.

One huge reason why is that the author has implemented everything you use in
Python in a way that enables you to read his code to more fully understand
what's happening. I'm on chapter 5 or 6 and he hasn't even touched on MCMC yet
which is most welcome.

------
pilooch
The title is a bit confusing as probabilistic programming is a research field
itself that the book seems to not touch upon. See [http://probabilistic-
programming.org/wiki/Home](http://probabilistic-programming.org/wiki/Home)

~~~
jkldotio
I don't see any confusion at all. The first few paragraphs of the link say
it's based on PyMC, which itself appears in your link under "Existing
probabilistic programming systems". So it's a book that's a practical guide to
using one of the systems you reference.

~~~
imurray
As a researcher with MCMC interests, I agree with the grandparent post. Used
as technical jargon "probabilistic programming" tends to mean: specify a model
using a programming language, a compiler then works out how to do inference in
that model, and writes the inference code for you.

PyMC is a toolkit that makes it easier to write inference code for a wide
range of models, but isn't as automatic as the field of probabilistic
programming promises.

As it says on the linked site, it lists more than probabilistic programming
systems, and PyMC falls into the latter categories of things it lists:

 _Below we have compiled a list of probabilistic programming systems including
languages, implementations /compilers, as well as software libraries for
constructing probabilistic models and toolkits for building probabilistic
inference algorithms._

~~~
ogrisel
PyMC follows the library approach to probabilistic programming rather than
inventing yet another application specific language that only a niche of
developers will be willing to spend time to learn.

Despite not introducing a new language syntax or DSL, PyMC is still
probabilistic programming in the sense that you have __python variables __that
represent random variables with prior distributions and then use those to
derive new random variables by using __deterministic python expressions or
functions __. Finally you plug an inference engine to be able to invert the
execution order and derive a posterior distribution on the unobserved
variables.

~~~
tristanz
I tend to agree with Ian that it's confusing to conflate probabilistic
programming and libraries that support Bayesian inference. In PyMC variables
represent random variables, but these variables can't be used with Python
constructs like conditionals and loops. Python is used to construct a DAG,
which is then executed.

I think a better definition of probabilistic programming languages are
languages where you can replace any variable of type T with Random<T>. The
line isn't entirely clear, but library approaches in languages like Python
don't fit, since they can't handle control flow. Bugs/JAGS/Stan might qualify,
although they are very limited declarative languages. Their motivation is
primarily to have a compact modeling syntax, not a real programming language.

There's no need for probabilistic programming languages to be some esoteric
DSL. You can convert languages like Python or Matlab into a probablistic
programming language with a lightweight compiler transformation:
[http://www.mit.edu/~wingated/papers/lightweight_pp.pdf](http://www.mit.edu/~wingated/papers/lightweight_pp.pdf).
Actually doing inference efficiently, however, remains as challenging as
always.

~~~
jkldotio
>it's confusing to conflate probabilistic programming and libraries that
support Bayesian inference

But it's a generic term so you could say the same about functional programming
or logic programming, both of which can be done in Python even if there are
more advanced or integrated systems elsewhere. I don't really think most
people care, besides perhaps PL researchers, at which portion of the stack
things are happening at or being optimised; if you are using the relevant
mathematics and statistics that's what you are doing. I think people are
playing semantics to say it only means one thing when it's obviously used in a
general way and a sometimes in specific way.

The bottom line is the guy who wrote the book thinks it's probabilistic
programming, ogrisel does, I do, and the people who run [http://probabilistic-
programming.org/wiki/Home](http://probabilistic-programming.org/wiki/Home)
seem to be referring to it as probabilistic programming as well. I don't buy
Ian's argument that it's part of some latter type of category on the site,
PyMC is directly linked in a section titled "Existing probabilistic
programming systems". They use "as well as" to link the two groups so either
the first is "systems" and the rest are still "probabilistic programming" just
without "systems" or they are all "probabilistic programming systems" if "as
well as" is operating in that way. The arguments against this seem to be
splitting hairs and playing semantics far too much when n-grams regularly have
more than one meaning. Indeed it's amusing to see _probabilistic_ people
arguing for one interpretation rather than saying that there could be more
than one and it depends on context (an NLP program trying to disambiguate the
meaning of a given n-gram would look at other words present, topic models for
the document, et cetera).

------
alexholehouse
This is fantastic. Both in terms of content, but also in terms of delivery.

I've been toying with ideas relating to some kind of text-book killing general
publishing platform for a while, but it's not something I'll implement (in the
next 5 years anyway). Obviously people are doing this kind of thing already,
but this is certainly the closest implementation to the ideas I've been
thinking about that I've seen.

------
MaxGabriel
I read a little of this book when it was on Hacker News a few months ago, but
only half a chapter. Did anyone read through the whole book and do you have a
review?

------
nrox
Looking forward to read Chapters X1, X2 for Machine Learning. Thanks for all
the work.

------
dantiberian
To run this in iPython (once iPython is installed and notebooks cloned), in
your terminal run: ipython notebook
Chapter1_Introduction/Chapter1_Introduction.ipynb

------
suprjami
You listen to Programming Throwdown too huh?

