Hacker News new | past | comments | ask | show | jobs | submit login
Free Python ebook: Bayesian Methods for Hackers (github.com/camdavidsonpilon)
395 points by blazin_billy on June 4, 2013 | hide | past | favorite | 50 comments

"This book has an unusual development design. The content is open-sourced, meaning anyone can be an author. Authors submit content or revisions using the GitHub interface."

What a wonderful thing -- this book looks fantastic, but the approach to making it really takes the cake. I really hope interactive notebooks (iPython based or otherwise) and multiple authors collaborating on Github will become widespread.

There is both wikibooks and wiki university - last time (several years ago) I looked on wikibooks there were actually very good physiology and biology textbooks that had been open sourced. Not sure if development has tapered but it seemed a pretty robust community for a while

It sounds like a perfect fit for the kind of fields that have grown so vast you need an expert for every sub-section. Like physiology and biology, basically.

The "Modern Perl" book did this a few years ago as well. The crowd sourced nature of it helps keep the content up to date and error free.

I should like to give a contrarian comment about this, because it is on top of the front page and seems to be being received positively. This book is probably not a good way to learn about statistical inference. It has quite confused explanations of both Bayesian and frequentist approaches. The preface seems to imply that programmers, by virtue of being able to use computers, don't need to take a rigorous mathematical course in Bayesian methods. However the text actually uses mathematical notation throughout, and as far as I could tell it is often not explained. I noticed at least one case where a probability distribution (gamma) was described only through plots i.e. without specifying its pdf or how you could derive the pdf. I think the kind of discourse that this book exemplifies is halfway to cargo cult 'statistics'.

I've got exactly the same feeling. Could you suggest a good introductory textbook into MCMC? They left it as a mysterious blackbox and I'm not very uncomfortable with using mathematical blackboxes I don't understand.

David Mackay covers MCMC in his exhaustive book titled "Information Theory, Inference, and Learning Algorithms" available for free here: http://www.inference.phy.cam.ac.uk/itprnn/book.html

They are covered near the end of the book. It should be enough to familarize yourself with and understand the basic concepts of MCMC. Anything more in-depth will require a strong mathematical background.

BTW : There are probably a ton of books that cover MCMC out there - that's just one I liked and which is freely downloadable.

You can also get a PDF of Barber's BRML or look in Murphys ML text, which isn't freely available as PDF

Check inside title page, make sure you get 3rd printing of Murphy's: http://www.cs.ubc.ca/~murphyk/MLbook/errata.html


I have some background (grad student in cfd, thinking about switching to some sort of data analysis later on) but my measure theory and probability skills are rusty (on the other hand numerical linear algebra, functional calculus and complex analysis are superb). What would be a good book for my level?

You'll have no trouble.

I'm sorry! I misread your question as 'Would this be a good book for my level?'.

I can honestly say this book changed the way I think about everything. I can't recommend it highly enough.

http://www.amazon.com/Data-Analysis-Bayesian-Tutorial-Public... is short and reputable on Bayesian statistics. On MCMC specifically, I don't know, but MCMC is really a kind of algorithm that lets you find the answer to a mathematical question (so I think understanding the math is the right thing to start with).

PS. There a second edition of that book, but I've heard that the first edition is better, because the second edition added a different author and expanded the book.

Would you perhaps be willing to contribute to the project in order to improve its explanations?

Well, no, because I don't know any good reasons for using Bayesian methods (except when prior probability distributions can be found objectively through some previous experiment etc).

how do you reconcile "I don't know any good reasons for using Bayesian methods" with the fact that Bayesian methods revolutionized spam-filtering? (or maybe you disagree they did?)

Naive Bayes revolutionzed spam filtering because it is incredibly easy to implement and understand, and was reasonably effective for early spam, not because it was the best model for detecting spam. There's a reason we started seeing ads for "v1agra" and snippets of prose -- it's pretty easy to game Naive Bayes.

On the other hand, the GP's assertion that there is seldom a need for using Bayesian methods is also unwarranted; they are the basis for so many machine learning algorithms in common use -- particle filters, for example.

That's a good question and I was asking myself that I after I wrote that comment. I think my objection is more to the 'Bayesian' and less to the 'methods', if that makes sense. That is, I think constructing and updating models using Bayes' theorem can be (as people doing spam-filtering have shown) a good way of making predictions, but that it is the frequentist properties of the models that actually matter (cf. the ubiquity of cross-validation: 'the proof of the pudding is in the eating'), not the fact that they let you maintain a probability distribution over parameters.

To add to a comment below -- naive Bayes is a simple classifier which doesn't really have much in common with full-on Bayesian methods.

Is this perhaps a suggestion that something you don't want to personally contribute to shouldn't be criticised?

>without specifying its pdf or how you could derive the pdf

Would you or anyone happen to know of a good book that discusses the derivation of various advanced probability distributions? It is quite frustrating that every ML or stats book I come across run through various distributions without giving the reader any sort of motivation or intuition behind them. Without that intuition how am I supposed to have any idea when to apply one vs another?

I honestly can't recommend a book for this. The best resource I've found is MathWorld. I've picked up a bunch of very helpful intuitions from it, including:

- Cauchy: the horizontal distance from the origin at which an arrow shot at a random angle from a point below the origin hits the x-axis

- Gamma: how long you have to wait for the nth event in a Poisson process

I'm sure these must be books that I haven't read.

It's amazing how popular the term "Bayesian" is amongst people who don't really know what it means or quite where it fits in the context of other statistical paradigms.

To spare others frustrations, you need to run IPython from the same directory as the notebook you want to view. So to see chapter one:

    (your-virtualenv) ~/path/to/the/book/Chapter1_Introduction $ ipython notebook --pylab inline
and then it will be linked on the list. Importing from different directories apparently doesn't work.

I had a heck of a time getting scipy working in a virtualenv on Mac OS X Mountain Lion. If you're looking for an easy install script, I whipped one up here:


Improvements welcome!

I started using Anaconda on OSX to skip the install headaches.

I read this book a few weeks ago. I loved the ipython notebook format. Being able to edit the code for each figure and play around with parameters while reading was a treat. As a bonus, I learned as much a about making nice figures as I did about Bayesian Methods. There were quite a few typos, but it wasn't much of a context switch to edit the source as I found them and then submit patches when I was done. I probably wouldn't have taken the time if it wasn't as convenient.

ipython looks really interesting, maybe even deserving its own HN thread.

This is the second time I've seen it in the last month, I noticed it in the documentation of a Python static blog generator, Nikola [1].

[1] https://github.com/ralsina/nikola/blob/master/nikola/plugins...

I love IPython: it's definitely worth it if you want to regularly use a REPL to explore things like your Django objects, filter for things, or test scripts with test data. I really like it for exploring a codebase which I am not 100% familiar with, as its tab completion is excellent.

Run it with ipython --pprint though, so you get automatic pretty printing. I also recommend using the qtconsole plugin , as it is Much Nicer.

Basically, if you like bpython, this is better in every way I can think of. If you like the plain python REPL, give this a shot anyway. :) You may be pleasantly surprised.

You can also use the ipython notebook instead of qtconsole if for some reason you refuse install At on your system (I have a GTK using friend who so refuses).

It also has a nice client-server structure too:

I think there's a vim or emacs python extension that connects to an ipython kernel to run execute python snippets.

"PDF versions are coming. PDFs are the least-prefered method to read the book, as pdf's are static and non-interactive. If PDFs are desired, they can be created dynamically using Chrome's builtin print-to-pdf feature."

While I agree PDFs are antiquated, I still like them for casual, off-the-grid reading, and opening many different pages and printing to PDF is not feasible or easy to organize once on my iPad for reading. All the same, I'll check this out.

I hate reading substantial things on any kind of backlit screen. Somehow my attention seems to wander. But I find that I can focus somewhat better on pdfs than webpages. I suppose it's some sort of philistinic nostalgia for the ultimate static and non-interactive medium that is the paper book.

That said, this idea looks awesome. I'd still appreciate a pdf to supplement this, though :).

Why should it be mere nostalgia? The typesetting on PDFs is typically far higher quality than on web pages.

I prefer PDF simply because usually PDFs have better typography than web browsers for reading sessions. Frankly the chapters and fonts from the github page look ... welll... bad in Chrome anyway.

A very interesting way of making a book/"store of knowledge".

I really really like this. Can these python interactive books be constructed to show and run code snippets of different languages?

Yes, with a bit of glue code, you can have 'cell magics' to run code in another language. This notebook has examples with R code:


There are extensions for Cython, Octave and C too. And there's a generic mechanism for scripting languages (%%script), but that only captures stdout, rather than moving objects between different languages.

I'm on the first chapter, but I can tell this is the perfect book for me. Very well written. Easy to follow for a non-mathematician and overall, a perfect introduction to a field I'm very interested in learning. I've worked through books such Programming Collective Intelligence but this closes the gap between blindly following along and actually understanding the fundamentals which that book lacks.

The controversy of this survey [1] could have been avoided if the survey authors had used the coin flip algorithm in Chapter 2. (Navigate to Chapter 2 on Github and Ctrl+F on "Example: Cheating" without quotes. Maybe someone should submit a pull request to add anchors to the HTML output.)

[1] https://news.ycombinator.com/item?id=5777578

Wow the interactive book sounds like a fantastic idea.

I would still like to read text parts on my Nook. How would one go about converting this to epub?

View each chapter in the browser and save as web page - complete. Then use Calibre (http://calibre-ebook.com/) to convert the html to your ebook format of choice. It's kind of a pain, but I've done this for my Kindle with a bunch of web material.

I've only skimmed very briefly each chapter, but I'm a pretty big fan of PyMC and am extremely impressed from what I saw regarding its use and how to think about optimization problems from a Bayesian framework. Chapter 5 and the Dark World example were particularly interesting.

Good book but I'd like to see more detailed explanation of MCMC inner workings. I'm very uncomfortable with using mathematical blackboxes I don't understand. Can anyone suggest a good introductory textbook into MCMC?

It has much more than MCMC, but the "Probabilistic Graphical Models" textbook by Daphne Koller (of Coursera) and Nir Friedman is a thorough introduction to the subject. It includes a discussion of MCMC that will leave you with a deeper understanding than the typical shallow treatment.

As an alternative, you can try watching about 90 minutes of lecture starting here:


but it will drop you right into the fray without much context.

Thank you, this looks exactly like the book I need.

Weirldy enough, exactly yesterday I started having a look at this[1], which seems to be very related...

[1] http://www.greenteapress.com/thinkbayes/

There should really be a standard interactive-ebook file format (something like a pdf + ipython, or an open-source CDF). Low power devices could easily degrade to text only with good tipography.

I've been following this for awhile, and am very impressed with how the book is coming together, both from a process and a content standpoint.

I couldn't agree more. The content has been relevant from day one and it's been looking even better ever since. I've passed it in a few lectures already and people really love it already :-)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact