EDIT: haha I am downvoted for speaking the truth. The parent and I have both read lots of papers like these, there is only a modest contribution of this paper. It basically says we fudged the inference and the end results look similar so its a good optimization. However, there will be probably lots of specific models that "need" the mixing freedom the approximation removes, and hence the algorithm will only work for a specific subspace of MCMC problems. MCMC is basically impossible to debug, so we dunno how well it works overall. THIS IS THE SAME CONCLUSION OF EVERY OTHER MCMC APPROXIMATION PAPER. The only reason this is on HN, is because of its heritage. I do not think this paper is revolutionary (unlike some other papers coming out of Google).
EDIT 2: evidence
"Fully Parallel Inference in Markov Logic Networks" - Max-Planck
"Hybrid Parallel Inference for Hierarchical Dirichlet Process"
"A split-merge mcmc algorithm for the hierarchical dirichlet process"
"Parallel Implementations of Probabilistic Inference" a 1996 review paper!
You might say these papers are not exactly the same, ok, but the final justification for the given paper is:
"Depending on the model, the resulting draws can be nearly indistinguishable"
NOTE kewords: "depending" and "nearly"
Your subsequent edit is useful - thank you. I just wish you'd said that in the first place and left out the swipe.
I thought arguing it has a was a average contribution would be more snidy. It's perfectly good work, just not in the same league as map-reduce or self-driving cars.
Not least, you might be wrong. It might be getting more air not because it's from Google, but simply because it has more visibilty in general. My feeling is that other, perhaps more scholarly, and perhaps more in-depth articles don't see the light of day simply because they are in more obscure places. You can help by pointing at other sources of similar material, and explaining why they are potentially of more value.
A single line of "It's by a Googler" was unenlightening, and to me came off as pure snark.
The subsequent edit was useful, although the attitude is, well, "sharp".
Can you please post the title/author of the ones that are most relevant?
It is not obvious that this can work at all in some cases. Think, for example, a clustering model. If there are two clusters, but one machine calls them A B and the other machine calls them B A, averaging will give you useless results.
So the contribution of this paper is finding a set of models on which naive averaging works, and showing an efficient mapreduce implementation of it.
That said, I don't find the paper particularly interesting.
Academics tend to trivialize the implementation (we had some pretty strong critics of his talk in my dept), but some kudos are in order for that, even if the algorithm itself isn't revolutionary.
My key question would be whether there are independence considerations that Parallel MC breaks? (Message passing works because all nodes need only information from their neighbours to summarise the rest of the network, due to the modelled independence).
What's a good example of where this breaks down? I'm thinking weather simulations, but I don't know the domain well enough.
This paper doesn't look like it has a magic solution though. It seems to propose an adhoc sharding approach and then shows it sometimes works.
The "all the data" problem is interesting. It seems solvable if each node summarizes the likelihood function for part of the data. I'll have to read the paper and see if that's what they did.