Hacker News new | past | comments | ask | show | jobs | submit login
Dirichlet Process Mixture Models in Pyro (pyro.ai)
53 points by apsec112 36 days ago | hide | past | favorite | 2 comments

Without getting too far into the Bayesian vs. frequentist debate, I find the whole field of Bayesian nonparametrics to be very mathematically satisfying. Hierarchical models effectively come with a hyperparameter search "built-in".

It's also quite appealing that many of these probabilistic models made by statisticians are "fuzzy" generalizations of ad-hoc algorithms originally developed for practical reasons. In the same way that Gaussian Mixture Models are a "fuzzy" generalization of K-means, Dirichlet Process Mixture Models are a "fuzzy" generalization of the adaptive K-means algorithm, which increments K whenever outliers are detected. This connection is nicely summarized by Kulis & Jordan 2012 [1].

If you're wondering where to get started learning a topic like this, it's good to know about latent variable models and expectation-maximization first. See for example my own notes [2] on the topic. Following that, you can start to understand variational inference, as well as topics relevant to modern deep learning like amortized inference, variational autoencoders, etc..

[1] Kulis & Jordan 2012, "Revisiting K-Means: New Algorithms via Bayesian Nonparametrics" (https://people.eecs.berkeley.edu/~jordan/papers/kulis-jordan...) [2] https://benrbray.com/static/notes/eecs445-f16-em-notes.pdf

Have you looked at capsule-routing algorithms that use expectation-maximization to generate layer outputs in deep neural networks? For example, in https://research.google/pubs/pub46653/ and https://arxiv.org/abs/1911.00792 each output capsule in a layer is generated by a Gaussian mixture model and also has an associated "activation value" that increases to the extent the GMM can explain (i.e., generate) its input data data "better" than the GMMs of other output capsules in the same layer. I wonder, do you think these routing algorithms, which are differentiable end-to-end, could be generalized to Dirichlet Process Mixture Models?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact