Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What companies are using probabilistic programming?
251 points by boltzmannbrain on June 3, 2018 | hide | past | favorite | 33 comments
Probabilistic programming systems (PPS) define languages that discretize modeling and inference such that any generative model can be easily composed and run with a common inference engine. The main advantage over traditional ML systems in deterministic code (i.e. Python) being concise, modular modeling where the developer doesn't have to write custom inference algorithms for each model/problem. For more info see, for example, [1] and [2].

I'm curious though, what applications of PPS are realized in practice? Notably Uber [3] and Google [4] are developing/supporting their own (deep learning focused) PPS, but is it known if/how they're used within these companies? Are the frameworks (Pyro [5] and Edward [6], respectively) used by other companies?

[1] Frank Wood (Microsoft) tutorial: https://www.youtube.com/watch?v=Te7A5JEm5UI

[2] MIT ProbComp lab's page of resources: http://probcomp.csail.mit.edu/resources/

[3] https://eng.uber.com/pyro/

[4] https://medium.com/tensorflow/introducing-tensorflow-probability-dca4c304e245

[5] http://pyro.ai/

[6] http://edwardlib.org/

Improbable [0] is building (and has open-sourced) Keanu, "a general purpose probabilistic programming library built in Java", where Bayesian networks are represented as DAGs. It's feature list includes Probabilistic Programming Operators and Distributions, Auto-differentiation, Inference, Maximum a posteriori, Metropolis Hastings, Hamiltonian Monte Carlo, Sequential Monte Carlo (Particle Filtering), and Support for Kotlin. [1]

[0] https://improbable.io/

[1] https://github.com/improbable-research/keanu

I met some of their team at an academic seminar and what they're doing is pretty ambitious; and goes beyond probabilistic programming. It sounds more like they're building traditional modelling tools on top of a PPL [0], where the PPL helps with the model calibration.

[0] https://github.com/deselby-research/

We use PPLs at Triplebyte for matching software engineers to jobs where we predict they're a strong fit. We recently published https://triplebyte.com/blog/bayesian-inference-for-hiring-en... which starts to explain our framework, though PPLs would have to be a "part 2" blog post if anyone's interested.

Would love to see that follow-up. There are so many domains, like yours, where it's unlikely (from a model selection perspective) that indicators are conditionally independent; whether it's hiring candidates, or matching companies and funding sources (as we're doing at my company, Belstone - we're hiring!), or building better dating sites, or recommending products, or implementing public policies, there are underlying hidden variables that capture aptitude/appropriateness of a subject to a certain aspect of the domain.

There's tons of academic literature on how to handle this, and accelerating industry support for the frameworks mentioned by OP... but the act of building an early-stage software engineering culture that is amenable to the large amounts of experimentation (often exciting, often frustrating, incredibly hard to time-predict against business needs and runway allocation) is something where I think the industry is still finding best practices. Were PPLs the right move, with the benefit of your hindsight, for that problem? Were they more promising than deep learning given challenges of properly collecting data at scale? The process of choosing a system, measuring it against more naive/heuristic approaches, deciding how to put it into production and integrate with existing software/pipelines - and reliably hiring the right people for those jobs, to make things a bit meta for Triplebyte! - that's a narrative in search of thought leaders.

Stan is probably the most mature, stable PPL, and it's an extremely popular tool, although it isn't really deep-learning-adjacent so gets much less hype.

It's generally used more for modeling and prediction than for creating "products", if that makes sense. More popular among people with statistics or social science backgrounds than among programmers and computer scientists.

If you dig around Stan-related websites you can see various companies and institutions that use it. One I found quite quickly was Metrum Research Group, which does consulting work for the pharmaceutical industry.


Generable is a Stan based startup which employs some of the core Stan devs


Some? There is only one (out of 31) listed at http://mc-stan.org/about/team/

Do you know of any details on how/why probabilistic programming is used over traditional methods? And why Stan? My assumption is b/c it's a legacy framework.

It's definitely not a legacy framework, it's very actively developed and improved.

A lot of very useful statistical models can't benefit from the GPU, and for them I think Stan is the better tool. There's a reason it's basically the default choice for people who want to use probabilistic programming for Bayesian statistics.

It's probably not the best tool for AI/ML type models. But for statisticians who want to use Bayesian methods it's close to perfect.

Thanks that's good to know.

Haha I knew "legacy" would ruffle some feathers, by which I mean Stan was pretty much the first application-ready PPS on the block -- robust toolbox of methods, actively developed/supported.

The last slide of this presentation on SMC inference in PPLs is a nice view of the PPS landscape (not to mention the whole deck is a great intro by Lawrence Murray): http://www.it.uu.se/research/systems_and_control/education/2...

Yeah, I see what you're getting at now. And that is an excellent diagram.

I've never actually used BUGS or JAGS for anything but I would consider at least BUGS to truly be legacy software. To me the word "legacy" means you shouldn't pick it for a new project.

GPU and MPI support in Stan coming hopefully this summer!

pymc3 provides this for Python in a way that is very concise and modular (certainly much more concise than tensorflow-probability) -- and it is an open question if TensorFlow might be used to replace Theano as the backend execution engine for the next versions.

In particular, pymc3's use of ADVI to automatically transform discrete or boundary random variables into unconstrained continuous random variables and carry out an initialization process with auto-tuned variational Bayes automatically to infer good settings and seed values for NUTS, and then to automatically use an optimized NUTS implementation for the MCMC sampling, is incredibly impressive.

For most problems, you use a simple pymc3 context manager and from there on it acts kind of like a mutually recursive let block in some functional languages: you define random and deterministic variables that inter-depend on each other and are defined by their distribution functions, with your observational data indicating which values are used for determining the likelihood portion of the model.

After the context manager exits, you can just start drawing samples from the posterior distribution right away.

I've used it with great success for several large-scale hierarchical regression problems.

The plan of record is already to build pymc4 on top of TensorFlow: https://medium.com/@pymc_devs/theano-tensorflow-and-the-futu...

At Semantic Machines [0] we rely heavily on probabilistic programming to build state-of-the-art dialogue systems. In particular, we use a library called PNP (probabilistic neural programming) on top of Dynet to allow us to express structured prediction problems in a simple and elegant form. If there are questions I am happy to elaborate to the extent I can. (Also, we are hiring! My email is jwolfe@.)

[0] http://www.semanticmachines.com/

Just a quick correction – Frank Wood is not at Microsoft, but at UBC:


Microsoft Research does have multiple excellent researchers working on probabilistic programming. Infer.NET in particular is a highly advanced piece of technology for models in which you would use message passing algorithms to perform inference:


Thanks. I believe he's with Oxford (although may have multiple appointments), and that video is from Microsoft Research.

Just to clear up further confusion: Frank was indeed at Oxford previously – he moved to UBC this Spring. The tutorial actually took place at NIPS 2015 in Montreal.

Last I saw him (last year) he was still a prof at Oxford, pushing for a startup called « invrea » (that does try to use PP [1]). I think UBC was before that.

[1] http://invrea.com/index.php

Given Avi Bryant[1] recently released Ranier from there, I'd guess Stripe is.

[1] https://twitter.com/avibryant [2] https://github.com/stripe/rainier

Yes, we are - Rainier is used in production, though so far it's a very small part of our overall ML efforts.

Anglican https://probprog.github.io/anglican/ also based on a prototype by Frank Wood, using Clojure syntax.

Facebook is working on probabilistic programming. Rather than develop it as a library, they're trying to provide language support directly. It was recently discussed at a conference; you could ask Erik Meijer for the details (https://twitter.com/headinthebox/status/993972303863070720).

I just made public my Master Thesis project that I completed at the University of Oxford.

It is called CPProb and it is a C++ general purpose probabilistic programming library that uses a version of Variational Inference to learn proposals for Importance Sampling.

It aims to be usable directly in preexisting C++ codebases. For the fulfillment of the Master Thesis, I also wrote a tutorial on Particle filters via SMC-like methods, and I described the design choices that one finds when implementing one of these systems.

The C++ library with the corresponding Pytorch-based neural network and the tutorial can be found in


and are available under a MIT license.

I've used rJags for R programming language which is quite old and based on Jags for windows but is quite straight forward though Edward, Stan and pymc3 seem to be state of art.

My project is quite simple but You can check it via homepage[0] or directly[1]

[0]http://www.vladovukovic.com [1]https://bit.ly/2Krtkfi

I designed a system which used probabilistic programming (Stan) to combine various deep learning based feature extractors to predict social behaviours.

Seems quite interesting! Do you have a write-up?

No, sorry.

I maybe able to answer specific questions.

I'm working on probabilistic programming right now, and a lot of the papers I'm reading are from Microsoft Research. They have a cool Infer.NET project, and an ecosystem based on it is beginning to form. For example take a look at Tabular, which is an excel addon to do "Bayesian inference for the masses" based on Infer.NET: https://www.microsoft.com/en-us/research/project/tabular/

Overall though, by freeing developers from writing custom inference algorithms, all the work gets pushed to the language designer/implementer. It is not at all clear to me that one (or even a few) generic inference algorithms will be able to satisfy the needs for different problem domains. So there could be not one general purpose PPL but multiple ones with different problem domains.

Charles River might use PPLs for determining drone targets.

Not really, but we do use our Figaro probabilistic programming language for all sorts of things, including predictive health maintenance, modeling complex systems like food insecurity, graph data mining for link analysis, onboard autonomy for satellites, predicting where a cyber attack may hit next based on different types of data, understanding the evolution of malware. All sorta of things. (https://www.cra.com/work/case-studies/figaro) for more info.

We use kind of a tiny in-house PPL for document analysis at https://scanye.io/. We haven't open-sourced it though.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact