
Ask HN: What companies are using probabilistic programming? - boltzmannbrain
Probabilistic programming systems (PPS) define languages that discretize modeling and inference such that any generative model can be easily composed and run with a common inference engine. The main advantage over traditional ML systems in deterministic code (i.e. Python) being concise, modular modeling where the developer doesn&#x27;t have to write custom inference algorithms for each model&#x2F;problem. For more info see, for example, [1] and [2].<p>I&#x27;m curious though, what applications of PPS are realized in practice? Notably Uber [3] and Google [4] are developing&#x2F;supporting their own (deep learning focused) PPS, but is it known if&#x2F;how they&#x27;re used within these companies? Are the frameworks (Pyro [5] and Edward [6], respectively) used by other companies?<p>[1] Frank Wood (Microsoft) tutorial: https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=Te7A5JEm5UI<p>[2] MIT ProbComp lab&#x27;s page of resources: http:&#x2F;&#x2F;probcomp.csail.mit.edu&#x2F;resources&#x2F;<p>[3] https:&#x2F;&#x2F;eng.uber.com&#x2F;pyro&#x2F;<p>[4] https:&#x2F;&#x2F;medium.com&#x2F;tensorflow&#x2F;introducing-tensorflow-probability-dca4c304e245<p>[5] http:&#x2F;&#x2F;pyro.ai&#x2F;<p>[6] http:&#x2F;&#x2F;edwardlib.org&#x2F;
======
O_nlogn
Improbable [0] is building (and has open-sourced) Keanu, "a general purpose
probabilistic programming library built in Java", where Bayesian networks are
represented as DAGs. It's feature list includes Probabilistic Programming
Operators and Distributions, Auto-differentiation, Inference, Maximum a
posteriori, Metropolis Hastings, Hamiltonian Monte Carlo, Sequential Monte
Carlo (Particle Filtering), and Support for Kotlin. [1]

[0] [https://improbable.io/](https://improbable.io/)

[1] [https://github.com/improbable-
research/keanu](https://github.com/improbable-research/keanu)

~~~
tiger_entropy
I met some of their team at an academic seminar and what they're doing is
pretty ambitious; and goes beyond probabilistic programming. It sounds more
like they're building traditional modelling tools on top of a PPL [0], where
the PPL helps with the model calibration.

[0] [https://github.com/deselby-research/](https://github.com/deselby-
research/)

------
compumike
We use PPLs at Triplebyte for matching software engineers to jobs where we
predict they're a strong fit. We recently published
[https://triplebyte.com/blog/bayesian-inference-for-hiring-
en...](https://triplebyte.com/blog/bayesian-inference-for-hiring-engineers)
which starts to explain our framework, though PPLs would have to be a "part 2"
blog post if anyone's interested.

~~~
btown
Would love to see that follow-up. There are so many domains, like yours, where
it's unlikely (from a model selection perspective) that indicators are
conditionally independent; whether it's hiring candidates, or matching
companies and funding sources (as we're doing at my company, Belstone - we're
hiring!), or building better dating sites, or recommending products, or
implementing public policies, there are underlying hidden variables that
capture aptitude/appropriateness of a subject to a certain aspect of the
domain.

There's tons of academic literature on how to handle this, and accelerating
industry support for the frameworks mentioned by OP... but the act of building
an early-stage software engineering culture that is amenable to the large
amounts of experimentation (often exciting, often frustrating, incredibly hard
to time-predict against business needs and runway allocation) is something
where I think the industry is still finding best practices. Were PPLs the
right move, with the benefit of your hindsight, for that problem? Were they
more promising than deep learning given challenges of properly collecting data
at scale? The process of choosing a system, measuring it against more
naive/heuristic approaches, deciding how to put it into production and
integrate with existing software/pipelines - and reliably hiring the right
people for those jobs, to make things a bit meta for Triplebyte! - that's a
narrative in search of thought leaders.

------
currymj
Stan is probably the most mature, stable PPL, and it's an extremely popular
tool, although it isn't really deep-learning-adjacent so gets much less hype.

It's generally used more for modeling and prediction than for creating
"products", if that makes sense. More popular among people with statistics or
social science backgrounds than among programmers and computer scientists.

If you dig around Stan-related websites you can see various companies and
institutions that use it. One I found quite quickly was Metrum Research Group,
which does consulting work for the pharmaceutical industry.

[https://metrumrg.com/](https://metrumrg.com/)

~~~
marmaduke
Generable is a Stan based startup which employs some of the core Stan devs

[http://www.generable.com](http://www.generable.com)

~~~
boltzmannbrain
Do you know of any details on how/why probabilistic programming is used over
traditional methods? And why Stan? My assumption is b/c it's a legacy
framework.

~~~
currymj
It's definitely not a legacy framework, it's very actively developed and
improved.

A lot of very useful statistical models can't benefit from the GPU, and for
them I think Stan is the better tool. There's a reason it's basically the
default choice for people who want to use probabilistic programming for
Bayesian statistics.

It's probably not the best tool for AI/ML type models. But for statisticians
who want to use Bayesian methods it's close to perfect.

~~~
boltzmannbrain
Thanks that's good to know.

Haha I knew "legacy" would ruffle some feathers, by which I mean Stan was
pretty much the first application-ready PPS on the block -- robust toolbox of
methods, actively developed/supported.

The last slide of this presentation on SMC inference in PPLs is a nice view of
the PPS landscape (not to mention the whole deck is a great intro by Lawrence
Murray):
[http://www.it.uu.se/research/systems_and_control/education/2...](http://www.it.uu.se/research/systems_and_control/education/2017/smc/schedule/lecture17.pdf)

~~~
currymj
Yeah, I see what you're getting at now. And that is an excellent diagram.

I've never actually used BUGS or JAGS for anything but I would consider at
least BUGS to truly be legacy software. To me the word "legacy" means you
shouldn't pick it for a new project.

------
mlthoughts2018
pymc3 provides this for Python in a way that is very concise and modular
(certainly much more concise than tensorflow-probability) -- and it is an open
question if TensorFlow might be used to replace Theano as the backend
execution engine for the next versions.

In particular, pymc3's use of ADVI to automatically transform discrete or
boundary random variables into unconstrained continuous random variables and
carry out an initialization process with auto-tuned variational Bayes
automatically to infer good settings and seed values for NUTS, and then to
automatically use an optimized NUTS implementation for the MCMC sampling, is
incredibly impressive.

For most problems, you use a simple pymc3 context manager and from there on it
acts kind of like a mutually recursive let block in some functional languages:
you define random and deterministic variables that inter-depend on each other
and are defined by their distribution functions, with your observational data
indicating which values are used for determining the likelihood portion of the
model.

After the context manager exits, you can just start drawing samples from the
posterior distribution right away.

I've used it with great success for several large-scale hierarchical
regression problems.

~~~
shoyer
The plan of record is already to build pymc4 on top of TensorFlow:
[https://medium.com/@pymc_devs/theano-tensorflow-and-the-
futu...](https://medium.com/@pymc_devs/theano-tensorflow-and-the-future-of-
pymc-6c9987bb19d5)

------
w01fe
At Semantic Machines [0] we rely heavily on probabilistic programming to build
state-of-the-art dialogue systems. In particular, we use a library called PNP
(probabilistic neural programming) on top of Dynet to allow us to express
structured prediction problems in a simple and elegant form. If there are
questions I am happy to elaborate to the extent I can. (Also, we are hiring!
My email is jwolfe@.)

[0] [http://www.semanticmachines.com/](http://www.semanticmachines.com/)

------
janwillem
Just a quick correction – Frank Wood is not at Microsoft, but at UBC:

[http://www.cs.ubc.ca/~fwood/index.html](http://www.cs.ubc.ca/~fwood/index.html)

Microsoft Research does have multiple excellent researchers working on
probabilistic programming. Infer.NET in particular is a highly advanced piece
of technology for models in which you would use message passing algorithms to
perform inference:

[http://infernet.azurewebsites.net](http://infernet.azurewebsites.net)

~~~
boltzmannbrain
Thanks. I believe he's with Oxford (although may have multiple appointments),
and that video is from Microsoft Research.

~~~
janwillem
Just to clear up further confusion: Frank was indeed at Oxford previously – he
moved to UBC this Spring. The tutorial actually took place at NIPS 2015 in
Montreal.

------
zellyn
Given Avi Bryant[1] recently released Ranier from there, I'd guess Stripe is.

[1] [https://twitter.com/avibryant](https://twitter.com/avibryant) [2]
[https://github.com/stripe/rainier](https://github.com/stripe/rainier)

~~~
avibryant
Yes, we are - Rainier is used in production, though so far it's a very small
part of our overall ML efforts.

------
kimi
Anglican
[https://probprog.github.io/anglican/](https://probprog.github.io/anglican/)
also based on a prototype by Frank Wood, using Clojure syntax.

------
arxanas
Facebook is working on probabilistic programming. Rather than develop it as a
library, they're trying to provide language support directly. It was recently
discussed at a conference; you could ask Erik Meijer for the details
([https://twitter.com/headinthebox/status/993972303863070720](https://twitter.com/headinthebox/status/993972303863070720)).

------
Lezcano
I just made public my Master Thesis project that I completed at the University
of Oxford.

It is called CPProb and it is a C++ general purpose probabilistic programming
library that uses a version of Variational Inference to learn proposals for
Importance Sampling.

It aims to be usable directly in preexisting C++ codebases. For the
fulfillment of the Master Thesis, I also wrote a tutorial on Particle filters
via SMC-like methods, and I described the design choices that one finds when
implementing one of these systems.

The C++ library with the corresponding Pytorch-based neural network and the
tutorial can be found in

[https://github.com/Lezcano/CPProb](https://github.com/Lezcano/CPProb)

and are available under a MIT license.

------
rbalicki
[https://www.ferolabs.com/](https://www.ferolabs.com/)

------
praeconium
I've used rJags for R programming language which is quite old and based on
Jags for windows but is quite straight forward though Edward, Stan and pymc3
seem to be state of art.

My project is quite simple but You can check it via homepage[0] or directly[1]

[0][http://www.vladovukovic.com](http://www.vladovukovic.com)
[1][https://bit.ly/2Krtkfi](https://bit.ly/2Krtkfi)

------
nl
I designed a system which used probabilistic programming (Stan) to combine
various deep learning based feature extractors to predict social behaviours.

~~~
curiousgal
Seems quite interesting! Do you have a write-up?

~~~
nl
No, sorry.

I maybe able to answer specific questions.

------
kccqzy
I'm working on probabilistic programming right now, and a lot of the papers
I'm reading are from Microsoft Research. They have a cool Infer.NET project,
and an ecosystem based on it is beginning to form. For example take a look at
Tabular, which is an excel addon to do "Bayesian inference for the masses"
based on Infer.NET: [https://www.microsoft.com/en-
us/research/project/tabular/](https://www.microsoft.com/en-
us/research/project/tabular/)

Overall though, by freeing developers from writing custom inference
algorithms, all the work gets pushed to the language designer/implementer. It
is not at all clear to me that one (or even a few) generic inference
algorithms will be able to satisfy the needs for different problem domains. So
there could be not one general purpose PPL but multiple ones with different
problem domains.

------
sitkack
Charles River might use PPLs for determining drone targets.

~~~
mrmiasma
Not really, but we do use our Figaro probabilistic programming language for
all sorts of things, including predictive health maintenance, modeling complex
systems like food insecurity, graph data mining for link analysis, onboard
autonomy for satellites, predicting where a cyber attack may hit next based on
different types of data, understanding the evolution of malware. All sorta of
things. ([https://www.cra.com/work/case-
studies/figaro](https://www.cra.com/work/case-studies/figaro)) for more info.

------
krotton
We use kind of a tiny in-house PPL for document analysis at
[https://scanye.io/](https://scanye.io/). We haven't open-sourced it though.

