Hacker News new | past | comments | ask | show | jobs | submit login
A general-purpose probabilistic programming system with programmable inference (github.com/probcomp)
238 points by espeed on June 28, 2019 | hide | past | favorite | 72 comments

Sounds spiffy. Also, I must find time to learn Julia. One problem with these 'out of a box' cookbook systems is that you only know approximately what you are doing. If it works, that may be fine, but for statistical systems the outcome depends heavily on the formulation, is strongly influence by random or irrational correlations and is more art than science. If you remove the element of experience that animates the "art", you get nonsense. Big box programming tools can make it hard to know what correlations are being established, and this is true with even simple neural networks. By the time they have many layers and filters in them, all hope is fled. You buy the result or you don't: simple as that.

I have nowhere experienced more looking under the hood feeling then when I am using Julia.

The source code of the packages sit on my computer, I can (and sometimes do) modify it, and I can still compile the same function to SIMD or GPU.

Although I have never used Julia, this is something that is possible in Ruby, and I think this feature is seriously underrated – i.e. to be able to jump quickly to the definition of external code, modify it and run it. When you work with multiple libraries/repo this is very valuable when debugging, saves a lot of time.

It should also be possible with Node.js I think (node_modules).

Don't get me wrong, I love Ruby, and my code was originally in Ruby, but the 20x speedup and better math libraries in Julia without sacrificing the speed of development too much made the switch for me very easy.

Actually translating Ruby code to Julia was much easier than I thought (the only real difference is the indexing).

The 1 based indexing really sucks in the PTX assembly output of Julia as well, I see a lot of useless increment and decrement operations when I don't expect it.

Makes sense to write it in Julia. Pretty much the only non commercial language that deals natively with mathematical objects in a way that makes sense to non CS people doing statistics. Plus, it is fast out of the box.

Like, when I first saw how you work with matrices in Python, I had a stroke.

I dont really get why we use Python for everything. The matlab style syntaxt is infinitly better for ML, stats and math.

> The matlab style syntaxt is infinitly better for ML, stats and math.

Syntax is a secondary concern for all but short-term, small-scale programming.

When it comes to making scalable software systems (scalable in every sense of the term, not just performance but also feature growth etc.), syntax is really not the biggest concern; as such, optimizing for syntax can be detrimental to other more important aspects (e.g composability, reusability, programmability), and focusing on syntax means you're not paying attention to the really critical stuff.

I mostly work in Clojure where _all_ operations use prefix syntax (e.g you write (+ A B C) instead of (A + B + C) for addition, and + is just a function). Trust me, it's unfamiliar at first, but it makes reasoning about the program MUCH saner than all the special cases and irregularities that language designer bake into the language just to make some operations infix.

RPN FTW. Good ol' LISP. :)

In Haskell, you can turn any function into an infix operator with backquotes. As a qualitative, nonfunctional stylistic, ergonomic opinion, I find Haskell much more flexible and beautiful than the ugliness of most brace-, paren- and prolixity-heavy languages. It can present too many features for pragmatic use, orthogonal to something in another domain like Rust or Go for effective software production where features are intentionally constrained or disallowed.

> RPN FTW. Good ol' LISP. :)

Actually Lisp uses ordinary old polish notation. Reverse Polish Notation is used by stack-oriented systems like HP calculators and Forth.

Not really; the motivating point of Łukasiewicz's notation was specifically to eliminate parentheses.

Well choosing between

(- (+ 1 2 3 4 5) 1)


- + + + + 1 2 3 4 5 1

The first one seems much more readable to me...

Lisp functions can take fewer or more than two arguments while many mathematical operators take only two.

And parentheses don’t serve precisely the same role in Lisp syntax than they in the more free-form world of maths.

We have macros and LISP is a building material, not a language.

One line of code is just enough to get something that works (defmacro infix [a op b] `(~op ~a ~b))

There are libraries for that too. Clojure: https://github.com/rm-hull/infix

Different things to care about. I have worked on stats in research, start ups and at large banks. A lot of production/infrastructure related things certainly profit from the points you mention.

But 99% of statistical applications I have seen differ. For someone using math, "programmability" and the other aspects you mention are certainly higher in a language that emulates how we think about the problems. While indeed Clojure may work well for some areas in math, Julia is certainly appropriate for statistics and probability - in my opinion the best -, and finally, Python isn't very good for either.

Everything else is almost always secondary. I would also argue that Julia is absolutely fine for developing large, scalable programs, not worse than Python.

Since Clojure has no statistics / ML ecosystem afaik, I am not sure why you mention it. This is absolutely crucial to even being considered in my point. That is why R dominates statistics still, by a large margin.

> Since Clojure has no statistics / ML ecosystem afaik, I am not sure why you mention it.

I mentioned it to illustrate my point, that semantics and other aspects matter much more than syntax in many cases.

(Btw Clojure does have a ML / stats ecosystem, in part via the JVM, though certainly not as developed as Python / Julia / R. For instance, Anglican is a probabilistic programming language embedded in Clojure: https://github.com/probprog/anglican).

> For someone using math, "programmability" and the other aspects you mention are certainly higher in a language that emulates how we think about the problems.

Sure, but syntax is just about notation - semantics are much more important to achieving nearness to a mental model. If you can't express your mental model in another notation than the one you're familiar with, then you probably don't have a very deep understanding of said model.

Continuing with my example, prefix notation for matrix multiplication does not hurt _at all_ my ability to reason about linear algebra - it sometimes even clarifies it.

I also think you misinterpret what I meant by programmability, which is not the same thing as 'ease of programming' - more like how smoothly various parts of a program interact. If for the sake of syntactic sugar you've introduced a proliferation of different programming constructs with no unifying abstraction, then the other parts of the program will need to make a proliferation of case distinctions as well - that's one way to hurt programmability.

To add on what @valw said:

Clojure's ML/stats ecosystem is moving fast. Several important libraries are under construction and will mature in few months. Imho, it is worth following this year, for anyone interested in languages for ML/stats.

In addition to probabilistic programming libraries such as Metaprob and Anglican mentioned above, here are some libraries worth mentioning: https://github.com/MastodonC/kixi.stats https://github.com/generateme/fastmath https://github.com/techascent/tech.ml

Anyone have a comparison of Anglican vs Gen? According to https://github.com/probprog/anglican/blob/master/doc/devel.m... the programmable inference seems to be a feature of both.

Good question. Disclaimer: I’m in the lab that made Gen & was on the paper, so not impartial :)

Anglican is implemented in Clojure, and can be extended (by writing new Clojure code) to support new general-purpose inference engines. Creating those extensions requires an understanding of both the statistics and the PL concepts used in Anglican’s backend; you are essentially writing a new interpreter for arbitrary Anglican code.

Gen provides high level abstractions for writing custom inference algorithms for _specific models/problems_ (not entire general-purpose inference engines). Those abstractions don’t require reasoning about PL concepts like continuation-passing style, nor do they require the user to do any math by hand. Of course, since Gen is just Julia code, you can still reach in and implement new inference engines (just as in Anglican/Clojure) if you’re an expert. But I wouldn’t expect people who are not probabilistic programming researchers to do this (in either Anglican or Gen).

Making scalable software systems is a secondary concern to most people.

R, SAS, Octave, Mathematica, MATLAB, Excel (lol), FORTRAN and many more come to mind. Julia seems more like flavor of the month.

As an example, I know of a Monte Carlo nuclear reactor simulator that was written in tens of millions of lines of Fortran over a period of 3+ decades by non-software engineer nuclear engineers.

Also, have you used an HP 48 calculator? You can snag an emulator app and a ROM to use it on most smartphones, and it does integration and derivation.

Comrade Dyatlov? COMRADE DYATLOV?!

I wish HN had gold to give.

Can you tell anything about that simulator? it sounds wild:)

There's progress in Python with the new `a @ b` operator for matrix multiplication.

The email operator.

In Romance languages the at-sign is called an arobase/arroba.

Seems to be a Julia lib unless I'm missing something?

Github says it is

    Julia 100.0% 
(click colored bar at the top of a repo)

The problem with frameworks like this is when you try to do something which isn't strictly standard, or lack understanding on whatever the framework is actually doing. Things will no doubt break, and then you have a 15 layered framework stack trace to try to debug.

You can't dodge knowledge with frameworks, only use frameworks to effectivize your knowledge.

This is exactly the premise of the paper, which isn’t presenting a framework, rather a language for general open ended computation for ML.

I haven't had that problem with Julia-based probabilistic programming frameworks. I stick functions from pre-existing packages into Turing.jl and it seems to work fine as long as the library is compatible with AD.

I'm excited for the probabilistic programming symposium at JuliaCon :). We had so many submissions around PP languages and approaches that we decided to bunch them all together to get people talking about the pros/cons of various approaches and find out what kinds of problems to collectively tackle. PP seems like a very promising field.

Could anyone point me in the direction of resources to learn more about probabilistic programming?

(e.g.: text books, practical applications, introductory articles)

MIT Probabilistic Computing Project http://probcomp.csail.mit.edu/

The Design and Implementation of Probabilistic Programming Languages (https://dippl.org) by Noah D. Goodman https://cocolab.stanford.edu/ndg.html

Stanford CS 228: Probabilistic Graphical Models https://cs228.stanford.edu and book by Daphne Koller http://openclassroom.stanford.edu/MainFolder/CoursePage.php?...

ProbTorch: Library for deep generative models that extends PyTorch https://github.com/probtorch/probtorch

Anglican: Probabilistic programming language integrated with Clojure and ClojureScript https://probprog.github.io/anglican/index.html

Discussion: https://news.ycombinator.com/item?id=18585465

[shameless plug]

Ranked programming is like probabilistic programming but you don't use probabilities. Instead, you state how your program normally behaves and how it may exceptionally behave. Conceptually it's very similar to probabilistic programming, but the underlying uncertainty formalism is replaced with ranking theory.

You can find an implementation of this idea (based on Scheme/Racket) here:


For more detailed information check the paper linked to on that page.

- A Personal Viewpoint on Probabilistic Programming https://www.youtube.com/watch?v=TFXcVlKqPlM by Dan Roy (I think this is the best intro to PP there is)

- Probabilistic Models of Cognition https://probmods.org/ by Noah D. Goodman, Joshua B. Tenenbaum & contributors

- An Introduction to Probabilistic Programming https://arxiv.org/abs/1809.10756 By Jan-Willem van de Meent, Brooks Paige, Hongseok Yang, Frank Wood

I think this is an exciting time for programming language designers. As we can see in blockchain space, on which there are new programming languages like Solidity and Vyper, I believe there will be new programming languages to make it easier for the programmers to express themselves (something different compared to Julia, R, Swift, or Python) in AI.

Right now people solve the problem with frameworks like Keras so the programmers have easier time to express themselves (for example designing neural networks). Imagine you have a new programming language on which it is much clearer and easier to create a neural network (in Deep Learning).

There is a missing piece in the Julia subsystem - first class support for GNN (Graph Neural Networks). So if someone wants to help - he can take an issue[1] at the GitHub repository of FluxML[2].

[1] https://github.com/FluxML/Flux.jl/issues/625

[2] https://fluxml.ai/

There's a JSoC developing this IIRC.

There seems to be an opportunity for a language to fill the data science/AI space for people who want something more performant and safe (ie, statically typed) than Python. I see people coming into /r/Haskell and other forums all the time looking for something that fits this category but there's never a really good answer that is better than the compromise that is Python with some C or whatever based optimizations.

Do you know how raw Julia and Nim stack up on those fronts?

No I don't sorry I'm only a spectator to those threads.

I'd personally at a minimum try out Julia if I was doing data science often. Don't know enough about Nim enough yet.

Bunch of tutorials using the language: https://probcomp.github.io/Gen/tutorials.html

Paper: https://dl.acm.org/citation.cfm?id=3314221.3314642 (is this article paywalled? I'm on my university's network so I don't have a paywall)

The syntax doesn't seem to be changed compared to Julia (so is it actually a new language or just a library? Not sure, haven't seen Julia code in a while.)

>The syntax doesn't seem to be changed compared to Julia (so is it actually a new language or just a library?

And the semantics are the same with some additions. This is a good thing as it allow interop with Julia code, low mental overhead and shows the power of Julia :)

Very exciting; I've found using STAN really hard, I really struggle to model real problems, perhaps this will make it easier - and I far prefer Julia to R in project work (not analytics or episodic data science; but when developing systems I find it hard to structure and decompose).

I wonder if the HN crowd think this system is AI ?

Random comment : by episodic data science I mean one of question answering quests. These are often created by incidents or by new stakeholders asking unexpected questions. Basically - collect, curate, explore, answer, explain, dump.

Pretty clickbaity, this looks like another probabilistic programming framework. Others include Pyro (pyro.ai) and Edward (http://edwardlib.org/).

Maybe the title's clickbaity, but steps forward in probabilistic programming are great. From the paper, it looks interesting. https://dl.acm.org/citation.cfm?id=3314221.3314642

My observation is that a lot of the MIT News and MIT Technology Review titles read like this.

It seems to work to get exposure for work being done there.

This press release is super weird. They are trying really really hard to not mention Julia.

A "new language" versus "a DSL built on Julia".

"Swift for Tensorflow" is still Swift.

I wonder why.

Maybe the link has changed, but now one can read:

“Building off concepts used in their earlier probabilistic-programming system, Church, the researchers incorporate several custom modeling languages into Julia, a general-purpose programming language that was also developed at MIT.”

Maybe HN just needs to ban submitting university press releases.

They don't even link to Julia like they have for other languages...

Typical MIT press release balognia; their publicist really should get a job in industry and stop misleading the scientific community.

What they really mean is "new probabilistic graphical model language." Yet another BUGS/JAGS/Stan like system.

It's not a PGM package. It's designed to work with arbitrary Julia programs and can be used for program induction etc. In that regard it's more like WebPPL or Church than Stan.

OP originally linked it to some breathy MIT press release which said

"New AI programming language goes beyond deep learning General-purpose language works for computer vision, robotics, statistics, and more" https://news.mit.edu/2019/ai-programming-gen-0626

I'm pretty sure most sane people agree this is a giant pile of meaningless marketing horse pookey, which is par for the course with MIT these days. That's what I was responding to. And characterizing it as "like stan" is a lot closer to descriptive than anything in that article. I picture whoever writes this stuff for MIT as wearing tap out t-shirts with neck tattoos.

Sure, the article is largely meaningless. But I've spent a couple months playing with this package and it is a lot more general than Stan etc.

This goes way way beyond graphical models.

The title is confusing. Can anyone please explain if this is Julia or a new programming language?

It's a set of macros and helper functions build on top of Julia to provide a sort of probabilistic programming DSL. The actual project describes themselves as a "programming system" rather that a programming language.

This is based on Julia.

The only place were they make it clear this is a Julia package is in the installation instructions:

"The, install the Gen package with the Julia package manager. From the Julia REPL, type ] to enter the Pkg REPL mode and then run: pkg> add https://github.com/probcomp/Gen"


I was really looking forward to probabilistic programming but then I saw this project is based on Julia. Not to keen on leaning Julia as I already know Python/C++/Java/Scala/R. Wish it was based on c++/python to at least get started with. Please let me know if there are other interesting probabilistic programming frameworks available or tutorials to get started with.

Edit: https://medium.com/tensorflow/an-introduction-to-probabilist... There is support in tensorflow for probabilistic programming. How is this any different?

Julia is fast becoming one of the great programming languages of our era. Julia is not going away, its usage is only becoming more pervasive. Imagine if you had learned Python 20 years ago. Julia's founding sponsor is Dr. Jeremy Kepner [1] who is the founder of MIT Lincoln Laboratory Supercomputing Center [2] and the founder of GraphBLAS [3]. Kepner's got a good eye for things, he knows what's coming down the pipe and what will be important in a language of the future. Learning Julia now means you'll be ahead of the curve for years to come. Your time will not be wasted.

[1] https://en.wikipedia.org/wiki/Julia_(programming_language)#S...

[2] http://www.mit.edu/~kepner/

[3] http://graphblas.org

Spoiler alert: There is a GSoC student working on getting GraphBLAS working together with Julia.

Julia needs something like PyTorch or Tensorflow equivalents.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact