After that, for practical applications, Gen looks very promising. The MIT group already presented a very early prototype 2 years ago at PPAML.
But if you don't understand the fundamentals (generative process, mixture model, hierarchical models, non-parametric models, etc) you will be lost. Note v1 of Probmods was written in Church (a Scheme dialect), which has some advantages to create meta-models.
An interesting embedded language for probabilistic programming and meta-programming implement in Clojure from the same organization: https://github.com/probcomp/metaprob
If you want a non-mathematician friendly introduction, "Probabilistic Programming and Bayesian Methods for Hackers" [0] is a good place to start. For a motivating application, chapter 2 walks through using probabilistic programming for A/B testing.
You have some existing scientific model (most likely a differential equation). You want to estimate the parameters of the model from data, but get distributions instead of point estimates so that you can quantify the uncertainty. PP is the answer.
Being able to run generic codes and packages from a whole language is really freeing because people in Stats don't generally have the same models as a systems biologist, pharmacometrician, computational fluid dynamics researcher, etc., so you know in advance that there will never be "built-in support" for any of these disciplines. And that's fine as long as the system is extensible. I like the Julia-based probabilistic programming frameworks because they let you put entire differential equation solvers in there, and estimate the parameters of some scientific model in a way where you get posterior distributions that quantify the uncertainty. Stan hard-coded 1-2 ODE solvers in there for "similar" functionality, but here we get a few hundred ODE, SDE, DDE, DAE, PDE, jump diffusion, etc. methods which all come along for the ride. Making something compatible with a PP framework like Turing.jl is essentially just making it compatible with the AD framework the PP language uses (for derivative-based sampling). I write the differential equation solvers, and we were able to get PP working without a change to the PP or DiffEq libraries, which is quite a win in my book.
This might not answer your question directly, but it is perhaps useful to know that the probabilistic programming approach can also be applied in non-probabilistic (non-statistical) ways.
I developed a ranked programming language, which is like a probabilistic programming language without probabilities. Instead, you state how your program normally behaves and how it may exceptionally behave. Conceptually it's very similar to probabilistic programming, but the underlying uncertainty formalism is replaced with ranking theory, which works with integer-valued degrees of surprise.
You can find an implementation of this idea (based on Scheme/Racket) here:
"We focus on the case where we know what is normal and what is exceptional, but where the probabilistic meaning of these terms is unknown or irrelevant." Interesting stuff!
In a "non-ML" context, it's my favorite way to fit models. Even if I don't "need" it, I prefer it.
This is because I prefer to use Bayesian modeling techniques when I can, and PP is the best and most expressive method for implementing and fitting Bayesian models that I know of.