As a concrete example, you might write a program expressing your beliefs about how the scenes you see come to be: some random objects are placed at some random positions, lighting sources are placed, and then a rendering algorithm converts that scene description into the light intensities perceived by your eye. If you write this up in a probabilistic programming language, you can then ask questions like: supposing I see this image, what are some likely scenes (positions and shapes of objects) that might have produced it? How does that distribution over scenes change if I know that the light source was here, or that there was a small chair there, or that my vision isn't that great and I should trust the image's pixels a little less? Section 1.3 of the introduction linked above lists many more applications for which probabilistic programming turns out to be useful: breaking CAPTCHAs, constrained procedural graphics, automatic program induction/synthesis, recursive multi-agent reasoning... and, as the authors write at the end of this section, "the list goes on."
Of course, solving the inference task in full generality, for any program and query that the user might write, is very difficult; you can design pathological cases where it boils down to solving the halting problem, and is therefore uncomputable. Different languages take different approaches to addressing this difficulty. They might limit you to writing certain kinds of programs for which a limited class of inference algorithms can always be made efficient; they might try to plan a fancy inference algorithm based on the specifics of your model; they might just expect you to learn which programs will be tractable and which won't; or they might provide abstractions that make it possible for you to safely experiment with writing new inference algorithms (or combinations of existing ones) specialized for your use case. Probabilistic programming is still a young field and this is an area of active research.
Sounds like there exists a device where I can write some program, run it over some inputs, pass the program and the outputs to the device and have it magically infer the possible inputs. This is basically logical programming. Then we add a probabilistical layer on top of that.
Maybe the probabilistic part hides some significant simplifications. Even relatively simple programs are nontrivial to express logically. See, for example, the pure treatment of arithmetic in http://okmij.org/ftp/Prolog/Arithm/arithm.pdf.
What kind of programs are practically feasible with Probabilistic Programming?
In (deterministic) logic programming, the default search strategy is backtracking. We search the space of "inputs that could have produced this output" exhaustively. Each set of inputs we try either works or doesn't, and doesn't tell us anything about whether similar inputs will behave similarly.
In probabilistic programming, we can use the information we have about probability densities to tell whether we're on the right track; as we search, we can tell if we're getting "warmer" or "colder." Suppose I have the simple probabilistic program:
a = normal(0, 1) # mean 0, stddev 1
b = normal(0, 1)
c = normal(a+b, 0.1)
There is research in probabilistic logic programming languages, which can be seen as combining some of the strengths of both paradigms, though that's not covered by the introduction linked in the OP.
If we were to delve into details a little further, being able to make predictions from a generative stochastic model would call for efficient samplers, and being able to infer parameters might call for good variational optimization schemes (and various other methods on the landscale of Bayesian inference).
by allowing for programming language constructs, you can define very rich and interesting data-generating distributions.
one can then execute the program to generate samples. more importantly, given observed data, one can efficiently "use Bayes" to update the priors.
this is usually computationally nontrivial due to the Bayes rule denominator aka "partition function", but there are various good algorithms to do it approximately. One of the most popular (used in Stan) is known as Hamiltonian Monte Carlo.
What's generally interesting is the general character of the posterior -- its modes, the shape of variance around these modes etc. High-quality sampling helps with this, whereas being about to calculate the PDF exactly at particular points sometimes does not.