Two additions to this:
- AFAIK, the probability distribution is based on how common any particular adjacency is in the input. If the input is just rules rather than a small sample map, they'd all be equal, but if the input is an image that shows (for example) that only one in ten roads dead-end, that would carry over to generated stuff.
- It can also do an "overlapping" model, where it sets one pixel at a time but considers overlapping tiles of 2x2 or 3x3 pixels. This breaks away from the grid structure, but requires a full sample image, not just a list of tiles.