Technically, it's not pattern matching. It's estimating conditional probabilities and sampling from them (and under the hood, building blocks like QKV attention aka probabilistic hashmap and the optimization used decide what it does anyway, ignoring any theory behind it).