Models do know things. Facts are encoded in their parameters. Look at the some of the interpretability research to see that. They aren't just Markov chains.
Nope. They don't know any specific facts. The training data produces a probability matrix that reflects what words are likely to be found in relation other words, allowing it to generate novel combinations of words that are coherent and understandable. But there is no mechanism involved for determining whether those novel expressions are actually factual representations of reality.
Again, read the papers. They absolutely do know facts, and that can be seen in the activations. Your description is oversimplified. It's easy to get models to emit statistically improbable but correct sequences of words. They are not just looking at what words are near by each other, that doesn't lead to the kind of output LLMs are capable of.
Exactly. People forget that we did make systems that were just Markov chains long before LLMs, like the famous Usenet Poster "Mark V. Shaney" (created by Rob Pike of Plan 9 and Golang fame) that was trained on Usenet posts in the 1980s. You didn't need deep learning or any sort of neural nets for that. It could come up with sentences that sometimes made some sort of sense, but that was it. The oversimplified way LLMs are sometimes explained makes it sound like they are no different from Mark V. Shaney, but they obviously are.