There's very little difference between a contextual predictive model like this and the guts of a compressor.
If your prediction is good enough that you can always come up with two possible predictions for each character, each of which has a 50% chance of being correct, then obviously you can compress your input down to one bit per character by storing just enough information to tell you which choice to pick. More generally, you can use arithmetic coding to do the same thing with an arbitrary set of letter probabilities, which is exactly what you get as the output of a neural network.
When the blog post says the model achieved a performance of "1.57 bits per character", that's just another way of saying "if we used the neural network as a compressor, this is how well it would perform."
It's a compression of Wikipedia in the sense that the NN generates probability estimates of the next character given the previous; the gibberish is simply greedily asking the NN repeatedly what the most-likely next character is. However, plug it into an arithmetic coder and start feeding in an actual Wikipedia corpus, and hey presto! a pretty high performance Wikipedia compressor, which works well on Wikipedia text but not so well on other texts (like this one, with its lack of brackets).
[1] http://prize.hutter1.net/