In the training phase the network learns to predict the next note from the past few beats. As training data it uses a bunch of text formatted midi files. In generative mode, they just feed back the generated notes into the network at each step. The same approach can be used to generate DeepDrumpf[1] or any kind of time series.
I believe Karpathy Char-RNN is from 2015 (that is at least the year on the 100-line gist and the "unreasonable effectiveness of ..." blog post[0]) but char-level RNN language models dates back to at least 2011 with [1]
So I don't think we should call him the inventor, though he definitely popularised it with his great writing and examples.
[0] http://karpathy.github.io/2015/05/21/rnn-effectiveness/
[1] Sutskever, Ilya, James Martens, and Geoffrey E. Hinton. "Generating text with recurrent neural networks." Proceedings of the 28th International Conference on Machine Learning (ICML-11). 2011.
[1] https://twitter.com/deepdrumpf
Funny, this is DeepDrumpf bad mouthing Andrej Karpathy, the inventor of this algorithm (char-rnn):
https://twitter.com/karpathy/status/705558159964803072