That video is fantastic I watched it a week or so ago.
Your explanation also explains why computer performed music is so off. It still has that uncanny valley effect. So when Sony had a computer generate a "Beatles-esque pop song", they still had a human perform and produce it. But at the point there's so much creativity and human-added value on top of it that I don't think its fair to call it computer generated imho.
yes. I can tell you a little more about that, too, since I used to research this stuff and think about it a lot still.
One of my models of music is an external model of a regulated system that parallels and trains our own habits and responses. E.g. a song demonstrates tension and release similar to our own lives. The level of tension in a song before release occurs can inform us how much tension which should accept before performing some release activity.
Music's rhythms also inform the pace of our work. E.g. verse-chorus-verse represents switching between two different activities. Even the pitch of a single note acts as a reference for the amount of intensity of a sensation we should use in our own lives. E.g. thrash metal listeners enjoy sudden shifts into massive intensity and hold it there. Dub step listeners are training themselves for unusual, but rather intense aesthetics leading up to disproportionate release. Classical music tends to be for "long-chain thinkers" tumbling ideas over from various perspectives, e.g. writers and politicans, doctors, not factory workers.
With that as a background, consider that a live instrument is also a physical system with a human controlling it interactively. The live system is a bit different every time. Here's the critical part: the human must listen and provide instantaneous feedback to a varying system in order to present the piece of music as a proper response model of a regulated system. If the player fails to do this, the model communicated by the performance is different.
In open-loop systems, such as a sequencer, there is no (or limited) interaction between the player and the sound, so an incidental model emerges. That incidental model represents an unintended and therefore most likely irrelevant model of how to interact with reality. e.g. it relieves tension where no relief was needed. It lingers too long on an idea, long after a human novelty-seeking circuit has starved.
Some people, e.g. in discussions of unstable filters like the TB-303, chalk up the variations as being different at every performance because the instrument is random... However, they're missing the closed loop portion of the performance, in which the performer reacts to the unpredictability of the instrument in order to maintain the model. In other words, the score and notes are not the music, but the performer's response to the environment the score sets up is the music.
To revivify your uncanny valley observation, the "unstable filter creates variations" crowd has a parallel in Perlin noise used to subtly animate human models to make them not look so dead. However, it's incomplete because they don't use (short-term) feedback to determine when the movement suffices to be convincing. That feedback is the essence of performance.
In theory, computer scientists could implement these feedback models in performance to make the sounds more realistic. They could be used in synthesis, but the playback would still require observation of the listener! Which is possible. Personally, I just prefer playing electronic instruments live over using sequencers. It's only the sounds of electronic music I like, the zaps, peowms, zizzes, pews, and poonshes, etc. I don't care for electronics/computers to perform for me.
Your explanation also explains why computer performed music is so off. It still has that uncanny valley effect. So when Sony had a computer generate a "Beatles-esque pop song", they still had a human perform and produce it. But at the point there's so much creativity and human-added value on top of it that I don't think its fair to call it computer generated imho.