I have some questions (mainly to improve my own understanding): 2. Since data is...

I have some questions (mainly to improve my own understanding):

2. Since data is MIDI-encoded, would a convolution hold any merit here? I suppose you could render to an mp3 and analyze the audio itself but that seems very computationally expensive and prone to overfitting. 3) If we're training a scoring classifier, we would need labeled data, but getting those labels seems very challenging, not least because of how subjective our impressions of melodies can be (for instance, the opinions of a fan of atonality would be drastically different from a fan of pop). Do you have any ideas on how to mitigate this?