MIDI songs? I checked, I couldn't find any from the link you posted. Most were different module formats, like XM, Protracker, S3M, Impulse Tracker. Those have nothing to do with midi other than they also produce music.
At one point in time, (I think maybe in connection with some mobile phone being able to play .midi files?) MIDI songs was (incorrectly) referring to a style/type of music rather than the transport/protocol we use for sending notes between instruments/devices, or the file format.
I'm still since then always assuming the above when someone says "MIDI music"; they really mean "really basic/simple music" or just straight up "chiptune" sometimes.
It has nothing to do with MIDI really, just a misnomer.
Right. Basically using simple waveforms either using samples or onboard chip like ZX Spectrum's. For tracker modules with more "normal" samples, we simply referred to them as modules or mods for short.
IMHO, a “chiptune” is music for an FM synthesis chip, like on the NES, the SID chip in Commodore 64, or the AdLib sound card for PC. A “mod” or “tracker music” is music made for a range of platforms in a rather narrow time-band, that could play digital samples, but could not reasonably store entire songs recorded digitally, like the Amiga, Atari ST, or early PC’s like 386s or 486s.
Neither the NES nor the SID employs FM synthesis. I'm not even sure what the collective noun is for these. Wikipedia tells me it's PSG (programmable sound generator).
The same behavior could be (also was) teased out of a MOD player if you choose samples with a handful of sample points, like 12. You could also draw up a sawtooth in paint and use that as a sample. These are down-to-earth honest true Scotsman chiptunes.
I'd say 'MIDI music' became a catch-all for music that's represented as data that is in turn triggering samples, rather than being a pure audio file. Might be actual MIDI or might be tracker music etc.
No, those are the original formats. MOD (Soundtracker [0], Protracker, etc.), IT (Impulse Tracker), XM (FastTracker) and S3M (ScreamTracker 3) are all from the nineties.
As far as I've seen, local OSS video understanding models just really aren't there yet. I briefly looked at facial recognition models but a good amount of signal was actually in the video's audio instead of the raw video frames. Depends on the accuracy you're looking for at the end of the day.
reply