My next iteration of that project was going to be to create NTSC output instead of digital TTL... so I'm pretty excited to see this, as the DAC on the dev board I used should be able to create "artifact colors" like this I think
As with other non-composite CRTs, the 5154 doesn't actually have a pixel clock so you don't need to output pixels at exactly 14.318181MHz - the timings of the horizontal and vertical sync signals are the critical thing. Using a 16MHz or 13.714MHz pixel clock would probably be more convenient for a 96MHz microcontroller.
Composite NTSC will create some interesting new challenges, though. I suspect the fact that some of your pixels are 6 cycles and some are 7 cycles will cause visible banding. It might be possible to eliminate this by taking your 14.318MHz source data and doing a proper band-limited resampling of it to 16MHz or 13.714MHz instead of the nearest-neighbor interpolation you're essentially doing at the moment. You'd probably need an output DAC with more than 1 or 2 bits of dynamic range to do that, though. One of my pending projects is to do something very similar, generating composite output from a VGA card (pixel clock 14.161MHz).
If I'm reading this correctly, they basically took advantage of the Nyquist frequency[1] of the NTSC carrier signal to push out frequencies unsupported by CGA?
Not exactly... The Nyquist frequency is half of the sampling rate of a digital signal. There isn't really any sampling going on here. The CGA does generate a signal with a pixel rate of 14.318MHz, so the Nyquist frequency of that is 7.16MHz. But that discreteness it not directly related to the color carrier frequency of 3.58MHz - inside a (sufficiently old) composite monitor, all the processing is analog.
As for whether this method of generating colour was supported by IBM - as far as I can tell, it wasn't. There is no mention of it in IBM documentation, and the artifact colours in later IBM composite machines (in particular the PCjr) were different. That didn't stop games programmers from using it though (at least in the classic and reasonably consistent 16 colour variant).
It's surprisingly straightforward - take your input composite signal C and multiply it by a sin wave and a cosine wave (both of frequency 3.58MHz). Then take the three resulting waveforms (C, Csin(f) and Ccos(f)) and filter them to remove frequencies of 3.58MHz and above. If your sample rate is 14.318MHz (4 pixels per color carrier cycle) then any filter kernel that is a polynomial of (1,1,1,1) will work for this. I use (1,4,7,8,7,4,1). The resulting three waveforms are Y, I and Q which you can plug into the matrix at http://en.wikipedia.org/wiki/YIQ to get RGB. You'll need to fix up the phase to get the right colours (this is what the color burst pulse does in the actual hardware). You might have to swap sin and cos as well, I'm just writing this from memory.
This is basically correct but the I and Q signals need to be filtered much more drastically (down to a bandwidth of about 1.5 mhz) to avoid artifacts, and more luma detail can be obtained by using a comb filter (summing adjacent lines to cancel the color signal since the phase on each line is shifted by 180 degrees from the previous).
This is true for proper standards-compliant NTSC signals, but a comb filter won't help you for CGA since the CGA card generates an integer number of color carrier cycles per scanline (228, not 227.5). So the phase isn't reversed each line and there's no way to get any extra detail.
It's cool to see this property exploited to such an extent.