done! paste this  into shadertoy (and I de-lurked on HN after 3 years to do this; who knew)
nice effect, but it pains me that a multicore cpu implementation can be SO SLOW. modern pc's are fast, you know? not just the gpu... oh well.
true, true! and I apologise for sounding whingey before, I do not mean to rag on you or the OP (I know nothing about what is good/idiomatic haskell and how that relates to efficient haskell). but it still feels damn slow, multiple seconds to make that image!
to put money where my (gut's?) mouth is, the dumb transliteration of my webgl shader to C++, compiled by MSVC in release mode on my win32 machine, takes 100ms to compute a frame at 800x600, on a single core, with precisely no tuning or effort.
with #pragma omp magic, equivalent in pain to the OP's point about almost-free-parallelisation in ghc, I imagine that would drop to around 20ms on 8 cores. and if I used an SSE vector class, probably another 2x, but that could legitimately be disallowed as overly complex.
my point being, you're right, GPUs stomp over CPUs for this kind of work! but my gut told me that this image should not take long for 'even' a CPU to produce; 10 or 20ms without effort, sub millisecond with effort (bytes rather than floats, asm, etc)
maybe I'm just lamenting the abuse of our modern CPUs, which are fantastically fast machines, even for stuff that they are not designed to excel at, like this.
"Embarrassingly parallel floating point operations" "for the win". The things that GPUs are better at than CPUs are very poorly modeled by the words "dynamic" or "realtime". They are happy to do long-term batch computations (and getting happier about it), and there are plenty of dynamic real-time things they are bad at, because they involve lots of branching.