As far as I'm aware OpenCL performance, while portable, is still somewhat inferior to that of straight CUDA on NVIDIA GPUs, hence my CUDA suggestion. I didn't actually look at the algorithms used in either so I can't comment, perhaps I'll do that tomorrow. What do you mean by using an FFT "for" a convolution? The algorithm is convolving the (presumably 2D) FFT of the game board with a disk filter?
SmoothLife uses two disks around each point to determine the instantaneous rate of change. Inside each disk the values are summed. This is a convolution with a disk filter.
As it says on that page, FFT is often used for convolution because it is fast: after applying a discrete Fourier transform to the kernel and the image, the resulting images must only be multiplied together before applying an inverse FFT.
Ah, thanks for the information. I'm familiar with signal processing, so the only way I'm used to seeing convolutions is through the multiplication of FFTs. I wasn't even considering the "regular" way.