Forgive my ignorance, but do you have any good references to multi-threaded cellular automata implementations? This is something I've been dabbling with myself.
You have two grids. Clear grid 2. Go over grid 1, calculate the new state for the current cell, write it to the second grid. At the end of a step you swap the grids.
Just slap on a good old "#pragma omp parallel for" with OpenMP (a nice guide is here [0]).