It's useful any time you are accessing higher-rank arrays in a spatially coherent manner without any directional bias.
Bit interleaving only works with power-of-two dimensions. That's limiting. But for the cache benefits you don't need self-similarity at all levels. For less regularly sized arrays, you can play this trick with only some slices of lower bits, and then it's usually called tiling.
Scientific computing people also do this whenever they want to lay out a matrix in memory so that it efficiently supports both row-coherent and column-coherent access, which is required for basic operations like matrix multiplication. You can formulate tiled matrix multiplication in terms of smaller tile vs tile multiplications.
The earliest GPUs supported only square power-of-two-sized textures using swizzling. Later ones like the GeForce FX supported non-square, non-power-of-two textures using a simple memory unit with tiling. But the total number of tiles in use was a limited and non-scaleable resource, and once you exceeded the limit you fell off a performance cliff. Finally, the last two generations of GPUs (starting with the G80 microarchitecture in NVIDIA's case) have a more scaleable multi-level tiling approach that combines benefits of both swizzling and tiling although with higher hardware complexity than either of them.
It's useful any time you are accessing higher-rank arrays in a spatially coherent manner without any directional bias.
Bit interleaving only works with power-of-two dimensions. That's limiting. But for the cache benefits you don't need self-similarity at all levels. For less regularly sized arrays, you can play this trick with only some slices of lower bits, and then it's usually called tiling.
Scientific computing people also do this whenever they want to lay out a matrix in memory so that it efficiently supports both row-coherent and column-coherent access, which is required for basic operations like matrix multiplication. You can formulate tiled matrix multiplication in terms of smaller tile vs tile multiplications.
The earliest GPUs supported only square power-of-two-sized textures using swizzling. Later ones like the GeForce FX supported non-square, non-power-of-two textures using a simple memory unit with tiling. But the total number of tiles in use was a limited and non-scaleable resource, and once you exceeded the limit you fell off a performance cliff. Finally, the last two generations of GPUs (starting with the G80 microarchitecture in NVIDIA's case) have a more scaleable multi-level tiling approach that combines benefits of both swizzling and tiling although with higher hardware complexity than either of them.
My coworker wrote a great blog post on fast multi-level tiling in software: http://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-s...