My startup (TidePowerd : http://www.tidepowerd.com) has a product called GPU .NET which JIT-compiles CIL (.NET bytecode) directly into GPU machine code; essentially, we've extended the .NET VM onto the GPU to make GPGPU coding as seamless as we possibly could.
We'll be releasing a new version next week with a much-improved API; if you're experimenting with GPGPU coding, please give it a try -- feedback is super-helpful in shaping the API into something that's both really powerful and really easy to pick up and start coding with.
Oh, and GPU .NET is written in F# -- which we don't support just yet for writing your GPU code, but we're hard at work to add that (likely around the end of November)!
For some things, like how your structs are organized/laid out, we haven't automatically optimized that yet -- but one advantage of using .NET (vs. native code like CUDA or OpenCL) is that the CLR specs allows a lot of freedom in implementation; so in the future, we could pretty easily implement some code to analyze your data layout / access patterns and reorganize things under the hood for better performance. All without you needing to rewrite your code, of course ;)
As time goes by though, and solidify the rest of our codebase, we'll be able to spend more time adding optimizations to the JIT compilers to get your code running as fast as the hardware allows, as often as possible.
"auto" lets the JIT compiler organize the fields in any order, with any padding bytes, etc. it wants to. This is the default, and changing it is basically a tradeoff between speed now (where the JIT compiler may not recognize where it can optimize something) and speed later (when we add a new optimization and your code automatically executes faster).
"sequential" requires the JIT compiler to layout the fields in the order they're defined, but it can add any padding bytes, etc. it wants to.
"explicit" forces you to specify the offset of each field, and forbids the compiler from re-ordering the fields or padding them in any way. It's rare to use this unless it's to handle interop'ing with a C library which uses some weird data structure as a parameter. You might get a speedup from using it in your GPU code, but since you've taken everything out of the hands of the compiler, there's little room for improvement/optimization.
Check out the docs on [StructLayout] for more info: http://msdn.microsoft.com/en-us/library/system.runtime.inter...