Definitely the easiest way I've seen to begin mucking around with GPU development.
My startup (TidePowerd : http://www.tidepowerd.com) has a product called GPU .NET which JIT-compiles CIL (.NET bytecode) directly into GPU machine code; essentially, we've extended the .NET VM onto the GPU to make GPGPU coding as seamless as we possibly could.
We'll be releasing a new version next week with a much-improved API; if you're experimenting with GPGPU coding, please give it a try -- feedback is super-helpful in shaping the API into something that's both really powerful and really easy to pick up and start coding with.
Oh, and GPU .NET is written in F# -- which we don't support just yet for writing your GPU code, but we're hard at work to add that (likely around the end of November)!
For some things, like how your structs are organized/laid out, we haven't automatically optimized that yet -- but one advantage of using .NET (vs. native code like CUDA or OpenCL) is that the CLR specs allows a lot of freedom in implementation; so in the future, we could pretty easily implement some code to analyze your data layout / access patterns and reorganize things under the hood for better performance. All without you needing to rewrite your code, of course ;)
As time goes by though, and solidify the rest of our codebase, we'll be able to spend more time adding optimizations to the JIT compilers to get your code running as fast as the hardware allows, as often as possible.
"auto" lets the JIT compiler organize the fields in any order, with any padding bytes, etc. it wants to. This is the default, and changing it is basically a tradeoff between speed now (where the JIT compiler may not recognize where it can optimize something) and speed later (when we add a new optimization and your code automatically executes faster).
"sequential" requires the JIT compiler to layout the fields in the order they're defined, but it can add any padding bytes, etc. it wants to.
"explicit" forces you to specify the offset of each field, and forbids the compiler from re-ordering the fields or padding them in any way. It's rare to use this unless it's to handle interop'ing with a C library which uses some weird data structure as a parameter. You might get a speedup from using it in your GPU code, but since you've taken everything out of the hands of the compiler, there's little room for improvement/optimization.
Check out the docs on [StructLayout] for more info: http://msdn.microsoft.com/en-us/library/system.runtime.inter...
On the other hand, as CPU's get more/beter vector units what's to stop a rep add or rep mul instruction from automatically vector/parallelizing things for you?
Now, what follows is just my opinion: I think in the (relatively near) future you'll see a lot of data-crunching, high-performance code switching away from C to some of the newer functional languages, or even back to some older languages like FORTRAN -- it's easier to express certain kinds of data-parallelism in those languages, which makes less work for the developer while also making it easier for the compiler to generate optimal, vectorized code.
We need more of these technical links on HN.
Did it? I've been here for over 1300 days and I don't remember it ever being called Startup News.
I guess my memory isn't what it used to be:)
Keep down voting tho because of not liking the truth being said about how this site is turning more into what I stated previously, or better yet emo posts about how xyz startup is failing and the founders want to know why people don't like them