At our startup we are creating our own many core processor SiliconSqueak and VM along the lines of David Ungars work. Writing non-deterministic software is fun, you just need to radically change your perspective on how to program. For Lisp and Smalltalk programmers this outlook change is easy to do. We welcome coders who want to learn about it.
I find it curious - and slightly scary - that as the world is stampeding towards increasingly-parallelized computing models, most of us in ASIC design are becoming increasingly thwarted by the limitations of functional simulation - which, by and large - is pretty much single-threaded. I mean, we're supposed to be designing to keep up with Moore, and our simulator performance has pretty much flat-lined. And even more alarming, I've heard very few ASIC people even talk about it.
First of all, we try to circumvent simulating the ASIC design by debugging the design in FPGAs. We then simulate the working design in software on our own small supercomputer built with these FPGAs. Simulating on many cores and running the design in FPGAs should bring us to the point where we can make a wafer scale integration at 180nm. Imagine 10000 cores on an 8 inch wafer.
Our software stack uses adaptive compilation to reconfigurable hardware, so we can identify hotspots in the code that can be compiled to the FPGA at runtime. Eventually we will be able to write and debug the whole ASIC in our software at runtime on the FPGA.
Simulating a single core is not to hard because our microcode processors is small. The ring network connecting cores, caches, memory and four 10 Gbps off-chip communication channels are harder to simulate tough.
the way i imagine this working (please correct me if wrong) is that there is quite a bit of speculative computation, and that various things get thrown away. given that, what happens if you measure speed in terms of results per watt, rather than per second? does the per-watt value tend towards a limit, even as the per-second value increases? if so, will the re-introduction of determinism (via some new mechanism) be necessary to improve efficiency, down the line?
I have not encountered any need for throwing away results, speculative computation as you describe it. An example of programming non-deterministically is the swarm model (ants). You can for example write the text editor of a word processor this way, as described in http://www.vpri.org/pdf/tr2011004_steps11.pdf
Our SiliconSqueak processor is in an FPGA stage right now, we will make the first ASIC version this year. At this time I have no hard numbers on the watt/performance and price/performance of the ASIC. The energy use of the FPGA per result is an order of magnitude below that of an Intel x86 core and we expect the ASIC to be much better.
Regardless of the efficiency of the code, I see no need to re-introduce determinism down the line.