
Stealthy startup Soft Machines launches virtual CPU cores - kjhughes
http://www.pcworld.com/article/2838018/stealthy-startup-soft-machines-launches-virtual-cpu-cores-that-trounce-traditional-processors.html
======
lgeek
I'm going to remain skeptical until they publish more details. The report[0]
linked from their homepage seems to contain more/better information, but it's
not much.

What seems pretty clear is that they use a proprietary ISA and have developed
dynamic binary translators for ARM and x86 code. It's not clear if it's a VLIW
architecture or if the ISA has any other properties required by the hardware.

From this report it sounds like they're essentially doing thread-level
speculation in hardware.

Based on the linked article I would have been tempted to think they have
reconfigurable pipelines, but based on the report I'm somewhat sure it was
just a misunderstanding.

I generally think it's a bad sign when a company implements relatively well
known concepts from research and then doesn't use the standard terminology and
tries to pitch their creation as something 100% new and original. In any case,
there's no need to rush a judgment, I guess they'll publish better technical
information in time.

[0] [http://www.softmachines.com/wp-
content/uploads/2014/10/MPR-1...](http://www.softmachines.com/wp-
content/uploads/2014/10/MPR-11303.pdf)

~~~
ChuckMcM
Came here to say exactly the same thing. I get that you could create a fabric
of hard cores which could be combined into a complete execution pipeline, and
I love the idea of imagining something like an FPGA with integer execution
units rather than complex logic blocks as building blocks. But it feels like
the Lego Technic kit that builds a motorcycle, sure you can build anything you
want with the parts, but the motorcycle is the only thing that makes sense. In
the same way I wonder if a programmable fabic layered over a series of chip
building blocks wouldn't resolve down to a 'best' (or 'least bad') solution
and nothing else, at which point why not just add a metal mask and make the
chip non programmable?

------
graycat
The article didn't make very clear the differences between Soft Machines and
some now quite old work by Michael J. Flynn on _universal host machine_ and
some work by Kemal Ebcioglu on using _very long instruction word_ (VLIW) on
code for nearly any instruction set.

Last I heard, Ebcioglu's execution timing simulations were getting 9:1 speedup
on IBM's 370 code via 24-way VLIW.

My favorite old idea was to find and offer some programming language
constructs that, in their implementation, could make good use of multiple
threads without the programmer having to consider multiple threads.

~~~
sharpneli
VLIW in reality was actually worse (see Itanium as an example) than the
currently favored superscalar out of order execution with register renaming.
Both of these techniques try to take advantage of instruction level
parallelism inherent in pretty much all code.

The reason for failure is that VLIW is basically stuck with whatever the
compiler decides can be done. OoO does it dynamically. In the end the only
real drawback with OoO vs VLIW is the chip area and increased power
consumption required for the logic.

~~~
tittat
Yeah. One would wonder if we werent so stubborn with the way we write our
programs what kind of performance we can get from our processors.

~~~
CyberDildonics
Well, if programs are created to process simple instructions on lots of data
there are huge speedups to be had on current x86 processors. By doing this
memory latency is no longer a bottleneck and SIMD instructions can be used
much more often. If unnecessary heap allocations have already been taken out,
structuring a program like this can result in very substantial speedups.

------
markrages
I've seen this movie before. The next step is to hire Linus Torvalds.

~~~
epoxyhockey
For those that don't get the reference:
[http://en.wikipedia.org/wiki/Transmeta](http://en.wikipedia.org/wiki/Transmeta)

~~~
melling
It's also covered in the article:

"Industry skepticism is likely. The notion of abstracting software from chip
hardware has been tried by companies such as Transmeta, a startup born in the
mid-1990s that labored for years in secret on technology based on translating
computing instructions in novel ways. The startup ultimately failed."

------
cfallin
The idea of breaking a single thread into parallel parts automatically at
runtime has been thrown around in academia for a while -- anyone interested
should search for "dynamic multithreading" and "thread-level speculation". So
industry is going to be (rightly) skeptical. _But_ if they've managed to build
something that actually works, and on real code (not just toy benchmarks),
this is a huge accomplishment.

~~~
sharpneli
Modern CPU's do this on very limited level by basically looking at the
instructions and executing those in parallel which have no dependencies with
eachother.

The thing I'm wondering the most is that how on earth can they get so many
instructions per clock with short pipeline? Without knowing the details on how
they compiled the SPEC benchmark it's really hard to say. Who knows, maybe
they cheated and ran parts of the benchmark parallel on their cpu and not on
others, with "It's the natural way for this chip!" as an excuse.

~~~
lgeek
What superscalar and out-of-order processors are exploiting is instruction
level parallelism, while their technology seems to use thread level
speculation.

> Without knowing the details on how they compiled the SPEC benchmark it's
> really hard to say. Who knows, maybe they cheated and ran parts of the
> benchmark parallel on their cpu and not on others

That's most likely the case, but I wouldn't consider it cheating. As long as
from the software perspective only a single thread is running (and SPEC CPU
2000 and 2006 are single threaded), I think it's fair game. The whole point of
their project is to expose parallelism without requiring the programmer /
execution environment to explicitly support it.

~~~
sharpneli
> seems to use thread level speculation

If the software is compiled into a normal single threaded program then what
else there is left except instruction level parallelism?

And if you can compile it to work with two threads then we have Hyperthreading
to take advantage of that even with a single core.

Their [Soft Machines] latest patent is about basically an OoO method in
overdrive, it abuses only instruction level parallelism. And based on that
their claim that their pipeline would be short is not really valid.

~~~
cfallin
Thread-level speculation _is_ exploiting ILP. In some contexts it's compiler-
assisted but in this context I would imagine it is fully microarchitectural
(i.e., in hardware/translation firmware, running a single instruction stream
of user code, invisible to the user). Dynamic multithreading (Akkary 1998?)
did this by splitting the thread at predictable points like function
calls/returns and (IIRC) backward branches. So it's still ILP within a single
thread, but at a much longer distance than what an OoO scheduling window
provides.

------
philliproso
Not sure if the article did not do into enough detail, or I am not technical
enough in this field. But would this be able to bypass python and other
interpreters GIL. ie "VISC’s secret sauce" is to hook into raw machine code
instruction set and determine if it is changing anything in other
cached/executing instructions, and if it is not pipe it to another core. Thus
if Python issues instructions it can immediately determine if it can be piped
to multiple cores.

------
rwmj
Semiaccurate have a bit more information. I'm still skeptical that the magic
box which turns serial code into threaded code without any overhead from
Amdahl's law can be done. [http://semiaccurate.com/2014/10/23/soft-machines-
breaks-cove...](http://semiaccurate.com/2014/10/23/soft-machines-breaks-cover-
visc-architecture/)

------
jewel
Can someone help my ignorance? I understand the appeal of automatic
parallelization, but what is the advantage to creating your own chip? It seems
to me that this is a translation that could be done in software either at
runtime or at compile time.

Trying to launch line of processors, even without any of the translation
magic, seems like a very difficult venture all by itself.

~~~
wmf
The auto-parallelization requires new hardware features that don't exist in
existing processors.

~~~
sharpneli
I'd really love to know what those might be. Unfortunately Soft Machines don't
tell any details.

EDIT: Jackpot! Google patent search to the rescue. Based on their patents in
the last few year (latest one was published March 2014) one can get an
understanding what the fuss is all about. Have to read those trough today.

------
dbancajas
Wow. This is like the holy grail of processor performance optimization. It's
so weird that 1000's of PhDs in the academia have not been able to solve this
yet 250 engineers can. If this were true, it will drastically alter the course
of research in computing. I'd see "low-hanging fruit" optimizations in ISCA,
MICRO, ASPLOS, HPCA and the likes in the coming years. Again, assuming this
technology will deliver.

------
3327
another great informative article full of details by the WSJ.

Here is a better article:

[http://www.pcworld.com/article/2838018/stealthy-startup-
soft...](http://www.pcworld.com/article/2838018/stealthy-startup-soft-
machines-launches-virtual-cpu-cores-that-trounce-traditional-processors.html)

~~~
dang
Ok, we changed the url to that from
[http://blogs.wsj.com/digits/2014/10/23/secretive-startup-
unv...](http://blogs.wsj.com/digits/2014/10/23/secretive-startup-unveils-
universal-chip/). Thanks!

------
msoad
Multithreading shouldn't be my job as a programmer. I love the idea. If this
takes off and become standard it's going to save us from dealing with
multithreading issues.

