
Intel Bringing SIMD to JavaScript - deathtrader666
https://01.org/blogs/tlcounts/2014/bringing-simd-javascript
======
jmpe
Anyone experimented with this yet? I'd like to know how this is resolved when
the architecture doesn't support Intel's SIMD approach, they map the objects
pretty close to the instructions (SIMD.float32x4.sub and the likes).

I'm trying to figure out what happens when you port this to ARM NEON, and how
you catch it with architectures that don't support NEON (they often lack them
in Marvell and Allwinner).

~~~
sunfish
I'm a Mozilla engineer involved in this. NEON support is very important and
we're designing the spec to support it well.

CPUs that lack SIMD units can support the functionality (though not the
performance of course), and there's even a polyfill library that can lower
this API into scalar operations for SIMD-less browsers too.

~~~
IvanK_net
It would be great, if you could detect SIMDable operations in classic JS (e.g.
in loops) and use SIMD for interpret them. I think that adding low-level
features into a high-level language is not good practice.

~~~
sunfish
We will probably do that too at some point, but it won't replace explicit
SIMD, just as widely-available auto vectorization support in C++ hasn't
eliminated the need for explicit SIMD extensions there either.

One thing to keep in mind is that most programmers probably won't want to use
this feature directly; it'll be used in libraries that expose higher-level
APIs. It's still true that every feature we add increases overall clutter, but
SIMD seems sufficently useful and sufficiently self-contained that it's worth
the tradeoff.

------
ronjouch
Holy hell, Gary Bernhardt was right all along and the future will be METAL...
[https://www.destroyallsoftware.com/talks/the-birth-and-
death...](https://www.destroyallsoftware.com/talks/the-birth-and-death-of-
javascript)

------
rasz_pl
Wonder what native SIMD support can do to projects like MPEG decoders in pure
javascript.

MPEG1 in javascript is pretty fast and runs at native speeds on phones

[https://github.com/phoboslab/jsmpeg](https://github.com/phoboslab/jsmpeg)

but MPEG4/H.264 is not

[https://github.com/mbebenita/Broadway](https://github.com/mbebenita/Broadway)

~~~
X-Cubed
The H.264 demo runs at 60+ FPS on the main thread on Chrome 37 (dev) on
Windows for me. Very impressive.

------
IvanK_net
I think their effort would be much useful, if they focus on WebCL. It is
already standardized (unlike their "SIMD" object). CPU implementation of
WebCL, that utilizes SIMD, would probably offer much better performance, than
any current Javascript engine.

~~~
azakai
WebCL is not yet standardized, there is still ongoing discussion about that
last I heard.

In any case, the two are not mutually exclusive, different people are working
on each.

~~~
IvanK_net
The version 1.0.0 was standardized in March 2014 -
[https://www.khronos.org/registry/webcl/specs/1.0.0/](https://www.khronos.org/registry/webcl/specs/1.0.0/)

~~~
azakai
Oh, interesting, I wasn't aware of that!

Do you know what the implementation status and intention to implement is,
among browsers?

~~~
IvanK_net
WebCL implementations for Firefox and Chromium exist for 2 years. But they
have not been included in any alpha, beta or canary release. Browser
developers don't seem to care about WebCL, but they do care about WebGL, which
is strange.

Samsung made it work in Tizen, but sadly, it is not very widespread OS.
[https://www.youtube.com/watch?v=TurCVdaUTMY](https://www.youtube.com/watch?v=TurCVdaUTMY)

------
twotwotwo
Stuff like this, and asm.js, and WebKit's crazy LLVM-based DFG optimizations,
all lead me to think Native Client will end up being looked at as a
transitional, niche tech: you keep tuning the JS engine until it's not
unacceptably slower than native for asm.js-type code, and you keep hooking
webpages up with more and more native capabilities and code (graphics, SIMD,
crypto/compression), and eventually very few NaCl use cases are left. I don't
see other vendors getting on the NaCl bandwagon because they don't want to
depend on Google's code and it's a lot of work to implement--so maybe NaCl
ends up remembered as the toolchain some companies used to port some apps to
Chrome OS and that's mostly it. Sort of a shame; I _like_ NaCl, just don't see
a great path for it compared to iterating on existing technologies.

~~~
zbowling
Threads... Native Client has threads. Javascript does not. I'm writing a
scheduler for Emscripten that supports pthreads but it's crazy because it's
still single threaded and doesn't look like readable javascript like it used
to output. Bending over backwards. ...and don't get me started on web workers.

~~~
erid
Please start, I'm curious as to why web workers doesn't fit the needs like
using threads, I have a fairly basic knowledge of web workers and not much
practice with them, but I would like to know if possible what are the
limitations.

~~~
zbowling
Javascript is inherently single threaded. It goes back to it's roots. It would
extremely difficult to make it multithreaded. Apps developed in Javascript are
heavily asynchronous and avoid blocking for too long because of this
limitation. In some ways this is good.

In the right situation, for example in node.js, this discourages anyone from
doing anything that would block the single thread handling requests for very
long and makes it easier to handle the C10K problem in a much more interesting
way (where traditional servers create a thread per request or queue them to
handle one by one from a pool, where node does it all from a single thread).
In other ways this can be bad it causes (in the case of Node developers)
developers to not do a lot of the heavy lifting in Javascript itself and to
hand off to native code that can do long hard things that may block on real
threads (database queries, persistence backends, file IO).

The thing is that some code is easier to write with threads in mind. If I know
there are 4 processes on a system, I can make 4 threads and use effectively
use all for processors (potentially) at the same time. This is not really
possible with Javascript. This makes heavy apps on a device like the Firefox
phone (which is all about Javascript only) that want to could really benefit
from using multiple cores impossible to take advantage of them (unless the
user uses webworkers).

There are also just an epic amount of legacy with apps that are inherently
built around the concepts provided by threads. Javascript is basically
becoming the bytecode of the web. Things like ASM.js and emscripten are
pushing it this way. But it's not really easy to port all code and fit on a
model where threads are not allowed.

Conceptually web workers are more like processes than threads. It's a shitty
work around in Javascript since we can't easily have threads without breaking
the way Javascript works (which would break the web potentially). That means
we have to write code into different processes where everything is isolated.
On top of that, you can't share memory and you have to message and marshal
data between webworkers. This is awkward and impossible to target from
something like emscripten making it near impossible to port something
expecting threads exist (theoretically if you have a C app that didn't use
threads but different processes, you fit better but this almost never heard
of). Most operating systems provide threads (pthreads most often except on
Windows).

My solution is to make a scheduler that is used by my fork of emscpriten. All
functions use return on call semantics and a "scheduler" in Javascript will
decide which function should be called next from a list of virtual thread
"stacks" (arrays) or if it should yield to the browser (setTimeout). The
"scheduler" is single threaded itself so it's not really "threads" but does
make it possible to emulate them for porting from native code.

------
cjreyes
Would these changes eventually make their way into node.js (through V8)?

~~~
mantrax5
Considering the architecture of Node.JS is not suited to compute-heavy tasks,
I wonder what kind of code are you willing to optimize in Node.JS with SIMD?

~~~
nevi-me
Any improvements are still welcomed of course. There are a number of
people/entities that are building desktop apps on Node and those tend to do
'compute-heavy tasks', a developer whose piece of code runs in a mean of 10
seconds would also welcome a possible optimisation to run it in less than
that.

Not knowing much, I think it'll be interesting to see how general purpose
applications would benefit from SIMD if it's accessed from a higher level.
Does that mean that if I want to loop through 103 items and run arithmetic
operations on them I'd have to do the following, (let's say I'm multiplying
each item in items[] by 2, and items.length % 4 !== 0):

    
    
      var batch = [], 
        results= [], 
        i = 0, 
        j = 0, 
        len = items.length, 
        a, b = SIMD.int32x4(2, 2, 2, 2), 
        c;
      var mod = len % 4;
      items.forEach(function (item) {
        if (i < mod) {
          results.push(item * 2);
          i++;
        } else if (j < 4) {
          batch.push(item);
          j++;
        } else {
          a = SIMD.float32x4(batch[0], batch[1], batch[2], batch[3]);
          c = SIMD.float32x4.mul(a, b);
          results.push(c.x);
          results.push(c.y);
          results.push(c.z);
          results.push(c.w);
          batch = [];
          j = 0;
        }
      });
    

Of course this is the interpretation of a non-CS graduate who taught himself
JS, some of the stuff mentioned at
[https://01.org/node/1495](https://01.org/node/1495) seems a bit over my head.
It'd be great if V8 would (unless it already does) transparently handle
creating SIMD-optimised code where one is looping through an array or the like
instead.

(edited to fix code, hopefully)

~~~
kevingadd
That kind of 'pack non-SIMD into SIMD, do SIMD op, unpack back into non-SIMD'
thing tends to be slower than just doing non-SIMD ops in most cases.

You'd want to convert your non-SIMD data into a big stream of SIMD data up
front, then do lots of operations on it, and then after that perhaps unpack
it. Most SIMD scenarios just keep data in SIMD format indefinitely.

(Sometimes a compiler can use SIMD operations on arbitrary data by maintaining
alignment requirements, etc. That sort of optimization might be possible for
the JS runtime, but seems unlikely for anything other than typed arrays.)

------
bwhthd
I was doing research on parallel computing last summer. Does anybody know if
this SIMD object is similar to the ParallelArray object Intel made in
Rivertrail? Or are there any similarities between the two libraries?

------
iamsalman
Yet another parallel framework. Without the 3rd party eco-system of APIs for
matrix math, any framework is doomed to just add noise, not value. Sure
there's _some_ benefit in getting marginal speedup on some algorithms but for
real speedup, you need to know the parallel architecture of the processor
(GPU, CPU or APU) which means a learning curve. The GPGPU industry has been
trying since long to abstract away the fine details and offer a plug-and-play
kind of easy-to-learn framework but then we suffer performance losses and it
really doesn't make sense to invest in GPUs for the kind of performance gains
you get with these high-level APIs.

~~~
AshleysBrain
It's not parallel, a framework, or a GPU feature. It's single-instruction-
multiple-data (SIMD) which is used to speed up single threaded execution on a
CPU when working with lists of numbers.

~~~
LnxPrgr3
My understanding is architectures are different enough that the fastest SIMD
strategy is sometimes CPU-dependent.

The author of FFTS, for example, chose a different strategy on ARM than
x86_64:
[http://anthonix.com/ffts/preprints/tsp2013.pdf](http://anthonix.com/ffts/preprints/tsp2013.pdf)

He found himself writing the NEON code in assembly entirely by hand because
vector intrinsics didn't even expose CPU features he wanted to use—even in C,
where vector intrinsics are CPU-specific.

Having access to SIMD is definitely better than not having it, but it really
should be paired with good optimized implementations of things like BLAS and
FFT libraries.

------
higherpurpose
How will this run on AMD chips?

~~~
4ad
I don't know, but why would there be any difference?

~~~
viraptor
Probably because of Intel's history of breaking code on AMD. Like this:
[http://www.swallowtail.org/naughty-
intel.shtml](http://www.swallowtail.org/naughty-intel.shtml)

~~~
mzs
Having dealt with cpuid, there were plenty of caveats with how AMD did things
vs how Intel did things. I can totally see Intel first checking to see if it
was Intel and then punting just due to complexity and testing. Keep in mind
that this article mentions p4 and athlon, so Intel would have also have had to
care about Cyrix and Transmeta as well, which were different as well.

------
api
Flying saucer, meet duct tape. Duct tape, meet saucer.

------
mantrax5
"There is encouraging evidence that SIMD will enable a whole new class of
application domains and high-performance libraries in JavaScript."

Anyone take a guess what those might be (honest question)?

In the past SIMD has been the primary way to accelerate audio and graphics
related compute tasks, but with WebGL and shaders, JS users already have a
very powerful vector processing unit at their fingertips.

~~~
ahoge
Check "Bringing SIMD to the Web via Dart" by John McCutchan:

[https://www.youtube.com/watch?v=CKh7UOELpPo](https://www.youtube.com/watch?v=CKh7UOELpPo)

He's also the guy wrote the proposal for ECMAScript:

[https://github.com/johnmccutchan/ecmascript_simd](https://github.com/johnmccutchan/ecmascript_simd)

Also, using SIMD is way easier than using shaders. You just write something
like this:

    
    
      double average (Float32x4List list) {
        var n = list.length;
        var sum = new Float32x4.zero();
        for (int i = 0; i < n; i++) {
          sum += list[i];
        }
        var total = sum.x + sum.y + sum.z + sum.w;
        return total / (n * 4);
      }
    

Instead of:

    
    
      double average (Float32List list) {
        var n = list.length;
        var sum = 0.0;
        for (int i = 0; i < n; i++) {
          sum += list[i];
        }
        return sum / n;
      }

~~~
general_failure
I am still looking for practical examples. Unless there is a usecase for
averaging gazillion numbers on the client (I would like to know what the use
case for that is)

~~~
spankalee
There's a skeletal animation demo that was made to show off Dart SIMD support.
The bottleneck was the animation, not the 3D rendering, and using SIMD allowed
almost 4x the number of characters to be drawn.

~~~
yoklov
This is _exactly_ what people should use WebGL for. Software skinning sucks,
esp. if you can avoid it (e.g. with a vertex shader).

(That said, there are plenty of other areas for using SIMD in JS)

------
pekk
Why not bring SIMD to Lua or Python? Or Lisp, or Haskell? Why not just bring
SIMD on the web to anyone who wants it?

Will we have to explain to our grandchildren why they can only write code in
some flavor of Javascript? Will it make sense, then, to wave our hands at
ActiveX? Or do we have any better ideas for the future?

Mozilla's mission to protect the web from all languages but Javascript is
locking us into a future where there will be no choice and nothing better than
Javascript, because there will be neither an audience nor hardware support for
anything but Javascript.

~~~
dragonwriter
> Will we have to explain to our grandchildren why they can only write code in
> some flavor of Javascript?

No, because while JavaScript is the "common" language of the web, its not the
only language you can code in on the web -- there are plenty of
implementations of other languages with it as a compilation target.

The value of having a guaranteed-to-be-everywhere _target_ language for the
web, even if it isn't the preferred _development_ language of every developer,
are fairly obvious.

> Mozilla's mission to protect the web from all languages but Javascript is
> locking us into a future where there will be no choice and nothing better
> than Javascript

As long as JavaScript keeps getting better -- and, particularly, as long as it
is spurred on in that by efforts which propose alternative standard languages
for the web with compelling stories so that JS _has_ to keep moving forward in
order to be acceptable as the universal, guaranteed target language -- that's
fine. "Nothing better than JavaScript" isn't a real limitation if JavaScript
is a moving target.

~~~
gfxmonk
> "Nothing better than JavaScript" isn't a real limitation if JavaScript is a
> moving target

JS is only a "moving target" in the sense that stuff is being added to it. If
you could make a perfect language by just adding things, then we'd be fine.

But the nature of the language itself is not going to change, because that
would break backwards compatibility. The type system, prototype inheritance,
`this`, type coercions, etc. There are plenty of undesirable things in JS
which we're stuck with (unless we break compatibility, in which case it might
as well be a different language).

------
CmonDev
"without the need to rely on any native plugins" \- if I need a specific
version of specific browsers (which it is), than it's no different (though
sand-boxed, which plugins could be as well).

~~~
azakai
Well, every new web feature will only work on new enough browsers. That was
true for HTML5 video, for new HTML5 input tags, ES5 stuff in JS, etc.

The difference though compared to plugins, is that

1\. Plugins must be installed by the user.

2\. Plugins are binary and usually do not exist for all OSes/browsers (e.g.,
no more Flash for linux, Silverlight never worked on linux, etc.)

3\. Plugins are nonstandard, created and controlled by a single corporation,
typically not open source, etc.

~~~
_random_
"new enough browsers" \- will work automatically in mobile Safari? Cool!
Unless we are talking about _specific_ browsers rather than just new ones.

1\. Browsers are installed by the user.

2\. Browsers are binary too. It seems Mozilla prefers C++/Rust to build them
rather than JavaScript.

3\. New web tech is mostly developed by Mozilla, who is de-facto paid by
Google.

~~~
azakai
1\. Browsers are often preinstalled for the user (especially but not only, on
mobile Oses). Also, many browsers auto-update, so the new features will be
automatically available eventually, without the user installing anything

2\. The point is that while browsers are binary, they can then run a huge set
of portable apps. As opposed to all those apps not being portable.

3\. Everything Mozilla does is open source, and not just Mozilla but also most
browsers today are open source - Chromium and WebKit in particular. This is a
very open space.

~~~
CmonDev
"This is a very open space" \- then why is it locked to one legacy scripting
language of some sort of crappy transpilation they themselves don't use? It is
effectively closed for new languages.

~~~
azakai
First, adding more options takes work. People need to volunteer to do that
work, and prove that adding more VMs to the web can be effective (there are
many technical challenges, like cross-VM garbage collection, sandboxing
issues, etc.). People simply haven't shown this is practical yet.

But, people have meanwhile shown that cross-compiling to JS is practical, from
things like CoffeeScript to C++. This is opening up the space to new
languages, but it takes time and effort as well - again, the speed depends on
how many people volunteer to help out.

------
general_failure
It is wrong to expose such level features into a a programming language. These
are exposed as actual types. More and more stuff is being added to HTML, CSS
and is without any thought as to what the language is meant for. Just because
it can be done does not mean it should be. It is a better approach to advance
type interference and compiler technology, add language primitives that assist
such code generation.

The web page shows a Mandelbrot which is a terrible example because they are
best done with shaders. In face there is no real use case for this. There will
always be specific things that cannot be done and no language can address them
all. How many apps need SIMD based physics? In fact I am not even sure what
kind of physics needs SIMD.

~~~
ahoge
Auto-vectorization is an extremely complicated topic. This stuff is slow. Not
a big deal with AOT compilation, but it's of course a huge deal if you only
have a few msecs to spare.

Also, a machine can't just jumble the operations around because that would
change the result.

    
    
      >> 0.1 + 0.2 + 0.3
      0.6000000000000001
      >> 0.1 + (0.2 + 0.3)
      0.6
    

So, this is apparently something you have to do yourself. A compiler won't
know what kind of data will be fed to that function. It can't make an informed
decision.

> _In fact I am not even sure what kind of physics needs SIMD._

It generally just means that you use some physics library which makes use of
SIMD. Without having to do anything special, your game will run drastically
better an use less energy to boot.

That's the primary use case; using libraries which use SIMD. Most people won't
bother doing that by themself.

~~~
general_failure
If speed and compilation is a concern, let's start with having proper types in
javascript. That will speed things up a whole lot faster and save a lot of
battery than this.

I am not saying this won't make things faster. There are many things you can
do to make things faster but this really is such a niche feature (for web
based apps).

~~~
pcwalton
> If speed and compilation is a concern, let's start with having proper types
> in javascript.

That is a _much_ harder problem, especially since you brought up performance.
Getting the interaction between dynamically typed and statically typed code to
work in a way that's both easy and natural to use and allows compilers to get
significant benefits from the statically typed code is an unsolved research
problem.

