
Google removes SIMD.js support from Chromium - chroem-
https://bugs.chromium.org/p/v8/issues/detail?id=4124&can=2&start=0&num=100&q=&colspec=ID%20Type%20Status%20Priority%20Owner%20Summary%20HW%20OS%20Component%20Stars&groupby=&sort=
======
millstone
This is a good decision. SIMD.js never made sense to me. Writing vector code
that outperforms scalar code requires not only fine-grained control over
issues like alignment, but also requires targeting specific SIMD instruction
sets. You must write your vector code with the emitted assembly in mind,
because if you fall off the fast path and the compiler switches to scalar code
or does a library call or something, you've lost any performance gain and then
some.

A generic SIMD API on a high-level language like JavaScript is as bad as it
gets. I looked at what it would take to port some of my vector code to SIMD.js
and it wasn't remotely plausible.

Also saving 10% of the v8 code size is enormous.

The right path forwards is for JITs to emit SIMD code when possible, and for
JS engines to provide OpenCL-like GPGPU-targeted APIs, following the trail
that WebGL blazed with JS shaders.

~~~
sillysaurus3
_This is a good decision. SIMD.js never made sense to me. Writing vector code
that outperforms scalar code requires not only fine-grained control over
issues like alignment, but also requires targeting specific SIMD instruction
sets. You must write your vector code with the emitted assembly in mind,
because if you fall off the fast path and the compiler switches to scalar code
or does a library call or something, you 've lost any performance gain and
then some._

I think this highlights the absurdity of writing vector code in higher-level
constructs rather than the direct ASM. Writing in ASM is not that hard, and
the gains can be enormous. I've seen graphics operations sped up by 5x or
more, just by writing hand-optimized vector ops in assembly.

This is one of Visual Studio's core strengths vs GCC. Writing an __asm { }
block is much more pleasant than the syntax GCC forces on you. Unfortunately
this inertia is one reason games tend to be windows-only, even in the post-
Apple era.

In other words, phrases like "You must write your vector code with the emitted
assembly in mind, because if you fall off the fast path and the compiler
switches to scalar code" invokes a feeling of "What? Why does the compiler get
any say in what the vector code looks like? Oh, right, everybody shies away
from writing ASM by hand nowadays." Which is valid, but sometimes having ultra
control is extremely nice. It just feels cool to get a 2x speedup by being
persistent and clever. (Those speedups come with future technical debt,
though, so there are good arguments against it.)

~~~
sddfd
The compiler should vectorize functions automatically, without user
intervention.

Writing assembly code to use SIMD instructions is a vote of no confidence on
your compiler.

If your compiler of choice can't do automatic vectorization switch to one that
can, or get people to implement that feature in your compiler of choice.

~~~
dagss
Well, the compilers don't do a good enough job, end of story (at least using
C; not sure about OpenCL). To achieve top performance it is necessary to make
sure everything in the computation is organized in the right way. And try out
multiple strategies on the high level of the program, to see what achieves
best performance at the low level. At that point, trying to obfuscate your
SIMD logic in regular C so that the C compiler _maybe_ figure out what you
meant is just frustrating anyway.

What you are saying is basically "you don't need top performance, just deal
with what the compiler gives you".

But sometimes you work on stuff where the whole point is to be best in class
in performance (libsharp in my case).

If you think that what you are suggesting is possible, feel free to improve
the compilers so that normal C code beats OpenBLAS. You will be famous. I will
wait....

~~~
sddfd
Writing inline assembly to get SIMD performance is likely to cost immense time
of an architecture expert and doesn't scale.

So my point stands: If a compiler can't produce vectorised code, the compiler
needs to be improved. Spending time on improving the compiler is sustainably
spent time. Spending time on programming SIMD in assembly by hand likely is
not.

I believe you that the vectorization support at the moment is not good enough,
and I completely understand that not everyone can spend time on improving the
compiler.

However, it seems you know exactly what a compiler should be able to do and
what feedback from the compiler, or annotations for the compiler, would be
helpful for your use-cases where optimization somehow didn't figure out what
to vectorize.

I was just pointing out that communicating with compiler people and getting
them to improve the compiler is likely a better move than asking for better
support for inline assembly.

~~~
dagss
The problem is not communication. To quote Paul Graham in a keynote from some
years back: People have been waiting for "sufficiently smart compilers" to cut
programmers out of the loop for 30 years. It is a neat idea in theory. In
practice it is just very hard. You need to design the algorithm for the
hardware, so it is sort of an AI problem. Programmers do fill a role usually
in coming up with the best algorithms...

The point is simply that C doesn't have the concepts you want to program with.
Vectorization only gets you so far, there are many other things you can do
with AVX/SSE. It doesn't need to be ASM, it could be supersets of C instead
(like CUDA).

Intrinsic functions for AVX is what I use and I don't see the problem with
using those. It sort of is such a superset of C.

In practice a few numerical computations (BLAS, FFTs, etc) are reused a lot by
many. It is worth it to write those few libraries in assembly (or at least
intrinsics).

For the rest we just have to live with a small performance penalty.

I am just saying "if you want to go the extra mile to get high performance,
you can beat compilers". In most scenarios of course it makes most sense to
not bother.

~~~
sddfd
I get your point.

However, I don't think it is useful to consider a compiler to be a replacement
for programmers. Compilers are tools for programmers. The more time a compiler
saves a programmer, the better it is.

I think there is a middle ground between writing inline assembly and fully
automatic vectorization that would be less time intensive than manual
vectorization and more predictable and available earlier than fully automatic
vectorization. I wonder what would have to be done to find it and provide
support for it in GCC/LLVM.

~~~
corysama
So... SIMD intrinsics?

You also need to take into account that a huge chunk of the work in SIMD is
the need to rearrange your data to be more amenable to the CPU. C++ compilers
are very restricted in what they can do for you there.

------
sowbug
Interesting fact: the owner of that bug wrote Raster Blaster
<[https://en.wikipedia.org/wiki/Raster_Blaster>](https://en.wikipedia.org/wiki/Raster_Blaster>)
and Pinball Construction Set
<[https://en.wikipedia.org/wiki/Pinball_Construction_Set>](https://en.wikipedia.org/wiki/Pinball_Construction_Set>).

~~~
rashkov
Thanks. The links don't work unfortunately -- no need to format them, but that
is a fun detail to know about

~~~
sowbug
Oops, bad habit. Thanks. Too late to edit, so for posterity:
[https://en.wikipedia.org/wiki/Raster_Blaster](https://en.wikipedia.org/wiki/Raster_Blaster)
and
[https://en.wikipedia.org/wiki/Pinball_Construction_Set](https://en.wikipedia.org/wiki/Pinball_Construction_Set)

------
pluma
This seemed shocking as I thought SIMD was still on track for inclusion in the
next ECMAScript release as the proposal was on stage 3 the last time I checked
(stage 4 means "it's in the spec", so stage 3 generally means "shipping
ASAP"). I first thought Google had single-handedly decided to violate the
spec.

However it turns out the SIMD proposal has been dead since at least April:
[https://github.com/tc39/ecmascript_simd/commit/c6ca655dcbfc0...](https://github.com/tc39/ecmascript_simd/commit/c6ca655dcbfc014d82f3ec500c2e41db34a1dee8)

So this isn't really "Google kills SIMD" but "Google kills implementation of
dropped proposal".

It's still surprising to see proposals get canned this late into the process.
The only other example of a late-stage retraction I can think of is
Object.observe, which was originally expected to become part of the 2016
edition: [https://esdiscuss.org/topic/an-update-on-object-
observe](https://esdiscuss.org/topic/an-update-on-object-observe)

------
tlb
Better explanation at
[https://bugs.chromium.org/p/v8/issues/detail?id=6020&desc=2](https://bugs.chromium.org/p/v8/issues/detail?id=6020&desc=2)

------
ndesaulniers
Comment #145 back on Feb 13 actually has the commit:
[https://bugs.chromium.org/p/v8/issues/detail?id=4124#c145](https://bugs.chromium.org/p/v8/issues/detail?id=4124#c145)

The bug links to the actual discussion ("The V8 binary has a considerable
amount of code that could be trimmed"):
[https://bugs.chromium.org/p/v8/issues/detail?id=5948](https://bugs.chromium.org/p/v8/issues/detail?id=5948)

There's a nice treemap of the size of pieces of v8 in the first comment on
that bug.

Comment #3 on the bug notes that SIMD.js is large:
[https://bugs.chromium.org/p/v8/issues/detail?id=5948#c3](https://bugs.chromium.org/p/v8/issues/detail?id=5948#c3)

Finally, comment #6 "There's no reason to keep the simd.js stuff around. I'll
take it out ASAP.":
[https://bugs.chromium.org/p/v8/issues/detail?id=5948#c6](https://bugs.chromium.org/p/v8/issues/detail?id=5948#c6)

Someone notes that v8 is reduced by 500KB on Android. Looks like it got
reverted a few times for breaking Node.js tests, someone notes "can you wait a
day until we sort out Node first?" then SIMD.js gets removed again the next
day. So I assume they "sorted it out w/ Node" w/e that means?

Also, trying to learn how the macroassembler is laid out; seems pretty neat.
Kinda baffled that s390 and ppc are supported. Was kinda for mips but I guess
that's an Android supported platform...

A lot of the code that looks like it's being assembled by the macro assembler
has double underscore followed by a space, followed by argument lists, which
is curious looking at first glance.

Looks like the double underscores are defined as:

#define __ ACCESS_MASM(masm)

