I've had open PRs for a year now:
Very straight forward changes. Emailed Ashley about helping with wasm-pack, no response (in response to https://github.com/rustwasm/wasm-pack/issues/928)
Updates to wasm-bindgen shouldn't require a change to wasm-pack. It should expose ways to pass arbitrary flags down
While having a webpack plugin is nice, I since gave up on wasm-pack & added a build-wasm to my package.json's scripts: https://github.com/serprex/openEtG/commit/9997fb098d168920bb...
This way if someone wants to contribute to openetg they don't need to install my wasm-pack fork. Ideally wasm-pack-plugin would skip wasm-pack & use wasm-bindgen directly
I do hope wasm-bindgen is able to be adequately resourced. It's a pleasure to program wasm modules in Rust
That's almost exactly what I'd expect from an optimal compiler.
Graviton2 has 3 scalar integer ALUs, and 2 128-bit. Scalar code can do 3 intops per cycle, x4 vector code can do 8. 8/3 is +67%. Intel processors have typically 4 scalar ALUs, and 3 vector units. 12/4 = 3x.
Zen has 4 units for 128-bit vectors, though until Zen3 not all units can do all operations, so the speedup in AMD land would be 2x-8x depending on application (although code doing brief 128-bit vector work would be limited by Zen having only 1 vector write port).
Pretty sure the "equivalent" is at least 10 times slower.
AMD64 CPUs don't have SIMD instructions multiplying 64-bit integers. The wasm32::u64x2_mul WASM function must be emulated somehow. The emulation gonna take many instructions and cycles.
Theoretically, yes. Practically, I think that’s a “sufficiently smart compiler” class of problems, insanely hard to solve. Especially given that WASM is a JIT compiler, it simply doesn’t have time for expensive optimizations.
Integer SIMD is weird on AMD64. Even state of the art C++ compilers fail to emit optimal code for rather simple use cases. A trivial example is computing sum of bytes: I’m yet to see a compiler which would optimize that code into _mm_sad_epu8 / _mm_add_epi64 instructions.
Detecting every way of doing a 32-bit multiply with a 64-bit mul operator is impossible, yes. But there only needs to be one way of doing it that the compilers knows about, and then people can use that idiom.
It's not pretty, but it works. Compare the common scalar int rotate: x86 can do it in one instruction, but C doesn't have an operator for it. The way to do it in C is to use an idiom that optimizers are known to recognize.
Too much magic to my taste. If compiler will be doing that anyway, why not expose an intrinsic we can use? The SSE instruction in question is rather efficient to emulate on NEON, only takes two instructions, vmovn_u64 and vmull_u32.
It’s the same about scalar code. When I need to rotate an integer, I normally use intrinsics instead of relying on the compiler to optimize the code. Recently, C++ language even added these things in their standard library, <bit> header in C++/20.
IMO, relying on such compiler optimization is fragile in the long run, for 2 reasons.
1. These are undocumented implementation details. Compiler developers don’t make any guarantees they will continue to support these things in exactly the same way.
2. Most real-life software is developed by multiple people. It’s too easy for developers to neglect comments, and slightly change the code in a way which no longer has a shortcut in the compiler.