

Binary data processing in JavaScript - varunkumar
http://blog.varunkumar.me/2012/07/binary-data-processing-in-javascript.html

======
kombine
Does Int32Array look horrible only to me? What if I want an array of my own
custom types - say vec2f. Do I have to use 2 Float32Array's? Javascript is a
horrible language where you have the tradeoff between performance and
expressibility. Seriously, this whole web development thing needs to be fixed.

~~~
mraleph
You can use one Float32Array: x component gets even indices, y gets odd ones.

The next standard might have a better binary data support:

<http://wiki.ecmascript.org/doku.php?id=harmony:binary_data>

I am not sure though where you tradeoffs between expressibility and
performance. Typed arrays came to our world only recently, when needs of WebGL
gave them birth. So they ended up being bolted on top of an existing language.

~~~
kombine
"You can use one Float32Array: x component gets even indices, y gets odd
ones."

That is exactly my point of choosing the performance vs expressibility. But
then how do I write algorithms in a generic fashion for them? In C++ I have
templates and template functions, so I can write algorithm that will work for
any type of the array. And I'm not even talking about STL which decouples
algorithms from containers, by making the latter provide a hierarchy of
iterators. Other languages like Haskell also provide compile-time
polymorphism. And here we are in 2012 - there is an ad-hoc solution that only
works in few browsers, and a proper one is only down the road.

------
azakai
A few things to add to this discussion about typed arrays:

1\. The list of support statuses does not seem to reflect that Float64Array
was missing from Safari until recently, and I think might still be missing
from mobile Safari.

2\. Typed array performance of Uint32Arrays can differ from the other integer
array types: unsigned int values do not fit in 31-bit signed integers, which
is what most JS engines optimize ints for. This is most significant in V8
which does not use NaNboxing, so when it sees a big integer it makes it a
boxed double, leading to bad performance, see for example
<http://code.google.com/p/v8/issues/detail?id=2097> Looks like the benchmarks
done here did not check those values, but they happen a lot in practice with
things like compiled C code.

3\. Aside from points 1 and 2 overall typed array support and performance are
predictable and good across browsers (sans IE, but hopefully with IE10). Aside
from random access there is also the .set() method which lets you copy large
amounts of typed data efficiently as well (which was not benchmarked here, but
should basically be a memcpy so likely consistent across browsers).

~~~
mraleph
> _This is most significant in V8 which does not use NaNboxing, so when it
> sees a big integer it makes it a boxed double_

NaN-tagging has nothing to do with the issue you are linking to e.g. in V8
does not allocate a boxed double for every floating point value it reads from
Float64Array.

Let me expand my own comments [vegorov@chromium.org] from the issue, so that
the problem V8 has becomes more understandable.

The reason for the Issue 2097 is somewhat "patchy" support V8 has for uint32
values in the optimized code. uint32 value in JavaScript has somewhat a dual
nature: it can be treated bitwise as an signed int32 as long as it
participates in truncating operations[1]; but once it escapes into unsafe
place where its "unsigned-ness" can be observed it might have to become a full
double if it does not fit into positive part of int32 range.

V8's optimizing pipeline does not have a proper analysis to determine where
uint32 value can efficiently be represented as int32 without affecting
semantics. It also does not want to pessimistically represent all values read
from Uint32Array as doubles so it just chooses to assume non-truncating int32
representation for them and as a result code deoptimizes when it sees a value
that does not fit [another bit that is missing here is type-feedback on keyed
load IC that would tell hydrogen that he is too optimistic in its assumption].

V8 actually has the same problem with x >>> 0.

[1] - examples of truncating operations are: bitwise ops, stores into integer
typed arrays and arithmetic operations when they do not overflow beyond 53bits
and their results are used only in truncating operations;

~~~
azakai
Interesting, thanks. I am not sure I fully follow, though: Why is v8 so much
slower on this than other engines that also do not have special code to handle
uint32s? We are talking slowdown factors of 10x-15x, so it isn't a minor
optimization here or there - what is the cause?

~~~
mraleph
Optimized code can't deal with uint32 that is out of range so it deoptimizes
and effectively you end up running in an entirely _unoptimized_ code which is
obviously much slower than optimized one... and of course this penalizes V8
even more because unboxed doubles and int32s are possible for V8 _only_ in
optimized code.

It is true that if V8 were using NaN-tagging it would suffer less from running
entirely unoptimized code. But my point is: it doesn't have to use NaN-tagging
to run this code efficiently, it just needs to ensure it doesn't deoptimize
for nothing.

~~~
azakai
Ok, I see.

So, out of the 10-15x slowdown, how much is due to deoptimizing and how much
is due to not NaNboxing, if it's possible to estimate that? The distinction
should be important in the case of a very large codebase whose performance is
not focused in a few small loops, so presumably most of the time you will be
running unoptimized code (and then if you NaNbox or not gets important).

~~~
mraleph
I am not sure I entirely understand. If you are running in a cold code then
performance does not matter and you can tolerate quickly allocating a small
amount of boxes which will be as quickly reclaimed by scavenger once you are
done with them. If you are running in an hot code --- then it _should_ be
optimized in a way that minimizes the number of boxes produced.

In other words: ideally application should be running unoptimized code if and
only if it is either cold or cannot be improved by optimization; all other
cases are bugs.

I can't split 10-15x between deoptimization and boxing because for V8 cost of
"erroneous" deoptimization _includes_ the cost of boxing as you can't have
_un_ boxed numbers in unoptimized code.

As I said earlier it is true that non-optimized code heavily manipulating
doubles could become faster if V8 used NaN-tagging (or another technique that
would allow it to maintain unboxed doubles on unoptimized frames). But speed
of unoptimized code should _not_ matter (see above).

Another thing to keep in mind is that for NaN-tagging on ia32 you pay with
memory overhead: every object slot that can contain primitive number becomes
twice as large on ia32. This is not nice if you don't have a lot of number
floating around.

Overall, let me reiterate it, I am not arguing against NaN-tagging. I am just
clarifying that the Issue 2097 is caused by the wrong decision in the hydrogen
pipeline not by the fact that V8 does not use NaN-tagging.

~~~
azakai
I see now what you are saying about that issue, NaNboxing makes it worse but
at core it is a deoptimization issue. Which is good, I hope this is fixed soon
(so emscripten-compiled code runs more consistently across browsers).

> I am not sure I entirely understand. If you are running in a cold code then
> performance does not matter and you can tolerate quickly allocating a small
> amount of boxes which will be as quickly reclaimed by scavenger once you are
> done with them. If you are running in an hot code --- then it should be
> optimized in a way that minimizes the number of boxes produced.

Let's say that performance matters in the application, but it is huge in code
size and all the code matters, not a few small parts. Would you call all the
code hot, and would v8 optimize the entire application? (i.e., how is 'hot'
defined in v8?)

~~~
mraleph
Hot currently is defined as "function called more than X times" and "function
contains a loop that took a backedge more than Y times". So it is defined per-
function basis.

I can hardly speculate how V8 will behave on some abstract application. That
is really highly dependent on how code looks like. But ultimately V8 will try
to optimize everything that falls under criteria outlined above.

~~~
azakai
I see, thanks.

I ask because performance of small benchmarks has been quite good, except for
the uint32 issue mentioned above, often around 3x slower than native code -
but on very large codebases it is often much slower, and I do not know why.

~~~
mraleph
Unfortunately (as you probably know yourself) I don't have a magical answer
that would speed everything up. As for V8 there is quite a number of limits
that you might be hitting with generated code (e.g. number of locals, size of
the function etc) and there can be some bugs or non-done-yet thingies
affecting performance. Profiling and looking at the generated code (and filing
bugs) is the only suggesting I can give here.

~~~
azakai
Ok, yeah, I understand there is no simple answer here. Thanks for your
clarifications earlier.

------
mraleph
Couple of things to be aware of from a V8 perspective:

\- to make "creation" test more fair for normal arrays they should be
preallocated with new Array(arraySize), if arraySize does not exceed 90000.
This will ensure that you are not wasting time reallocating backing store as
it grows.

\- It has been pointed to the test author a year ago that having a single
test_SMTH function and calling it with different array types causes it to
become polymorphic --- which affects the generated code. V8 became much-much
better in handling polymorphism of this sort, but if you'll create a
test_SMTH_ARRAYTYPE for each combination of test and array type you'll see
results not distorted by the polymorphism.

~~~
varunkumar
Any idea as to how other JS engines handle polymorphism?

~~~
mraleph
I am not very familiar with internals of other engines. AFAIK both
SpiderMonkey and Safari have polymorphic inline caches for a.foo property
access sites. I can't say if the leverage PICs for a[i] access sites.

But from what I see from a quick glance over JavaScriptCore sources they do
not seem to handle any kind of polymorphism for a[i] kind of sites in their
new optimizing compiler (aka DFG). Additionally they do seem to handle
polymorphism for a.foo sites in DFG only if foo always has the same offset in
all structures this site have seen. I might be wrong about it though, it was
just a quick flight over the source, without even checking it out to the disk.

~~~
varunkumar
Yes, other engines do not seem to handle polymorphism well. I just amended the
code to use separate test_something_arraytype for each array type. And, it
turns out that the performance of other browsers have improved significantly
and they look closer to Chrome now. Check out the updated charts in the blog
post.

~~~
mraleph
Thanks for the update!

I suggest you add a note to the text hinting that you are not using original
tests anymore. It would be even cooler if you publish both results of original
(polymorphic) and your (non-polymorphic) runs.

I find results for "random read" particularly interesting. I need to take a
look at the generated code. It might be that there is something in the way V8
compiles integer modulo operation that makes it slower than SpiderMonkey.

[though for this test to be comparable across browsers you should really fix
function init_randlist not to use Math.random but use a pseudo-random
generator with a _fixed_ seed. Uncontrolled sources of randomness should be
discouraged in microbenchmarks and they lead to flaky unreproducible
benchmarks. Though I don't think that it's the reason here].

------
jacobolus
This test should also add strings where the least significant byte of each
character is used a byte (i.e. '\x00' = 0 up through '\xff' = 0xff). For many
operations, these end up being faster than arrays of numbers.

~~~
varunkumar
But, the immutable nature of strings should slow down the whole conversion
right? Correct me if I am wrong.

------
chris_wot
Jdataview handles this already. Except for Internet Explorer of course, which
can't handle arrays with null characters in it. No matter what you do, IE
won't let you see beyond the null char. No other browser has this issue.

I once pointed this out, but got a "holy edge case, Batman!" [1], and got
voted to -1...

1\. <http://news.ycombinator.com/item?id=3953368>

------
tezza
Hmm, I must have a differing understanding of 'support'

    
    
      Global user stats*:Support:	52.22%
    

Yet no version of IE supports it, that ranks support close to 0% IMO.

I'm no IE lover, but lots of users still have it as their main browser.

~~~
masklinn
> I must have a differing understanding of 'support'

One which makes no sense apparently.

> Yet no version of IE supports it

Which is why _global user support_ is 52%: 48% of users have browsers which do
not implement typed arrays.

------
chmike
What is the status of the blob data type ?

