
Why iPhone Xs performance on JavaScript is so good - Bootvis
https://twitter.com/codinghorror/status/1049082262854094848?s=21
======
faitswulff
@steipete puts it pretty well: JavaScript really made it. We now tweak CPUs to
make it faster.

[https://twitter.com/steipete/status/1047415826083729408](https://twitter.com/steipete/status/1047415826083729408)

~~~
tzs
That was a trick Dave Fotland used in the '80s to make his Go program, "Many
Faces of Go", faster. He donated some key code from his evaluation function to
SPEC and they used it as one of the parts in the SPEC integer CPU benchmarks.
CPU makers tweaked CPUs to do well on those benchmarks, and hence on his code.

~~~
chaboud
I've talked with him about those days... Though I haven't chatted with him
about Go since AlphaGo flipped the whole world over.

I worked very hard on pulling the same trick at Sony. We'd made Vegas
scriptable, and, once it was part of Bapco's Sysmark, we received immense
support from chip vendors. Machines, engineers, tools, instructions added to
instruction sets... It was a nutty time.

Of course, the biggest wins worked everywhere, like the 6755399441055744
single precision rounding trick, or the use of memory mapped file regions to
cache the rendering tree. Still, becoming part of a performance benchmark is a
great way to get attention.

------
olliej
This instruction cannot result in a significant performance improvement for
any js code that isn’t absolutely perf bound on just converting floats to
integers. If your code is successfully making that your bottleneck your code
has problems. None of the major benchmarks (I can’t even think of micro
benchmarks that could really achieve this) are spending significant time doing
double to integer conversions.

And as a nail in the coffin for this nonsense: javascriptcore does not use or
even emit this instruction:
[https://mobile.twitter.com/saambarati/status/104920213252247...](https://mobile.twitter.com/saambarati/status/1049202132522479616)

~~~
mschuetz
> If your code is successfully making that your bottleneck your code has
> problems.

I disagree. My use cases involve loading, processing, rendering, and saving
point clouds with millions to billions of points. For precision and file size
reasons, coordinates are stored as 3 x int32 instead of 3 x float or 3 x
double. Every time I load the points from disk I have to convert them to
floats for rendering, or doubles for processing. Vice-versa for storing the
processed results to disk.

Just because you don't need it, doesn't mean nobody needs it.

~~~
maccard
> For precision and file size reasons, coordinates are stored as 3 x int32
> instead of 3 x float or 3 x double

I get storing them as float instead of double but into instead of float for
size/precision doesn't really make sense to me. A float is 32 bits, so will be
the same size as an int(32). If it's precision you're worried about,you're
going to lose the precision as soon as you convert it from an int to a float
to use.

~~~
mschuetz
Single precision floats loose precision as coordinates get larger, hence
"floating" point precision. Point cloud data often consists of outdoor/aerial
scans covering multiple kilometers and floats cannot accurately represent
large coordinate values like that. Translating to origin and/or rescaling only
works to a limited extent and isn't a very robust solution.

Integers on the other hand can be used to represent coordinates in a fixed
precision. E.g. if you want to store coordinates in milimeter precision, you
multiply your double precision meter coordinates by 1000, and then cast the
resulting coordinate to 32 bit integers.

> If it's precision you're worried about,you're going to lose the precision as
> soon as you convert it from an int to a float to use.

Yes, that is why you convert ints to doubles for processing and only use
floats for tasks where limited accuracy is okay, e.g. rendering.

~~~
maccard
> Single precision floats loose precision as coordinates get larger, hence
> "floating" point precision.

I've plenty of experience with floating point numbers.

> Translating to origin and/or rescaling only works to a limited extent and
> isn't a very robust solution.

Curious as to why you think this isn't the solution? double precision just
kicks the can farther down the line, whereas a fixed point offset origin gives
you the "best" of both (albeit with slightly more code)

> Yes, that is why you convert ints to doubles for processing and only use
> floats for tasks where limited accuracy is okay, e.g. rendering.

Right, so you store as 32 bit values, and do the processing in double
precision, but convert to single precision for rendering (from double
precision)? So you were never going to store them as float on disk.

~~~
olliej
Presumably they don’t think that’s the problem because doubles already exceed
the precision of the integer types they’re using.

I do get this decision - point cloud data for scanned geography tends to be
uniformly separated over a large area, a 32 bit float very rapidly starts to
accrue significant error if the geometry is large. Presumably he olde fixed
point arithmetic would be more precise (for this specific purpose) than a
32bit float (which has thrown away significant amounts of precision for the
exponent).

But again, 64bit float has 53bits of precision, which is larger than int32
that is in their data format. It’s also probably enough precision for more or
less anything outside of extreme ends of science :)

------
jjgod
Jeff Atwood/DHH’s conclusion regarding Speedometer 2 is not completely true
according to Filip Pizło
([https://webkit.org/blog/author/fpizlo/](https://webkit.org/blog/author/fpizlo/)):
[https://twitter.com/filpizlo/status/1049132270773198848](https://twitter.com/filpizlo/status/1049132270773198848)

In this case I’d trust Filip more :)

~~~
codinghorror
As I said in my reply to that, it could also be in combination with the much
faster memory / caches on A12 -- see
[https://www.anandtech.com/show/13392/the-iphone-xs-xs-max-
re...](https://www.anandtech.com/show/13392/the-iphone-xs-xs-max-review-
unveiling-the-silicon-secrets/3)

------
zbjornson
[https://developer.arm.com/docs/100069/latest/a64-floating-
po...](https://developer.arm.com/docs/100069/latest/a64-floating-point-
instructions/fjcvtzs) (FJCVTZS: Floating-point Javascript Convert to Signed
fixed-point, rounding toward Zero) is the instruction they're talking about.
One main difference (vs FCVTZS) is that it sets the Z flag depending on
whether the conversion was exact, and it's mod 2^32 on overflow.

~~~
simias
I think this is the point where it starts being hard to argue that ARM is RISC
anymore. x86 has BCD support and special facilities to handle NUL-terminated
strings (i.e. C-strings) but now ARM has instructions with "javascript" in the
name. At least Jazelle was a separate extensions.

I mean I suppose it makes sense for them to do that, especially if the
performance benefits are as huge as outlined in this tweet, but damn it feels
dirty.

I assume the reason for this is that in JS, as far as I know, all numbers are
stored as floats right? So you keep casting everywhere when you need an
integer? I assumed that JS implementations where a bit more clever and kept
them as ints whenever possible but maybe it's not as simple as I had imagined.

~~~
zbjornson
Not all numbers are stored and operated on as doubles, no. They are treated as
if they were though. V8 and Spidermonkey (idk about the others) store and
handle integers that fit in 32 bits as 32-bit integers.

------
blinkingled
Without clicking the link i knew this was going to be Jeff Atwood - something
about iPhone's JS performance seems to crack him up perpetually! :) Also see
Dan Kaminsky's reply -

"I took a quick look at Speedometer 2.0 and it seemed to be driven by the
speed of the browser cache implementation, as in how much was explicitly in
memory, what poked the file system, how async was implemented. Not CPU bound."

~~~
xrd
That's pretty serious, right? Changes the entire conversation if this is not
truly improving execution speed?

~~~
blinkingled
Yes - and if it does it doesn't matter for this particular benchmark which
isn't CPU bound.

------
thought_alarm
2008: I never thought I'd be walking around with a full Unix machine in my
pocket.

2018: I never thought I'd be walking around with a full Symbolics Lisp Machine
in my pocket.

~~~
blihp
2028: Why does battery life still suck?

------
hden
So Apple has its own browser, its own JIT compiler(1), which emits its own
instruction sets, which then run own its own CPUs.

Heck let’s propose the JPU (JavaScript Processing Unit) for server-side codes.

1\. [https://webkit.org/blog/5852/introducing-the-b3-jit-
compiler...](https://webkit.org/blog/5852/introducing-the-b3-jit-compiler/)

~~~
ridiculous_fish
Imagine if Apple had its own operating system running its own static compiler!

~~~
cm2187
With the end of Moore’s law, that may be the way forward, making cpus with
more of the software implemented in hardware.

------
sercand
According to [https://www.anandtech.com/show/13392/the-iphone-xs-xs-max-
re...](https://www.anandtech.com/show/13392/the-iphone-xs-xs-max-review-
unveiling-the-silicon-secrets/4) reason is new memory subsystem. The new chip
is not just fast at Speedometer 2 but %40 faster almost all at other
benchmarks. Also, all iPhone devices are fast with iOS 12 JavascriptCore.

~~~
codinghorror
Per
[https://images.anandtech.com/doci/13392/SPEC2006-eff_575px.p...](https://images.anandtech.com/doci/13392/SPEC2006-eff_575px.png)

SPECint2006 -- 36.93 --> 44.92 is about 20% better

Speedometer2 -- 90 --> 125 is 38% better

That is nearly 2x what SPECint would predict. (It's also not changes in Mobile
Safari / iOS 12 because every device benchmarked was on the same version of
iOS 12).

------
breakingcups
This says Webkit doesn't emit these instructions yet:
[https://bugs.webkit.org/show_bug.cgi?id=184023](https://bugs.webkit.org/show_bug.cgi?id=184023)

------
nikkwong
Kind of ironic given that safari continues to lag behind in the context of
implementing web APIs. Heck, they still don't even support
IntersectionObserver, which was introduced in like 2014. I really do share the
sentiment of others in calling Safari the new IE; developing for safari on iOS
is a total drag. I guess the performance team and specs teams must be
compartmentalized :)

------
vbezhenar
I think that it's inevitable path of processor evolution. When it's hard to
increase performance, processors will include more useful bits of
functionality for popular runtimes.

------
wruza
As if double to int/ptr was js-specific.

~~~
angelsl
No, but this does it the way JavaScript wants it. FCVTZS has always been
therr. FJCVTZS[1] is new.

[1]:
[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc....](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0801g/hko1477562192868.html)

Did someone say ARM is RISC?

~~~
userbinator
Indeed, it could be said that ARM _not_ being pure RISC is what keeps it
competitive with x86. I wonder how long it'll be before x86 also gets a
similar set of instructions...

~~~
Dylan16807
The instruction is just specifying particular semantics, that doesn't really
disqualify it from being RISC. The instruction doesn't do _more_ than the
baseline variant.

------
im3w1l
If there are any twitter devs here, clicking Greg Parker's tweet breaks the
back button for me.

~~~
zenexer
What browser? Working in latest stable Chrome on Windows.

~~~
im3w1l
Detailed steps: Firefox on windows.

1\. Click to get to jeff's tweet from hacker news.

2\. Click Gregs tweet.

3\. Click back (works and we get to jeffs tweet)

4\. Successive back clicks never get to Hacker News.

------
jayd16
Fixing javascript with hardware? Its not exactly genius but it seems worth it.

Only Apple had the #courage.

~~~
kbumsik
ARM did it, not Apple.

~~~
JohnBooty
Apple did it, not ARM.

Apple designs the chips:

[https://en.wikipedia.org/wiki/Apple-
designed_processors](https://en.wikipedia.org/wiki/Apple-designed_processors)

Specifically,

    
    
        Apple has an ARM "architectural license", which is for
        companies "designing their own CPU cores using the ARM
        instruction sets. These cores must comply fully with the 
        ARM architecture. Companies that have designed cores that
        implement an ARM architecture include Apple, AppliedMicro,
        Broadcom, Cavium (now: Marvell), Nvidia, Qualcomm, and 
        Samsung Electronics.
    

[https://en.wikipedia.org/wiki/ARM_architecture#Licensing](https://en.wikipedia.org/wiki/ARM_architecture#Licensing)

~~~
kbumsik
Again, ARM did it, not Apple.

ARM designed that JS-optimized instruction. Apple just follows (and it has to)
ARM specification in Apple’s own way.

Don’t confuse between ISA design and its chip implementation design.

~~~
dannyw
Don't think for a sec that ARM doesn't design ISAs without talking to vendors
like Apple.

------
forrestthewoods
Hardware instructions created to handle JavaScript specifics makes me so very
sad. What a terrible state of affairs.

~~~
FridgeSeal
I agree.

All the things we could be doing and we're making CPU improvements for the
flaming pile of hot trash that is JS? Why?

~~~
koffiezet
Maybe because it's a considerable part of the code that CPU's are executing
these days? Certainly UI stuff that should be responsive and fast.

Even many 'native' apps, mobile or desktop (electron) are written in
javascript, and how much of your time do you spend in your browser or electron
apps every day? Think vscode, atom, Spotify, Slack, facebook, reddit, twitter,
...

