

Introducing the JetStream benchmark suite - ingve
https://www.webkit.org/blog/3418/introducing-the-jetstream-benchmark-suite/

======
nnethercote
The situation where browser vendors create browsers benchmarks isn't healthy.
And SunSpider is a total joke at this point, full of toy programs and
microbenchmarks that have been optimized to death, so I'm disappointed that
JetStream is including most of it.

(For example, did you know that every major JS engine now has a daylight
savings offset cache, something which is entirely useless for any real code,
but substantially speeds up the date benchmarks in SunSpider? Bleh.)

The state of C/C++/Fortran benchmarking has been much better -- not perfect,
but much better -- for years, thanks to the SPEC benchmarks, which are created
by an independent company that invites submissions for real-world programs
that are suitable for benchmarks.

I wish there was something similar for JavaScript and HTML, though the fact
that you have to pay for copies of the SPEC benchmarks isn't good.

~~~
om2
Did you read the article? It explained why we used fast-running tests from
SunSpider combined with longer running tests from Octane and the LLVM test
suite. You may have a fair point about the date tests being mildly silly. But
many of the SunSpider tests (for instance 3d-raytrace, tagcloud, crypto-md5)
are pretty realistic workloads, if fast-running. They are arguably more
realistic than some subtests of Octane (like splay).

I agree with you that a neutral and fair benchmark would be good.
Unfortunately, the non-browser-vendor-created browser benchmarks out there
tend to be uniformly much worse. I think the only way to get something that is
high quality and also neutral is to create a neural group that multiple
browser vendors join.

~~~
nnethercote
> But many of the SunSpider tests (for instance 3d-raytrace, tagcloud, crypto-
> md5) are pretty realistic workloads, if fast-running. They are arguably more
> realistic than some subtests of Octane (like splay).

I stand by my opinion that SunSpider is a poor benchmark suite that should
have been retired years ago.

Octane also has some bad benchmarks, and I agree that splay is one of them.
I've written about this before at
[https://blog.mozilla.org/nnethercote/2012/08/24/octane-
minus...](https://blog.mozilla.org/nnethercote/2012/08/24/octane-minus-v8/).

And you admit that at least some of the SunSpider and Octane benchmarks are
bad, and yet you included them anyway? Argh.

I must say I also dislike this trend of creating benchmark suites that have
entirely new names yet consist mostly of tests from prior suites.

> I think the only way to get something that is high quality and also neutral
> is to create a neural group that multiple browser vendors join.

I agree 100% with you there. Robohornet was an attempt at this, but was
fatally flawed by the fact that the benchmarks themselves were terrible.

~~~
om2
"Bad" is relative. Some tests are not as reflective of real-world workloads as
some other tests. But they still test real aspects of VM performance. I don't
think these tests force you to do things that are actively counterproductive,
and if you did, the weight of other tests would punish you. The scoring is
geometric mean, so it's impossible for a single test to dominate.

Did Robohornet get non-Google browser vendor involvement? It doesn't seem so
currently, and I don't recall any particular push to get involvement. It seems
like it was single-browser-vendor in practice (and also not very well
designed).

I will hedge this by mentioning that there is a totally neutral benchmark that
is actually good and more realistic than any of these. JSBench is created from
the JS in real websites, was not created by any browser vendor, and has had
advice and input from multiple vendors.
<[http://jsbench.cs.purdue.edu>](http://jsbench.cs.purdue.edu>) It's not very
good at modeling advanced / bleeding-edge webs though. Also, Safari spanks
other browsers so hard on this benchmark that it's not a useful optimization
tool.

~~~
nnethercote
> Did Robohornet get non-Google browser vendor involvement?

There was a stewardship committee that had lots of non-Google people on it:
[https://github.com/robohornet/robohornet/wiki/Committee-
Memb...](https://github.com/robohornet/robohornet/wiki/Committee-Membership).
There was even a Mozilla person on it at one point, I can't remember who
(someone I didn't know) but they asked to have their name removed once we
discovered and explained how bad the benchmarks were -- it turns out that
person had only had tangential involvement anyway.

But I just found the technical advisory committee, which consisted of three
Google Chrome people:
[https://github.com/robohornet/robohornet/wiki/Technical-
Advi...](https://github.com/robohornet/robohornet/wiki/Technical-Advisors). So
it was more Google-heavy than I realized.

------
cromwellian
Is it just me, or do these tests seem more heavily weighted towards C compiled
to JS than most other suites? Emscripten tests seem to make up 1/4 of the
suite. That's a good exercise for FTLJIT, but emscripten code is rather
atypical for the majority of web apps and isn't idiomatic JS.

------
wolf550e
This is also interesting: [http://browserbench.org/JetStream/in-
depth.html](http://browserbench.org/JetStream/in-depth.html)

