

Google Criticizes JavaScript Benchmarks, Modifies Sunspider - peternorton
http://www.conceivablytech.com/7210/products/chrome-declares-sunspider-browser-benchmark-outdated

======
othermaciej
I'm pretty surprised that Google chose to fork SunSpider without even
suggesting their change to the maintainer (me) first, let alone providing a
patch. That's not to say I necessarily agree that running each test 50 times
is a better test, but it seems somewhat outside open source norms of courtesy
to fork and then blog about it without even proposing your change first.

~~~
pbiggar
I would generally agree, but to any observer it looks like SunSpider isn't
being actively developed: <http://bit.ly/lScLRi>

\- my correctness bugs have gone unanswered since August,

\- other mozillians bugs have not been fixed either (at least one since 2009),

\- the SunSpider svn logs show that there is no work being done on the
sunspider tests themselves, since the release of 0.9.1 in January 2010,

\- the last fix to the harness was 5 months ago (which broke shell tests for
everyone except JSC).

If you [1] want to remain the "maintainer" you have to show that's you're
maintaining it. While this isn't how I would have gone about it (and I was
thinking about it), I can't really fault Google for this.

[1] And by "you", I mean "Apple", not you personally.

~~~
othermaciej
If they filed bugs and got impatient, I'd understand. This blog post is the
first I've heard of this particular idea though.

Your bugs are good ones and should be addressed. I will try to do an update
soon. FYI, if you want a patch approved, you should flag it for review (and
make sure it follows the style guide), your patch would have been noticed much
sooner had you flagged it.

~~~
pbiggar
> If they filed bugs and got impatient, I'd understand.

I think it's acceptable to look at bugs others have filed to determine whether
to file bugs.

> I will try to do an update soon.

Great, thanks! (I know you won't like this, but it sort of validates Google's
approach. By making a big splash about this, they accomplish far more than
they would have by filing bugs and waiting, or contacting you directly.)

> FYI, if you want a patch approved, you should flag it for review

Funnily enough, we're dealing with problems like this at Mozilla too - a
contribution comes in, but is missed because the review flag isn't set, or it
needs work, or whatever. We're adding dashboards and metrics and things to
avoid this happening at Mozilla - and it happens a lot.

I'll mark them for review now.

------
kalleboo
I was hoping they'd discuss how SunSpider doesn't test the DOM, Canvas or
other things that web apps commonly do. I haven't felt my apps have really
been performance constrained by basic JS language performance, but rather DOM
tree searches, element creation, innerHTML parsing, etc. Benchmarking a
JavaScript Raytracer and encryption engine is a noble cause for advancing JS
going forward (especially with projects such as node.js gaining momentum), but
it doesn't feel very relevant to me, right now, today.

What other benchmarks are there? Apple's Sunspider, Google's V8 Benchmark
Suite, Mozilla's Kraken, that JSLint-based one... Does anyone know of a good
link where someone has deconstructed all these JS benchmarks and what parts of
a browser they actually test?

~~~
kenjackson
Yes, running a benchmark that doesn't test the right things 50x more doesn't
really fix the core issue.

I am curious though why IE gets slower. At worst it should run about the same
speed.

~~~
mbelshe
The great thing and the worst thing about benchmarks is that vendors optimize
to them :-)

IE9 claims speed, but the only benchmark it performs better than other
browsers on is the SunSpider benchmark. If it is so fast, why does it only do
well on a single benchmark? It has already been documented that trivial
changes to SS make IE9's performance change dramatically:
[http://blog.mozilla.com/rob-sayre/2010/11/16/reporting-a-
bug...](http://blog.mozilla.com/rob-sayre/2010/11/16/reporting-a-bug-on-a-
fragile-analysis/)

Personally, I believe SunSpider is just too-easy-to-game. Historically, this
problem comes up now and then with compiler vendors and benchmarks. The only
solution is to have long-running benchmarks which exercise many parts of the
JS engine so that gaming is not so easy.

~~~
kenjackson
Actually the IE9 SS issue was a bug. It's since been fixed and no longer
displays this weird behavior in IE9.

IE9 also does well on other benchmarks too like the Facebook graphics
benchmark, where I think the only browser to beat it was FF4.

~~~
mbelshe
There were multiple issues. There was a straight up bug, which was fixed. But
the perf delta remains.

~~~
mbelshe
You can see this for yourself - run the test referenced in Rob's post above.
IE9, right now, reports a 13-14x slowdown on that test with trivial changes.

------
azakai
There are several inaccuracies in this article,

* "Mozilla is tinkering with a SpiderMonkey JavaScript engine that supports Google’s V8 core and is currently developed as “V8Monkey”" - I assume they mean the side project some people are doing, to let SpiderMonkey support the V8 API, which would let it be used say in node.js. The article implies that that is the latest "buzz" in JS engines, but there are plenty of actual recent stories in that area, and all major JS engines are constantly improving.

* "Mozilla Firefox 5 Beta Build 1" - No such thing exists. Firefox 5 is in Aurora, which is pre-beta, and there is no Firefox Beta at the moment (just nightly, aurora, and stable).

* They use Chromium nightly, but not a Firefox nightly (Firefox 6), which is odd. For the other browsers - Opera and IE - I don't know if nightly builds are available, but for Firefox they are.

As to the content, I agree with Google that it is interesting to see 'steady-
state' results, of running a benchmark many times. For some web pages, that
clearly matters. However, the normal SunSpider is still very interesting - it
says something about the speed of smaller web pages that try to load quickly,
but still need some amount of JavaScript. Quick loading and responsiveness are
very important there, and in fact that is exactly the sort of thing Google
usually stresses in its own websites, and ironic that Chrome is the slowest
there. But in any case, both benchmarks are interesting.

~~~
magicalist
ironic maybe, but when the difference between the best and worst performers
over the entire benchmark is 3 frames at 60fps, it isn't unreasonable to think
of sunspider as indicative of not much any more.

as js engine improvements are starting to level off, the tradeoffs that have
to be made become more complex. optimizing for one case can cause a regression
in another (just take a look at the heuristics used for when tracing should
kick in for trace-/jaegermonkey). at some point, insisting on monotonically
decreasing times for a test that takes a quarter of a second becomes pretty
much akin to the megahertz or megapixel races.

~~~
ErikCorry
I agree, but I do have one nitpick: I've seen this suggestion several places,
but I don't really think it's accurate to say that JS engine improvements are
levelling off. See <http://blog.chromium.org/2010/12/new-crankshaft-
for-v8.html> Improvements to SunSpider scores, on the other hand, are
levelling off, as you note.

------
justincormack
So you run the tests a lot of times so the JIT compiler kicks in and is
guaranteed to compile the code. I can see that makes it run faster, but not
sure it makes it any more of a real benchmark...

------
js2
[http://blog.chromium.org/2011/05/updating-javascript-
benchma...](http://blog.chromium.org/2011/05/updating-javascript-benchmarks-
for.html)

------
natmaster
We already knew sunspider was outdated last year:
[http://weblogs.mozillazine.org/asa/archives/2010/10/some_sun...](http://weblogs.mozillazine.org/asa/archives/2010/10/some_sunspider_numbe.html)

It's funny that Google should criticize a benchmark on 'not reflecting the
real world' when their own V8 includes machine translated Scheme (how many
websites actually do that?) test.

Making the sunspider test run more times doesn't make it any better, all it
does is tune the benchmark to look better for their particular JIT settings.
They spend more time compiling and optimizing the javascript, which can only
be taken advantage of after many runs, rather than being smart and balancing
optimization time with how much gains you get from actually running with the
optimizations.

~~~
ryanpetrich
JavaScript as a target language isn't unheard of--Google Web Toolkit compiles
Java to JavaScript. You're right in that machine translating Scheme to
JavaScript is very uncommon.

------
itissid
Should these tests not be a black box? If you know these tests you can cheat
on them right? So the idea of making a black box benchmark is that a closed
system randomly chooses tests to run(without you knowing which ones or there
nature) and then running them and just giving you the results as a composite
metric to tell you how good the engine was. And put in an extra restriction
that you can't test more than once every week(say), or some time frame like
that. like that so that no one can test infinitely to game it. Folks into
machine learning would know what i am talking about here right?

