The javascript engines (spidermonkey, v8 et. al.) have undergone a terrific amount of work optimising the performance of the VM over the past several years.
Unless I'm thinking about this all wrong, doesn't this just show that spidermonkey currently does a bad job of optimising asm.js (or at least the type of asm.js produced by non-JIT PyPy.js)
I suppose? Define 'bad job' - asm.js is super new and it's a very low level representation of an application. From what I know about it (and my experience trying to produce optimal jitted JS without using asm.js at runtime), it's very easy to produce valid, fast-looking JS that turns out to run really slow.
A lot of the problems here require global optimization - thinking about high level dataflow, object shapes, types, etc. asm.js eliminates some of those problems, but not all of them - if you bounce between integer and float a lot that will cost you more in asm.js, if you miss cache and branch mispredict a lot that will cost you more in asm.js, etc. A lot of the optimizations you get 'for free' in a normal JS environment aren't meaningful for asm.js, so you may have to do a lot more work in your compiler (or in this case, JIT) to hand it code that is already high quality.
For example, let's say you've got a variable that bounces between containing a float, an integer, and null. In normal circumstances, Spidermonkey and v8 may be able to do clever things like using type information to determine that it's safe to always store that variable as a given type, and remove that optimization the moment an int or null is written to it. There are also tricks like nanboxing, where you pack smaller types into doubles by utilizing quirks in the IEEE format - meaning that the runtime can just allocate 64-bit slots for everything, instead of having to allocate room for a type tag + the largest possible type.
In asm.js, you need to figure all that stuff out yourself, and do it up front in your compiler (or JIT). It expects you to have all your types figured out so if you hand it the naive solution for that problem (type tag, largest possible type) you will end up using more memory and potentially paying higher costs from always checking the tag and always copying the whole thing, where a more dynamic runtime would be able to do on-the-fly optimizations based on observations about your code. It's a trade-off.
One other area where this can come up is virtual calls. Virtual calls are incredibly pervasive in languages like Java and C#, to the extent that HotSpot and the CLR actually do very clever dynamic optimizations on virtual calls (and interfaces, in the C# case) that can outperform equivalent patterns in native code. This doesn't mean that overall Java or C# deliver better performance - they don't - but this is one of the strengths you get from a tightly integrated JIT and language definition. At present, at least, this sort of optimization is also outside the domain of asm.js - it's an AOT compiler, not a recompiling JIT, so it's not going to do any magic to make your virtual calls and other pointer indirection any faster.