Everyone is saying "nothing", but I reviewed the patches (I did /not/ bother watching the video, however, as it was long and I was in a place where I felt uncomfortable playing sound; so maybe the video says "we didn't do anything" ;P), and what they did is to remove all of the non-strict pointer aliasing (one of C's optimization weaknesses) and thereby were able to kick the optimizer up to -O3, which is helping some already tightly-coded algorithms actually avoid touching memory.
It's not double the performance. It's a demo that if you use the software GLES implementation (which isn't used in production) they can run that benchmark at 60fps. The non-linaro toolchain GLES renders at less that 60 fps for this benchmark and then degrades to 30fps because the setup is double buffered and vsync'd.
Vsync doesn't convert anything in the range of 31..59 FPS to 30 FPS. Vsync prevents fractional frame updates. You can still have any integer number of frames per second with vsync enabled. Anything less than 60FPS will just look like a dropped frame.
(Double-buffering isn't particularly relevant to FPS that I can see. It just prevents you seeing the frame while it's being drawn, in practice it's mostly useful for reducing flicker from overdraw.)
It's related to double buffering and vsync together. If the pure rendering performance was 45fps, when the render is complete the next render can't progress until the buffer currently being displayed is available - hence degrades to 30 fps.
Ah OK. I was assuming a world in which triple-buffering was a given; in other words, that vsync in practice was a rendering implementation detail for avoiding tearing, rather than something that blocked the renderer. I can see how things are different on a resource-constrained device that can't afford triple-buffering.
I'll add that usually you don't see a constant rate of rendering performance. Render time for a frame wanders above or below 16.67ms (with the design target being under that, if 60fps is the aim) depending on complexity or how busy other tasks are. It's not usually the case that every frame takes 22ms (i.e. 45fps) to produce; so it would be very unusual to see a game / app flip from 60fps straight to a locked 30fps. Rather, the odd frame will take slightly more than 16.666, with most falling under. The statistical distribution will give an FPS rate somewhere between 30 and 60.
What's being demonstrated here isn't a real world usecase rather it is a demonstration that software (cpu based) GLES implementation can run at 60 fps with this test. In the production devices (and also available on Panda too) the actual GPU driver/libraries for the SGX540 is used.
Practically this means that the Linaro tool chain won't double the performance of this test on a fuller environment.
Additionally it's fairly specious to say 'double' the performance, on Android things are typically vsync'd and double buffered so if you render at < 60fps you will degrade to 30fps.
IMO this is mildly disingenuous or at best naive and from a technical perspective Linaro should really do a better job trying to make their efforts relevant.
They'd be better off showing off improvements in browser performance because the CPU is still used for some 2D rendering.
Actually, in many cases Android does use the software renderer.
Take a look at /system/lib/egl/egl.cfg
In many cases where I've looked (though it varies on device) the software renderer does get picked up. If you remove the entry for the software renderer, it often speeds up in general, because hardware is forced to be used.
It practice it doesn't very often unless there's a very weird EGL configuration being requested which the h/w version doesn't support. But your point applies that you'd be better off removing the s/w renderer.