That's to optimize the scene, not the engine itself :) There could be some trade...

That's to optimize the scene, not the engine itself :)

There could be some tradeoffs to those suggestions as well. For example using unindexed geometry for simple meshes can still be slower if there's many vertex attributes. Its also not uncommon to render tesselated meshes - there's a sweet spot in triangle size for mobile GPUs, at least tile-based ones like PowerVR. With VR barrel distortion applied in the vertex shader during the main pass you definitely don't want cubes made out of only 12 triangles.

Vertex count isn't that important a metric anyways; you can push a few million polygons in a few hundred draw calls to mobile GPUs every frame and still run a smooth 30FPS. Desktop is an order of magnitude higher (5k draw calls/frame is common). The number of draw calls, the cost of their shaders and how fast the CPU can push them are much more important. There's little difference between 20k and 40k polygon meshes, but there's a huge one between 20 and 40 draw calls. Its creating batches that's costly, not running them.

We also had heuristics to determine an appropriate device pixel ratio without completely disabling the scaling. So for mobile devices with a ratio of 3 instead of tripling the pixel count we'd settle for a ratio in between. Text projected in 3D was just unreadable on iPhone without this and going all the way to 3x was overkill.

I did call freeze() on materials but the material/effect caches were trashed quite often and the bind() implementation is very expensive; it does quite a few hash lookups and indirections. A lot of our uniforms had to be updated every frame so we ended up separating the materials from their parameters and indexing the later with bitfields. Setting a shader was just looping through a dirty bitfield and doing a minimum of uniform uploads. This also allowed for global parameters quite easily (binary OR on material/global parameter bitfields). There was only 3 arrays of continuous memory to touch to fully setup a shader (values, locations, descriptors), and they could be reused between materials so it was very CPU-cache friendly.

Looking at the profiler most of the lost performance came from the engine, not the scene.