I looked through the source code and saw string concatenation to set the active ...

deltakosh · on Jan 3, 2017

Performance wise: http://doc.babylonjs.com/tutorials/Optimizing_your_scene

jeremiep · on Jan 3, 2017

That's to optimize the scene, not the engine itself :)

There could be some tradeoffs to those suggestions as well. For example using unindexed geometry for simple meshes can still be slower if there's many vertex attributes. Its also not uncommon to render tesselated meshes - there's a sweet spot in triangle size for mobile GPUs, at least tile-based ones like PowerVR. With VR barrel distortion applied in the vertex shader during the main pass you definitely don't want cubes made out of only 12 triangles.

Vertex count isn't that important a metric anyways; you can push a few million polygons in a few hundred draw calls to mobile GPUs every frame and still run a smooth 30FPS. Desktop is an order of magnitude higher (5k draw calls/frame is common). The number of draw calls, the cost of their shaders and how fast the CPU can push them are much more important. There's little difference between 20k and 40k polygon meshes, but there's a huge one between 20 and 40 draw calls. Its creating batches that's costly, not running them.

We also had heuristics to determine an appropriate device pixel ratio without completely disabling the scaling. So for mobile devices with a ratio of 3 instead of tripling the pixel count we'd settle for a ratio in between. Text projected in 3D was just unreadable on iPhone without this and going all the way to 3x was overkill.

I did call freeze() on materials but the material/effect caches were trashed quite often and the bind() implementation is very expensive; it does quite a few hash lookups and indirections. A lot of our uniforms had to be updated every frame so we ended up separating the materials from their parameters and indexing the later with bitfields. Setting a shader was just looping through a dirty bitfield and doing a minimum of uniform uploads. This also allowed for global parameters quite easily (binary OR on material/global parameter bitfields). There was only 3 arrays of continuous memory to touch to fully setup a shader (values, locations, descriptors), and they could be reused between materials so it was very CPU-cache friendly.

Looking at the profiler most of the lost performance came from the engine, not the scene.

deltakosh · on Jan 3, 2017

For isReady: you can set material.checkReadyOnlyOnce

adam12 · on Jan 2, 2017

What version of BJS are you referring to?

jeremiep · on Jan 3, 2017

Latest.

Here's the string concatenation to set the active texture unit. (By the way the fastest way to do it is "gl.TEXTURE0 + channel" instead of creating the string to index in the proper constant).

https://github.com/BabylonJS/Babylon.js/blob/master/src/baby...

As for the broken cache, I think it was Engine._activeTexturesCache; sometimes its indexed by texture channel other times by GL enum values (this makes the cache array explode to 30k elements and causes cache misses in half the code paths.)

From what I remember, lots of caches are needlessly trashed many times per frame.

There's also noticeable overhead to all of those "private static <constant> = value;" with public getters.

deltakosh · on Jan 3, 2017

Just pushed an update to remove the string concat. No evidence of broken cache as all references to activeTextureCache use texture channel index.

jeremiep · on Jan 3, 2017

You won't see it in the code. Run it through the debugger; the value of "channel" is sometimes the value of the GL enum rather than the index of the texture unit.

It could've been fixed since as well.