Seriously, the last time I saw something that good and detailed was an investigation into cache line miss performance back in 2002 on the Intel P4 at NetApp. That investigation figured out that the transaction rate in Intel's memory controller resulted in an 18+% reduction in throughput and up to 50% increase in latency for file system operations.
Chromium and Google Chrome has those in their dev tools too, and allow you to create graphs of your own. You output trace info in that format and it will give you interactive flame graphs. Not just for web dev mind you — could be anything you can imagine.
Here is a random blog post explaining how to use it if you are interested: https://aras-p.info/blog/2017/01/23/Chrome-Tracing-as-Profil...
And here is a video showing a guy in the demoscene making use of the visualization for tracing what is going on in his code while working on making a piece of software multithreaded. https://www.youtube.com/watch?v=zCICjD4J0nA&t=12m15s
Notice also that these flame graphs don’t have to correspond to time spent in a function. You label it yourself, meaning that you are actually limited only by your own imagination. For example, I seem to remember that in the video above Ferris used it somewhat differently than just looking at time spent in each function call.
This solution to "network transparency" is nothing else than pushing the whole screen updates directly over the wire. So why not use established protocols like VNC?
For instance, sub-buffer updates have certain constraints that make it very fast in the local case but would require a lot of data serialized over a the wire every frame, and networks do not have the bandwidth for that.
"network transparency" is an anti-goal in protocol design for the same reason "rpc that acts like a function call" is inherently flawed - the network adds a lot of complexity and different design constraints.
Games that try to sqeeze every ounce out of the hardware with tricks for extra FPS are not suitable to be serialized. I agree with that.
So, the very abstraction of "textures, models, display lists, and draw commands" is no longer what is being managed by the graphics stack. That is just one legacy abstraction which could be emulated by an application or gateway service. As people have stated elsewhere, one can continue to operate something like X Windows to keep that legacy protocol. Or, one can run a web browser to offer HTML+js+WebGL as another legacy and low-trust interface.
But, one cannot expect all application developers to limit themselves to these primitive, legacy APIs. They want and need the direct bypass that lets them put the modern GPU to good use. They are going to invest in different application frameworks and programming models that help them in this work. I hope that the core OS abstractions for sharing this hardware can be made robust enough to host a mixture of such frameworks as well as enabling multi-user sharing and virtualization of GPU hardware in server environments.
To provide transparently remote applications in this coming world, I think you have to accept that the whole application will have to run somewhere colocating the host and GPU device resources, if the original developer has focused on that local rendering model. Transparency needs to be added at the input/output layer where you can put the application's virtual window or virtual full-screen video output through a pipe to a different screen that the application doesn't really know or care about.
If you purposely design graphics devices you can make many simplifications and optimizations because you can abstract all tasks as drawing primitives. That will make serialization very easy.
The core of the GPU is really computational data transforms on arrays of data. But there is a whole spectrum to these computational methods rather than just a few discrete modes. This is where application-specific code is now supplied to define the small bits of work as well as to redefine the entire pipeline, e.g. of a multi-pass renderer. The differences between "transforms and lighting", "texturing and shading", "z-buffering and blending", or even "ray-casting and ray-tracing" are really more in the intent of the application programmer than in actual hardware. The core hardware features are really to support different data types/precisions, SIMD vs MIMD parallelism, fused operations for common computational idioms, and some and memory systems to balance the hardware for certain presumed workloads.
> So why not use established protocols like VNC?
Serialized GLX was introduced by SGI before direct rendering was even possible. It is what the creators of OpenGL originally envisioned (Graphics terminals connected to servers). If anything DRI that came afterwards was the hack.
Serialized GLX doesn't make sense in every context, I agree with that. But it is great to have the option. X11 offers that option, Wayland does not. Of course you could write your own proprietary client server architecture on top of the Wayland protocol. But why reinvent the wheel?
Not whole screen. Whole windows.
You can use VNC, but it won't integrate well with other windows.
The server should be sending a ticket, an encryption of the serial number, to the client, and expecting that back. It should be salted by the client id.
Being able to export a DMA-BUF is necessary anyway for multi-GPU setups.
EDIT: fixed a s/GPU/CPU/ typo
And there is not reason to use HTTPS for Wayland instead of a lower level protocol.
No, it's not; SSH’s transport layer component, as I understand it, provides functionality loosely comparable to TLS, but SSH does not rely on, assume, use, or incorporate TLS in any way.