
Weep for Graphics Programming - samps
http://adriansampson.net/blog/opengl.html
======
heavenlyhash
Upvoted simply because -- notwithstanding all of the entirely valid complaints
-- this is actually the short, sweet introduction to GL that I would've loved
when I first picked it up.

All of these things took me some serious time to learn. Partly because of my
sheer incredulity at the stringliness of it all. Finding good GL tutorials is
hard, after all: "surely", I thought, "these are just bad examples, and I
should keep looking for a better way".

A quick, frank throwdown of "this is the way it is" like this would've gotten
me over that hump much faster. Even if it's crushing, I'm filing this link
away for "must read" when someone asks me where to get started in GL
programming.

~~~
pjc50
There are surprisingly many systems where the tutorials have to start with
"First, paste this thousand lines of boilerplate startup code. Then you can
add 'hello, world' at the bottom." FPGA programming is another example.

~~~
Impossible
I consistently read complaints about boilerplate code in low level graphics
APIs, but I'm never clear about what people want from a low boilerplate
graphics API. I'm strongly against low level graphics APIs sacrificing
performance or control in order to appeal to beginners or casual users, but
from the comments on Hackernews it seems like there is a need for something
more accessible. Elsewhere in this thread someone said that drawing a quad
should be a one liner, this assumes a ton of opinionated default state, and
what makes sense might vary based on your use case and the power of your
hardware (2D quads for UI? 3D quads to draw a heightfield? light or unlit?
what BRDF? shadows? etc.)

To me this low boilerplate API implies something closer to Processing,
Openframeworks, THREE.js or Cinder, but I rarely see those recommended as
options in OpenGL threads like this. Is this a marketing problem? Is it
because programmers are taught that real programmers use mostly raw OpenGL if
they want to take advantage of the graphics pipeline? Is it because HN users
are curious and this is a learning exercise? I'm asking because I'd like to
look into writing a library, framework or set of tutorials that appeals to
programmers that are interested in graphics programming, find modern OpenGL
too verbose or intimidating and whose needs are not met by an existing
library, framework or game engine. This might end up being an educational
resource that helps people unfamiliar with graphics or game technology
understand their options based on their use case. I imagine that many of the
people that think they need to learn OpenGL would be completely happy with
Unity3D, for example.

~~~
angersock
_" I imagine that many of the people that think they need to learn OpenGL
would be completely happy with Unity3D, for example."_

This isn't quite my experience. Unity3D can put a _lot_ of bullshit between
you and a simple update loop for testing out an idea spike.

Many times, you really just want a "poll events/update state/draw shit" loop,
bereft of anything else.

OpenGL 1.x and GLUT were actually pretty much _perfect_ for this use case--the
fixed-function pipeline made splatting up things really easy to understand. As
hardware moved away from that model, and as people kept wanting to bolt on
more features, life got too complicated, and the only solution was to say
"Okay, screw it, you all write the shaders you want".

Your point about "what makes sense" is a good one--people don't seem to grasp
that in order to have the functionality they might want in a graphics API at
the performance they think they need, they need to have a toolset that lets
them build large pipelines.

When you're building what is in nature a big refinery, you shouldn't complain
about boilerplate. At that scale, _it 's not_.

~~~
Impossible
I agree that Unity3D is overkill for many use cases, although its not many
steps to get something that looks roughly like a fixed function OpenGL 1.x and
GLUT application working for quick, code only prototypes, so I'm curious what
you think is a _lot_ of bullshit. I find that Processing has less boilerplate
and more functionality than oldschool OpenGL + GLUT but it gets dismissed by a
lot of people because its seen as more of a learning or non-coder environment,
even if it's usually sufficient for many quick graphics prototypes.

This is why I asked if its more of a marketing\perception problem, or maybe a
workflow problem, than a real problem.

Maybe the answer is to build a portable environment that completely emulates
fixed function OpenGL from the 90s and can build all the NeHe tutorials,
although that sounds really regressive to me.

~~~
vitd
Here are things that made learning OpenGL extremely painful for me:

\- I came from doing graphics on the CPU where you were often handed a buffer
of ARGB pixels and did whatever the heck you wanted to them. You could call OS
or library routines to draw simple 2D shapes, text, or images, or you could
write directly to the buffer if you were generating a pattern or writing a ray
tracer. Understanding textures, framebuffers, 3D polygons etc. was really
strange. (What do you mean I have to draw geometry if I just want to display a
JPEG? WTF?)

\- Understanding the state machine and having to hook up so much crap for
simple operations. I can't tell you how many times I wrote, or even copied
existing working code into a program, and it resulted in nothing on the
screen, and no errors. It was almost always because I either hadn't enabled
some state or hadn't bound some thing to some particular location. And there's
no way to debug that other than stumbling around and asking vague questions
online because you don't have any idea what's going on, unless you're already
an expert and know what to look for.

\- The complete lack of modern language features in the API, such as using
enums instead of #defines. This means you can do stuff like call
glBegin(GL_LINE) and if you don't check glGetError() (another stupid pattern!)
you'll never realize it's supposed to be glBegin(GL_LINES), but you just won't
see anything rendered.

\- Having 20 different versions of each function for every combination of data
type and size. (e.g. glColor3b, glColor3f, glColor4f, glColor4i, etc., etc.)

The entire API is horrible. I'm sure it's all related to how SGI machines
worked in the early 90s and how graphics hardware evolved since then, but it
made the whole thing shitty to work with. We have a solution now, which is to
use one of the new APIs. Vulkan honestly doesn't look much better. I've only
dipped my toes into Metal, but it looks like it has solved some of these
issues.

I almost wonder if libraries that are more specialized would be useful? Unity
seems as hard to learn as OpenGL (and if you want to understand why things are
going wrong, you probably have to understand OpenGL anyway). Something
specific to image processing might be easier to understand for people doing
that, for example. I'm sure there are other niches that would benefit from
libraries that hide the complexity of OpenGL (or whichever library you build
it on top of).

~~~
khedoros
Use SDL or SFML. They both let you write RGBA into a buffer and throw it onto
the screen. I think they both handle hardware stretching and scaling, too.

> Having 20 different versions of each function for every combination of data
> type and size.

Welcome to C land.

> I'm sure it's all related to how SGI machines worked in the early 90s and
> how graphics hardware evolved since then

Fixed pipeline would be, I suppose. The programmable pipeline is what you get
when you have GL 1.x users reinvent the API. It's great for graphics experts
and a barrier for everyone else.

------
kvark
In Rust ecosystem, we've been experimenting with different ways to make CPU-
GPU interaction safer when it comes to graphics abstractions.

In gfx-rs [1] one defines a Pipeline State Object with a macro, and then all
the input/output data and resources are just fields in a regular Rust struct
[2]. So assignment pretty much works as one expects. All the compatibility
checks are done at run (init) time, when the shader programs are
linked/reflected.

In vulkano [3] the Rust structures are generated at compile time by analyzing
your SPIR-V code.

    
    
      [1] https://github.com/gfx-rs/gfx
      [2] https://github.com/gfx-rs/gfx/blob/2c00b52568e5e7da3df227d415eab9f55feba5a9/examples/shadow/main.rs#L314
      [3] https://github.com/tomaka/vulkano

~~~
vvanders
I've also been really impressed with glium. I don't think I've ever had as
nice of a blank editor -> quad on the screen experience in a long while. At
least since the days of fixed-function.

I say this as someone who's written ~2 commercial renderers from some various
degrees from scratch. Normally it's 1-2 days to just setup all the various
infra for handling shaders/uniforms/etc.

~~~
vvanders
To expand on this a tad, here's the intro code for glium, getting a triangle
on the screen is just one page of text:

[https://github.com/tomaka/glium/blob/master/examples/tutoria...](https://github.com/tomaka/glium/blob/master/examples/tutorial-02.rs)

That includes: Window creation and event handling, fragment shader, vertex
shader, varying declaration and binding. Uniforms are pretty trivial on top of
this(also done via macros so there's no raw strings).

The whole tutorial is really solid:
[http://tomaka.github.io/glium/book/tuto-01-getting-
started.h...](http://tomaka.github.io/glium/book/tuto-01-getting-started.html)

Next time I'm starting up a GL project it's going to be hard for me to argue
C/C++ given how nice some of the semantics around Rust's OpenGL libraries are.

~~~
continuational
But still, _just_ one page of code? Ideally, drawing a quad should be a one-
liner.

~~~
vvanders
If you want to use a higher-level framework then one-liner might make sense.

For OpenGL you want control over state changes, uniforms and a slew of other
things. Otherwise your performance/control will suffer significantly.

Look at the things I listed there, that's the bare minimum you need to do the
above, and fitting that in one page is pretty impressive.

~~~
kbutler
OpenGL is set up to optimize for all those things, instead of optimizing for
the learner/simple case.

It's definitely a valid approach, but it means for simple cases, you're copy &
pasting a lot of boilerplate for performance, control, and a slew of other
things you don't need.

You have to learn/paste a lot before you get a little reward.

------
Athas
I'm not sure why bytecode is supposed to be significantly better than storing
the shaders as strings. Sure, you get rid of a complex parser and can more
easily use it as a compiler target, but the core problem remains: the compiler
can still not reason across the device boundary. The CPU compiler does not
understand what happens on the GPU, and the GPU compiler has no idea what the
CPU is going to do do it.

Although I'm not a big fan of CUDA, it does have an advantage in that it
combines both worlds in a single language. This _does_ permit optimisations
that cross devices boundaries, but I have no idea whether the CUDA compiler
does any of this.

Apologies for tooting my own horn, buI have been working on a language that
can optimise on both sides of the CPU-GPU-boundary: [http://futhark-
lang.org/](http://futhark-lang.org/) (Although it is more high-level and less
graphics-oriented than what I think the author is looking for.)

~~~
corysama
The issue is that every driver implementation ever ships with a slightly
different implementation of that complex parser. That means (every mobile
device)X(every OS update) and (every PC ad-in card)X(every driver update) all
have the potential to misinterpret/reject your shaders in slightly different
ways.

Case in point: I recently learned that "#line 0" is a spec violation. A single
OS revision of a single Android device refused to compile it. This is after
shipping that line in every shader in multiple games for multiple years to
millions of happy customers.

Apparently, Unreal Engine's boot process involves compiling a series of test
shaders to determine which common compiler bugs the game needs to work around
for this particular run.

~~~
sillysaurus3
_The issue is that every driver implementation ever ships with a slightly
different implementation of that complex parser. That means (every mobile
device)X(every OS update) and (every PC ad-in card)X(every driver update) all
have the potential to misinterpret /reject your shaders in slightly different
ways._

This doesn't have anything to do with shaders as strings. It's been true since
the beginning of shader programming, whether bytecode shaders or otherwise.

~~~
ufo
His point is that shaders-as-strings have a larger surface area for
misinterpretation than shaders-as-bytecode.

------
zamalek
I spent a lot of time making toy engines with XNA and carried much of its
lessons to my toy engines in C++.

 _If your asset can be built, then your asset should be built._ Content
pipelines.

> Shaders are Strings

If you're using Vulkan, compile your shaders to SPIR-V alongside your project.
If you're using OpenGL you're mostly out of luck - but there's still no reason
to inline the shaders even with the most rudimentary content manager. Possibly
do a syntax pass while building your project.

> Stringly Typed Binding Boilerplate

If you build your assets first then you can generate the boilerplate code with
strong bindings (grabbing positions etc. in the ctor, publicly exposed
getters/setters). I prototyped this in XNA but never really made a full
solution.

Graphics APIs are zero-assumption APIs - they don't care how you retrieve
assets. Either write a framework or use an opinionated off-the-shelf engine.
Keeping it this abstract allows for any sort of imaginable scenario. For
example: in my XNA prototype, the effects (shaders) were deeply aware of my
camera system. In the presence of strong bindings I wouldn't have been able to
do that.

Changing the way that the APIs work (to something not heterogeneous) would
require Khronos, Microsoft and Apple to define the interaction interface for
_every single language that can currently use those APIs_.

It's loosely defined like this for a reason - these are low-level APIs. It's
up to you to make a stronger binding for your language.

~~~
zanny
You can technically make shader binaries with OpenGL, the problem is Khronos
never standardized a bytecode for it so you are basically shipping three
binaries, one for each GPU vendors own intermediary (if they even have one, I
only know AMD does):

[https://www.opengl.org/sdk/docs/man/html/glShaderBinary.xhtm...](https://www.opengl.org/sdk/docs/man/html/glShaderBinary.xhtml)

~~~
robbies
Noooo, please don't do that. That is not the intention of that API at all!

The API was built so that the cost of compilation by the driver would only be
done once (likely at install, or at driver update time). So the idea is that
when you are installing your PC game, it is compiling all the shaders against
the current driver rev and hardware in your machine. Then when you play the
game, it just loads the locally compiled binaries so your game doesn't hitch
at random times when the driver compiler decides to kick off a compilation
pass.

However, the binary is very sensitive to the driver version (as it owns the
compiler), and the hardware itself. The binaries will differ across vendors,
vendor architectures, and even sub-revisions inside of a single architecture.

Bonus: This API was provided as part of the GL_ARB_ES2_compatibility
extension, in order to ease porting from OpenGL ES 2 to (big) OpenGL. OpenGL
has a different API/extension, glProgramBinary, that is preferred for OpenGL.
glShaderBinary might not even really work on any desktop implementation.

Probably the best source for this, though this information is
relatively...obfuscated:
[https://community.amd.com/thread/160440](https://community.amd.com/thread/160440)

------
junke
Young fool, only now, at the end, do you understand... code is data.

CEPL demos:
[https://www.youtube.com/watch?v=2Z4GfOUWEuA&list=PL2VAYZE_4w...](https://www.youtube.com/watch?v=2Z4GfOUWEuA&list=PL2VAYZE_4wRKKr5pJzfYD1w4tKCXARs5y)

I get it, there are many good reasons to stick to C++ if you want to ship a
game today.

~~~
leetNightshade
Even with C++, a modern flexible C++ game engine should be mostly data driven.
For flexibility and being able to have faster iteration times, changing data
at run-time and seeing results almost instantly is necessary, with the level
of quality people demand from games these days.

Granted that's not the case across the board, but people have been pushing for
data driven C++ game engines for years.

~~~
xyience
As far as I'm aware the state of the art tooling at some studios is far
greater than any demos from the Lisp world, though I don't think it's beyond a
much smaller number of Lisp programmers to accomplish. e.g. one simple feature
being able to take any pixel on the screen and trace back through what calls
produced it. Studios also usually support live changes as you edit code/data
(often times the code is in a simpler language like Lua when it doesn't need
to be core code, I hope no one pretends C++ is anywhere close to as flexible
at changing live as Lisp...), and of course all sorts of world / level / map
editors with varying degrees of integration with your asset pipeline -- having
a "game"/simulation whose purpose is to build your actual game is often nicer
to work in than the code such a tool outputs directly, even after a macro
treatment. I don't care if object x is at vector v and y is at vector u, I
just want this character standing here and that character standing there, and
maybe have a tool to quickly snap them to the same plane. (Incidentally this
uncaring about absolute values in the coordinate system at least up front is
what turned me off from PovRay on Linux...)

Then there's a rich variety of third party tooling for specific things you
become aware of when you start to have studio levels of money to buy licenses,
and those typically don't have Lisp wrappers.

Still, CEPL is pretty sweet, and in the related bin it's neat to see things
like Flappybird-from-the-ground-up-in-the-browser through ClojureScript and
Figwheel. For a beginner who doesn't have access to the best tooling in the
industry, or even an amateur who just wants to make a simple game, there are a
lot of good (and good enough) options. Even behemoths IDEs that force you to
relaunch a scene to load your newly compiled changes aren't so bad, given that
because the language you're working in probably starts with C it's also a
given that state management is harder than with more functional-programming-
friendly languages so it's much easier to reason about your game scene when
you're always starting with a fresh state. Raw OpenGL with C++ should probably
be discouraged unless the person is explicitly aiming to become a pro at a big
studio (in which case they might be better off learning DirectX)...

~~~
junke
Thanks for your comment, this is very instructive.

~~~
xyience
Thanks for all your Lisp comments. As an extra reference the tool I was
thinking of for the "debug this pixel" feature was the venerable PIX:
[http://www.garagegames.com/community/blogs/view/14251](http://www.garagegames.com/community/blogs/view/14251)
It was a great standalone tool, and its more modern successor is better
integrated with visual studio:
[https://channel9.msdn.com/Events/GDC/GDC-2015/Solve-the-
Toug...](https://channel9.msdn.com/Events/GDC/GDC-2015/Solve-the-Tough-
Graphics-Problems-with-your-Game-Using-DirectX-Tools) On the OpenGL side of
things the situation has become a lot better than it was in the past with
things like [https://apitrace.github.io/](https://apitrace.github.io/)

------
rjmunro
I don't think the representation of shaders as strings vs bytecode is a
problem at the low level. Strings are slightly inefficient, but you could just
pretend that they are bytecode, where the bytes are all restricted to the
ASCII range.

What is required is a language where the compiler can analyse the code and
decide what to do on the CPU and what to do on the GPU as part of it's
optimisation. It could even emit different versions for different CPU / GPU
combinations or JIT compile on the target machine once it knows what the
actual capabilities are (maybe in some cases it makes sense to do more on the
CPU if the GPU is less powerful). You could possibly also run more or even all
code on the CPU to enable easier debugging.

The language could be defined at high enough level that it can be statically
analysed and verified, which I think would answer the criticisms in the
article.

------
haxiomic
The haxe-language project 'hxsl' takes a pretty good stab at improving the
situation. With hxsl you write your GPU and CPU code in the same language
(haxe) and it generates shader strings (in GLSL or AGAL) at compiletime along
with CPU code for whatever platform you're targeting.

There's not a lot of documentation around it at the moment but the author
explains a little about it in this video
[https://youtu.be/-WeGME_T9Ew?t=31m49s](https://youtu.be/-WeGME_T9Ew?t=31m49s)

Source code on github
[https://github.com/ncannasse/heaps/tree/master/hxsl](https://github.com/ncannasse/heaps/tree/master/hxsl)

~~~
ordinary
Sounds like hxsl doesn't actually solve most of the problems described in the
article, but just obscures them in a layer of indirection. If you're
generating shader _strings_ at compile-time, you still have to compile them at
runtime, for instance, just like before, and the GPU/CPU gap is only
marginally reduced.

It might solve some other problems with GPU programming, though?

~~~
jdonaldson
I wouldn't call it obscuring, you can see exactly what is getting passed to
the card just the same as if you had written it manually.

The key is that you're dealing with the shader language by using a language,
rather than strings. You can use static type information. You can refactor
more easily. You can create abstractions over the minor variations in card
behaviors more easily.

While hxsl doesn't directly address debugging, it greatly improves your
ability to create code examples to isolate the problem.

hxsl does about as much as you could hope for with shader languages, without
requiring changes to the cards themselves (e.g. using byte code, which hasn't
been standardized yet, and I wouldn't hold my breath for).

In some sense, the arrangement between app logic and shader logic is still a
bit like client-server serialization. You still have to pass the data and the
commands in a payload, since the two platforms typically do not share memory
or have useful common architecture.

~~~
ordinary
Yeah, you're right. I shouldn't have used that word.

------
Negative1
I think his conclusion is valid; compilers can now handle register assignment
and other boilerplate pretty well (Metal does this superbly and SPIR-V seems
very good).

My only qualm is that he treats these aspects that he deems outdated as
unnecessary, which is just not true. For a long time there was no other
alternative. Introducing High Level languages for programmable shading was a
huge deal and it actually decreased a lot of complexity. In reality it
simplified a lot of stuff that was quite difficult before GLSL/HLSL came
along.

He seems to be making some kind of rallying call for change but the next
generation is already here. We'll have to keep supporting that old approach
for a while longer but the problem really has been solved (to some extent).

Also, old person rant: "Back in my day, we only had 8 register combiners and 4
blend states. And we liked it!"

------
kirillkh
When writing my first OpenGL code, I was stunned by the amount of boilerplate
required. It is so bad it reminds me of COBOL. No one in their right mind
would tolerate that monstrosity in a normal CPU program (except, possibly,
some hardcore C++ fans).

I think in order to solve this, we need three things: 1) Intermediate
representation for compiled GPU code. Bytecode mentioned in the article sounds
like it. 2) Cross-platform GPU programming support in our programming
languages' standard libraries. 3) Compiler plugins and DSLs that output the IR
and link to the library.

~~~
datenwolf
> When writing my first OpenGL code, I was stunned by the amount of
> boilerplate required.

Huh, actually the amount of boilerplate required for getting a triangle to the
screen with OpenGL is minimal.

\- Create a context \- fill a buffer with the triangle vertex data \- load
buffer to OpenGL \- create vertex shader implementing vertex to position
transformation \- create vertex shader implementing pixel coloirization \-
issue draw commands

~~~
xigency
And yet you've just described a decent amount of boilerplate.

The fact that future iterations of OpenGL and OpenGL ES have nuked the fixed
function pipeline is sort of annoying. I'm not saying there should be more
codepaths in the implementation, but if there were a default compiled fixed
function vertex shader and fragment shader, it would be helpful.

What you've just described for rendering one primitive includes compiling two
shaders, linking arguments, and specifying byte arrays before issuing
commands, when this task used to be as simple as:

    
    
        glBegin(GL_TRIANGLES);
            glColor3f(1.0f, 0.0f, 0.0f);
            glVertex3f(0.0f, 1.0f, 1.0f);
            glColor3f(0.0f, 1.0f, 0.0f);
            glVertex3f(-0.866f, -0.5f, 1.0f);
            glColor3f(0.0f, 0.0f, 1.0f);
            glVertex3f(0.866f, -0.5f, 1.0f);
        glEnd();
    

Being able to draw a triangle in eight lines of code is much more encouraging
to someone wanting to make a 3D game for mobile or the web than an estimated
50 lines, including two shaders they had to borrow from a textbook.

I would understand if immediate mode were gone, which might be better
regardless. Still, things should be as simple as a glDrawArrays call without
compiling shaders.

Not to mention, when working as a solo game developer, shaders feel like an
orphan between artists and programmers.

~~~
datenwolf
> when this task used to be as simple as:

a) Immediate Mode has been out of fashion for since before the turn of the
century. Even the good old NVidia GeForce2 of 1998 already had OpenGL
extensions that are essentially vertex buffer objects; you can in fact use the
GL_ARB_vertex_buffer_object extension on a GeForce2. If you don't believe me,
I'm currently headed to my mother's and I have a old box stored in her cellar
with a GeForce2 in it; I could give interested parties a SSH into it (you just
have to live with a old Gentoo installation that I didn't touch for some 10
years or so).

b) There's a certain complexity cutoff where the whole shader loading
boilerplate plus shader code is less, than what it takes to setup an
equivalent fixed function pipeline setup. Making an educated estimation I'd
say, that the break even point is, when enabling a register combiner on two
textures, one a cube map, the other a normal+shininess map, setting DOT3
combining of normal with vertex secondary color (for normal mapping),
directing the normal map into the cubemap texture coordinate lookup and fading
between reflection and dull shader based on shininess. Sounds complex? Indeed
it is, but keep in mind that this kind of thing was already possible with the
GeForce3 (at least, I think it may even work on the GeForce2, but after over
14 years since writing the last time a program targeting it, I'm a bit shaky
on the details).

Anyway, to set up this kind of register combining you'd need between 4 to 5
calls of glTexEnvi per texture. Another 3 glTexEnvi calls for setting up the
secondary color muxing mode and another 3 calls for setting the final stage
fading mode. Add to that the hours of twisting your brain to figure trying to
wrap your mind around all the relevant state switches in the register
combiners.

With shaders you simply write down what you want.

~~~
xigency
Yes, I'm aware of how both immediate mode and vertex arrays work in OpenGL.

The problem, and this is even getting away from what the article addresses, is
that a green developer approaching the application must learn an entirely new
language (shaders) before drawing a single primitive. In mattbee's example,
what is the beginner supposed to think of the line "#version 410" or "in vec3
vp" which is not even C code?

I'm also not saying that all of the legacy functions from the fixed-function
pipeline should be maintained, with all of their specific parameters, but that
those attributes should be bound with a default shader that supports all of
the same capabilities. So, using GLSL attribute accessors with each platform
having the same default shader (and default parameter names). Then, wrap the
IM example in arrays, call a few binds, and make a draw call. That's much
simpler than writing two embedded programs inside your first program. I think
it would have been very beneficial if OpenGL ES had been rolled out with a
design like this.

Maybe it's because graphics programming is so feature focused that it creates
a problem. For me, the issue is the obstacles between getting shapes on the
screen, especially in contexts like WebGL or mobile.

~~~
angersock
The beginner needs to rise to the task.

GLSL isn't complex _at all_ , least of all because its problem domain is
basically pure math.

It should take less than an afternoon to understand how to write a shader if
they have any familiarity with math.

If they don't know math, _they shouldn 't be doing shader programming until
they learn_.

~~~
xigency
Writing a fixed function shader is to learning graphics programming what
writing a Makefile is to learning C++. I'm not saying it's not important, or
not important in the long run, but what introductory programming book would
start out showing how to use a linker instead of how to write a program? In
this case things are compounded by the fact that there is no linker, which
means the programmer has to shuffle around string programs at runtime.

I agree, though, that shader writing should help with learning 3D math. Saying
GLSL is not complex is a bit of an overstatement, though. The language is
simple, but the fact is that GLSL doesn't behave like C with certain
statements, and not knowing everything will make things complicated.

You've also described a chicken-and-egg problem. Can't make a 3D app till you
learn GLSL, can't use GLSL until you make a 3D app.

It also feels absurd to be writing any static language code wrapped in a
string, in JavaScript, to be sent to a GPU and then compiled. The user doesn't
have newlines, let alone syntax highlighting. Sure, most WebGL developers will
just use Three.js but then they aren't really learning anything, anyway, and
we still have these patchwork solutions.

~~~
angersock
I disagree that it's inappropriate for beginners, and comparing it to a build
system is incorrect.

Shaders are a step in the pipeline. Beginners should be learning the pipeline:

1\. Geometry is instantiated and draw parameters set.

2\. Geometry vertices are transformed by a vertex shader.

3\. Transformed vertices in primitive are used to sample across its geometry,
producing pixel fragments.

4\. Pixel fragments are transformed by a fragment shader.

5\. Transformed fragments are written to a framebuffer.

That's the whole thing, and GLSL neatly handles 2-4. Step 1 is a kinda pain in
the ass, but not terrible. Step 5 is usually simple, but no worse than 1. We
shouldn't be protecting users from dealing with the (simple) facts of life in
a graphics pipeline.

As for Javacsript not having multi-line strings...that's hardly the fault of
WebGL, and honestly the simplicity of passing around strings means that as the
language gets more interesting support (for interpolations and whatever) the
API will be unaffected.

~~~
xigency
It's a stretch to say that GLSL shaders are broken, since they are functional
and every OpenGL application uses them, but it's hard to say that they are
implemented in a logical way as they are now, especially at a topical glance.

------
kelvin0
Gets much worse than that, for anyone having shipped AAA games on multiple
consoles: 1) Ubershader program files (#ifdef's pepper generously everywhere)
2) Each console has it's own quirks and gotchas that you try to abstract 3)
CPU/GPU interface is a nightmare and debugging is as painful as it gets.

~~~
fsloth
"CPU/GPU interface is a nightmare and debugging is as painful as it gets."

I suppose this is one of the biggest main paint points. Not a verbose API, but
the fact that the drivers are so complex that they often break the expectation
that you can trust the underlying implementation if it's provided by a vendor.

No linguistic overlay is going to fix the underlying drivers provided by
graphics card providers.

Before figuring out how to make the interface elegant I claim it should first
become robust. If it's not robust then wrapping it in an elegant interface
will only make things worse.

~~~
tehrei
Vulkan and DirectX 12 are attempting to mitigate the driver problem by
removing as much of the driver as possible.

I've always sort of suspected AMD's Mantle initiative, which grew into the
above two technologies, was at least partly motivated by their inability to
ship a driver with the same performance as nVidia's...

~~~
yoklov
Sadly, it looks very much like both Vulkan and DX12 are going the way of their
predecessors. By which I mean tons of extensions, and nobody really thinks
that they'll be able to avoid working around buggy games the same way previous
drivers had to, do they?

Maybe I'm being too cynical, but these seem like issues inherent in the
situation, and not something caused by particular quirks in gl/dx.

~~~
Guvante
> nobody really thinks that they'll be able to avoid working around buggy
> games the same way previous drivers had to, do they?

It depends, the running understanding was that when told to do X the driver
actually did Y so the game did Z which the driver transformed to X.
Unfortunately a latter update caused Z to become W so a game specific fix was
added to make it "correctly" do X.

If Vulkan/DX12 can avoid the shell games then the whack-a-mole could in theory
be eliminated.

> Maybe I'm being too cynical, but these seem like issues inherent in the
> situation, and not something caused by particular quirks in gl/dx.

It actually might be given the huge amount of underlying work the drivers were
doing to maintain the fasade they were presenting. By removing much of the
fasade it could help.

------
Arnsaste
He listed some of the downsides of the OpenGL approach to shaders, but he
forgot to address the (IMHO) really big advantage: You can use OpenGL in every
programming language that is able to call C functions, which is basically
every programming language.

Somebody in this thread mentioned CUDA, which is great, but it also has the
downside that you practically have to use C++ on the CPU side. You also can
use e.g. python, but when you do you are back to compiling CUDA code at
runtime.

Sure, you could argue that we can simply create a new programming language for
applications that use the graphics card (Or use C++ like CUDA). The problem
with this is that few people would use it just because its a bit easier to do
CPU-GPU communication. An there is a second much larger problem: Different
graphics API vendors would create different programming languages which makes
it much harder for a application to support multiple graphics API as a lot of
games do today with DirectX and OpenGL.

Perhaps there is another better solution, but I can't see it right now.

------
chongli
What we really need is a project to do for GPUs what RISC-V[0] is about to do
for CPUs. It's high time we as a society broke away from the stranglehold
proprietary companies such as Nvidia and Intel have over us. It's time for a
completely open computing platform for all of us to use to the fullest
advantage of society.

[0]
[https://en.m.wikipedia.org/wiki/RISC-V](https://en.m.wikipedia.org/wiki/RISC-V)

~~~
eropple
That sounds great. Are you going to pay for it?

That isn't an attempt at snark--RISC-V is interesting. It isn't _competitive_.
GPU technology is very interesting. This, without dump trucks of cash, won't
be competitive, either.

There is a question of _cui bono_ here, and you need to answer that to make
what you advocate make sense.

~~~
chongli
_dump trucks of cash_

How about $30,000[0]? Seriously, it's not the 90s anymore. The cost of making
a chip is not insane and it's going to go down every year. As Moore's Law
slows to a halt, we enter the golden age of computer architecture[1].

[0]
[http://www.eetimes.com/author.asp?doc_id=1327291](http://www.eetimes.com/author.asp?doc_id=1327291)

[1] [https://www.youtube.com/watch?v=mD-
njD2QKN0](https://www.youtube.com/watch?v=mD-njD2QKN0)

~~~
eropple
The cost of making a chip anybody wants to use is not $30,000, though. nVidia
gets my money because their GPU technology is good enough to, at a reasonable
price point, do absolutely anything I'm likely to encounter in the next 3-5
years. You're fighting against significant economies of scale as well as a
significant technical deficit.

Don't get me wrong: it would be awesome if this existed. But I think that,
like RISC-V, it's predicated on a heavy hitter getting something significant
out of doing it. I'm not sure who will do that without the ROI in play.

~~~
chongli
I agree that you're not soon likely to replace your shiny $400 Nvidia graphics
card with something designed by a bunch of students and faculty at Berkeley.
However, what about your phone? If you watch Prof Patterson's talk, you'll be
astounded at what they've been able to do with modern tools and a small team
of students. Now think of some larger, slightly-more funded teams such as
LowRISC[0] and you begin to see the possibilities. Raspberry Pi proved there
is demand for this sort of thing.

[0] [http://www.lowrisc.org/](http://www.lowrisc.org/)

~~~
eropple
"It's open" is not and has never been an effective selling point for consumer
products, and it seems that you're effectively trying to sell these things to
consumers. "It's open _and is better_ is," and in my experience that rarely
exists (again, for consumer products). If it is as easy as you posit (and this
isn't my field, I know a little about CPUs and GPUs but mobile isn't really
relevant to me), then what's stopping a company with significant talent and
cash reserves can appropriate the same approaches and lap you with their
closed tech? I don't see how you're not going to _have to_ offer an inferior
good at superior-good prices.

Technically speaking, if anything, phones strike me as a _harder_ place to
enter, due to the limitations on power budget and the significant premium on
marginal performance. I'm not saying it can't be built--I'm saying, who would
both want it _and_ not benefit from business-as-usual, closed development?

~~~
chongli
The equation is the same one as for software: why did Google use the Linux
kernel instead of developing their own kernel for Android? Because Linux is
available and free to use; duplicating the effort of Linux would not have been
worth it at all.

The RISC-V architecture is superior to all of the major commercial
architectures out there. It's also free to use and has a significant amount of
headroom for private extension. So what's stopping a company like Apple from
using it for their next phone? They already have the necessary human resources
to design an SoC. The main barrier is software recompilation. Well, the RISC-V
target is specifically designed to be easy to recompile to: it's not in any
way a radical departure from the common ideas out there.

------
exDM69
The issues pointed out by OP are mostly gone in the new Vulkan API, which is
the new graphics API from Khronos. Shaders are shipped in SPIR-V bytecode
binary format and the binding of resources to shader inputs is more memory-
oriented.

Time to go learn a new API :)

~~~
panic
The article does mention Vulkan:

 _Direct3D and the next generation of graphics APIs—Mantle, Metal, and
Vulkan—clean up some of the mess by using a bytecode to ship shaders instead
of raw source code. But pre-compiling shader programs to an IR doesn’t solve
the fundamental problem: the interface between the CPU and GPU code is
needlessly dynamic, so you can’t reason statically about the whole,
heterogeneous program._

------
joeld42
CUDA sounds like the kind of "heterogeneous" model the author is suggesting.
Unfortunately it has it's own set of problems.

I'm not really sure what kind of API the author wants. Sure, it's annoying
that data in GPU-land and CPU-land are difficult to get to work together. But
that's not the API's fault, it's because they are physically very far removed
from each other. They don't share memory (and if they did you'd still have to
lock and manage it). You could make an API that made them seem more
transparent to the programmer, but then you're back to the GL 1.0 mess where
you have zero control and the driver does dumb things at random times because
it doesn't know anything about the program that is executing.

~~~
vvanders
If you're on mobile GPU chipsets the latter is actually more common(CPU/GPU
being in the same memory space).

That said it doesn't really change a whole lot, each platform has it's own
unique performance characteristics and I don't think you'll ever correctly
represent that at the API level. APIs tend to move much slower than the
hardware platforms that power them.

------
csabahruska
In lambdacube 3d you can program the CPU/GPU (OpenGL+GLSL) in one typed
language. ([http://lambdacube3d.com/](http://lambdacube3d.com/))

------
_yosefk
OpenCL was moving to LLVM IR away from programs-as-source-code-strings at some
point. Still, you ought to have a JIT step if you want to be able to ship the
same binary to multiple systems with GPUs not compatible at the binary level.
And BTW shipping LLVM IR on the CPU would make things better in terms of
supporting CPUs with incompatible ISAs... whether that ever overtakes native
binaries is a question (it did to an extent, of course, with the JVM and what-
not.) Of course you could move the JIT off the client device completely and do
it at the App Store - prepare a bunch of binaries for all the devices out
there - but still, the developers will have shipped IR.

Once you get to shipping IR instead of strings, which is nice I guess in that
it should make initialization somewhat faster and the GPU drivers somewhat
smaller, I'm not sure what you're going to get from being able to treat a
CPU/GPU program as a single whole. Typically the stuff running on the GPU or
any other sort of accelerator does not call back into the CPU - these are
"leaf functions", optimized as a separate program, pretty much. I guess it'd
be nice to be able to optimize a given call to such a function automatically
("here the CPU passes a constant so let's constant-fold all this stuff over
here.") The same effect can be achieved today by creating a GPU wrapper code
passing the constants and having the CPU call that, avoiding this doesn't
sound like a huge deal. Other than that, what big improvement opportunities am
I missing due to the CPU compiler not caring what the GPU compiler is doing
and what code the GPU is running?

(Not a rhetorical question, I expect to be missing something; I work on
accelerator design but I haven't ever thought very deeply about an integrated
CPU+accelerator compiler, except for a high-level language producing code for
both the CPU and some accelerators - but this is a different situation, and
there you don't care what say OpenGL or OpenCL do, you generate both and
you're the compiler and you use whatever opportunities for program analysis
that your higher level language gives you. Here I think the point was that we
miss opportunities due to not having a single compiler analyzing our C/C++ CPU
code and our shader/OpenCL/... code as a single whole - it's this area where I
don't see a lot of missed opportunities and asking where I'm wrong.)

~~~
vilya
In CUDA you can write your device kernel as a templated function and the
compiler will specialise it based on how it's called by the host. In some
cases this can result in big speed-ups while keeping the code simple and
maintainable. You could get the same speed from specialising the code by hand,
or with a separate code generation tool; but having that ability built in to
the compiler makes it very easy to use.

It's also handy for the compiler to be able to check for errors in shader
invocations. Looking up parameters by name and setting them dynamically adds a
whole class of potential runtime errors that don't exist in straight c++
programs.

~~~
_yosefk
func<a,b>(c,d) isn't faster than func(a,b,c,d) if the compiler sees that a,b
are constants and the call to func is in the same file as its definition; the
only thing CPU+GPU code adds here relatively to CPU-only code is, perhaps you
need to write a wrapper along the lines of func_ab(c,d) { func(a,b,c,d) }, a
bit ugly but the code is still pretty simple and maintainable. (And unlike the
case with templates at least you can make sense of the symbol table etc. etc.)

As to runtime errors because of misspelling a parameter name, this is going to
be fixed the first time you run the program and you live happily ever after; I
don't mind these errors in Python very much and I don't mind it in C programs
using strings to refer to variable names in whatever context that may happen.

Overall the amount of "dynamism" in CPU/GPU programming IMO does not result in
wasting almost any optimization opportunities nor does it make the program
significantly harder to get right, but I'm prepared to change my mind given
counterexamples (certainly in Python it's really easy to demonstrate missed
optimization opportunities relatively to C due to its dynamism... The amount
of dynamism in CPU/GPU programming however is IMO trivial and hence the issues
resulting from it are also rather trivial.)

------
xigency
It seems unlikely that any of this will change, because control is in the
hands of a relatively few powerful organizations who honestly have no interest
in making life easier for software engineers working on graphics programming;
and those engineers who do have to put up with it have long since given up or
chalk it up to experience; and those developers would probably resist any
changes now simply because change requires more work and learning how to do
things all over again.

The hardware developers (AMD/nVidia/Intel) have an interest in not changing
their device drivers, the software vendors (OpenGL and DirectX) have little
interest in redeveloping technology, and the software developers with the most
capital, game engine developers, have already found workarounds and hire
enough engineers to plug leaks in their lifeboats. The state of tools for game
developers is so shoddy as well that trying to retrofit language compilers and
shader compilers to work together seems like a drawn out task.

It's sort of a David and Goliath situation if you think you can change the
graphics programming landscape on your own. Plus, we all know how poorly these
standards are developed over time.

~~~
zanny
I don't think its that bad now. We are somewhat limited by SPIR-V, but we now
have the capabilities to write our own graphics APIs and shader languages that
can compile to Vulkan /OpenCL and SPIR-V and go on from there. It is nothing
close to as bad as the goliath of OpenGL hegemony.

~~~
xigency
I might not be as qualified to talk about this since I am not actually working
on the leading edge, but I know from past experience that development hasn't
been pleasant.

------
vanderZwan
Would Halide[0] be what he's looking for? It's a DSL, sure, but one that is
embedded in C++, so you can at least program your whole pipeline in place,
giving you a good grasp of the flow of the program(s).

[0] [http://halide-lang.org/](http://halide-lang.org/)

~~~
mwilcox
I believe Halide is limited to 2-dimensional (Image Processing) use cases?
Either way, it's designed to make it easier to find optimal parallel
implementations rather than write code that can fluidly execute across a CPU
and GPU.

~~~
vanderZwan
Well the thing is that it's targeting "x86/SSE, ARM v7/NEON, CUDA, Native
Client, and OpenCL."

Anyway, as far as I know it is limited to 2D raster images so far, yes. Since
it's part research language with some very interesting features I do hope that
it gets extended to more general cases, and at least make it easier to work
with 3D use cases (image volumes and such) too.

------
ixtli
Very well written. I get the sense that the reason we're all here now is that
historically the people who use these APIs as client programmers have been
somewhat bad at explaining what they want.

~~~
marssaxman
Rather: it's because the companies building the hardware have jealously
guarded their machine languages, treating their drivers as a source of
competitive advantage. We'd have fixed the problems ourselves years ago if we
could, but GPU manufacturers like to keep it all locked up tight and secret,
so we can't.

------
IvanK_net
I did not find anything interesting or useful in this article.

Author just says, that OpenGL and "shaders as strings" are bad, without
explaining why. Then author speaks about some mysterious "Heterogeneity"
without specifying what it means.

"We need programming models that let us write one program that spans multiple
execution contexts" \- what does it mean? How should that model look like, how
should it be different? It is like saying "cars with 4 wheels are bad, cars
with different number of wheels would be better".

~~~
viraptor
I thought the problem is pretty much spelled out in that post:

> [...] doesn’t solve the fundamental problem: the interface between the CPU
> and GPU code is needlessly dynamic, so you can’t reason statically about the
> whole, heterogeneous program.

It's the same thing as raw SQL vs abstraction (could be ORM, but not
necessarily). Not only are your assumptions not verified (is there table X,
will I receive the columns I expect, etc.), you cannot tell if the constructs
are even valid until runtime (is "SELECT %s from %s" valid?). Abstractions let
you do things like `table.where(abc=123)` which is guaranteed to be a valid
construct and can be verified against schema at compile time. It can still
fail at runtime, but has much fewer failure modes and is clearer when reading.

~~~
rwmj
10 years ago I wrote a thing called PG'OCaml which integrates OCaml and SQL in
a type safe way. You write code like:

    
    
        PGSQL(dbh) "insert into employees (name, salary, email)
                    values ($name, $salary, $?email)"
    

and it (at compile time) creates the correct PREPARE+EXECUTE pair and checks
types across the languages. (It's not obvious but it is also free of SQL
injections.)

For example if your OCaml 'salary' variable was a string but the column is an
int then the above program wouldn't compile. If you change the type of the db
column after compiling the program then you get a runtime error instead.

Still around although now maintained by a group of authors:
[http://pgocaml.forge.ocamlcore.org/](http://pgocaml.forge.ocamlcore.org/)
(git source:
[https://github.com/darioteixeira/pgocaml](https://github.com/darioteixeira/pgocaml))

It's only possible because OCaml has real macros. It wouldn't be easy to do
this in C/C++ (I guess? - maybe with an LLVM-based pre-processing parser or
something)

~~~
fauigerzigerk
This used to be how a lot of SQL code was written. I think it's about as old
as SQL itself. See Oracle's Pro*C for instance. It's a C precompiler that
generates API calls and optionally does type checking. There's even an
ANSI/ISO standard for embedded SQL.

~~~
chiph
Yep. I used to use the Sybase SQL precompiler in OS/2 with Watcom C++.
Nullable columns were a pain because you had to define a separate boolean
variable to hold the null status for each one.

------
DonHopkins
DreemGL [1] [2] addresses these problems by compiling JavaScript code into
shaders.

"DreemGL is an open-source multi-screen prototyping framework for mediated
environments, with a visual editor and shader styling for webGL and DALi
runtimes written in JavaScript." [3]

[1]
[https://github.com/dreemproject/dreemgl](https://github.com/dreemproject/dreemgl)

[2]
[http://docs.dreemproject.org/docs/api/index.html#!/guide/dre...](http://docs.dreemproject.org/docs/api/index.html#!/guide/dreem_in_10_part1)

[3] [https://dreemproject.org/](https://dreemproject.org/)

~~~
DonHopkins
Here's a talk about the nuts and bolts of how DreemGL compiles JavaScript into
shaders:

[https://youtu.be/L-apMRjlBOM?t=12m6s](https://youtu.be/L-apMRjlBOM?t=12m6s)

------
rawnlq
One interesting library for building shaders is:
[http://acko.net/blog/shadergraph-2/](http://acko.net/blog/shadergraph-2/) or
[https://github.com/unconed/shadergraph](https://github.com/unconed/shadergraph)

Made by the guy behind MathBox.

------
robbies
I think the author has good intentions (don't we all), but I don't think he
understands enough about graphics programming to make some of the
proclamations/requests in the article. Not that I blame him...it's hard to get
an understanding of GPU programming outside the scope of a game dev or IHV.

> To define an object’s appearance in a 3D scene, real-time graphics
> applications use shaders... Eh, the shaders are just a part of the GPU
> pipeline that transforms your vertices, textures, and shaders into something
> interesting on the screen.

This is already oversimplifying what GPUs are trying to do for the base case
of graphics.

> the interface between the CPU and GPU code is needlessly dynamic, so you
> can’t reason statically about the whole, heterogeneous program.

Ok, so what is the proposed solution here? You have a variety of IHVs (NV,
AMD, Intel, ImgTec, ARM, Samsung, Qualcomm, etc). Each vendor has a set of
active architectures that each have their own ISA. And even then, there are
sub-archs that likely require different accommodations in ISA generation
depending on the sub-rev.

So in the author's view of just the shader code, you already have the large
problem of unifying the varieties of ISA under some...homogenous ISA, like an
x86. That's a non-trivial problem. What's the motivation here? How will you
get vendors to comply?

I think right now, SPIR-V, OpenCL, and CUDA aren't doing a _bad_ job in trying
to create a common programming model where you can target multiple hardware
revs with some intermediate representation, but until all the vendors team up
and agree on an ISA, I don't see how to fix this.

 _On top of that_ , that isn't even really the only important bit of
programming that happens on GPUs. GPUs primarily operate on command buffers,
of which, there is nary a mention of in the article. So even if we address the
shader cores inside the GPU, what about a common model for programming command
buffers directly? Good luck getting vendors to unify on that.
Vulkan/DX12/Metal are good (even great) efforts in exposing the command buffer
model. You couldn't even _see_ this stuff in OpenGL and pre-DX12 (though there
were display lists and deferred contexts, which kinda exposed the command
buffer programming model).

> To use those parameters, the host program’s first step is to look up
> location handles for each variable...

Ok, I don't blame the author for complaining about this model, but this is an
introductory complaint. You can bind shader inputs to 'registers', which map
to API slots. So with some planning, you don't need to query location handles
if you specify them in the shader in advance. I think this functionality
existed in Shader Model 1.0, though I can't find any old example code for it
(2001?).

That being said, I certainly don't blame the author for not knowing this, as I
think this is a common mistake made by introductory graphics programmers,
because the educational resources are poor. I don't think I ever learned it in
school...only in the industry was this exposed to me, to my great joy. Though
I am certain many smarter engineers figured it out unprompted.

> OpenGL’s programming model espouses the simplistic view that heterogeneous
> software should comprise multiple, loosely coupled, independent programs.

Eh, I don't think I want a common programming model across CPUs and GPUs. They
are fundamentally different machines, and I don't think it makes sense to try
to lump them together. I don't think we just assume that we can use the same
programming methodologies for a single vs multi-threaded program. I know that
plenty tried, but I thought the best method of addressing the differences was
education and tools. I'd advocate that most effective way that GPU programming
will become more accessible will be education and tools. I have hope that the
current architecture of the modern 'explicit' API will facilitate that
movement.

~~~
xigency
"Eh, I don't think I want a common programming model across CPUs and GPUs.
They are fundamentally different machines, and I don't think it makes sense to
try to lump them together."

Well, except for APU's, which I think is an unstated target for this
discussion. Or on-board graphics chips.

In these cases, there is almost certainly extra work being done, regardless of
your programming philosophy.

~~~
robbies
I'd actually argue that not that much changes, except APUs mean you don't have
to worry about different memory pools, and moving stuff across the bus. That's
a whole other problem, as far as streaming data management. If we're just
talking about GPU programming, in my experience, even APUs still feel like two
separate processors instead of some...combined thing.

------
onetimePete
Coding for a GPU is basically embedded programming - and the jit step is
needed to avoid platform adaption costs.

------
scrumper
Slightly OT, but the literate presentation was nice. Any particular tool or
technique you used for that?

Thanks

~~~
tyrust
From the Makefile [0] it looks like they are using Docco [1].

I agree that it's a very nice presentation of docs+code.

[0] -
[https://github.com/sampsyo/tinygl/blob/master/Makefile#L25](https://github.com/sampsyo/tinygl/blob/master/Makefile#L25)

[1] - [https://jashkenas.github.io/docco/](https://jashkenas.github.io/docco/)

------
cosmicexplorer
I think this what OpenACC
([https://en.wikipedia.org/wiki/OpenACC](https://en.wikipedia.org/wiki/OpenACC))
was created to do; its support is pretty iffy right now, though.

~~~
QuantumRoar
OpenACC is for GPU programming what OpenMP is to parallel programming. It is
intended for general purpose graphics programming, i.e. numerical computations
on your GPU, not for replacing anything like OpenGL.

------
tdsamardzhiev
Can somebody with Direct3D-experience give opinion if it's any better?

~~~
zamalek
Shaders can be built in advance. The raw interface remains dynamic/stringly
accessed. It's mostly the same.

------
sklogic
Source code or IR or whatever else would always be needed, I'm afraid, with a
late (and, likely, unpredictable) compilation. Passing an image of a different
format may trigger a dynamic shader (or OpenCL kernel) recompilation on some
GPUs, for example.

