Hacker News new | comments | ask | show | jobs | submit login
Weep for Graphics Programming (adriansampson.net)
367 points by samps on May 3, 2016 | hide | past | web | favorite | 154 comments

Upvoted simply because -- notwithstanding all of the entirely valid complaints -- this is actually the short, sweet introduction to GL that I would've loved when I first picked it up.

All of these things took me some serious time to learn. Partly because of my sheer incredulity at the stringliness of it all. Finding good GL tutorials is hard, after all: "surely", I thought, "these are just bad examples, and I should keep looking for a better way".

A quick, frank throwdown of "this is the way it is" like this would've gotten me over that hump much faster. Even if it's crushing, I'm filing this link away for "must read" when someone asks me where to get started in GL programming.

There are surprisingly many systems where the tutorials have to start with "First, paste this thousand lines of boilerplate startup code. Then you can add 'hello, world' at the bottom." FPGA programming is another example.

I consistently read complaints about boilerplate code in low level graphics APIs, but I'm never clear about what people want from a low boilerplate graphics API. I'm strongly against low level graphics APIs sacrificing performance or control in order to appeal to beginners or casual users, but from the comments on Hackernews it seems like there is a need for something more accessible. Elsewhere in this thread someone said that drawing a quad should be a one liner, this assumes a ton of opinionated default state, and what makes sense might vary based on your use case and the power of your hardware (2D quads for UI? 3D quads to draw a heightfield? light or unlit? what BRDF? shadows? etc.)

To me this low boilerplate API implies something closer to Processing, Openframeworks, THREE.js or Cinder, but I rarely see those recommended as options in OpenGL threads like this. Is this a marketing problem? Is it because programmers are taught that real programmers use mostly raw OpenGL if they want to take advantage of the graphics pipeline? Is it because HN users are curious and this is a learning exercise? I'm asking because I'd like to look into writing a library, framework or set of tutorials that appeals to programmers that are interested in graphics programming, find modern OpenGL too verbose or intimidating and whose needs are not met by an existing library, framework or game engine. This might end up being an educational resource that helps people unfamiliar with graphics or game technology understand their options based on their use case. I imagine that many of the people that think they need to learn OpenGL would be completely happy with Unity3D, for example.

The boilerplate is boilerplate because there are a few exceptional cases in scope where you might do something slightly differently. Since you can't completely abstract it out, it becomes boilerplate of some form. Either you're using boilerplate to set up a configuration object to feed into an initializer, or you're using initialization boilerplate.

It turns out that with graphics APIs, you actually have to layer many, many abstractions on top before you get into useful territory, which is why I'd say that 99% of use cases are better off using a high level library/engine even if all you want to do is draw a few things.

I've written my own graphics engines on top of OpenGL, and in my experience the set of people "whose needs are not met by an existing library, framework or game engine" is actually much slimmer than that set of people might think. Sure, Unity or Unreal or even Ogre3D might be overkill and might not give you all of the nuances you need, but they actually make some excellent abstractions that are well worth the tradeoffs and workarounds.

In most cases, if you need to do "custom stuff" that a "mid level" engine like Ogre3D offers, either your project is so sophisticated that it's worth redoing boilerplate, or you're asking for more than you really need, or you just want the fun of building your own engine, because it's really a lot of fun. What Unity and Unreal add on top of that are even more abstractions and tools for an entire production pipeline of artists, modelers, and programmers. But they do a great job of covering the vast majority of use cases. It's like if you told me that your application asks for things jQuery can't do... I'd question what kind of situation you're in that jQuery does things incompatibly.

This post though, about shaders, has a point regardless of OpenGL boilerplate. Shaders suck everywhere, even in engines. You're probably writing your own shaders that do 90% of what most shaders do and 10% of specifically what you needed, and you're probably trying to work through all of the complexities of "uniform" variables and how other bits and pieces of the rendering pipeline fit together. That part always sucks regardless of OpenGL, D3D, Unity, Unreal, etc.

"I imagine that many of the people that think they need to learn OpenGL would be completely happy with Unity3D, for example."

This isn't quite my experience. Unity3D can put a lot of bullshit between you and a simple update loop for testing out an idea spike.

Many times, you really just want a "poll events/update state/draw shit" loop, bereft of anything else.

OpenGL 1.x and GLUT were actually pretty much perfect for this use case--the fixed-function pipeline made splatting up things really easy to understand. As hardware moved away from that model, and as people kept wanting to bolt on more features, life got too complicated, and the only solution was to say "Okay, screw it, you all write the shaders you want".

Your point about "what makes sense" is a good one--people don't seem to grasp that in order to have the functionality they might want in a graphics API at the performance they think they need, they need to have a toolset that lets them build large pipelines.

When you're building what is in nature a big refinery, you shouldn't complain about boilerplate. At that scale, it's not.

I agree that Unity3D is overkill for many use cases, although its not many steps to get something that looks roughly like a fixed function OpenGL 1.x and GLUT application working for quick, code only prototypes, so I'm curious what you think is a lot of bullshit. I find that Processing has less boilerplate and more functionality than oldschool OpenGL + GLUT but it gets dismissed by a lot of people because its seen as more of a learning or non-coder environment, even if it's usually sufficient for many quick graphics prototypes.

This is why I asked if its more of a marketing\perception problem, or maybe a workflow problem, than a real problem.

Maybe the answer is to build a portable environment that completely emulates fixed function OpenGL from the 90s and can build all the NeHe tutorials, although that sounds really regressive to me.

Here are things that made learning OpenGL extremely painful for me:

- I came from doing graphics on the CPU where you were often handed a buffer of ARGB pixels and did whatever the heck you wanted to them. You could call OS or library routines to draw simple 2D shapes, text, or images, or you could write directly to the buffer if you were generating a pattern or writing a ray tracer. Understanding textures, framebuffers, 3D polygons etc. was really strange. (What do you mean I have to draw geometry if I just want to display a JPEG? WTF?)

- Understanding the state machine and having to hook up so much crap for simple operations. I can't tell you how many times I wrote, or even copied existing working code into a program, and it resulted in nothing on the screen, and no errors. It was almost always because I either hadn't enabled some state or hadn't bound some thing to some particular location. And there's no way to debug that other than stumbling around and asking vague questions online because you don't have any idea what's going on, unless you're already an expert and know what to look for.

- The complete lack of modern language features in the API, such as using enums instead of #defines. This means you can do stuff like call glBegin(GL_LINE) and if you don't check glGetError() (another stupid pattern!) you'll never realize it's supposed to be glBegin(GL_LINES), but you just won't see anything rendered.

- Having 20 different versions of each function for every combination of data type and size. (e.g. glColor3b, glColor3f, glColor4f, glColor4i, etc., etc.)

The entire API is horrible. I'm sure it's all related to how SGI machines worked in the early 90s and how graphics hardware evolved since then, but it made the whole thing shitty to work with. We have a solution now, which is to use one of the new APIs. Vulkan honestly doesn't look much better. I've only dipped my toes into Metal, but it looks like it has solved some of these issues.

I almost wonder if libraries that are more specialized would be useful? Unity seems as hard to learn as OpenGL (and if you want to understand why things are going wrong, you probably have to understand OpenGL anyway). Something specific to image processing might be easier to understand for people doing that, for example. I'm sure there are other niches that would benefit from libraries that hide the complexity of OpenGL (or whichever library you build it on top of).

I don't mean to sound dismissive, but a lot of what you just mentioned is because you are trying to use OpenGL in a way it wasn't intended.

OGL is meant for 3D work--if you want a framebuffer, an SDL_Surface or some native GUI element is probably a better fit for manual CPU hackery. The reason you find it so uncomfortable is that you are trying to use a refinery to bake a waffle. Textures, framebuffers, and polygons are all kinda central to how 3D graphics (and thus OGL) work--they require some work to understand, but it's not wasted work, if you are doing 3D pipeline stuff.

If you're trying to do simple operations, you probably shouldn't be using OGL directly. For debugging, RenderMonkey was quite a good tool.

Lacking modern language features means keeping a clean C ABI is easier. Having #define's means that there is little work to make sure that any language can invoke different things. Using enums, or even better strongly-typed/templated arguments ala C++, would harm that portability.

The many variants on the APIs are actually quite useful under certain circumstances, and it is always pretty clear how they are used (and again, remember that this is meant for a C ABI).


I agree that a simpler environment (like Processing, mentioned above) would be helpful. That said, OGL is hardly bad for the reasons you listed.

Use SDL or SFML. They both let you write RGBA into a buffer and throw it onto the screen. I think they both handle hardware stretching and scaling, too.

> Having 20 different versions of each function for every combination of data type and size.

Welcome to C land.

> I'm sure it's all related to how SGI machines worked in the early 90s and how graphics hardware evolved since then

Fixed pipeline would be, I suppose. The programmable pipeline is what you get when you have GL 1.x users reinvent the API. It's great for graphics experts and a barrier for everyone else.

> Vulkan honestly doesn't look much better.

What? Vulkan seems to give pretty good access to hardware command buffers etc. It's pretty good API for low overhead graphics.

> I've only dipped my toes into Metal, but it looks like it has solved some of these issues.

I looked into it, and couldn't even find proper C bindings. Isn't it also only available on some OS X and iOS devices?

Something like Three.JS has some serious weaknesses: one of which is the obsessive object orientation, which makes it essentially impossible to share a program and VBO between draws. More objects -> even worse overhead.

Honestly, I think that it's necessary to expose people to something like OpenGL, because it at least somewhat communicates the real costs of operations. I don't think it's exactly a marketing problem so much as a practicality problem. If you want to ship something which Three.JS can render with decent performance, it's a good tradeoff. If your abstraction prevents you from implementing what you want, it's an unacceptable tradeoff.

FPGA programming? Surely not. If you've got a nicely supported dev board then it's simply a case of loading up a board support package, selecting some pins, and your first program can be three lines long. I will admit the toolchain can be a pain to set up but that's an entirely different problem.

My aha moment came by reading Unity's 'built in shaders', which are faily simple. I found tutorials for shaders are for rendering things far more complex than a simple sprite.

You can download just the Unity shaders via the download dropdown: https://unity3d.com/get-unity/download/archive

Those shaders give you (battle tested) base cases for really common functionality.

I learned a lot about how WebGL works by studying the "twgl" library, which is basically a thin wrapper around WebGL that automates all the boilerplate stuff. It's true to the WebGL way of doing things, and doesn't add much beyond that.

"This library's sole purpose is to make using the WebGL API less verbose."


If anyone is looking for a more in-depth 'OpenGL For Dummies' tutorial, I wholeheartedly recommend Anton's OpenGL Tutorials^. It's written in a similar style to this blog post - plain English with simple minimal examples to follow along with - but it also goes into the math behind a lot of common rendering techniques.

It's really good, and I wish that I'd known about it years ago.

^ http://antongerdelan.net/opengl/

You just gave me flashbacks to giant chunks of GL boilerplate C code. (shudders... smiles)

In Rust ecosystem, we've been experimenting with different ways to make CPU-GPU interaction safer when it comes to graphics abstractions.

In gfx-rs [1] one defines a Pipeline State Object with a macro, and then all the input/output data and resources are just fields in a regular Rust struct [2]. So assignment pretty much works as one expects. All the compatibility checks are done at run (init) time, when the shader programs are linked/reflected.

In vulkano [3] the Rust structures are generated at compile time by analyzing your SPIR-V code.

  [1] https://github.com/gfx-rs/gfx
  [2] https://github.com/gfx-rs/gfx/blob/2c00b52568e5e7da3df227d415eab9f55feba5a9/examples/shadow/main.rs#L314
  [3] https://github.com/tomaka/vulkano

I've also been really impressed with glium. I don't think I've ever had as nice of a blank editor -> quad on the screen experience in a long while. At least since the days of fixed-function.

I say this as someone who's written ~2 commercial renderers from some various degrees from scratch. Normally it's 1-2 days to just setup all the various infra for handling shaders/uniforms/etc.

To expand on this a tad, here's the intro code for glium, getting a triangle on the screen is just one page of text:


That includes: Window creation and event handling, fragment shader, vertex shader, varying declaration and binding. Uniforms are pretty trivial on top of this(also done via macros so there's no raw strings).

The whole tutorial is really solid: http://tomaka.github.io/glium/book/tuto-01-getting-started.h...

Next time I'm starting up a GL project it's going to be hard for me to argue C/C++ given how nice some of the semantics around Rust's OpenGL libraries are.

But still, just one page of code? Ideally, drawing a quad should be a one-liner.

If you want to use a higher-level framework then one-liner might make sense.

For OpenGL you want control over state changes, uniforms and a slew of other things. Otherwise your performance/control will suffer significantly.

Look at the things I listed there, that's the bare minimum you need to do the above, and fitting that in one page is pretty impressive.

OpenGL is set up to optimize for all those things, instead of optimizing for the learner/simple case.

It's definitely a valid approach, but it means for simple cases, you're copy & pasting a lot of boilerplate for performance, control, and a slew of other things you don't need.

You have to learn/paste a lot before you get a little reward.

There is also kiss3d for some higher-level rendering:


GPipe also provides a typesafe api to OpenGL, but in Haskell

[1] http://tobbebex.blogspot.se/

Is eholk no longer active in Rust-land? I remember reading his post [1] about compiling Rust for GPUs (i.e. app code + OpenCL kernels from a single source) back in 2012 and getting quite excited. Presumably generating SPIR-V wouldn't be vastly different in principle.

[1] http://blog.theincredibleholk.org/blog/2012/12/05/compiling-...

eholk was engaged with Rust as a part of an internship for his doctorate back in 2012, since then I guess he's just been occupied by other research.

I'm not sure why bytecode is supposed to be significantly better than storing the shaders as strings. Sure, you get rid of a complex parser and can more easily use it as a compiler target, but the core problem remains: the compiler can still not reason across the device boundary. The CPU compiler does not understand what happens on the GPU, and the GPU compiler has no idea what the CPU is going to do do it.

Although I'm not a big fan of CUDA, it does have an advantage in that it combines both worlds in a single language. This does permit optimisations that cross devices boundaries, but I have no idea whether the CUDA compiler does any of this.

Apologies for tooting my own horn, buI have been working on a language that can optimise on both sides of the CPU-GPU-boundary: http://futhark-lang.org/ (Although it is more high-level and less graphics-oriented than what I think the author is looking for.)

The issue is that every driver implementation ever ships with a slightly different implementation of that complex parser. That means (every mobile device)X(every OS update) and (every PC ad-in card)X(every driver update) all have the potential to misinterpret/reject your shaders in slightly different ways.

Case in point: I recently learned that "#line 0" is a spec violation. A single OS revision of a single Android device refused to compile it. This is after shipping that line in every shader in multiple games for multiple years to millions of happy customers.

Apparently, Unreal Engine's boot process involves compiling a series of test shaders to determine which common compiler bugs the game needs to work around for this particular run.

The issue is that every driver implementation ever ships with a slightly different implementation of that complex parser. That means (every mobile device)X(every OS update) and (every PC ad-in card)X(every driver update) all have the potential to misinterpret/reject your shaders in slightly different ways.

This doesn't have anything to do with shaders as strings. It's been true since the beginning of shader programming, whether bytecode shaders or otherwise.

His point is that shaders-as-strings have a larger surface area for misinterpretation than shaders-as-bytecode.

> The issue is that every driver implementation ever ships with a slightly different implementation of that complex parser. That means (every mobile device)X(every OS update) and (every PC ad-in card)X(every driver update) all have the potential to misinterpret/reject your shaders in slightly different ways.

I can confirm this happens a LOT. (At least with ESSL) In it's defense, it usually is a mistake I made, but some implementations are definitely more lenient than others.

Couldn't a bytecode format have the same problem? Probably not with respect to parsing, but the semantics could still differ.

In theory, yes. Neither system represents perfection. But, what's more important in practice is the likelihood of problems. Parsing bytecode is trivial. Thus problems with drivers inconsistently rejecting D3D shader bytecode has been relatively rare.

Parsing GLSL is quite hard. Thus, problems with drivers inconsistently rejecting GLSL text has been relatively common.

OpenGL's text requirement makes great sense in theory, but encouraged lots of problems in practice.

The software engineers who write drivers for graphics hardware have, unfortunately, an ill-earned reputation for being unable to code their way out of wet paper bags.

The fact that the GLSL standard forced every hardware vendor to vend a fully-functional compiler for a C-like language (even a simplified C-like language) is no small part of the source of that reputation.

You know what's great fun? Getting errors that only apply to desktop GLSL when trying to compile valid GLES GLSL. Can you guess how that particular vendor shortcutted implementing their ES compiler for that Android OS revision? Can you guess how likely it is that those Android customers will ever get a driver update? ;P

Can empathize with that. The problem with OpenGL is not really the API but the impossibility to really have something correct. There is a sheer difficulty of finding combinations of GPU x drivers x OS to test OpenGL. It's quite common that the drivers lie or reject something valid.

I was poking around on your site, and I took a look at your publications list. I found this paper: http://futhark-lang.org/publications/icfp16.pdf

But it has no authors! I assume it was a double-blind submission to ICFP, which required removing the author names. When posting it publicly, though, I think you should put the author names back on. You don't want your work floating around the internet without your names attached.

I think the article's author does not necessarily bemoan the lack of a comprehensive integration, but the complete lack of any non-textual interface definition.

I agree, CUDA took a step in the right direction by allowing annotated C++ source to be compiled for the device. As an HPC developer, I'm using the Kokkos C++ library https://github.com/kokkos/kokkos to switch between CUDA, OpenMP, or Serial modes from a single source code.

Furthark looks really nice, but what does the future look like for it?

(I'm currently working on a Python hpc codebase with pyopencl and would be interested in reducing some of the manual labor required for cpu-gpu coordination)

Well, I still have a year left on my PhD, and I hope (with some reason) to get funding to continue after that. But I must emphasize that the language is still a little young and raw.

I just read the paper and it looks better than most of the alternatives for GPU programming.

Besides the good reasons brought up by other posters, game devs tend to be very secretive of their shader code, so they like the bytecode as an obfuscation layer.

I spent a lot of time making toy engines with XNA and carried much of its lessons to my toy engines in C++.

If your asset can be built, then your asset should be built. Content pipelines.

> Shaders are Strings

If you're using Vulkan, compile your shaders to SPIR-V alongside your project. If you're using OpenGL you're mostly out of luck - but there's still no reason to inline the shaders even with the most rudimentary content manager. Possibly do a syntax pass while building your project.

> Stringly Typed Binding Boilerplate

If you build your assets first then you can generate the boilerplate code with strong bindings (grabbing positions etc. in the ctor, publicly exposed getters/setters). I prototyped this in XNA but never really made a full solution.

Graphics APIs are zero-assumption APIs - they don't care how you retrieve assets. Either write a framework or use an opinionated off-the-shelf engine. Keeping it this abstract allows for any sort of imaginable scenario. For example: in my XNA prototype, the effects (shaders) were deeply aware of my camera system. In the presence of strong bindings I wouldn't have been able to do that.

Changing the way that the APIs work (to something not heterogeneous) would require Khronos, Microsoft and Apple to define the interaction interface for every single language that can currently use those APIs.

It's loosely defined like this for a reason - these are low-level APIs. It's up to you to make a stronger binding for your language.

You can technically make shader binaries with OpenGL, the problem is Khronos never standardized a bytecode for it so you are basically shipping three binaries, one for each GPU vendors own intermediary (if they even have one, I only know AMD does):


Noooo, please don't do that. That is not the intention of that API at all!

The API was built so that the cost of compilation by the driver would only be done once (likely at install, or at driver update time). So the idea is that when you are installing your PC game, it is compiling all the shaders against the current driver rev and hardware in your machine. Then when you play the game, it just loads the locally compiled binaries so your game doesn't hitch at random times when the driver compiler decides to kick off a compilation pass.

However, the binary is very sensitive to the driver version (as it owns the compiler), and the hardware itself. The binaries will differ across vendors, vendor architectures, and even sub-revisions inside of a single architecture.

Bonus: This API was provided as part of the GL_ARB_ES2_compatibility extension, in order to ease porting from OpenGL ES 2 to (big) OpenGL. OpenGL has a different API/extension, glProgramBinary, that is preferred for OpenGL. glShaderBinary might not even really work on any desktop implementation.

Probably the best source for this, though this information is relatively...obfuscated: https://community.amd.com/thread/160440

The Vulkan comment is particularly relevant, and the reason that technology exists is because of OpenGL's quirks.

This whole article is whinging against things that people have known for a while and have been working to fix. Sometimes it's not easy.

Young fool, only now, at the end, do you understand... code is data.

CEPL demos: https://www.youtube.com/watch?v=2Z4GfOUWEuA&list=PL2VAYZE_4w...

I get it, there are many good reasons to stick to C++ if you want to ship a game today.

Even with C++, a modern flexible C++ game engine should be mostly data driven. For flexibility and being able to have faster iteration times, changing data at run-time and seeing results almost instantly is necessary, with the level of quality people demand from games these days.

Granted that's not the case across the board, but people have been pushing for data driven C++ game engines for years.

As far as I'm aware the state of the art tooling at some studios is far greater than any demos from the Lisp world, though I don't think it's beyond a much smaller number of Lisp programmers to accomplish. e.g. one simple feature being able to take any pixel on the screen and trace back through what calls produced it. Studios also usually support live changes as you edit code/data (often times the code is in a simpler language like Lua when it doesn't need to be core code, I hope no one pretends C++ is anywhere close to as flexible at changing live as Lisp...), and of course all sorts of world / level / map editors with varying degrees of integration with your asset pipeline -- having a "game"/simulation whose purpose is to build your actual game is often nicer to work in than the code such a tool outputs directly, even after a macro treatment. I don't care if object x is at vector v and y is at vector u, I just want this character standing here and that character standing there, and maybe have a tool to quickly snap them to the same plane. (Incidentally this uncaring about absolute values in the coordinate system at least up front is what turned me off from PovRay on Linux...)

Then there's a rich variety of third party tooling for specific things you become aware of when you start to have studio levels of money to buy licenses, and those typically don't have Lisp wrappers.

Still, CEPL is pretty sweet, and in the related bin it's neat to see things like Flappybird-from-the-ground-up-in-the-browser through ClojureScript and Figwheel. For a beginner who doesn't have access to the best tooling in the industry, or even an amateur who just wants to make a simple game, there are a lot of good (and good enough) options. Even behemoths IDEs that force you to relaunch a scene to load your newly compiled changes aren't so bad, given that because the language you're working in probably starts with C it's also a given that state management is harder than with more functional-programming-friendly languages so it's much easier to reason about your game scene when you're always starting with a fresh state. Raw OpenGL with C++ should probably be discouraged unless the person is explicitly aiming to become a pro at a big studio (in which case they might be better off learning DirectX)...

Thanks for your comment, this is very instructive.

Thanks for all your Lisp comments. As an extra reference the tool I was thinking of for the "debug this pixel" feature was the venerable PIX: http://www.garagegames.com/community/blogs/view/14251 It was a great standalone tool, and its more modern successor is better integrated with visual studio: https://channel9.msdn.com/Events/GDC/GDC-2015/Solve-the-Toug... On the OpenGL side of things the situation has become a lot better than it was in the past with things like https://apitrace.github.io/

I do like CEPL, but feel compelled to point out that it creates GLSL strings from its shader microlanguage.

I like CEPL a lot too, but I agree, it doesn't address the OP's pain points.

I am not enough of a gpu guy to know, but I thought that was what Harlan [1] was trying to address. It had used miniKanren originally as a region and type inferencer.

  [1]  https://github.com/eholk/harlan

CEPL does not try to fix the underlying problem, but provide a decent way of working with it. The Varjo compiler seems already quite evolved and sure, it could be better. On the other hand, this seems to be a single-person project (http://techsnuffle.com/).

Equivalent types: https://www.youtube.com/watch?v=jSR9eXLXH-A

CEPL and Varjo updates: https://www.youtube.com/watch?v=hBHDdYayOkE&index=12&list=PL...

CEPL/Varjo creator here. Thanks for the shout-out but my compiler is very basic, it has potential though and I'm looking forward to people making higher level approaches/languages that use CEPL & Varjo under the hood.

Regarding the authors issues, the APIs are very ugly, especially in GL where you have two apis smashed together (pre v3.1 and post). I'm not qualified to comment of DirectX but I feel in the GL case there is a fairly nice API buried in the madness, and that with a suitably flexible language you can making working with it rather pleasant without resorting to tools that totally separate you from the lower levels.

Vulkan is no panacea (nor dx12) because with control comes the associated complexity. Vulkan is going to be awesome for folks making engines, languages or those who just want absolute control. But Khronos are very open that these Vulkan & SPIRV are not replacements for GL/GLSL. The increased variety of APIs that will be possible for people to make WILL be exciting though and that is worthy of celebration, as will be mixing and matching the above.

Re the "impoverished GPU-specific programming languages" comment. Limiting the scope of a language can give you valuable places to optimize and one of the primary concerns for this hardware is speed. As an exercise/strawman add pointers to an imaginary gpu language and work out how to keep the performance you currently have. This is going to become more evident as languages come out that compile to SPIRV and we get to see behind the curtain at just how hard it is to make code run fast against the various hardware approaches to render that exist in GPUs (for a case in point see 'tiled rendering').

Im unsure around the 'interface too flexible' issue as GPU resources are still valuable enough that being able to unload code is still useful, and at that point you have to rely that the pipeline you load follows the interface you expect it to, as you would with any shared library. Even when you take away all the api ugliness you are still left with the fact you are talking to a separate computer with separate concerns, I'd rather embrace this in a way that feels natural to your host language than hide this separation which makes certain issues (e.g synchronization) harder to deal with.

Regardless, this is still a valuable post and an issue worth getting fired up about, it's always exciting to see people making different ways of leveraging these awesome pieces of hardware

Speaking to the DX APIs. 9.1c was pretty much the same kind of mess gl is now. Two APIs, the fixed function and programmable pipelines mashed together to create something totally gross.

10 restructured the whole interface significantly. The API is actually quite nice these days. With the caveat that the API overall is very performance conscious so GPU considerations show up in the API idioms and thus in client code.

> Young fool, only now, at the end, do you understand... code is data.

And clearly storing C in strings is a saner way to handle code than as Lisp stored in S-expressions. Who could possibly doubt it?

Who doesn't like having to escape double-quotes and newlines or the simple yet powerful aesthetics of programs entirely styled in font-lock-string-face?

I don't think the representation of shaders as strings vs bytecode is a problem at the low level. Strings are slightly inefficient, but you could just pretend that they are bytecode, where the bytes are all restricted to the ASCII range.

What is required is a language where the compiler can analyse the code and decide what to do on the CPU and what to do on the GPU as part of it's optimisation. It could even emit different versions for different CPU / GPU combinations or JIT compile on the target machine once it knows what the actual capabilities are (maybe in some cases it makes sense to do more on the CPU if the GPU is less powerful). You could possibly also run more or even all code on the CPU to enable easier debugging.

The language could be defined at high enough level that it can be statically analysed and verified, which I think would answer the criticisms in the article.

The haxe-language project 'hxsl' takes a pretty good stab at improving the situation. With hxsl you write your GPU and CPU code in the same language (haxe) and it generates shader strings (in GLSL or AGAL) at compiletime along with CPU code for whatever platform you're targeting.

There's not a lot of documentation around it at the moment but the author explains a little about it in this video https://youtu.be/-WeGME_T9Ew?t=31m49s

Source code on github https://github.com/ncannasse/heaps/tree/master/hxsl

Sounds like hxsl doesn't actually solve most of the problems described in the article, but just obscures them in a layer of indirection. If you're generating shader strings at compile-time, you still have to compile them at runtime, for instance, just like before, and the GPU/CPU gap is only marginally reduced.

It might solve some other problems with GPU programming, though?

I wouldn't call it obscuring, you can see exactly what is getting passed to the card just the same as if you had written it manually.

The key is that you're dealing with the shader language by using a language, rather than strings. You can use static type information. You can refactor more easily. You can create abstractions over the minor variations in card behaviors more easily.

While hxsl doesn't directly address debugging, it greatly improves your ability to create code examples to isolate the problem.

hxsl does about as much as you could hope for with shader languages, without requiring changes to the cards themselves (e.g. using byte code, which hasn't been standardized yet, and I wouldn't hold my breath for).

In some sense, the arrangement between app logic and shader logic is still a bit like client-server serialization. You still have to pass the data and the commands in a payload, since the two platforms typically do not share memory or have useful common architecture.

Yeah, you're right. I shouldn't have used that word.

That's correct, it makes the situation easier for the programmer rather than improving the pipeline directly, however there are benefits for the added layer of abstraction in terms of flexibility and optimization. When your target platform supports bytecode shaders you can serve those instead, but the developer doesn't have to think about it - the compiler can just choose the best option available. Since the compiler now knows how the shader integrates with the rest of the code it can start merging shaders where optimal and improving dead code elimination.

Ideally this sort of system would be part of the standard graphics pipeline but looks like we could be waiting a while to see that.

Isn't this kinda normal with all the query-builders for DBs? Why isn't it used more excessively with shaders? Because they're Turing-complete?

I doubt it. SQL (with enough extensions) can also be Turing complete (or pretty close to it).

I think his conclusion is valid; compilers can now handle register assignment and other boilerplate pretty well (Metal does this superbly and SPIR-V seems very good).

My only qualm is that he treats these aspects that he deems outdated as unnecessary, which is just not true. For a long time there was no other alternative. Introducing High Level languages for programmable shading was a huge deal and it actually decreased a lot of complexity. In reality it simplified a lot of stuff that was quite difficult before GLSL/HLSL came along.

He seems to be making some kind of rallying call for change but the next generation is already here. We'll have to keep supporting that old approach for a while longer but the problem really has been solved (to some extent).

Also, old person rant: "Back in my day, we only had 8 register combiners and 4 blend states. And we liked it!"

When writing my first OpenGL code, I was stunned by the amount of boilerplate required. It is so bad it reminds me of COBOL. No one in their right mind would tolerate that monstrosity in a normal CPU program (except, possibly, some hardcore C++ fans).

I think in order to solve this, we need three things: 1) Intermediate representation for compiled GPU code. Bytecode mentioned in the article sounds like it. 2) Cross-platform GPU programming support in our programming languages' standard libraries. 3) Compiler plugins and DSLs that output the IR and link to the library.

> When writing my first OpenGL code, I was stunned by the amount of boilerplate required.

Huh, actually the amount of boilerplate required for getting a triangle to the screen with OpenGL is minimal.

- Create a context - fill a buffer with the triangle vertex data - load buffer to OpenGL - create vertex shader implementing vertex to position transformation - create vertex shader implementing pixel coloirization - issue draw commands

And yet you've just described a decent amount of boilerplate.

The fact that future iterations of OpenGL and OpenGL ES have nuked the fixed function pipeline is sort of annoying. I'm not saying there should be more codepaths in the implementation, but if there were a default compiled fixed function vertex shader and fragment shader, it would be helpful.

What you've just described for rendering one primitive includes compiling two shaders, linking arguments, and specifying byte arrays before issuing commands, when this task used to be as simple as:

        glColor3f(1.0f, 0.0f, 0.0f);
        glVertex3f(0.0f, 1.0f, 1.0f);
        glColor3f(0.0f, 1.0f, 0.0f);
        glVertex3f(-0.866f, -0.5f, 1.0f);
        glColor3f(0.0f, 0.0f, 1.0f);
        glVertex3f(0.866f, -0.5f, 1.0f);
Being able to draw a triangle in eight lines of code is much more encouraging to someone wanting to make a 3D game for mobile or the web than an estimated 50 lines, including two shaders they had to borrow from a textbook.

I would understand if immediate mode were gone, which might be better regardless. Still, things should be as simple as a glDrawArrays call without compiling shaders.

Not to mention, when working as a solo game developer, shaders feel like an orphan between artists and programmers.

> when this task used to be as simple as:

a) Immediate Mode has been out of fashion for since before the turn of the century. Even the good old NVidia GeForce2 of 1998 already had OpenGL extensions that are essentially vertex buffer objects; you can in fact use the GL_ARB_vertex_buffer_object extension on a GeForce2. If you don't believe me, I'm currently headed to my mother's and I have a old box stored in her cellar with a GeForce2 in it; I could give interested parties a SSH into it (you just have to live with a old Gentoo installation that I didn't touch for some 10 years or so).

b) There's a certain complexity cutoff where the whole shader loading boilerplate plus shader code is less, than what it takes to setup an equivalent fixed function pipeline setup. Making an educated estimation I'd say, that the break even point is, when enabling a register combiner on two textures, one a cube map, the other a normal+shininess map, setting DOT3 combining of normal with vertex secondary color (for normal mapping), directing the normal map into the cubemap texture coordinate lookup and fading between reflection and dull shader based on shininess. Sounds complex? Indeed it is, but keep in mind that this kind of thing was already possible with the GeForce3 (at least, I think it may even work on the GeForce2, but after over 14 years since writing the last time a program targeting it, I'm a bit shaky on the details).

Anyway, to set up this kind of register combining you'd need between 4 to 5 calls of glTexEnvi per texture. Another 3 glTexEnvi calls for setting up the secondary color muxing mode and another 3 calls for setting the final stage fading mode. Add to that the hours of twisting your brain to figure trying to wrap your mind around all the relevant state switches in the register combiners.

With shaders you simply write down what you want.

Yes, I'm aware of how both immediate mode and vertex arrays work in OpenGL.

The problem, and this is even getting away from what the article addresses, is that a green developer approaching the application must learn an entirely new language (shaders) before drawing a single primitive. In mattbee's example, what is the beginner supposed to think of the line "#version 410" or "in vec3 vp" which is not even C code?

I'm also not saying that all of the legacy functions from the fixed-function pipeline should be maintained, with all of their specific parameters, but that those attributes should be bound with a default shader that supports all of the same capabilities. So, using GLSL attribute accessors with each platform having the same default shader (and default parameter names). Then, wrap the IM example in arrays, call a few binds, and make a draw call. That's much simpler than writing two embedded programs inside your first program. I think it would have been very beneficial if OpenGL ES had been rolled out with a design like this.

Maybe it's because graphics programming is so feature focused that it creates a problem. For me, the issue is the obstacles between getting shapes on the screen, especially in contexts like WebGL or mobile.

The beginner needs to rise to the task.

GLSL isn't complex at all, least of all because its problem domain is basically pure math.

It should take less than an afternoon to understand how to write a shader if they have any familiarity with math.

If they don't know math, they shouldn't be doing shader programming until they learn.

Writing a fixed function shader is to learning graphics programming what writing a Makefile is to learning C++. I'm not saying it's not important, or not important in the long run, but what introductory programming book would start out showing how to use a linker instead of how to write a program? In this case things are compounded by the fact that there is no linker, which means the programmer has to shuffle around string programs at runtime.

I agree, though, that shader writing should help with learning 3D math. Saying GLSL is not complex is a bit of an overstatement, though. The language is simple, but the fact is that GLSL doesn't behave like C with certain statements, and not knowing everything will make things complicated.

You've also described a chicken-and-egg problem. Can't make a 3D app till you learn GLSL, can't use GLSL until you make a 3D app.

It also feels absurd to be writing any static language code wrapped in a string, in JavaScript, to be sent to a GPU and then compiled. The user doesn't have newlines, let alone syntax highlighting. Sure, most WebGL developers will just use Three.js but then they aren't really learning anything, anyway, and we still have these patchwork solutions.

I disagree that it's inappropriate for beginners, and comparing it to a build system is incorrect.

Shaders are a step in the pipeline. Beginners should be learning the pipeline:

1. Geometry is instantiated and draw parameters set.

2. Geometry vertices are transformed by a vertex shader.

3. Transformed vertices in primitive are used to sample across its geometry, producing pixel fragments.

4. Pixel fragments are transformed by a fragment shader.

5. Transformed fragments are written to a framebuffer.

That's the whole thing, and GLSL neatly handles 2-4. Step 1 is a kinda pain in the ass, but not terrible. Step 5 is usually simple, but no worse than 1. We shouldn't be protecting users from dealing with the (simple) facts of life in a graphics pipeline.

As for Javacsript not having multi-line strings...that's hardly the fault of WebGL, and honestly the simplicity of passing around strings means that as the language gets more interesting support (for interpolations and whatever) the API will be unaffected.

It's a stretch to say that GLSL shaders are broken, since they are functional and every OpenGL application uses them, but it's hard to say that they are implemented in a logical way as they are now, especially at a topical glance.

> but what introductory programming book would start out showing how to use a linker instead of how to write a program?

A good one!

Just take a glance over at StackOverflow and consider how many questions there essentially boil down to lack of understanding how the pieces in the compiler toolchain fit together:


In workshop class a teacher will also start with explaining how to properly handle a tool before showing how to apply said tool to a problem.

You're skirting around the issue.

The original comment was - "When writing my first OpenGL code, I was stunned by the amount of boilerplate required."

Your response was - "the amount of boilerplate required ... is minimal."

It's clearly not minimal because we've outlined ways it could be reduced. Your counter-example of writing larger programs doesn't solve the issue of drawing a single triangle. Don't confuse a local minima for a global minima and don't confuse your experience with every noobie's experience.

Although I would say it looks minimal compared to an early D3D application.

> You're skirting around the issue.

Sorry, I didn't mean to.

> The original comment was - "When writing my first OpenGL code, I was stunned by the amount of boilerplate required."

Well, you have to consider what the minimum requirements are to draw something at all. Not just at a computer display, but also in the physical world.

You need a canvas to draw on (window). You have to prepare your drawing tools (GL context). You have to sketch out what you want to draw. You have to prepare your paint (either setting up the OpenGL fixed function pipeline) and last but not least to the actual drawing.

Old style OpenGL (fixed function pipeline with immediate mode vertex calls) and modern OpenGL (shader based) pretty much even out in the amount of characters to type for drawing a triangle (fixed function slightly wins if using client side vertex arrays). The main difference is, that the fixed function pipeline gives you a default state that will show you something. But that's something that only OpenGL does and you'll find in no other graphics system (and I include also all offline renderers there).

> It's clearly not minimal because we've outlined ways it could be reduced

But this reduction works only for the most simple most basic images drawn. As soon as you're drawing something slightly more complex than a RGB triangle the shader based approach quickly wins out. The very first shader languages were created not for programmable hardware (didn't exist yet) but to kill the dozenz of lines of OpenGL code required to setup a register combiner and replace it with an API that'd allow the programmer to write down the intention in a high level manner as mathematical expressions.

Also lets not forget that if you were really after simplifying things for newbies, the right approach would be providing a library that does simple shader setup. Completely ignoring OpenGL for a moment, you'd hardly put a newbie through the paces of touching directly the native APIs to setup a GL context (well, okay, that was how I learnt it back then, because I couldn't make GLUT work with my compiler back then). So maybe provide something like a GLU-2 or GLU-3 for that; the GLU spec has not been maintained for two decades.

I feel this discussion is moot (regarding OpenGL) until everybody participating has implemented at least one complex OpenGL scene done with just register combiners. I'll throw in my purely register combiners based water ripples effect I implemented some 16 years ago (have the sources for that somewhere, but if I don't find them I can reimplement) and compare that with the straightforwardness that is a shader based approach.

It's staggering! The above is immediate-mode code which is what's gone and that's how a lot of people learned OpenGL 1-3.

Anton Gerdelans's tutorials are the best I've found and just look at how much typing there is:





Once you can get through that mess, the effort/reward ration seems to be a bit more incremental, but the triangle program a pretty good illustration of that first cliff, and how much it's grown.

Fundamentally, the biggest problem with the fixed-function model and the vertex-plotting calls in the "red book" is they don't scale. They just don't. :(

There's a reason OpenGL ES threw most of that baby and bathwater out---it's not to diminish the footprint of the library on storage (even in an embedded device, memory's cheap for long-term storage). It's that those function calls are each a stack setup, library operation, stack teardown.... Per function call. You're more-or-less tripling the cost of describing your vertices.

We can ignore that cost on desktop machines (well, we COULD, if people didn't want to play the latest and greatest games in CRYENGINE or what have you ;) ). The cost was not ignorable in time-constrained, power-constrained architectures.

Simpler to reason about, but flatly the wrong solution for performant graphics on a coprocessor.

My argument is not to keep the interfaces and functionality of the OLD pipeline, but to maintain a reasonable, fixed-function shader that is a default out-of-the-box. The most general purpose fragment/vertex shader for drawing a triangle and maybe texturing it is 120 lines top, which actually compiled in the driver would be a couple of hundred bytes. The real cost of not including that feature, or including it in the software library, is preventing new developers from easily experimenting with 3D programming in the browser and on portable devices, which is a fairly large loss of creativity in an emerging area.

> My argument is not to keep the interfaces and functionality of the OLD pipeline, but to maintain a reasonable, fixed-function shader that is a default out-of-the-box.

There's very little use of such a catch-all shader in a driver. This is a very narrow requirement and only newbies and super simple programs would benefit from it. Providing such a default shader fits more in a companion library like GLU, that could also cover things like image pyramid generation and generating meshes for primitive solids.

Well, we have WebGL but we don't have WebGLU.

It's shitty.

That's what I'm saying.

And the worst part about it is that the shaders have to be stored as strings. It's not as if it's possible to write shaders in JavaScript (even though it could be possible), so it's really an outrageous expectation when using WebGL from square one to have to write GLSL shaders inline in this specific language (to put a triangle on the screen). Even streaming them as text resources, there is not an easy way to write or debug a fragment shader in the browser that isn't stroke inducing. In fact, it doesn't seem that there's a reasonable way to do that AT ALL.

Your "very little use" case covers about 10,000 snippets that could be easy to share if they didn't end up looking like this, first result from Google for "Pastebin Webgl": http://pastebin.com/uiHGium5

Compare a decently developed web standard like HTML 5 video: http://pastebin.com/znbHgG3r

Wow, a 40 line demo that is easy to demonstrate.

The whole domain of 3D programming is full of such arrogant programmers that it's no wonder that everything continues to be difficult even 20 years on. It takes expert knowledge. Why even bother using a graphics card or language, when it's trivial for a programmer to write their own software rasterizer? This is obviously how someone should learn to program, if they're going to do it the RIGHT WAY.

Spare me.

Or go on beating that dead horse.

I think you missed:

- spend a few hours working out why your display window just shows black

> - spend a few hours working out why your display window just shows black

I hardly ever had this kind of problem… and I'm doing OpenGL programming for almost two decades now. Yes, when implementing a new shader (idea) I often am greeted with a blank window, too. This is where the "stringiness" of OpenGL is actually a huge benefit.

See, you don't have to recompile your program just to change things in the shader. In fact in development builds of my OpenGL programs I regularly place a filesystem monitor on each shader source file, so that the shader gets automatically reload and recompiled whenever the file changes. Combine that with a GLSL compiler error log parser (the GLSL compiler will emit helpful diagnostics) for your favourite editor and you can quickly iterate your development cycles.

Just to give you an idea of how well this works and how quickly you can develop with OpenGL if you know how to do it properly: Past February our company showed our new realtime videorate 4D-OCT system at Photonics West/BiOS expo. The first day of the expo, about 10 minutes before opening my CEO asked me, how difficult it would be to hack a variance visualization into it. Since we had a developer build of the software on the demo system I could make the changes in-situ in the running program. It took me less than 5 minutes to implement that new display mode shader… from scratch. Considering that even the smallest changes in the program C++ files would still require a at least 10s long linking stage and that after each launch of the program it takes about 2 minutes for the program to go through all the required calibration I couldn't have done it that quickly.


For those wondering how to effectively debug shaders: First get the vertex shader working, only then start with the fragment shader. Until the vertex shader doesn't work use the following as fragment shader:

out vec4 o_fragcolor; void main() { o_fragcolor = vec4(1.,0.,0.,1.); }

i.e. a solid red. For debugging the vertex shader you may introduce some varyings that you pass into the fragment color. Once you've got the vertex shader doing what you want it to do you can tackle the fragment shader.

And always remember: Don't link the shader text hard into your program binary. Load them from external files instead and have a way to signal a reload.

There's nothing intrinsic to graphics work there, you're just talking about content-driven development which is common in a bunch of domains.

Your not hitting the black-screen since you've been doing it self-admitted for 20 years. Kudos, good for you.

The problem is getting fresh-blood in who have a hunger to learn but don't have a deep understanding of shaders, pipelines, vertices, etc. It's really easy to make a trivial mistake in that boiler plate(I've seen someone fight something for hours until they thought to turn of culling, they just had their winding wrong).

Case in point from one of the foremost figures in the domain:


PIX was about the most sane step in the direction of fixing this, sadly tools seem to have slid backwards of late.

Your advice is sound, if maddening (it's maddening to have to debug a black box; I've hit bugs in shader compilers before that had me tearing my hair out. Things should not be fixable by multiplying a quantity by 1.0 in a sane world ;) ).

Ha, the things I've seen shader compilers do. One of the funniest (or so I think) was how the NVidia shader compiler messed up its intermediary assembly code:

The NVidia GLSL compiler (which is based on the Cg compiler), to this date, first compiles from GLSL into assembly as specified by the GL_ARB__program extensions plus extra instructions specified in the GL_NV__program extensions. Nothing wrong with that. But what's wrong is applying the LC_NUMERIC locale onto floating point literals emitted into the assembly code, because if your system happens to be set to a locale where the decimal separator is a comma (,) and not a period (.) the following assembly stage will interpret the commas as list separators. NVidia's beta drivers of November 2006 for their then newly released G80 GPUs had this very bug (it also happens that I did report it first, or so I think); NVidia got the report just in time to apply an in-situ workaround before the next non-beta driver release (at least so I was told back then).

The core problem is library infrastructure and tooling of our programming language.

> Compiler plugins and DSLs

Name me a non-lisp language that supports that. A lot of LLVM based languages are making progress in that direction (by providing an ability to hook into the LLVM IR at least). But this problem is all programming languages, graphics is just a pain point because of it.

The amount of boilerplate seems much lagrer with Vulkan, given what they say :)

Vulkan is shaping up to be much more capable of supporting abstraction layers above it. I imagine in 5 years we will have a common "standard" Vulkan wrapper that abstracts away the boilerplate and behaves like OpenGL did, except we aren't going to be restricted to it - if it ever diverges from hardware realities again we can just drop it and use Vulkan again or write another abstraction layer.

Gets much worse than that, for anyone having shipped AAA games on multiple consoles: 1) Ubershader program files (#ifdef's pepper generously everywhere) 2) Each console has it's own quirks and gotchas that you try to abstract 3) CPU/GPU interface is a nightmare and debugging is as painful as it gets.

"CPU/GPU interface is a nightmare and debugging is as painful as it gets."

I suppose this is one of the biggest main paint points. Not a verbose API, but the fact that the drivers are so complex that they often break the expectation that you can trust the underlying implementation if it's provided by a vendor.

No linguistic overlay is going to fix the underlying drivers provided by graphics card providers.

Before figuring out how to make the interface elegant I claim it should first become robust. If it's not robust then wrapping it in an elegant interface will only make things worse.

Vulkan and DirectX 12 are attempting to mitigate the driver problem by removing as much of the driver as possible.

I've always sort of suspected AMD's Mantle initiative, which grew into the above two technologies, was at least partly motivated by their inability to ship a driver with the same performance as nVidia's...

Sadly, it looks very much like both Vulkan and DX12 are going the way of their predecessors. By which I mean tons of extensions, and nobody really thinks that they'll be able to avoid working around buggy games the same way previous drivers had to, do they?

Maybe I'm being too cynical, but these seem like issues inherent in the situation, and not something caused by particular quirks in gl/dx.

> nobody really thinks that they'll be able to avoid working around buggy games the same way previous drivers had to, do they?

It depends, the running understanding was that when told to do X the driver actually did Y so the game did Z which the driver transformed to X. Unfortunately a latter update caused Z to become W so a game specific fix was added to make it "correctly" do X.

If Vulkan/DX12 can avoid the shell games then the whack-a-mole could in theory be eliminated.

> Maybe I'm being too cynical, but these seem like issues inherent in the situation, and not something caused by particular quirks in gl/dx.

It actually might be given the huge amount of underlying work the drivers were doing to maintain the fasade they were presenting. By removing much of the fasade it could help.

He listed some of the downsides of the OpenGL approach to shaders, but he forgot to address the (IMHO) really big advantage: You can use OpenGL in every programming language that is able to call C functions, which is basically every programming language.

Somebody in this thread mentioned CUDA, which is great, but it also has the downside that you practically have to use C++ on the CPU side. You also can use e.g. python, but when you do you are back to compiling CUDA code at runtime.

Sure, you could argue that we can simply create a new programming language for applications that use the graphics card (Or use C++ like CUDA). The problem with this is that few people would use it just because its a bit easier to do CPU-GPU communication. An there is a second much larger problem: Different graphics API vendors would create different programming languages which makes it much harder for a application to support multiple graphics API as a lot of games do today with DirectX and OpenGL.

Perhaps there is another better solution, but I can't see it right now.

What we really need is a project to do for GPUs what RISC-V[0] is about to do for CPUs. It's high time we as a society broke away from the stranglehold proprietary companies such as Nvidia and Intel have over us. It's time for a completely open computing platform for all of us to use to the fullest advantage of society.

[0] https://en.m.wikipedia.org/wiki/RISC-V

That sounds great. Are you going to pay for it?

That isn't an attempt at snark--RISC-V is interesting. It isn't competitive. GPU technology is very interesting. This, without dump trucks of cash, won't be competitive, either.

There is a question of cui bono here, and you need to answer that to make what you advocate make sense.

dump trucks of cash

How about $30,000[0]? Seriously, it's not the 90s anymore. The cost of making a chip is not insane and it's going to go down every year. As Moore's Law slows to a halt, we enter the golden age of computer architecture[1].

[0] http://www.eetimes.com/author.asp?doc_id=1327291

[1] https://www.youtube.com/watch?v=mD-njD2QKN0

The cost of making a chip anybody wants to use is not $30,000, though. nVidia gets my money because their GPU technology is good enough to, at a reasonable price point, do absolutely anything I'm likely to encounter in the next 3-5 years. You're fighting against significant economies of scale as well as a significant technical deficit.

Don't get me wrong: it would be awesome if this existed. But I think that, like RISC-V, it's predicated on a heavy hitter getting something significant out of doing it. I'm not sure who will do that without the ROI in play.

I agree that you're not soon likely to replace your shiny $400 Nvidia graphics card with something designed by a bunch of students and faculty at Berkeley. However, what about your phone? If you watch Prof Patterson's talk, you'll be astounded at what they've been able to do with modern tools and a small team of students. Now think of some larger, slightly-more funded teams such as LowRISC[0] and you begin to see the possibilities. Raspberry Pi proved there is demand for this sort of thing.

[0] http://www.lowrisc.org/

"It's open" is not and has never been an effective selling point for consumer products, and it seems that you're effectively trying to sell these things to consumers. "It's open and is better is," and in my experience that rarely exists (again, for consumer products). If it is as easy as you posit (and this isn't my field, I know a little about CPUs and GPUs but mobile isn't really relevant to me), then what's stopping a company with significant talent and cash reserves can appropriate the same approaches and lap you with their closed tech? I don't see how you're not going to have to offer an inferior good at superior-good prices.

Technically speaking, if anything, phones strike me as a harder place to enter, due to the limitations on power budget and the significant premium on marginal performance. I'm not saying it can't be built--I'm saying, who would both want it and not benefit from business-as-usual, closed development?

The equation is the same one as for software: why did Google use the Linux kernel instead of developing their own kernel for Android? Because Linux is available and free to use; duplicating the effort of Linux would not have been worth it at all.

The RISC-V architecture is superior to all of the major commercial architectures out there. It's also free to use and has a significant amount of headroom for private extension. So what's stopping a company like Apple from using it for their next phone? They already have the necessary human resources to design an SoC. The main barrier is software recompilation. Well, the RISC-V target is specifically designed to be easy to recompile to: it's not in any way a radical departure from the common ideas out there.

> Raspberry Pi proved there is demand for this sort of thing.

RPi proved nothing of the sort. You are still relying on a commercial vendor to provide the VLSI chips. And, I assure you, they didn't make that chip for the RPi.

Open chips for phones? I couldn't stop laughing. Phones are all about minimizing power consumption. That requires that you be on nearly the newest fab technology.

If OpenRISC/LowRISC/<foo>RISC were serious, they've got a really good market to jam themselves into--Bluetooth Low Energy and WiFi chips. The margins on those chips have gotten so small that everybody is doing anything to avoid having to pay ARM money.

So, why aren't those chip vendors adopting the "open" stuff instead of hand-rolling their own?

The issues pointed out by OP are mostly gone in the new Vulkan API, which is the new graphics API from Khronos. Shaders are shipped in SPIR-V bytecode binary format and the binding of resources to shader inputs is more memory-oriented.

Time to go learn a new API :)

The article does mention Vulkan:

Direct3D and the next generation of graphics APIs—Mantle, Metal, and Vulkan—clean up some of the mess by using a bytecode to ship shaders instead of raw source code. But pre-compiling shader programs to an IR doesn’t solve the fundamental problem: the interface between the CPU and GPU code is needlessly dynamic, so you can’t reason statically about the whole, heterogeneous program.

CUDA sounds like the kind of "heterogeneous" model the author is suggesting. Unfortunately it has it's own set of problems.

I'm not really sure what kind of API the author wants. Sure, it's annoying that data in GPU-land and CPU-land are difficult to get to work together. But that's not the API's fault, it's because they are physically very far removed from each other. They don't share memory (and if they did you'd still have to lock and manage it). You could make an API that made them seem more transparent to the programmer, but then you're back to the GL 1.0 mess where you have zero control and the driver does dumb things at random times because it doesn't know anything about the program that is executing.

If you're on mobile GPU chipsets the latter is actually more common(CPU/GPU being in the same memory space).

That said it doesn't really change a whole lot, each platform has it's own unique performance characteristics and I don't think you'll ever correctly represent that at the API level. APIs tend to move much slower than the hardware platforms that power them.

In lambdacube 3d you can program the CPU/GPU (OpenGL+GLSL) in one typed language. (http://lambdacube3d.com/)

OpenCL was moving to LLVM IR away from programs-as-source-code-strings at some point. Still, you ought to have a JIT step if you want to be able to ship the same binary to multiple systems with GPUs not compatible at the binary level. And BTW shipping LLVM IR on the CPU would make things better in terms of supporting CPUs with incompatible ISAs... whether that ever overtakes native binaries is a question (it did to an extent, of course, with the JVM and what-not.) Of course you could move the JIT off the client device completely and do it at the App Store - prepare a bunch of binaries for all the devices out there - but still, the developers will have shipped IR.

Once you get to shipping IR instead of strings, which is nice I guess in that it should make initialization somewhat faster and the GPU drivers somewhat smaller, I'm not sure what you're going to get from being able to treat a CPU/GPU program as a single whole. Typically the stuff running on the GPU or any other sort of accelerator does not call back into the CPU - these are "leaf functions", optimized as a separate program, pretty much. I guess it'd be nice to be able to optimize a given call to such a function automatically ("here the CPU passes a constant so let's constant-fold all this stuff over here.") The same effect can be achieved today by creating a GPU wrapper code passing the constants and having the CPU call that, avoiding this doesn't sound like a huge deal. Other than that, what big improvement opportunities am I missing due to the CPU compiler not caring what the GPU compiler is doing and what code the GPU is running?

(Not a rhetorical question, I expect to be missing something; I work on accelerator design but I haven't ever thought very deeply about an integrated CPU+accelerator compiler, except for a high-level language producing code for both the CPU and some accelerators - but this is a different situation, and there you don't care what say OpenGL or OpenCL do, you generate both and you're the compiler and you use whatever opportunities for program analysis that your higher level language gives you. Here I think the point was that we miss opportunities due to not having a single compiler analyzing our C/C++ CPU code and our shader/OpenCL/... code as a single whole - it's this area where I don't see a lot of missed opportunities and asking where I'm wrong.)

In CUDA you can write your device kernel as a templated function and the compiler will specialise it based on how it's called by the host. In some cases this can result in big speed-ups while keeping the code simple and maintainable. You could get the same speed from specialising the code by hand, or with a separate code generation tool; but having that ability built in to the compiler makes it very easy to use.

It's also handy for the compiler to be able to check for errors in shader invocations. Looking up parameters by name and setting them dynamically adds a whole class of potential runtime errors that don't exist in straight c++ programs.

func<a,b>(c,d) isn't faster than func(a,b,c,d) if the compiler sees that a,b are constants and the call to func is in the same file as its definition; the only thing CPU+GPU code adds here relatively to CPU-only code is, perhaps you need to write a wrapper along the lines of func_ab(c,d) { func(a,b,c,d) }, a bit ugly but the code is still pretty simple and maintainable. (And unlike the case with templates at least you can make sense of the symbol table etc. etc.)

As to runtime errors because of misspelling a parameter name, this is going to be fixed the first time you run the program and you live happily ever after; I don't mind these errors in Python very much and I don't mind it in C programs using strings to refer to variable names in whatever context that may happen.

Overall the amount of "dynamism" in CPU/GPU programming IMO does not result in wasting almost any optimization opportunities nor does it make the program significantly harder to get right, but I'm prepared to change my mind given counterexamples (certainly in Python it's really easy to demonstrate missed optimization opportunities relatively to C due to its dynamism... The amount of dynamism in CPU/GPU programming however is IMO trivial and hence the issues resulting from it are also rather trivial.)

I also don't see how you can possibly optimize GPU code based on some static CPU code but, on some game consoles, you can share data structures between GPU and CPU, which is nice. E.g. instead of binding a bunch of named parameters you just pass a pointer to a native C struct. This, apart from greatly simplifying the code is a performance advantage as well (the parameter binding is not free and consumes both CPU and GPU time). Though, I don't see how this could be possible to implement in a platform-independent way.

It seems unlikely that any of this will change, because control is in the hands of a relatively few powerful organizations who honestly have no interest in making life easier for software engineers working on graphics programming; and those engineers who do have to put up with it have long since given up or chalk it up to experience; and those developers would probably resist any changes now simply because change requires more work and learning how to do things all over again.

The hardware developers (AMD/nVidia/Intel) have an interest in not changing their device drivers, the software vendors (OpenGL and DirectX) have little interest in redeveloping technology, and the software developers with the most capital, game engine developers, have already found workarounds and hire enough engineers to plug leaks in their lifeboats. The state of tools for game developers is so shoddy as well that trying to retrofit language compilers and shader compilers to work together seems like a drawn out task.

It's sort of a David and Goliath situation if you think you can change the graphics programming landscape on your own. Plus, we all know how poorly these standards are developed over time.

> who honestly have no interest in making life easier

Hrm...I've thought about this a lot, and I don't think it's as much as not having interest, as much as not having time. Or at least...not being able to monetarily demonstrate to their employers that it would be worth the engineers time to generate educational resources and better tools.

I do think AMD is trying to get better in this area with their GPUOpen stuff, though of course, you never know what might happen if AMD suddenly becomes top dog.

I don't think its that bad now. We are somewhat limited by SPIR-V, but we now have the capabilities to write our own graphics APIs and shader languages that can compile to Vulkan /OpenCL and SPIR-V and go on from there. It is nothing close to as bad as the goliath of OpenGL hegemony.

I might not be as qualified to talk about this since I am not actually working on the leading edge, but I know from past experience that development hasn't been pleasant.

With Apple releasing Metal, I think the vendors do have an interest in making developers lives easier if it helps to draw them into their platform. I'm just glad OpenGL is still supported at this point.

Would Halide[0] be what he's looking for? It's a DSL, sure, but one that is embedded in C++, so you can at least program your whole pipeline in place, giving you a good grasp of the flow of the program(s).

[0] http://halide-lang.org/

I believe Halide is limited to 2-dimensional (Image Processing) use cases? Either way, it's designed to make it easier to find optimal parallel implementations rather than write code that can fluidly execute across a CPU and GPU.

Well the thing is that it's targeting "x86/SSE, ARM v7/NEON, CUDA, Native Client, and OpenCL."

Anyway, as far as I know it is limited to 2D raster images so far, yes. Since it's part research language with some very interesting features I do hope that it gets extended to more general cases, and at least make it easier to work with 3D use cases (image volumes and such) too.

Very well written. I get the sense that the reason we're all here now is that historically the people who use these APIs as client programmers have been somewhat bad at explaining what they want.

Rather: it's because the companies building the hardware have jealously guarded their machine languages, treating their drivers as a source of competitive advantage. We'd have fixed the problems ourselves years ago if we could, but GPU manufacturers like to keep it all locked up tight and secret, so we can't.

I did not find anything interesting or useful in this article.

Author just says, that OpenGL and "shaders as strings" are bad, without explaining why. Then author speaks about some mysterious "Heterogeneity" without specifying what it means.

"We need programming models that let us write one program that spans multiple execution contexts" - what does it mean? How should that model look like, how should it be different? It is like saying "cars with 4 wheels are bad, cars with different number of wheels would be better".

I thought the problem is pretty much spelled out in that post:

> [...] doesn’t solve the fundamental problem: the interface between the CPU and GPU code is needlessly dynamic, so you can’t reason statically about the whole, heterogeneous program.

It's the same thing as raw SQL vs abstraction (could be ORM, but not necessarily). Not only are your assumptions not verified (is there table X, will I receive the columns I expect, etc.), you cannot tell if the constructs are even valid until runtime (is "SELECT %s from %s" valid?). Abstractions let you do things like `table.where(abc=123)` which is guaranteed to be a valid construct and can be verified against schema at compile time. It can still fail at runtime, but has much fewer failure modes and is clearer when reading.

10 years ago I wrote a thing called PG'OCaml which integrates OCaml and SQL in a type safe way. You write code like:

    PGSQL(dbh) "insert into employees (name, salary, email)
                values ($name, $salary, $?email)"
and it (at compile time) creates the correct PREPARE+EXECUTE pair and checks types across the languages. (It's not obvious but it is also free of SQL injections.)

For example if your OCaml 'salary' variable was a string but the column is an int then the above program wouldn't compile. If you change the type of the db column after compiling the program then you get a runtime error instead.

Still around although now maintained by a group of authors: http://pgocaml.forge.ocamlcore.org/ (git source: https://github.com/darioteixeira/pgocaml)

It's only possible because OCaml has real macros. It wouldn't be easy to do this in C/C++ (I guess? - maybe with an LLVM-based pre-processing parser or something)

This used to be how a lot of SQL code was written. I think it's about as old as SQL itself. See Oracle's Pro*C for instance. It's a C precompiler that generates API calls and optionally does type checking. There's even an ANSI/ISO standard for embedded SQL.

Yep. I used to use the Sybase SQL precompiler in OS/2 with Watcom C++. Nullable columns were a pain because you had to define a separate boolean variable to hold the null status for each one.

Just last night I was thinking about what it would look like to have a language that would type check your SQL strings. Thanks for sharing PG'OCaml!

totally right about the parallel with SQL.

This paper is from a few years ago about verifying programs that span multiple languages / runtimes:


We need to get these concepts into mainstream tools. SQL certainly seems to be the low-hanging fruit (simple, straightfoward typing, widely used and understood, frequent errors lead to security problems).

I think the author of the article may be failing to understand why the dynamism is there. It's, unfortunately, not needless.

The shader API is an API where the developer provides source code at runtime and then modifies inputs to the program compiled from the source code that runs on the user's GPU. It's a source-code level interface because it makes nearly zero assumptions about what capabilities the underlying hardware will have; interpretation and compilation of the source is left largely up to the vendor (literally, in the sense that the compiler is part of the GPU's driver kit and not part of the OpenGL library---an issue that has caused me HOURS of headache when encountering a bug in a closed-source graphics card's compiler, let me tell ya ;) ).

I strongly suspect a more compile-time-checkable API would lack the flexibility needed to capture all shader behavior---both when the API was codified and in the future. It's a pain in the ass to use, but I'm not convinced it's avoidable pain.

Yes, that is the problem: the compiler is embedded in the GPU driver. We can't do better because we can't build our own compilers.

Communication between the part of the program that runs on the CPU and the part that runs on the GPU is typeless and requires manually marshalling arguments. Javascript programmers will shrug at this, but for people used to strong type checking and better automarshalling of RPC it's very disappointing.

As a fun exercise, try sketching out what a stronger-typed API for shaders and shader parameters would look like.

It's a useful exercise in understanding why such a thing wasn't done.

DreemGL [1] [2] addresses these problems by compiling JavaScript code into shaders.

"DreemGL is an open-source multi-screen prototyping framework for mediated environments, with a visual editor and shader styling for webGL and DALi runtimes written in JavaScript." [3]

[1] https://github.com/dreemproject/dreemgl

[2] http://docs.dreemproject.org/docs/api/index.html#!/guide/dre...

[3] https://dreemproject.org/

Here's a talk about the nuts and bolts of how DreemGL compiles JavaScript into shaders:


One interesting library for building shaders is: http://acko.net/blog/shadergraph-2/ or https://github.com/unconed/shadergraph

Made by the guy behind MathBox.

I think the author has good intentions (don't we all), but I don't think he understands enough about graphics programming to make some of the proclamations/requests in the article. Not that I blame him...it's hard to get an understanding of GPU programming outside the scope of a game dev or IHV.

> To define an object’s appearance in a 3D scene, real-time graphics applications use shaders... Eh, the shaders are just a part of the GPU pipeline that transforms your vertices, textures, and shaders into something interesting on the screen.

This is already oversimplifying what GPUs are trying to do for the base case of graphics.

> the interface between the CPU and GPU code is needlessly dynamic, so you can’t reason statically about the whole, heterogeneous program.

Ok, so what is the proposed solution here? You have a variety of IHVs (NV, AMD, Intel, ImgTec, ARM, Samsung, Qualcomm, etc). Each vendor has a set of active architectures that each have their own ISA. And even then, there are sub-archs that likely require different accommodations in ISA generation depending on the sub-rev.

So in the author's view of just the shader code, you already have the large problem of unifying the varieties of ISA under some...homogenous ISA, like an x86. That's a non-trivial problem. What's the motivation here? How will you get vendors to comply?

I think right now, SPIR-V, OpenCL, and CUDA aren't doing a _bad_ job in trying to create a common programming model where you can target multiple hardware revs with some intermediate representation, but until all the vendors team up and agree on an ISA, I don't see how to fix this.

On top of that, that isn't even really the only important bit of programming that happens on GPUs. GPUs primarily operate on command buffers, of which, there is nary a mention of in the article. So even if we address the shader cores inside the GPU, what about a common model for programming command buffers directly? Good luck getting vendors to unify on that. Vulkan/DX12/Metal are good (even great) efforts in exposing the command buffer model. You couldn't even _see_ this stuff in OpenGL and pre-DX12 (though there were display lists and deferred contexts, which kinda exposed the command buffer programming model).

> To use those parameters, the host program’s first step is to look up location handles for each variable...

Ok, I don't blame the author for complaining about this model, but this is an introductory complaint. You can bind shader inputs to 'registers', which map to API slots. So with some planning, you don't need to query location handles if you specify them in the shader in advance. I think this functionality existed in Shader Model 1.0, though I can't find any old example code for it (2001?).

That being said, I certainly don't blame the author for not knowing this, as I think this is a common mistake made by introductory graphics programmers, because the educational resources are poor. I don't think I ever learned it in school...only in the industry was this exposed to me, to my great joy. Though I am certain many smarter engineers figured it out unprompted.

> OpenGL’s programming model espouses the simplistic view that heterogeneous software should comprise multiple, loosely coupled, independent programs.

Eh, I don't think I want a common programming model across CPUs and GPUs. They are fundamentally different machines, and I don't think it makes sense to try to lump them together. I don't think we just assume that we can use the same programming methodologies for a single vs multi-threaded program. I know that plenty tried, but I thought the best method of addressing the differences was education and tools. I'd advocate that most effective way that GPU programming will become more accessible will be education and tools. I have hope that the current architecture of the modern 'explicit' API will facilitate that movement.

Correct. What the author seems to be missing is that a different approach was already tried in the "fixed-function pipeline" era. Two issues were encountered:

1) Fixed-function utterly failed to capture the explosion of features as graphics card capability took off, having to fall back on nasty APIs like the OpenGL extensions system (where at a certain level of sophistication, more of your code is calling Glext functions than core OpenGL functions---when everything is an extension, nothing is)

2) Compile-time-checkable behavior was accomplished via function calls, but it turned out that making a lot of function calls created significantly non-trivial runtime overhead, and performance is king in graphics. It's hard to get good static checking on a model of "Rip a bunch of bits into a binary buffer and then ship that whole buffer to the GPU," but that model is a lot faster than many-function-call-based alternatives.

"Eh, I don't think I want a common programming model across CPUs and GPUs. They are fundamentally different machines, and I don't think it makes sense to try to lump them together."

Well, except for APU's, which I think is an unstated target for this discussion. Or on-board graphics chips.

In these cases, there is almost certainly extra work being done, regardless of your programming philosophy.

I'd actually argue that not that much changes, except APUs mean you don't have to worry about different memory pools, and moving stuff across the bus. That's a whole other problem, as far as streaming data management. If we're just talking about GPU programming, in my experience, even APUs still feel like two separate processors instead of some...combined thing.

Coding for a GPU is basically embedded programming - and the jit step is needed to avoid platform adaption costs.

Slightly OT, but the literate presentation was nice. Any particular tool or technique you used for that?


From the Makefile [0] it looks like they are using Docco [1].

I agree that it's a very nice presentation of docs+code.

[0] - https://github.com/sampsyo/tinygl/blob/master/Makefile#L25

[1] - https://jashkenas.github.io/docco/

Thanks both for the reply. I'll check docco out.

I think this what OpenACC (https://en.wikipedia.org/wiki/OpenACC) was created to do; its support is pretty iffy right now, though.

OpenACC is for GPU programming what OpenMP is to parallel programming. It is intended for general purpose graphics programming, i.e. numerical computations on your GPU, not for replacing anything like OpenGL.

Can somebody with Direct3D-experience give opinion if it's any better?

Shaders can be built in advance. The raw interface remains dynamic/stringly accessed. It's mostly the same.

Source code or IR or whatever else would always be needed, I'm afraid, with a late (and, likely, unpredictable) compilation. Passing an image of a different format may trigger a dynamic shader (or OpenCL kernel) recompilation on some GPUs, for example.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact