Hacker News new | comments | show | ask | jobs | submit login
Render Pipelines design in C++ (marti.works)
72 points by llorens-marti on Mar 20, 2017 | hide | past | web | favorite | 19 comments

This is a somewhat simplified version of what you will see in a AAA graphics engine (pretty much all of them have abstractions around draw passes/stages and pipelines). The weakness in the approach here I think is that it won't handle temporally aliased things particularly well and also, I've found that actually render pipelines are more of a pipe"tree" with dependent objects becoming inputs for more than one pass for various things. Then you throw in multithreading into the mix and start to understand why a simplistic API like this gets complicated really fast in real world settings.

There is nothing new in Abstraction of passes and pipelines. We need to think about practical complications also.

I would be curious to see what anyone with more than my armchair experience in graphics thinks of this.

This post is an example of armchair programming, so your armchair experience is more than enough.

On a more serious note, it's software architecture done on paper. Any experienced programmer will tell you it's close to worthless. You don't have to be a graphics expert. Abstraction of passes and pipelines isn't anything new, it even bled into the modern APIs like Vulkan. This post is just some inexperienced programmer's vague description of how he or she would like things to work in theory. It's not worth a discussion.

I think the pipelines in this posts mean something different. As Vulkan/Metal pipelines refer to the GPU rendering pipeline with vertex, fragment shaders, tesselation etc not a sequence of passes.

Variations of this design are quite common in realtime rendering engines after people realised that putting everything into a single giant scene graph isn't a good idea (so at least since the very early 2000's).

For instance in our engines (Nebula2 and 3) there are so-called "frameshader" files (XML, nowadays it would be JSON of course) which describes a frame as a sequence of passes, the result of a pass is a valid render target texture or the final visible image, and a pass consists of per-material batches (or buckets).

Higher up are 'stages, views, and entities', a stage is a collection of graphics entities, a view is a 'view into a stage' (a view owns a camera and a frameshader), and entities can be models, lights and cameras. Multiple views can be attached to the same stage, and views can depend on each other (although in the >10 years we use this concept, dependent views were hardly needed).

> people realised that putting everything into a single giant scene graph isn't a good idea (so at least since the very early 2000's).


A high-level intuitive, if simplistic answer, would be that you want to do batching based on type, and a monolithic scene graph does not arrange things by type; nor is it in general a valuable abstraction for rendering graphics, which are ultimately highly non-hierarchical and undermined by the constant need for sophisticated traversal logic.

Go to Tom Forsyth's blog here: https://tomforsyth1000.github.io/blog.wiki.html and find the "Scene Graphs - just say no" post (I failed at providing a direct link, apologies).

I was hoping for the deconstruction (or reconstruction) of one of the many rendering engines already out there - there's lots to talk about! This doesn't have much meat - reads like an incomplete UML diagram. My experience is mostly porting/profiling/optimizing graphics, not writing entirely new pipelines, but I'll chime in on a couple points:

>> Passes can receive as inputs: [...] The previous Pass outputs in the form of textures in memory (Render Targets)

This feels a bit arbitrarily limited. I find rendering passes more 'graph-like' than a pure linear pipeline. For example, you might render some shadow maps - some of these might be bound to a single camera (e.g. for traditional directional cascading shadow maps fit to the camera frustum), but other shadowmaps could easily be shared between multiple viewpoints (e.g. point light dual paraboloid shadow maps). And even your 'per-camera' shadowmaps might be reused between multiple viewports for e.g. two very similar eye cameras in VR, whereas other parts of your scene will need to be re-rendered wholescale per-eye (unless you start implementing e.g. nVidia's single-pass multi-viewport rendering - how does that fit into this UML diagram?)

And then of course you might have multiple passes reading from the same gbuffer(s) or the scene depth buffer, or even the result of previous frames (temporal anti-aliasing and motion blur), and other things that aren't really per-camera (dynamic light probes for reflections, etc.)

I've also 'fond' memories of bugs from the magic configuration of passes, inferring inputs and outputs from their position in a list. I'd lean strongly towards being completely explicit about inputs and outputs. It's more dumb code, but it'll make things beautifully straightforward to understand, modify, and fix.

>> The basic concept of boiling things down into passes with inputs and outputs that can be managed as data

This has been overkill for every single one of my hobby projects, just more cruft to wade through, a solution searching for a problem. At least one of the more graphically simple professional titles I've worked on didn't even bother with this, and I didn't miss it. That said, the professional rendering code I'm most familiar with (having ported it from D3D9-only to multiple other rendering APIs) did have something similar, and it seemed reasonable enough. If you want to go "crazy" exposing control of rendering passes directly to your artists to tweak with scripts or other UI, it's a reasonable thing to represent. It can also make adding some debug tooling easier (e.g. profiling information about different passes, adding debug views of the outputs of individual stages, etc.)

A nifty graphics study of DOOM: http://www.adriancourreges.com/blog/2016/09/09/doom-2016-gra...

Hey, it's MaulingMonkey! I spent most of my childhood on #gamedev. (Is Washu still around?)

Thanks for the countless hours debating random programming topics. #gamedev was a pretty wonderful incubator for a 14 year old. Whenever I ran into a wall that couldn't be solved with googling, someone was usually able to give some sort of tip or an arcane D3D incantation to point me in the right direction. Hope you've been well!

I grab dinner with Washu sometimes :). I hear he's been hanging out in the discord chat more often now, but still pops into #gamedev some as well. Also un-retired into game development out of boredom.

Been well, hope the same for you :).

> This way we can describe that the Camera will have a relation of 1:1 with a Render Pipeline.

If all you're making are smartphone games, this might work [1]. But for more complex scenarios, this sort of thing would be a non-starter. You ideally need to support N:1, Cameras to Pipelines. Not just for N=2 for VR, but N=3 for VR+spectator mode, and N=4+ for projection mapping to surfaces as well.

[1] I say "might" because smartphones are getting more powerful and will probably very soon (as in 2 to 3 years, i.e. about when such a new project would be usable) be decent VR platforms.

Doesn't this approach incur a lot of cache misses?

Why? All these data structures are mostly just handles to the actual data, which lines in the GPU memory.

The concept in this blogpost anticipates a more general architecture. Graphics pipelines are a good fit for dataflow programming, such as FBP [0] or reactive [1]. When you use those styles, the pipeline architecture is more flexible, more straightforward to run concurrently, while the coding challenge turns into one of how to allocate and address different assets, address different types of assets, create and address temporary data, and describe the node graph connecting these things together.

The practical complication in rendering can be stated like this: combine assets A and B and parameter P to make asset C, then (subsequently) combine asset C with asset D and parameter J to make output O. Some of the assets/parameters change with each frame rendered, others are entirely static, and oversight over GPU time and memory usage is needed, so you have to consider which things are loaded when. Then, while in production, a design change necessitates that a new parameter be added somewhere, and it turns out that you have to reorder which things are processed first and add a custom codepath because one of your target GPUs doesn't support the feature you need without a hack.

[0] https://en.wikipedia.org/wiki/Flow-based_programming [1] https://en.wikipedia.org/wiki/Reactive_programming

Why not simply use a software cache (memoization, [1]) to prevent doing things more often than necessary?

[1] https://en.wikipedia.org/wiki/Memoization

All kinds of caches are used in graphics programming. The devil is in the details: Does a generic memoization strategy help you decide when to re-render a shadowmap based on the updating of entities within range of said shadowmap? And when you decide technically incorrect, stale results, are still 'good enough' (based on the duration they've been stale, distance from the camera, size of the resulting shadows, etc.) how do you represent that? How do you decide when shadowmaps can been reused between multiple perspectives? How much can you calculate offline, before the game is even run? If you squint really hard you might be able to couch some of this in memoization terms - but how, exactly, is that helping?

Memory is tight - GPU memory might be a few gigs, and your artists will fill the space. Time is tight - 16ms/frame max if you're targeting a smooth 60fps, and even higher framerates (and tighter time budgets) are recommended to e.g. reduce nausea for VR. You know exactly how long you need some resources, they are large (over 100MB for a single set of 4k gbuffers), and spilling active data out of the cache can be disastrously slow, or lose necessary information you cannot recreate.

At some point you'll get rather hands on and start micromanaging a lot of this stuff - it's much simpler to write, debug, and optimize a lot of this if you implement it 'by hand', than to design, write, debug, and optimize an algorithm trying to handle a perfectly generic answer/solution to all these problems.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact