The same was true of early games. Gamers love the bright colors of World of Warcraft for instance because displays were bad, and it was easier to watch bright colors for hours on end while gaming. Even today my modern TV looks pretty bad in the sunlight. As much as I love the muted colors of Breath of the Wild I have to admit that without closing my curtains its really hard to tell what's going on if a sunbeam is hitting the TV.
I think as gamer's screens get better we will start to see a transition away from exaggerated brightness and a trend toward more realism, just like Instagram has transitioned largely from a platform that overcompensated for bad smartphone screens to a #nofilter style.
> [W]hy does the LDR photo of your TV look so much better than the LDR tone map image? There’s another tone map in this chain which nobody thought to examine: Nikon’s.
The author isn't merely saying that games are too colorful. He's saying the colors are wrong from both an artistic perspective and from a utilitarian perspective. He's saying this is because the people making decisions about how color should be rendered are either ignorant of how to make the decision (copy-paste code) or make the decision without real awareness of how it will effect either of those properties.
It's physically painful to remember it.
I got a Compaq Portable (8088 clone) for a dollar years ago. I weighs a "tiny" ~35 LBs. ("portable"!) I got it to boot up after some problems with floppy drives. I was so cool to see little BASIC programs displaying silly, fancy, tech-demo graphics. But the whine... oh the whine... I'm half tempted to replace the built-in screen with a small LCD and just disconnect the blue and red wires to keep it solid green.
Sidenote: I still miss the warm glow of an amber screen though. There's something magical about it.
I worked for an animation studio around that time. They stuck to their CRTs long after they were discontinued because they couldn't certify anything else--I think I remember hearing they paid something like $150 per monitor when they were still manufactured. The closest LCD was 10x as expensive.
They eventually switched to DreamColors when they were released in 2008.
Some app authors and content producers put in the effort to understand and specifically target the iPhone display, e.g. by gamut mapping their images beforehand. But I would guess that “some” to be significantly less than 1%. (Though I suppose you might consider the exaggerated-color Instagram filters we’re discussing here to be one sort of gamut mapping, targeting the relatively muted display.)
At some point Apple started making all of their devices very precisely hit the sRGB spec, so even though iOS still did no active color management, that made things a lot better, at the expense of making it impossible to accurately target colors at all iPhones for a while because different models had different gamuts. If you go look up the historical tech specs and display teardowns for different iPhone generations, I’m sure you can figure out exactly which version was the first to be close to sRGB. [More recently, I believe they now also do some color management, though I haven’t really investigated this in a few years.]
The first couple iPad mini versions also had a smaller-than-sRGB color gamut and no color management (I have one of these). I’m not sure about the earliest full-sized iPads.
Of course, this is all still better than Android where there is extreme fragmentation and every display looks slightly different.
It wasn't perfect nor did it hit sRGB, but still pretty darn good for photos. As you say, they didn't color manage but they did a great job factory calibrating and selecting panels, and thus the lack of color management wasn't really an issue in practice, at least when designing apps. About 2 years after Instagram launched, the iPhone 5 came out which had sRGB.
The point still stands that the displays weren't the reason insta photos were filtered - it was driven by camera limitations and the culture / artistic sensibilities that Instagram imbued in the product.
Earlier iPhones were an even smaller gamut. http://www.displaymate.com/iPhone_3GS_ShootOut_files/image00...
More than just desaturating images that aren’t gamut mapped, however, these iPhones substantially distort color relationships.
They were historically great displays for mobile devices, but they would have been a lot better with some color management involved.
> Your content is matched to the sRGB color space on the authoring platform. This matching is not performed dynamically on the iOS device. Instead, it happens during authoring on your Mac OS X desktop.
> Targeted color management may also occur when you sync content to your mobile device. In fact, iTunes running on the desktop provides color management to the iOS targeted color space when you sync content from iPhoto to your iOS device.
(As you noted, there have also been more recent developments. Since last year there have been iOS devices which support the DCI-P3 gamut and use active color management; some of those devices also support “True Tone”, which adjusts the color mapping based on ambient light.)
IMHO this type of scene is better than any of the examples shown in the article:
I don't understand this statement. Why do you need a certain number of bits to represent a large range? You can encode a range using any number of bits you want, but if you use fewer bits the resolution within may be lower.
Representing a large luminance range isn't a problem of resolution though is it? It's a problem of presenting that contrast ratio on a screen that is not capable of a contrast ratio of 1,000,000:1. If I represented the same luminance range with a thousand bits it doesn't do anything to solve the core problem, does it? So why does the number of bits matter?
Not a graphics programmer, so genuinely asking, not saying the article is wrong!
You are right, the final output dynamic range is limited by your output device, you can't circumvent that.
The 1,000,000:1 is the contrast ratio you might see on a sunny day, and represents the state of the world that your eyes are seeing.
Your eyes aren't capable of 1,000,000:1 contrast either, at least not globally across your field of vision, so we can play tricks in rendering to simulate what you would perceive with your limited eyeballs on a 1M:1 contrast ratio day, and this is what HDR rendering does.
Internally, the rendering code handles luminance well in excess of its ability to display on the final output device. All the game rendering happens internally in HDR, preserving this high level of contrast. The final steps of producing a displayable image are doing something to convert non-displayable "real" contrast ratios into post-processed simulation to fool you into perceiving high contrast scenes. This last part is subjective and artistic, since you are trying to figure out a way to show that high dynamic range scene. Common tricks played here are the light bloom from bright areas, the artificial amplification of darks areas, etc.
All these in-between rendering steps compose the result of multiple rendering passes into an image. Each pass which is blended with the results of a previous pass is subject to quantization error. The more passes you do (and modern games do lots), the more this quantization error accumulates, and leads to pretty ugly banding artifacts. You want to do your internal processing in the highest color resolution that you can, so that you only have to worry about the error of quantizing the 1M:1 scene into a 256:1 output device. You don't need the full 20 bits, and common practice is to use 16-bit S10E5 floats since hardware handles this quickly.
HDR televisions, while nicer than nothing, don't help you all that much. Displays have generally been 8-bit per channel, giving you 256 values distributed along a gamma curve which maps to a device with perhaps 3,000:1 contrast capability. HDR tv's give you 1024 values along a gamma curve that spans a higher range of contrast, but not all that much higher.
In the real world, the total contrast ratio between the brightest highlights
and darkest shadows during a sunny day is on the order of 1,000,000:1.
We would need 20 bits of just luminance to represent those illumination ranges
In practice modeling vision precisely gets really complicated because there are several types of spatial and temporal adaptation and contrast effects that the human visual system can undergo / be affected by, and something like a video game definitely doesn’t have the computation available to handle it perfectly (not to mention the programmers don’t want to implement anything that complex and finicky). There is also a complication that display black is never especially close to zero light, and depends a lot on the light hitting the display. Etc.
So for practical purposes you just have to come up with some kind of “good enough” heuristic.
The assumptions seems to be a linear illumination scale with the darkest value at 1 (with true absence of illumination at zero); 1,000,000:1 needs about 20 bits for that.
So for practical purposes you just have to come up with some kind of “good enough” heuristic
The article says that the highest light level in a scene is 1,000,000 times brighter than the lowest. It does not say, or even imply, that you can see the difference between those levels of light to a one in a million, which is what 20 bits needs.
Can you see what I mean? The factor between the lowest and the highest is entirely separate from the question of how many levels between you can see, which is what directs how many bits of resolution you need.
One thing to add: Wikipedia sayeth "the eye senses brightness approximately logarithmically over a moderate range". If we go with that, then you presumably want to encode brightness logarithmically, and the number of bits you have available will determine the ratio between your adjacent quantized levels. In that case I believe the ratio between adjacent levels would be exp(ln(max_range_ratio)/(2^bits)).
I explained it in another comment but basically the "20 bit" value is based on an idealized digital image sensor rather than a game rendering pipeline (which operates in floating point). It has admittedly proven somewhat confusing.
All we know is that the range is N to 1000000N, but we do not know in and of itself that N is the smallest delta perceivable or possible.
I don't know anything about games but at least in cameras the relation you want between the bit depth and dynamic range is determined by what you want to measure rather than a formula. An eye tracker I have is a 10-bit IR camera because a normal 8 bits are insufficient to both note that IR LED reflections are way brighter than everything else, while also having sufficient detail in the lower values that the edges of the pupil are easy to detect. At least, without having the scale be logarithmic or discontinuous or otherwise compressed.
The real benefit of HDR is in the "more values" part since, as the author notes, our displays have a very limited dynamic range anyway.
Hell, I like to think I'm somewhat knowledgeable about these things and I read straight past that line thinking "that's a million, that's about twenty bits, okay".
But you are absolutely right, now that you pointed it out it's obvious.
And indeed you can get the same range using only one bit (per channel, that is) and if you had high enough (very high) resolution and proper dithering, you'd totally get away with it, too. In that case, the tonemapping goes just before the dithering.
I didn't interpret the articles premise that if you had 20 bits, you could 100% reproduce the light in a scene.
You only need 1 bit if you declare an encoding scheme where 1 == 1 million. You need 20 bits to define 1 million DIFFERENT luminances. Their ratio is subject to an arbitrary scaling.
> To express a ratio of 1000000:1, you need at least 20 bits.
You mean to express all the integer factors in the range 1000000:1, you need at least 20 bits. With 20 bits you can represent 1 times brighter, 2 times brighter ... 1000000 times brighter.
But there's nothing special about those coefficients. 20 bits does not allow you to represent 1.5 times brighter. That's still in the range 1000000:1 but 20 bits isn't enough to represent it.
If we're happy to skip 1.5 times brighter, why can't we skip all the even integer times brighter values, and use 19 bits?
Things get wonky if you don't have a linear scale with a true zero; in such a scale the low end of your N:1 contrast ratio (in the smallest representation) has a value of 1, and the high end has a value of N.
Also, if that type of "wonky" throws off your rendering pipeline, you're bound to get something else wrong.
Such as ever having a linear scale with a small number of bits in your pipeline. The linear scaled brightness stays afloat all the way through (cause floats have this handy feature of being transparently sorta-logarithmic in the way they use their bits, even 16-bit floats beat 20-bit ints for that purpose), only at the very end you apply the tonemap+gamma function(s), then dither, then truncate to fixed (8) bit integer.
edit: And aren’t output color spaces already highly nonlinear due to gamma correction?
2^19 = 524,288
2^20 = 1,048,576
Another argument for this formula might come from Shannon information entropy but I'm not 100% sure
EDIT: I might as well link to the author's clarification. I missed it in the noise: https://news.ycombinator.com/item?id=15538738
It is a little odd. I don't think a linear, fixed-precision scale for luminance is used much, though I could be quite mistaken. I'm not very familiar with HDR. Still, I might as well explain a bit further...
Light accumulation is linear. If you have two lights, the amount of light reflecting off the surface is twice as much. Twice as many photons.
Human perception, however, is logorithmic. Something that is twice as bright in terms of photons emitted actually only looks a little brighter to the human eye. To get the same perceived brightness increment again, you need to double the photon count again.
With that in mind, what's the best numeric representation to use? Well, you want a simple way to add up the total amount of light from all sources. The amount of light may vary by orders of magnitude, but it's manageable because your need for precision scales with the magnitude you're dealing with. Basically, you want floating point.
In practice, a float for each color channel means 3 floats (96 bits) per pixel, which is a lot. So, while that might be used during calculations, it's probably stored compressed. For example, RGBE is a 32 bit per pixel format. You use 8 bits per color channel and multiply them all by 2^(E+128), where E is an 8 bit signed integer. It's kind of inaccurate for the dim channels if luminance varies significantly between colors, but the brightest one is probably going to overwhelm the other channels anyways.
Imagine a gradient that smoothly transitions from black to white. 8 bit color means only 256 shades of luminance, which would quickly show banding artifacts.
In fact, the entire reason we gamma encode images is because 256 is too small for even a low dynamic range monitor. Gamma encoding is a clever technique to give us more bits to represent shadows, which humans are more sensitive to than highlights.
You can't take a real range and give a number of bits needed to represent it without also stating what resolution you need. And I think it's unlikely that the resolution that a human can perceive is coincidentally the same as the factor of one extreme to the other.
Much of the post comes from a general assumption that the goal of computer graphics is primarily to replicate how a camera sees the world around it. Thus I think it's easiest to start from the idea of real world light entering a digital image sensor. Light in this setting is not continuous! Each subpixel in an image sensor acts as a photon counter. One photon hits the sensor, the count ticks up by one. There's no question of being able to perceive the values between 1 and 2 because they don't even exist. Either the sensor counted one photon or two. If you were going to literally create a digital camera that can process the entire world, you need 20 bits to count up to a million photons without losing any along the way. So if you were to build the hypothetical rendering pipeline that works on "real world" data about the scene, that 20 bit value would be the input.
As a practical matter, nearly all modern games store lighting levels internally in floating point, in arbitrary units chosen by the developers. Lighting pipelines are not integer based, but they're linear and not perceptual. The conversion to perceptual 8 bit (gamma curve) happens as part of the tone map stage. Doing things in floating point physical units is a better idea than the photon counter anyway, but the line you're confused about was really written with idealized cameras in mind.
(Technically an image sensor is an analog device and the voltage increases with each photon detection by an increment that is subject to noise of all sorts and pre-amplification before hitting ADC. Don't jump me on the photon counter thing.)
But it's rather important, and not even for those reasons. This is not meant to jump on you! (and I really loved your article, like you said it's not a crucial point at all)
The 1:1000000 or 1:2^20 contrast ratio only corresponds to exactly 20 bits if the 1 on the low end of this ratio corresponds to exactly one discrete unit of light (photon). If it's off by a factor of 0.5, 1.618 or whatever, that's what the whole argument is about.
First, the sensor counts not photons but a value relating to photons per second (because exposure time). If the 1 on the low end of the range corresponds to some exact minimum number of photons, it's going to be "one discrete unit of light per <exposure time>". Making the whole thing analog from the start.
Second, those sensors most probably aren't able to count individual photons any way. The human eye, after about 30 minutes to get optimally adjusted to darkness, can sort of perceive individual photons, or small bunches of maybe 2 or 3, kind of. Those barely-perceptible specks of light in the utter darkness aren't the sort of resolution issues we're worrying about in the dark end of these types of scenes. And, as soon as you make a light source that can be described as "emitting single photons" in a certain context, you get uncertainty effects and all that quantum jazz (show me a photon/path tracer renderer that gets the slit experiment correct and you can have your integer photon counters :) ).
So the sensor output values can (and should), for all intents and purposes, be assumed to be an analogue value.
The amount of bits you represent it with just puts an upper bound on your signal-to-noise ratio (as per Shannon entropy). But since we're dealing with 2D images, the distribution of this noise over the spatial resolution (either as a result of sensor noise at the input or explicit noise shaping dithering at the end of the pipeline) also comes into play when considering the quality of the output.
If the signal-to-noise ratio of a sensor output happens to allow for 20 bits of precision, for a sensor that happens to have a 1:2^20 brightness range, that's coincidental. Sure it correlates because higher-end sensors tend to perform better in both range and SNR. But I don't believe that the 1 on the very low end of a discrete range represents precisely one photon per <power-of-two times minimum exposure time>.
 correct me if I'm wrong about this btw. There might be specialized scientific equipment that can, but I doubt even high-end cameras bother to go to the accuracy of single photons. But, I mostly know about digital signal processing, not about state of the art of camera hardware. Yet even if they are able to detect individual photons, that's going to be a probabilistic and per-unit-of-time measurement, so the rest of my argument holds.
 these probably exist, but aren't used for games or photorealistic rendering purposes
To put it in mathematical terms, there is no reason to use the log base 2, but the formulae is definitely a logarithm. So, again, 20 bits is a minimum bit count conveniently used to compare with others situation (e.g. "if you are shooting not in direct sunlight, you will only need 12 bit minimum, not 20 bit minimum so this method of shooting is OK") and make decision about your color pipeline
You will need 20 bits if a human can distinguish 1,000,000 levels between those two extremes
So, for display purposes, 8 bits (with a non-linear mapping) has been sufficient for the brightness range of TV and monitors until recently. Now we have displays going High Dynamic Range. If you tried to display a 10x wider range of brightness using the old 8-bit format, you'd start to see a lot of ugly banding (posterization) much more than you have seen before because each of those 255 flat steps would be associated with a much larger section of the brightness curve. You'll need at least a couple more bits to keep your brighter&darker image looking as smooth as the mid-range-only images did in the past.
For storage purposes, the 16-bit per channel floating point OpenEXR format has been used by the film industry as a intermediate format between image processing tools for quite some time now. I think that demonstrates that 16 bit floats are sufficient for storage.
For computational purposes, 16 bit floats can build up a lot of numerical error very quickly. It's OK for small bit of math (moderate real-time shaders), but when things get complicated, it's necessary to go full 32-bit float.
Tell that to D/A converters everywhere.
> If you tried to display a 10x wider range of brightness using the old 8-bit format, you'd start to see a lot of ugly banding (posterization) much more than you have seen before because--
Because you didn't dither, is why.
The only thing you lose with less bits is SNR, and thanks to the wonders of noise-shaping you can pretty much decide where to put this noise. And I'm just going to make a very broad statement that ever since we managed to climb out of 320x200 resolutions, spatial resolution is where you should put it, not in banding. Maybe even one or two bits in the time resolution if you want to get fancy about it.
Basic dithering can be as easy as just adding a uniform 0..1 random number just before truncation. Triangle noise (difference of two uniforms) can give a bit better visual accuracy for graphics, but has one or two snags that take attention to get right.
What kind of graphics calculation would build up enough error to matter? You don't exactly need to do A + 10000 - 10000 in a rendering pipeline.
If you're summing huge amounts of numbers in a raytracer it's appropriate. Where else?
And .. even a raytracer doesn't sum that many numbers, maybe a few thousand? So even then it's still a very reasonable question to ask.
 assuming a Monte Carlo pathtracer type of thing with 5-10 bounces or so. BTW, do you think you'd ever need more than five? Unless you're rendering close-to-parallel reflective surfaces, but the appearance of those is often as confusing in real life as it is when rendered correctly.
Only if you don't dither. Which is like free bits of precision given today's monitors resolutions.
I don't disagree with the rest of your point (although I do believe that just because there's a 20-bit dynamic range one should represent it by 20 bits of precision that's an arbitrary choice) (not even the best choice, IMO).
Just want to emphasize that, a statement being made all over this thread, that "less bits = you gonna get banding", just means you're doing it wrong, thinking about it wrong, and might therefore even end up with banding if you had enough bits but wasted them at some point in the pipeline.
Imagine you have a lightbulb (producing let's say max 1000 units of light/brightness [which are two subtly different things]) connected to a dimmer that has for example 8 discrete levels. What would you say the dynamic range of this setup would be? Obviously measuring the contrast ratio from when the dimmer is in the zero setting (=light off) to max setting would result infinite contrast ratio/dynamic range. That is not very useful though, so the next best thing is to measure brightness difference from the lowest on setting ("1") to max, which gives you actually a reasonable number, in this scenario 125:1000 or 1:8. I hope this example was illuminating.
So I guess the idea is that bit depth fundamentally determines what is the minimum brightness representable without going to zero (where be dragons), and as such determines the max usable dynamic range.
You could have a dimmer that goes from one level of lightness, up to another, with 8 steps in between. And you could have another dimmer that goes between exactly the same two levels of lightness. The same range. but with 1,000,000 steps in between. The range of the two dimmer switches is the same. They both go from the same minimum to the same maximum. Just the number of steps in between is different. The number of bits represents the number of steps.
Why is 1,000,000 steps the required number? 1,000,000 is just the number of integral factors in the range in base 10. But why is an integral factor more important than a fractional one? They could both be perceptible to a human. The range is the same in both cases, and the given explanation was given that the range was so high.
The thing is that you can't set the minimum value arbitrarily, because the scale is anchored to zero. So "1" is always one step above zero, and the step size is a function of bit depth.
So going with our example here, a dimmer with 1000000 steps would not have the same range as dimmer with 8 steps, because the the minimum brightness of one is 1/1000000th of max, and 1/8th of the max for the other (assuming linear scale for simplicity's sake, the principle applies to nonlinear scales too)
(Pure black might be zero, but that's useless for computing a ratio, so it doesn't give you a resolution.)
Then you need 20 bits, because log_2 1e6 rounds up to 20 (equivalently, 2^20 <= 1,000,000 < 2^21).
If you used only 8 bits, the ratio between highest and lowest (non-zero) luminance would be only 256:1, or 1,000,000:3,906.
20 bits lets you represent all the values which are integral in base 10.... but so what? Why is the fact that a value is integral in a particular base make it the minimum resolution difference that you need to represent? Why do you need to quantise at that level? There is a luminance of 1.5 times n, isn't there? But 20 bits doesn't let me represent that. 21 bits would. See what I'm saying?
The fact is that if you want to have good highlights (beautiful clouds, candles, sun glares etc..) in your image when shooting with a camera, you have to "expose by the right" i.e. if the scene is more dynamic => reduce the amount of light incoming into the camera => your shadows will be crushed down so less precise => you need better color resolution if you want to "save" your shadows OR you need to use another specific colorspace (S log shaped)
It's a problem of presenting that contrast ratio on a screen
that is not capable of a contrast ratio of 1,000,000:1
The short answer is: You're right that it's wrong to say that you need 20 bits to cover the range. Because clearly you can cover the range with 10 bits or 100 bits by subdividing differently. More bits is better, but there's no rule of required multiplication of bits for fabricated metrics like integral brightness multiples. 20 bits is just required to cover the range with integers without skipping any, which is an entirely arbitrary decision.
The long answer is:
The reason you want more bits to represent the larger range is exactly as you say because one doesn't actually care about _range_ but rather about the utilization of increments within the range. Now I'm going to describe something that you clearly already intuit for completeness...
Say, for example, that you have only 3 steps (let's call them 1, 2, and 3) in your range instead of a million, but you want to represent the full range of darkest dark to brightest bright with those 3 steps (for simplicity you could also be thinking in black and white instead of color). Now, you take a picture of a normal indoor environment with some gentle indirect light, and maybe all of external values would be, on the million range, between 350k and 650k. If you linearly divide your range onto 3 steps, when you look at your resulting image you will have a flat solid block of 2s with no 1s or 3s at all. And yet your original setting had an absolutely massive 300k spread. But it didn't have significant brights or significant darks, because you were indoors with some indirect light and there was no fireball sun or deep cavern of despair in the image, so you didn't measure any differentiation in values.
So you decide to switch it up and go inside a dark cathedral with some wonderful stained glass windows (everyone loves cathedrals with stained glass when looking at HDR). This time you do the same thing, but instead of measuring all 2s, you measure all 1s with a chunk of all 3s for the sunlit windows, but no 2s at all.
So then you say, "What I should do is take some of those 3s and some of those 1s and pretend that they're actually 2s when differentiation is too high, and take some of those 2s and split them into 1s and 3s when differentiation is too low! That way I'll see more details, because there will be more efficient utilization of the segments of my available output spectrum!"
But _then_ you say, "Oh crap! I can't do that! I didn't actually capture the differences! I captured what I captured, and what I captured was the aforementioned terrible undifferentiated data!"
So the reason you want to be able to represent values on a _densely_ _segmented_ range is just so that you can differentiate things that are only slightly different in absolute overall term (think photon counts, I guess) because aesthetically those slight differences matter. It means you can do things like know when some of your 2s should become 1s and 3s, because the 2s aren't actually 2s anymore; they're a bunch of tiny fractional variations between 1.5 and 2.5, and that's data you can work with.
You can perform tone mapping with any number of bits. If record 10 increments instead of only 3, you can obviously convert those 10 down to 3 in a way that gives a better image than just starting with 3. And the more bits you have, the more linearly you can record the environment without losing the differentiability between things that are in absolute terms on a large scale very similar. And then you get to do fun things like add a light bloom effect around something that is _very_ bright compared to its surroundings without losing details within the very bright thing and without losing details within the very dark surroundings.
Having more representational bits just means you can tone map scenes that have really bright parts, really dark parts, and really medium parts without losing any of the details inside those parts.
What he is discussing is about aesthetics. There are two things at play here: the big studio business model relies on making blockbusters, so you pack together a few hundred devs and artists and try to pop out the biggest game you can. The other is that many devs are technically trained but not especially artistically educated. They have a limited understanding of lighting, color, framing a scene, etc.
Someone who has made some great criticisms of this as well as provided real solutions is Eskil Steenberg.
The topics are interesting and the opinions are there, but it certainly could be written better.
As a programmer who has done a small amount of work in graphics but hasn't worked in HDR rendering I have no idea what any of these are....
post process LUT color grade
> It alters between putting the blame on technical limitations and ignorance, so I would have liked to see a suggestion on how to address either of these.
I second this.
color grading = adjusting the color in an image, as you might do in Photoshop or whatever
post process = any image adjustment that happens after the image is originally captured
This author is assuming a basic background in lighting / human vision / film technology.
If I were to talk about a web framework and say, "we've got our JSON files wrong and our package configs are cluttered, and don't get me started on the XML files," it's clear that it is metonymy .
From the article:
>>> It was Reinhard in years past, then it was Hable, now it’s ACES RRT. And it’s stop #1 on the train of Why does every game this year look exactly the goddamn same?
If the problem the author is facing is a gap between art skills and technical skills, then the post could be less angry and more informative about color theory or rendering technology, maybe bridging the needed skills.
Though if you look to the sidebar, you can see the author’s recent tweets:
> Tonight in "alarming discoveries" is that I went into my WP stats to find heavy traffic from Reddit. From r/pcgaming no less - blech.
> Practical consequence is that I'm going to have to write future entries from more basic principles, because this is getting out of hand.
So maybe you’ll get your wish.
Photoshop is a very basic tool for this. Color grading tools used in the industry are far more advanced than the options offered in PS, take a peek at Resolve (free).
But personally I find Photoshop to be more flexible and more capable than any of these film tools I’ve looked at, none of which seem to be capable of the kind of (very non-standard) workflow I use in Photoshop. Photoshop tools (layers, masks, adjustment layers [especially curves, and more recently full color lookup tables], blend modes) can be combined in very sophisticated ways which can be set up via a macro action, whereas most of these other tools are a bit more prescriptive. For example I can build my own “hue vs. hue” tool (from the Resolve marketing video) out of more basic Photoshop building blocks, and set up an action to pop one up with a keystroke. To be fair Photoshop also has piles of highly constrained single-purpose tools, most of which are outdated and relatively useless.
But none of these tools was ever designed with a user in mind who would think abstractly about building blocks and means of combination. More abstract and generic functional-boxes + arrows kind of interface could probably do better from a flexibility perspective, but can be a pain in the butt to use.
Someone with a color science, user interface design, and art background and a lot of time to experiment who was trying to make a tool for professionals could also certainly do a heck of a lot better. Most of the individual user interface components in these tools [including photoshop] are pretty uncreative and inflexible. Maybe this will be me in a few years when I have some more time, we’ll see.
Admittedly I deal with still images, not video.
But everything shown in that video (other than motion tracking) can be done step for step equivalently in Photoshop, albeit some bits take some nonstandard combinations of tools. Some of it would probably be more convenient in PS to achieve via a different method.
I don’t exactly understand what the compositing order is of “parallel nodes”, but I’m quite convinced I could mimick the effect almost precisely given a technical description.
The nonlinear ordering you mention could be done using smart objects, but that can get annoying, so it’s usually easier in practice to just duplicate the layer (which breaks the link if you change something up the compositing chain later).
1. Pick your tone mapping early in the development process.
2. Look into the models are used by the camera / film industry, because they're clearly getting better results.
It's not a complete discussion, but as a student in computer graphics, I found it very informative. The title says Part 1, so presumably there will be more. The links were also all very good, too.
A game like Team Fortress 2, released in 2007 (!) looks so much better than many modern games because there is a coherence and style to the art. It's not "HD" for the sake of "HD."
We're in a period where the graphical canvas is getting larger every year, and the temptation is to fill it with as much color and pop as possible. But some restraint really works wonders.
It's been a while, but last I looked, it seemed that the temptation was to use ever muddier and desaturated visuals, with as much glare and shiny surfaces as possible.
But in this decade things are finally feeling more evened out. Lighting models are sophisticated enough to allow for designs similar to a film set or photo shoot, and post-processing is getting past basic glows and filters and into a spectrum of quality/performance tradeoffs.
Of course, the games that don't aim for photorealism always age better. That's been the case since people started digitizing photographs for games.
If the main goal in a game is to make the graphics of yesterday look and feel obsolete, then that game will probably look and feel obsolete tomorrow...
All 3 examples here are excellent but I want to say I know a lot of people hate on Destiny but the art direction, environmental art, creature design is all magical in terms of tech and results. Honestly feels like concept art brought to life and anyone dismissing it and not even giving it a chance just are missing out on some seriously impressive visuals resting perfectly between realistic and stylistic.
I don't know enough myself about HDR rendering to have an opinion on whether the author is right or wrong about any of this, but people arguing that the author is wrong because games shouldn't look photorealistic are attacking a straw man.
But the industry isn't stupid, and some of the "over-contrasty garbage" is intentional to help people rapid identify the things that they need to identify. Being able to quickly spot the things that you need to play the game are more important than sensible tone maps and contrast curves.
This is not a medium you observe like film & paintings are, it's a medium you interact with, and rapidly at that. Approaching the visuals with the same perspective as something you only observe is a fundamentally flawed approach.
This article is packed with baseless assertions, unfounded claims, and offhanded but not investigated technical limitations. Hell, right after saying "partly due to technical limitations" the author proceeds to throw out a "nobody in the game industry has an understanding of film chemistry" snark. The author doesn't even bother to elaborate on the technical limitations that they remarked on, opting instead to just hurl insults for no reason.
> [...] some of the "over-contrasty garbage" is intentional to help people rapid identify the things that they need to identify. Being able to quickly spot the things that you need to play the game are more important than sensible tone maps and contrast curves.
Is this speculation or were you involved in these discussions? Or, maybe a GDC talk? For all I know, you could the lead graphics programmer for Naughty Dog, in which case I'd totally take your word for it.
And, actually, I really would like to hear more specific criticisms. As much as I respect Promit, he absolutely could be wrong and a detailed breakdown from someone who knows better would be informative.
> ask WHY games are the way they are, though.
To add to your point, the main reason for "bad HDR" in FPS games is for simulating eye adaptation. Screens don't have 1,000,000:1 constrast ratio and can't shine as bright as the sun (we wouldn't want them to). This means that games are forced to approximate this effect - inevitably resulting in shortcomings.
Furthermore, game development is a race against time. You have milliseconds to render the scene and apply tone mapping. It may very well be mathematically possible to create automated tone mapping that would make a cinematographer proud, but can you make an algorithm that completes in less than 1 millisecond? I doubt it.
To my point, there are mods on PC that replace the tone mapper (I think ReShade does this) and they absolutely destroy your framerate - however the results are absolutely gorgeous if you know what you're doing.
How is that different from what photographs do?
> Furthermore, game development is a race against time. You have milliseconds to render the scene and apply tone mapping.
Maybe I know just enough to be ignorant of the topic, but LUTs are just lookup tables. People working in film apply LUTs to the footage they're watching in realtime on normal PC hardware. Fragment shaders have been common in games for years. Last year I was working on a VR project (targeting 11ms per frame) and I think the post-processing took between 0.25 and 1.5ms--any more an it would have been dropped). For a desktop game having 16 or 30ms to render many fewer pixels is an eternity, comparatively. The only difference between a LUT and ICC profile (which most OSes support out of the box) is that for ICC profiles are mean to be calibrated against either an input or output--they're used everywhere in realtime.
I've never actually used ReShade, so I had to look it up. It does way more than color correcting. Depth of field, ambient occlusion, antialiasing are completely separate from tone mapping and notoriously computationally expensive (not just in video games, but in film as well). I also think since third-parties are writing them, they're not always tuned to be performant.
A typically developed HDR photo is a single sample following the time at which the eye has adapted. There are [hopefully] no over- or under-exposed areas.
The dynamic range in HDR games is often used to simulate the entire process of eye adaptation. It's a function over time that intentionally creates over- or under-exposure some of the time.
I worked in animated feature films and had similar critiques about the lighting and look. I felt like they lit everything like a well exposed photograph and were afraid of letting things fall off into dark or light (part of that could have been poor simulation of the toe or head of the film curve).
Then, when Pixar did Wall-e (2008) they hired a well known cinematographer, Roger Deakins, as a consultant. DreamWorks also hired him for How to Train Your Dragon (2010). I think that was a big change and both of their studios approached how CG movies look.
Maybe a little snark is good. These big studios have been pumping out bigger and louder games for a decade, and each one is more soulless than the last. There's a reason Minecraft took off the way it did. It's because it was a return to the visually appealing age when pixel colors were in some sense manually selected, resulting in a pleasant palette. I can say without a hint of irony that I prefer the look and feel of the original Doom to any modern blockbuster game.
Are you here to tell me that Minecraft and Doom are paragons of bad gameplay?
I could go on for a while about how Zelda BOTW manages to integrate the design and aesthetic choices, but I'm going to get mugged if I try to use that game as an example again. Maybe I'll use it as part of a different write-up on visual design.
Similarly combat in BOTW happens up close, you don't actually need to distinguish things at range. This is not generally true, especially for the other titles you're comparing BOTW against. BOTW also has problematic cases such as this: https://static.gamespot.com/uploads/scale_super/1552/1552458... That rock face is horrible. Totally washed out in a highly unrealistic way, to say nothing of the ridiculous saturation of the torch light.
I guarantee game developers and game artists know a hell of a lot about HDR and color and lighting, and all kinds of things related to it, probably more than most people involved with movies. They just come at it from a different viewpoint - that of real-time rendering.
The problem with the game doing fake-HDR is that while it looks great in trailers (or... at the very least, looks like the engine is doing something fancy with lighting, "great"-ness notwithstanding) it doesn't match what my real eyes would do in a situation. Unlike film, a video game doesn't have control over where I'm going to point the camera. I do. And sometimes, I need to see over that ridge, but the detail in the shadows is important too, and the game camera can't see what I'm actually looking at (with my eyes) and adjust accordingly.
Breath of the Wild is a great counter example, because it doesn't really try to do the HDR adjust thing at all. It doesn't really need to; sure its daylight scenes are a bit washed out, but real daylight is a bit washed out too. My eyes adjust, just like they would outside, and I can actually see what I'm doing. That's important! With games, tempting though it is to control the camera effects just like you would when shooting film, you can't control the player's actions. It's usually better for gameplay to mellow out the effects, so the player can focus on their actions without being distracted by the 'pretty' effects.
An especially egregious offender in this category is Battlefield 1. When looking out of the driver's hatch of any vehicle, the game will randomly and chaotically fade it's "HDR" simulation between the exterior being discernible and a white rectangle (the hatch) set on a solid black background (the vehicle interior). The fade depends on miniscule adjustments in the PoV of the player, which is fixed relative to the outside world, not the vehicle, so you have to constantly track the view port.
Battlefield 1 generally looks best in scenes featuring low (natural) contrast.
> Yeah, and we screwed that up too. We don’t have the technical capability to run real film industry LUTs in the correct color space
(The reason being that even fairly simple color correction pipelines in a real color grading tool keep a mid-range GPU pretty busy at 1920x1080 and higher resolutions).
Of course, this is also how eyeballs work...
So, adjusting the curves, especially at a slow, steady rate? That might be a good idea, and should actually aid visibility if it's done right. Say, walking all the way into a dark cave should lighten things up a bit, since your real eyes (and most cameras) would do exactly that. On paper, this part of the HDR effect works, and if it's done well it enhances the gameplay rather than distracting from it.
Imitating a faster, more jarring camera transition when it fits the mood, especially during cutscenes? That's fine, and the article gives some great examples of this being done well. Scaling up the contrast so that any HDR changes cause you to either lose all the detail in all of the darks, or all the detail in all of the lights? That's the problem. The effect, if it's there at all, needs to be a lot more subtle. This is partly due to limitations with the final RGB space (only 255 color values to work with and weird curves to boot) but mostly due to the fake-HDR effect just being a poor imitation of nature.
That aside, I found the discussion about the misuse of ACES particularly interesting. It's like the completely misunderstood the tool and how it's intended to be used (which isn't uncommon with the tech unfortunately).
Everything is blurry, you can't see far, you get dirt over your vision and there are glares and lens-flares all over.
What's worst is that this is done on purpose and on almost all the games. They must think that this is what real life look like.
A lens that completely separates color channels is defective. It was defective in 1950 and it's defective today, so unless your goal was to replicate the look of hopelessly broken hardware, it's in no way "vintage".
Which I think goes more generally into my issue with game graphics - the taking of interesting effects (depth of field, chromatic aberration, lens flares, etc) and cranking it up to 11 for its own sake.
Also, AA is another thing I generally turn off. It gives quite a performance boost and the resolution these days is high enough that I'm not bothered by it.
> So why the heck do most games look so bad?
And then they go on to pick a game that people specifically calling out for looking amazing: HZD.
In fact, the problem seems to be:
> But all of them feel videogamey and none of them would pass for a film or a photograph.
To which I would reply: because they are video games. A photograph doesn't pass as film, and film doesn't pass as a photograph (though the nature of the two, film essentially being a lot of photographs). A video game must serve a different purpose than a photograph or a film. The players interact with the game, and therefore the intent and purpose is different. Glares on the screen aren't just there to add lens flare, but to make it harder to see because that's part of the challenge. It is harder to see, and you have to deal with that to overcome the challenge. That can be offset by keeping the sun to your back. Same with grime in the screen, or other such things. You want to highlight the things players need to react to, and in such a way that allow them to react quickly.
There is so much more, and yet so far this article is only referencing the pure look and seems to ignore the goals of the medium. I'll reserve judgement until I read the next parts of the articles, but until then, I can't help but wonder if this really matters.
Failure at realistic game graphics is so common, so taken for granted, that the scale of artistic accomplishment shifts downwards: if HZD looks only slightly bad, not constantly, and without disgusting the player too much, it's considered to have "amazing" graphics.
Until graphics and animation are photorealistic I prefer my games to look videogamey. All those 4 screenshots are beautiful and I wouldn't want them any other way.
There is a reason why Skyrim, Mount&Blade, GTAV, etc are still dominating playtime charts years after release. Good enough graphics + deep, flexible mechanics is a much better combination in my opinion.
The bar for "great" graphics is going to change as new hardware comes out, but its a stretch to say those games were not pushing rendering on their target minspec systems (Xbox 360 and PS3) at the time of release, there aren't any examples of open world games that looked better than either of those games and ran on that hardware.
I know it's fun (although tired) to jump on any thread about game graphics and say "Graphics don't matter, games with great graphics don't have good gameplay!", but sharing knowledge about how to tune and implement postprocess to get better image quality isn't hurting gameplay quality and can lead to games both looking and playing better.
Those are all sandbox games that you effectively never stop playing. People tell me that there's a story and an endgame in Skyrim, but I've played a few hundred hours without doing any main-line quests past the first city. Mount&Blade I've sunk even more into - there isn't even the pretense of a story there, so you're pretty much on your own if you want to role-play, or just take in the enjoyment of riding around, hacking at people, and looting their corpses.
Anyway, the point I was going for the last thing we need is having them spend even more effort trying to make pretty looking but mediocre playing games.
Yes, that sunset is pretty, but now I can't see anything else. I do not want to reproduce my frustrating commute into the sunset in a game.
Another pet peeve of mine: ambient occlusion (crude GI approximation) and especially SSAO (crude approximation of a crude approximation).
Corners don’t look like that:
(Ambient occlusion proper and in moderation I think looks fine, but most games really overdo it.)
That's a huge leap, but it has the benefit of being true.
(To be precise: no one has ever created fully-simulated video capable of fooling human observers anywhere close to 50% of the time. The video has to be reasonably long (>30sec) and complex. But in a double-blind test, almost any nature video will handily beat any synthetic video.)
Out of curiosity is this from something? it sounds like you're citing something and for realtime I'd agree. I think people are fooled everyday by raytraced images (movie cg etc).
> no one knows how to make any fully-simulated video look indistinguishable from reality.
I'd say we know how to do it, we don't know how to do it fast.
If all I do here today is hatch an egg of doubt in your head, I would be delighted. Someone needs to carry the torch.. My life has turned to other interests.
It's the central problem. It's as hard as AGI, and it might be as impactful as the invention of the airplane.
Think of it. Fully-simulated video, indistinguishable from reality.
In many ways, it was my first love. The desire to be a gamedev drove me to learn programming. I ended up a graphics dev -- Carmack's old path. My job was to make a certain game engine look better. What an innocent problem... 12 years later, I still feel the pull, the need to blow off everything in my life and bend a computer to my will. To our will. The human mind has never once achieved this goal. And it was the perfect problem... I never cared much about fame or money. But to be the first. Think of it... How can you not want to spend the rest of your life on this? The solution is out there, taunting us. Everyone is pursuing physics, when all we need to do is pursue the fact that video cameras can already generate images that look identical to real life.
The ancients have made up stories about the sky and stars since long before civilization. Put yourself in their shoes, if they even had shoes.
Look up. It's the night sky, far brighter than anything we can see today. Imagine staring up at the infinite complexity, wondering, how does it work? Why do the stars go the way they go? We tell stories; could one of the stories be right?
A few millennia later, one person had an idea: What if we watch the stars very, very carefully, and collect the stories very carefully? We could compare the movements of the stars to the stories, so that the alternative theories might be distinguished from one another.
This was the key to modern science, and the root of wisdom. When you stop thinking about what everyone else is doing, you're free to hit on solutions that everyone else overlooked.
And think of the feeling you'd get when you finally solved it. Can you imagine? You'd get the same rush as the Wright brothers, or Ford when he made the assembly line, or McCarthy when he stumbled across Lisp.
If you or anyone else intends to take on this challenge, know this:
The fact that no one believes you when you say "No one has ever done this, and no one has any idea how to do it," is your biggest advantage.
It means you're free to spend the next five years figuring out that solution that everyone else missed because they were too busy chasing the pipe dream that if you throw enough physics+time at a computer, it will produce synthetic video that fools people into thinking it's real.
The moment people realize that it's probably not that hard, you'll lose your advantage, because every top tech company will start exploring your problem space. Like if in 1902 you'd hinted to a top university the gist of Einstein's thesis. No one would take you seriously. Lucky for you.
So, what's the secret technique? Well, if I knew that, I'd have fulfilled my 12-year dream. But I know a few things that will move you (12-N) years toward the goal.
There is one rule, and one rule alone. You have to force yourself to stay true to it, or else nothing else you do will matter. Here it is:
If you get a dozen people together, and show them a mix of 10 real videos and 10 simulated videos, and those videos are reasonably complex, like clips from a nature documentary, then 12 out of 12 people will effortlessly call out your fake videos as fake and your real videos as real. It's not even close. That's how far away we are from the goal of fully-simulated video indistinguishable from reality.
Maybe I've hooked you at this point. Maybe not. But if anyone comes up with a way to fool those 12 people so completely that their responses are no better than random chance, you win.
Let's call this the "Carmack criterion." If you tried administering the above test to a dozen clones of Carmack, here's how they'd sound: "That's a fake. That one's real. Fake. Fake. Real. Real." No matter how much ornament or showmanship you throw into the video, you can't fool Carmack. He'll report whatever his eyes are telling him.
And as of 2017, he'll be right 100% of the time. His eyes would shout: "None of those fakes were even close to real! Are you kidding? One of the real videos was of a lion taking down a gazelle. I know every artist in the gamedev industry. None of them have ever produced anything approaching that level of quality, even working together."
That, and that alone, is the game. Literally nothing else matters. If you can fool people until their responses are statistically identical to RNG, you've done it. You're world-famous. Yer a wizard.
Corollary: you can use the Carmack criterion like a compass for every decision you face. Should you research physically-based rendering, or try to apply machine learning? The latter seems unpromising. Yet Hollywood has been administering the Carmack test to millions of people, most recently with Avatar, which completely rules out physically-based rendering. So we know to spend zero time on it.
As you can see, that razor is so sharp that it will cut away every illusion you might try to cling to that humans are anywhere close to achieving the Carmack criterion. Or that some smart hacker somewhere has a pretty good idea of how to achieve it, or that it's just a matter of letting computers advance another few decades, or any other false reason that those around you like to tell themselves.
But if mainstream ideas are dead-ends, then what should we research?
I hesitate to give concrete suggestions, because the history of science demonstrates that progress isn't made like that. Whatever the real solution is, it's far beyond anything you or I can imagine today. People were forced by mathematics to believe that planets' orbits were elliptical. An ellipse is the only shape that makes the numbers come out right. Yet how many of our ancestors came up with that idea? Even by accident, it's probably too bizarre for anyone to seriously consider it. Not without mathematics.
Yet that's a positive statement: It meant that if someone were audacious enough to trust in mathematics alone, they could determine the right answer. The solution was always there, waiting for you to find it.
To make any progress at all, your ideas will need to seem shockingly different. The whole world has spent two decades going over every inch of physically-based rendering -- presumably hoping that if they put on a different pair of glasses, maybe they'll spot anything other than a mountain of evidence that it doesn't work.
So you have to let yourself consider every angle, no matter how strange.
720 frames of 720p video. That's all you need. That's 30 seconds of HD footage. Get a computer to conjure up those 720 frames. Summon DaVinci's ghost, and you win.
Whenever someone finally solves this, you'll think "Oh, right. That technique makes sense." But it only makes sense because you see it works. Till then, that correct answer will seem to be a complete waste of time.
Think of Airbnb, and how awful their idea sounded. Yet when someone spent a couple years exploring the problem space, shazam! Out popped a billion-dollar company.
Since there is ~zero chance these ideas are anywhere close to the right answer, here are the two avenues I left off with:
1. A video camera generates images that pass the Carmack criterion. Ask yourself: why do those videos look real? And why is it so important to judge video, not photos? (It's crucial.)
This is key: Are you absolutely certain you should be ignoring the fact that any old camcorder's videos look real? Whip out your phone. Take a video. That video looks real. Why? Quantify the difference vs footage of the latest game engine.
(Try to avoid using the latest movies as a basis of comparison, because movies mix real-life footage into their VFX. Our criteria of "fully-simulated video" is strict by design: it keeps us honest about our progress. Especially to ourselves.)
2. After meditating for a year on why crappy cellphone videos look real, you may start thinking along the lines of "how can I write a program to mimic the essence of that realism?" It looks real because the colors are exactly right. Think of evolution, and how long we've been evolving. That whole time, every single one of our ancestors were staring at images that they believed were real. Our brains are wired to notice even a hint of strangeness. ("When we notice there's something strange about a video, what exactly is going on there? What do we mean by that?" is another "fun" question.)
Now, wouldn't it be handy if someone knew how to write a program that can mimic real-life data? If only such a technique were possible... We even have an infinite stream of pre-classified data to feed it: phones and webcams.
Maybe that could bring you closer to understanding what the important factors are that the current graphics pipeline can't do. Why is it hard to get the pixels in the simulated video be the same color as the cell phone ones? Are the materials off? The shadows? If you can't even make the teapot look real, then you've zeroed in on something fundamental that's still going to bite you when you're busy trying to rig antelope skeletons.
Of course it's ridiculously simple, so you may want to increase the bar a little to impress GP :p
If it gives you any sense of hope, I was actually paraphrasing Carmack himself as part of the source of "we already know how". I believe from a Quakecon Keynote (I'm not sure what year unfortunately, the context was PBR and approximations).
The real thing I'm not certain of is do we want to live in a world where we have the ability to create in realtime video content entirely indistinguishable from reality.
The entertainment and simulation benefits would obviously be amazing. The ability to recreate phenomenon we can observe in science but can't see, etc.
But there is also the possibility to weaponize that. Seeing is believing and what do in a future where we can't trust anything we see.
Part of me is glad we don't have to ask ourselves these questions yet.
Does that need qualifying with "in a reasonable time frame and/or budget"? Or is it really just that bad?
If you bet $3,000 that "we have no idea what we're doing" is an accurate assessment, you'd win.
How can that be? Because color science is very difficult. Your eyes are designed to fool you.
When you're born into a certain time period -- a random slice of human history -- the probability the dominant school of thought is mistaken is nearly 1. Wouldn't it be remarkable if we were the first generation who figured out all the truths?
The hardest part is admitting to yourself that it might be true. Could it be possible? Has the world collectively been using techniques that are nowhere close to the final answer?
I launched myself into that question with an open mind. As far as I can tell, the answer is yes.
From having worked in the industry, it's pretty accurate to say that most graphics programmers haven't read any books on color, or the human visual system. I nearly didn't. I was dragged into it because I kept getting strange answers when I tried to mix colors and quantify the diffs -- I was trying to do the same experiment that CIE 1931 did, but I got very different results. That led me to the Musnell color system, and to the history of color theory.
If you glance over the history, you'll notice that our understanding of color keeps changing. The models keep being updated; we can never quite figure out whether they're right. If CIE was perfectly accurate, we'd never have invented LAB space, because CIE would perfectly match nature. Right?
Musnell tried a different approach. Rather than coming up with a fine-sounding theory and curve-fitting it to the data, he built a model directly from the data. One of the most powerful techniques at our disposal is to use our own eyes as a null instrument. You have to have absolute confidence in your own judgement -- I cannot overstate how easy it is to fool yourself -- but if you are as methodical as a robot, you can come up with surprising answers. When those answers contradict the established science that everyone believes, you start to worry. Maybe you weren't careful enough, right? They must know what they're doing; this is what everyone believes, after all.
No... At the end of it, you discover that it really is that strange. Color science is one of the hardest to quantify. There are hard answers, but only when you strip away all the context your eyes relies on. When you look at something, you see literally a million clues that tell your brain it's a 3D shape and that X color is brighter than Y because of Z. Nature has spent a billion years evolving your brain to be able to process all of that instantly. It's impossible to be consciously aware of everything that's happening.
The only way to answer your question is to (a) come up with a methodical test, (b) conduct it meticulously, then (c) trust in yourself and the fact that you are competent and were extremely careful.
If you do all three of those things, you will be dragged kicking and screaming to the conclusion that not one person anywhere in the world has any idea how to generate 100% synthetic video. We don't even know where to begin. No one knows even roughly what the final techniques might look like.
Think about how integral a good artist is. Every rendering pipeline in the world is built for artist flexibility. When a talented team of artists feel empowered by the tools you write for them, they end up producing a different kind of movie altogether. It's not a matter of degree. The reason movies look incredible is because artists mastered the tools we make for them. That's their role, and this is ours. Both halves are crucial.
Yet what does that imply? Imagine we invent a program that produces perfectly real video. People think it looks like a nature documentary. Now think about everything an artist does in a modern pipeline: they decide which shaders to use. Which materials to apply, and to what. The base color of everything. The shape and the animation. They arbitrary select the physics. When grass changes color from green to brown, it's because the atoms they're made of are changing -- everything that makes light bounce off grass in a way that looks real, those are the parameters that artists change "till it looks good." It's arbitrary. It makes no sense to say with a straight face that we've created a "physically based renderer" when the artists have complete authority to break every assumption and piece of data that those physics simulations were modeled from.
The fact that they have so much flexibility is a strong hint that we are very far from mastering this. If artists' jobs were mostly identical to a set designer's job -- placing lights, arranging the scene -- then our renderer must look so real that it may as well be reality, right? If it looked perfectly real, there would be no reason to change it, except as a stylistic choice (which is fine, but it's unrelated to our goals).
Now, people will immediately try to convince you that there are engines out there that work that way. Artists are mostly set designers, they say. But all you have to do is look. Take a clip from whatever movie they produced, and put it side by side with a nature video. Then put it next to the most visually cutting-edge movie you can think of. An honest assessment will show a striking difference.
Lack of flexibility kills the art. The flexibility is the only technique we have. The fact that you can get really talented artists together and give them highly advanced tools, and they end up spitting out stuff that looks real -- it's not inherently obvious that we should've been able to invent those techniques! The fact that it's possible at all is amazing. When graphics programmers believe in the ideology of physically-based rendering, they become slaves to hubris. They start thinking it's reasonable to take away the only tools that work.
Ask yourself this: When an artist is free to flex all the parameters until it looks real, what's going on there? What does that mean, in a fundamental sense?
It's a deep question, and I still haven't come up with a complete answer. But I think it's reasonable to say that artists attempt to make the output on the screen match the output that a video camera would have recorded, if the scene were real. Yes, they make a few stylistic tweaks, but all of it still looks awesome. That's why people pay to experience it. It's partly why Star Wars was such a hit. It was believable.
And that, my friend, is the real question. Asking "Do we know how to make something look real, if only we spent enough money or CPU power on it?" turns out not to make any sense. Counterintuitive, yet true. People care about making movies or manipulating images in photoshop or making games look awesome. They don't care about wasting time trying to coax the computer into generating video that can fool an audience -- they already have a thousand techniques for fooling them! Why generate it when you can mix in actual video from the real world?
As strange is it sounds, I think the full answer is: no one realizes we have no idea how to generate video of complex scenes indistinguishable from reality in a double-blind test because there's no money in it. Not yet. If you happen to invent it, your company might make a million dollars. But you're more likely to lose a million by trying to achieve that objective.
What about scientists? Surely some of them must have spent their lives trying to answer such a deep and fundamental mystery?
Yes and no. There is a lot of impressive work out there, but scientists are mainly concerned about getting published. Their careers are at stake. If you don't publish, you can't get funding, and your impact comes to an end. And the problem is ambiguous: what does it mean to publish a paper related to the idea that people don't know how to generate synthetic images that look real? Everyone already knows that! You can't write a paper on that. The best you can do is try to come up with a paper about an incremental improvement.
And that's exactly what we see. It's all we see. Negative results in science are mostly discarded -- much of the time, we simply don't hear about them. I am speculating, but I think this would be even worse in color science: it's not very prestigious work. When you run an experiment to validate the CIE model and end up with wildly different answers, what do you do? As a scientist with a deadline that may literally kill your career, how likely are you to chase down this mystery? Or to have the freedom to suddenly pivot, and to make the paper about that?
It felt strange to realize no one knows how to make 100% synthetic video look real. It's like sinking up to your neck in quicksand: an inescapable conclusion, and consequences people would rather not dwell on.
Focus on the data. That's the key. Not opinions, not what the professionals believe, but data -- hard answers, obtained from careful experiment with a large sample size -- you arrive at some very unexpected truths.
It's hard to know what to even do with the information. What do you even say? I would've dismissed this at 23. "What are the chances that everyone in the world is being sent to college to learn the wrong techniques? And what about all the published research? The guy who inspired me to become a graphics programmer worked so hard on his graphics engine. He spent years thinking about it. You're saying he has no idea what he's doing, and that all the techniques are fundamentally flawed? That they're not even kinda-sorta close?"
All I can say is, look at the data. Pretend you're piloting a plane in complete darkness. You either trust your instruments -- your carefully-designed experiments -- or you don't.
I’ve never played HZD in anything but HDR mode (in fact, I bought the game specifically to test HDR) so admittedly I don’t know the difference there. All I know is in HDR mode, HZD is one of the prettiest games I’ve ever played, with fantastic colors and brilliant transitions in luminance.
HDR I feel is one of those things you just can’t appreciate in screenshots or comparison videos. You have to get a good screen (I can’t see myself coming back from OLED, it’s – no pun intended – a game changer) capable of HDR and just appreciate it for yourself, it’s really something.
I’m sure it’ll be even better as technology matures and game developers become more familiar with it, and as well the market penetration of HDR capable screens increase, but to say they don’t know what they’re doing now seems a bit of a stretch. Especially if all you have to go on are screenshots.
I also get a very strange sense of depth from those images -- it feels like everything is either very close or very far away, without having a sense of how far apart objects actually are.
In comparison, in the BOTW screenshot, I can clearly identify objects of interest (a shrine, a tower, a volcano, a river, a bridge) and have a sense of where they are compared to each other and myself.
That said, I feel like it's not a completely fair comparison -- the "bad" images all seem to have the camera in a dark area looking out and up into a bright area, while the BOTW camera is on top of an exposed mountaintop looking out into the sunset. The lighting is going to look a lot different regardless. However, the HZD image after _that_ is really quite garish and harder to justify.
These effects are beautiful in Colossus because they were chosen for artistic reasons and they are artistically used. The lack of color saturation complements the game's lush, but somehow bleak and foreboding, natural environment. The extreme contrast highlights the differences between the dark and dank indoor environments of the various ruins and temples, and the raw untamed wilderness that lies outside them. And the dynamic exposure adjustment effect is, vaguely, supposed to parallel the main character's eyes adjusting as he moves from one kind of environment to another. (Either that or just "hey look, we can do this on a PS2!")
It's kinda like how the parallax line-scrolling effects in Shadow of the Beast were amazing in that particular game at that particular time, but take the same effect in a shitty game -- like, say, Bubsy -- and it just looks awful and tiresome.
Edit: As far as I know render engines use the complete dynamic range until the last step which converts it to a 16 bit output range. So maybe games can let the user decide which tone mapping algorithm to use.
This is also why a lot of people prefer older FPS titles or games like TF2 used stylized profiles of the characters that stand out. In some games, you want interpretation of the game scene to be a challenge, but in most games, you want players to easily tell what they are looking at. Using high contrast / distinct models / distinct animations / color differentiation all help to achieve that.
But there's a small minority of games that do look great:
I don't think it's fair to put Shovel Knight in your list.
Yacht Club Games tried really hard to recreate a NES aesthetics and constraints (limited color palette, sprites, etc...) with careful deviations when it is really necessary . I don't think you'll find more aesthetically pleasant game in the NES library.
Hence my mention of historical reasons that are holding game aesthetics back. I think making good art requires criticizing your childhood tastes, not following them.
> But all of them feel videogamey and none of them would pass for a film or a photograph.
I would definitely prefer games to look bad (or preferably stylized in a way that works for the game) over having a somewhat realistic looking game ( only when your standing still so the frame rate doesn't dip) and terrible mechanics and a bad story.
Far to much information and talk about making games look better and very little about how to make games actually better. Modern gaming to me has become the same as clicking the X on hundreds of popups from your windows 98 machine. Just with somewhat realistic looking ugly faces instead of X's.
I'm not sure but I think one of the issues are from the 16 - 235 channel remapping
or there is some problem in the pipeline between uploading your video and it getting reproduced on screen by the player inside a web browser
videos exported from e.g. FCPX have colour profile attached and look right when played back in Quicktime desktop player, but the contrast always gets messed up when it comes back through YouTube
this is a separate issue from compression artefacts
But using stark moral terms for aesthetic judgements seems rather insular at best. When I think about games like Minecraft, Factorio, RimWorld, and so on, I wonder if maybe imitating movie-quality graphics is really as important as the author thinks it is? It's only a small part of the experience.
And HDR doesn't really do anything to materially aid what he's calling out as well done in the first place. It's ancillary. It's not important. What's important is actual artistic intent and structure. That's what's lacking, along with a misunderstanding of why certain techniques work in film and don't when they are uncritically applied to games.
P.S. Unreal Engine reintroduced forward (non-deferred) rendering as an option last year because it's more efficient in VR.
A few months ago I hacked tonemapping straight into the ubershader of some foss Quake client, you don’t even need a seperate buffer to do these things.
I remember one game ("Through the woods") which had this problem in a very aggressive form. It was very visible while playing, but it's hard to find a screenshot.
Here, for example, the house on the left (but also the cabin in the center:
What's the exact issue? Is it a tone mapping problem? I'm curious, because I have a rather good monitor, and some dark games (eg. "Alien: Isolation") are equally dark, but don't look as bad as this.
I'm not sure this is the case in your example, but a lot of render engines use an environment light when rays don't detect collision.
So the camera is looking at the wall of the house and bounces into oblivion. Therefore the environment values are used. And since the stones of the house are gray a lot of environment light bounces back to the camera.
But when the camera is looking at the trees a lot of rays bounce to leaves and the ground returning almost no light back to the camera.
I would even say that your example looks realistic (although way to dark).
But yes, Arri does have a beautiful soft look along with Fuji digital cameras, and plenty of film stocks do as well. It all depends on the tastes. Gamers aren't going for soft reality. A lot of hard-edged gamers are going to go for the high-contrast, desaturated ExTreMe ChaLLeNgE look.
Yes, this indicates the article was poorly written. I advise you move on rather than reflexively attack nothing.