Hacker News new | past | comments | ask | show | jobs | submit login
Correct SRGB Dithering (thetenthplanet.de)
81 points by ingve 6 months ago | hide | past | favorite | 31 comments



This is not correct dithering, because uniform noise is not correct.

You need triangular noise, and have it somehow be aware of sRGB. Why? Because otherwise your variance is not uniform in your dithering spectrum, and gives noticeable artifacts. So it's not correct to only choose from floor(c) and floor(c)+1, you actually need to choose from round(c) - 1, round(c) and round(c) + 1 with the right probabilities.

I've gotten stuck on this for days, and even tried to ask on stackexchange, but to no avail: https://math.stackexchange.com/questions/3200249/maintaining...

Yet this is important, as dithering with triangular noise gives vastly superior dithering. This is 5 grayscale values with my best found method (which is not fully mathematically correct, but a close approximation):

https://i.imgur.com/sNq20Oo.png

The yellow line indicates sRGB signal, the red the error, and the blue the variance. Now compare it to OP's method:

https://i.imgur.com/FjBUuCr.png

Notice the significant banding? It's because there are regions where the variance of the dither essentially hits zero, producing a (near) pure color band that's very noticeable.


which math package are you using? Mathematica, Macsyma, ...?

also, I have uBlock origin, and the imgur .png link shows me the page without the png... (EDIT: I just used wget on the exact same URL and it just downloads the png, it boggles my mind why firefox doesn't just show the PNG, or is this imgur detecting user agent etc?)

Also, that's a good observation, that the spatial distribution of variance is noticable.

Would you mind sharing the code for the last 2 plots you generated?

For incorporating gamma, and using the original HN post terminology, what prevents you from generalizing your technique to the optical (linear) domain instead of the electronic (gamma) domain?

Is it because you need to devise a new family of probability functions so that the expectation value in the optical domain is linearly proportional to the input optical value while having the bins spread according to the gamma curve?


> Would you mind sharing the code for the last 2 plots you generated?

I just quickly edited my code to have OP's method, but didn't keep it after generating the plot. The code is generally a mess, because I kept experimenting with different methods, but this is the current version: https://www.shadertoy.com/view/wlX3RS

> For incorporating gamma, and using the original HN post terminology, what prevents you from generalizing your technique to the optical (linear) domain instead of the electronic (gamma) domain?

Because I believe it is only due to linear effects that 'dither + quantize' is a legal proper operation that does what you want. As soon as you enter a non-linear space this no longer holds, and you must see "dithered quantization" as a single operation, rather than adding independent noise followed by an oblivious quantization.

So the issue is that our quantization must happen in sRGB space - our output space. Yet the probability distribution according to which we'd like to quantize follows rules formulated in a linear space, which requires a non-trivial and non-linear conversion.

There is a very large family of probability distributions that have the appropriate mean (mean computed in linear space), but I believe if we also try to minimize the fluctuation in variance there should pop out a unique solution. In a fully linear world this is the triangular distribution.


So if I understand correctly, its a mathematical question:

What function family p(c,k) for integers k satisfies the following conditions:

ExpectationValue < p(c,k) * e2o(k) > = c (for nearly all c)

   (over k)
Variance < p(c,k) * k > = constant over nearly all c

i.e. the expected variance should be constant in the perceptual (gamma compressed) domain, while the expected optical intensity should be the non-dithered original input optical intensity c.

May I ask how you came up with the current probability distribution? you call it triangular but it's not really a hat distribution.


My current probability distribution is just a bunch of curve fitting and solving the equations as I put them in the StackExchange question.

I don't sum over all k, I only sum over k - 1, k and k + 1 where k is the closest integer to c.

I don't believe you can get constant variance, due to the endpoints, where if you wish to have a correct mean (non-negotiable) the variance must approach 0. Therefore I believe the optimal variance curve to be constant in the middle with a small curve towards 0 at the endpoints.

> May I ask how you came up with the current probability distribution? you call it triangular but it's not really a hat distribution.

The reason it's called this way is because in a linear space this effect can be achieved by a regular additive dither using a noise source that follows the [triangular distribution](https://en.wikipedia.org/wiki/Triangular_distribution). Why? I don't know I got from reading some audio dithering material. But plotting the results does give beautiful near-constant variance dithering, if we ignore sRGB and assume a linear space, except at the endpoints. This can be corrected, see this question: https://computergraphics.stackexchange.com/a/8777/10515. That's the origin of the plotting code as well.

You can recover the correct linear p(c, k) by doing a convolution of the triangular distribution with a unit box: https://i.imgur.com/2AKVS4B.png


I've worked it out on paper, both for functionally known gamma functions (optionally piecewise defined) and for arbitrary monotonically increasing gamma arrays (where naive linear interpolation is used for values of c that do not correspond to integral k's).

I will try to write it out in LaTeX, and will link when finished.

Sadly the shadertoy site doesn't work on my smarthone (it complains that uint's are not available before mobile GLSL 3.0, I tried modifying them all to normal int's, but then got stuck on the hash function through which I assume you generate pseudo random numbers.


Did you end up writing it down?


What happens if you do the triangle distribution in sRGB space, derive the nonlinear system transfer function, then invert that (numerically)? The middle step is something like $t^{-1}(\sum_k p(c, k) t(k))$.


The plots in the linked StackExchange question were created using Wolfram Notebooks.


The big takeaway here is that you should never perform brightness arithmetic on sRGB. You need to remove gamma to make brightness linear, do your math, then re-apply gamma.

The full correct formula is:

https://stackoverflow.com/a/13558570

Interestingly gamma correction also includes a linear component. For more info on why:

https://poynton.ca/notes/colour_and_gamma/GammaFAQ.html#gamm...


I have actually seen a lot of different explanations why the sRGB EOTF has a linear part, and can come up with a few more theories myself. In the Poynton FAQ, it says it "minimizes the effect of sensor noise in practical cameras and scanners". Wikipedia says "The purpose of the linear section is so the curve does not have an infinite slope at zero, which could cause numerical problems." Poynton's book Digital Video and HD: Algorithms and Interfaces talks about "veiling glare" and then says that the sRGB function almost perfectly inverts L* from the CIELAB color space (a pure 2.2 gamma would not do so). His thesis[1] goes into more detail.

One possible theory why there's a linear term is so that negative values are well-defined. In consumer HDMI, reference black is 10 (as opposed to 0 in sRGB), and reference white is 235 (as opposed to 255 in sRGB).

Since there no doubt be experts here, I'd love to see an authoritative answer. (I'm considering writing an explanation of sRGB and if so would really like to get this right)

[1]: http://poynton.ca/PDFs/Poynton-2018-PhD.pdf


Oh wow, I had no idea about video black/white being 16/235, rather than 0/255.

It took a lot of Googling, but I finally found an article that explains why, in case anyone else is interested:

"The 601 system allows colors that are darker than black and brighter than white. This is especially important for cameras, because you may occasionally shoot an object that has a bright spot that is “hotter” than legal white, and might want a way to later recover the detail in this hot spot. Going darker than broadcast black is also used at times for synchronization signals, as well as some primitive keying applications such as “superblack” where systems mask or key out areas of an image “blacker” than black."

https://www.provideocoalition.com/luminance_ranges/


There's also a pragmatic reason: When sampling the PAL/NTSC signal cabling will have applied low-pass filtering, causing over- and under-shoot of the signal. By setting the signal up to have the range 16/235 for an 8-bits A/D converter, this overshoot does not saturate the A/D converter.

Lots of consumer electronics is designed with these kind of rules.


Fun fact the blender cycles renderer used sRGB for a long time which was just throwing away huge amounts of dynamic range


From the original 1996 proposal "A Standard Default Color Space for the Internet - sRGB" (https://www.w3.org/Graphics/Color/sRGB):

> The effect of the above equations is to closely fit a straightforward gamma 2.2 curve with an slight offset to allow for invertability in integer math. Therefore, we are maintaining consistency with the gamma 2.2 legacy images and the video industry as described previously.


Right, should have added that to the list of not-quite-consistent explanations. Also, slight typo, black level is 16 not 10.

And to this I'll add another speculative answer, similar to others that have been advanced, but I'm not sure it's ever been explicitly expressed: it means that a piecewise linear approximation in either direction can be done with good accuracy and not a huge number of segments.


One possible reason is that it's hard to construct a numerical approximation (e.g. polynomial approximation) to the power curve around zero. With a linear part you avoid that problem. This is probably what the 'numerical problems' alludes to.


It’s more accurate to say you probably shouldn’t be applying any operations to sRGB values directly. They’re non-linearly “compressed” and most operations assume they’re working in a linear space.

Of course, if you want, you can figure out the equivalent “chain rule” of your operation, but realistically sRGB also mostly exists to compress colors into 8-bits per channel which doesn’t allow for great manipulation. Fun fact: this is why LRBni had a “load 8-bit as sRGB and upconvert to linear floating point”.


To make matters more interesting the actual gamma curve you should be using should be the one corresponding the the display you use to view the image. Very frequently this isn't sRGB or anything close to it (pure power gamma curves are relatively common, if you're lucky the gamma is somewhere close to 2.4). It is in fact not even rare to encode an image with a gamma curve that's intentionally different from the display that's supposed to be used to view it, in order to, for example, prevent crushing black details or to avoid numerical problems (both reasons for the linear component in sRGB).


For clarity, precision, and a bit of pedantry, the IEC 61966–2‑1:1999 Standard does not specify an OETF but only an EOTF, so the OETF in the linked article is actually an Inverse EOTF.

Some more information here: https://www.colour-science.org/posts/srgb-eotf-pure-gamma-22...



Your error diffusion dither is strange to me. Why do you not use two-dimensional patterns like Floyd-Steinberg or Jarvis-Judice-Ninke?

Is your assumption that you're diffusing error linearly in Morton or Hilbert curves? Because I'm pretty sure that's going to be outperformed in quality by any common 2D error diffusion pattern.


Rounding to the value that's nearest in linear light is not exactly right (if anything evaluating distances in linear light is a bad idea in general, but that aside). What you want is to find out exactly how much of each colour to blend in order to get closest to the actual value and blend the colours in those proportions. The formula in the OP is the right way to do this.


So much software gets this sort of thing wrong by either ignoring or incorrectly accounting for gamma.


All internet browsers have to use 'wrong' blending when combining transparent objects.

If they 'fixed' the bug, and rendered with correct physical-aware alpha blending, a bunch of websites wouldn't look correct anymore.


It's rather disappointing that websites can't request linear ("correct") alpha compositing in any way. I suppose it's a bit of fingerprinting, but browsers can already be fingerprinted by gamut support[0].

[0]: https://developer.mozilla.org/en-US/docs/Web/CSS/@media/colo...


Alpha blending can be an other can of worm. There is premultiplied alpha for correct interpolation between color values with alpha and then there is the lack of per-channel alpha that would be useful for subpixel antialiasing against a transparent background.


Subpixel anti-aliasing I suspect is technology that will go the same way as video interlacing...

ie. "It works great, has a positive benefit on the user experience, but adds so much system complexity, limits other operations, and is so hard to implement right, that it fell out of use"

Today subpixel anti-aliasing is implemented mostly for text only - we've given up implementing it for 2d/3d graphics.


In that case they could they make it a CSS property? Something like `alpha-mode: physical-aware`, defaulting to the current "wrong" blending. Or is this such a fundamental change that it would be infeasible to support both modes?


This would be totally do-able.

The change would be fairly invasive inside the Blink/Webkit rendering engine if you wanted to support both physically aware and non-physically aware blending simultaneously in the same compositing layer (I suspect you'd have to split everything into multiple GPU surfaces - one for physical blending, one for 'wrong' blending, and more if any of those objects are interleaved in zIndex - that might use a lot of RAM in some cases).

A per-page blending mode would probably be easy to implement.


I’ve worked on code that does this before and I thought I understood it, but something about these explanations are making it way more confusing to me




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: