We desperately need some research, based in user studies and using modern display technology, to settle some basic questions:
* What reconstruction filter gives the best results? Is it the same for vector (text) and natural images? By "best" I do mean contrast (sharpness) and lack of visible artifacts.
* For rendering of very thin lines (relevant to text), what gamma curve gives the perception of equal line thickness across the range of subpixel phase? How does it vary with display dpi? (Hint: it's likely not linear luminance)
* What gamma curve yields the perception of equal width of black-on-white and white-on-black thin lines (also relevant for text)? (Hint: likely not linear luminance)
I've seen a number of discussions where people feel they are able to answer these questions from first principles, and corresponding arguments that doing these in a "correct" way gives results that are less visually appealing than the common assumptions of treating a pixel as a little square (so doing a box filter for reconstruction) and ignoring gamma so effectively using a perceptual color space for the purposes of alpha compositing.
I posit that these questions cannot be solved by argumentation. I think user studies might not be very difficult to do; you could probably get "good enough" results by doing online surveys, though this wouldn't pass standards of academic rigor.
Gamma curves do not affect black and white themselves, only intermediate grays. It’s true that grays are usually used to draw antialiased black and white lines, but we can also think about “ideal” (axis-aligned, pixel-centered, non-antialiased) black and white lines. By framing this as a question about gamma, you’ve implicitly assumed that “ideal” black lines on white would have equal perceived thickness to “ideal” white lines on black.
This is not the case, as typographers have known for decades. The right way to draw perceptually equal-thickness black-on-white and white-on-black lines is to vary the line width (perhaps even by more than a full pixel if the lines are thick enough!). Gamma only comes in afterwards, to help us reproduce the varied widths accurately, and the accurate way to do that is in a linear color space.
Also I totally agree that we need to take into account the perceptual differences between black-on-white and white-on-black even assuming the display technology is perfect. That's one reason doing these studies is not trivial!
This question is less important than the other two, and plausibly the best place to address the effect is in design, rather than rendering.
It's axis-aligned and box-filtered so there are no shades of gray, which means gamma is irrelevant. The white-on-black line appears to have a greater thickness than the black-on-white line.
There's a lot of empirical research showing that reading performance is better with positive-polarity (black-on-white) text than with negative-polarity text [1, 2], probably because the higher overall luminance results in a smaller pupil size and thus a sharper retinal image . So, white-on-black lines appear thicker than black-on-white lines because the eye doesn't focus as sharply on them. This is true regardless of which color space blending is performed in.
Given this fact, if one wants to achieve uniform perceptual line thickness for black-on-white and white-on-black text, a more principled approach than messing with color blending would be to vary line width or the footprint of the antialiasing filter based on the luminance of the text (and possibly the background). This is the approach Apple and Adobe have taken for years with stem darkening/font dilation.
Here's the interesting thing about your observation: doing alpha compositing of text in a perceptual space (as opposed to a linear space) results in a thickening of black-on-white and a thinning of white-on-black. So doing gamma "wrong" actually results in better perceptual matching.
Do you have evidence that either Apple or Adobe varies the amount of stem darkening based on text luminance? I've tested Apple (macOS 10.12 most carefully) and have not seen this.
Color and luminance is a separate and orthogonal issue from filtering. I also know that people get away with compositing without converting to linear space, but I'm skeptical that any benefits they see aren't just a matter of getting the color curve they want for free, as opposed to doing a similar correction after something has been composited correctly.
Personally, I like the extra blur I get (and extra safety and guarantees) you get with Gaussian. Back in the NTSC days I discovered that vertically blurring interlaced video made it noticeably more clear and visible, even though it was softer.
But, If you do really spend time with sharper filters, it is true that Gaussian is softer and some pros really do want sharper images than Gaussian can provide.
Having studied graphics and signal processing for a few years in graduate school before that job, I thought I would be good at seeing the differences, and I was a bit shocked how good they were at it, and how not that good I was. :)
Truncating the Gaussian too closely though, and it’s not exactly a Gaussian anymore, you lose the best antialiasing properties. I can totally see how it will be sharper and more comparable to other popular filters. (Normalized & truncated at 1.1 radius is just slightly outside the 1 std dev line, right?)
Gaussian is my personal choice for large format prints of images with extreme aliasing problems.
IMO the best place to learn to see these distinctions is in careful photo printing (of the type where you spend >30m per image, and where “printing” here is being used in an old-school sense of “all of the manual steps to take a raw image from the camera and turn it into printed output”).
Spend a few months doing that for a few hours per week and your ability to see artifacts, textural details, fine differences in amount of edge contrast, etc. will shoot up. (Obviously the folks who spend 30 years on this are even better.)
Studying signal processing, optics, psychophysics, etc. is also useful for understanding what you are noticing, but it isn’t seeing practice.
Using only one standard deviation cuts off a huge amount of the curve, ideally it would be cut off around the third standard deviation and normalized so that the value after the last would be zero.
I basically agree with your points regarding compositing in a linear space, except that I suspect that thin black-on-white lines will come out looking thin and spindly.
If you want a general filter that can give a result without visible aliasing while sacrificing as little sharpness as possible, a 2.2 gauss filter is very hard to beat and I have spent a lot of time trying.
Box filters can be sharper, lancoz filters can be better for scaling down a final image, etc. but they will alias in a general sense. You might not see it in static text which is fine.
Also thin black on white lines are an extreme outlier, since what you percieve is relative and the entire image matters. It is more into the realm of optical illusion that play off our relative sensitivity.
Does “diameter” here mean “twice the standard deviation”?
If you compare filters at the same width, gauss will be softer, which is why you can lower the diameter.
A ‘Gaussian’ which extends 1.1 units is a non-standard and not clearly defined thing. Are you picking some fraction of 1.1 as the standard deviation and truncating after? (Often Gaussian filters are truncated after 3 standard deviations, or similar.) Are you multiplying by some other window function? ...
Can you link to a more explicit formal description of what you mean?
A curve that uses more standard deviations would have to be wider in pixels to look similar.
You have been saying here “this kernel is better than all of the alternatives” but it’s hard to evaluate that kind of claim without knowing precisely what you mean.
“not formal or strictly defined” is not super encouraging.
Bake a gauss curve out to three standard deviations into a LUT and normalize it. You can look at what PBRT does.
Keep in mind that I was replying to someone confused by all the choice of filters and was giving him a very solid starting point. This isn't some grandiose claim of scientific exploration, it's experience.
Does this mean that there is not one true right answer for images, as there is for audio? That you should use different reconstruction filters depending on the application. Is it accurate to say that a pixel is not a little square unless you're rendering text, in which case it is?
 I know this is a slight oversimplification, if you care deeply about latency you might choose between linear and minimal phase, etc. But for consumer applications it's true enough.
The perception of equal line thickness may depend on orientation of the lines (https://res.mdpi.com/vision/vision-03-00001/article_deploy/v...), distance from the fovea, distance from where your attention lies (a can of worms even deeper. I don’t think there’s agreement on whether one can attend to more than one visual location at a time, or what ‘attention’ even is), direction of those distances (vertical will almost certainly be different from horizontal, sight could be better or worse in the nasal direction vs the temporal one), light/dark adaptation of the eyes, whether subjects are color-blind, etc.
I fear that, to qet something that’s better than Weber’s law (https://en.wikipedia.org/wiki/Weber–Fechner_law), you need a model so complex that it doesn’t make sense to start using it.
Another place to look is Kevin Larson's work on subpixel rendering, which informed Microsoft's ClearType efforts. But that was done mostly around 10 years ago, when displays also were different. A good representative is .
Here's another pretty good paper I found, but it focuses more on the display technology than the perception side.
So what I'm looking for is adjacent to general psychophysics results on visual perception, but much more specific to what real displays do. That literature is pretty thin on the ground.
: Avi C. Naiman and Walter Makous. Spatial nonlinearities of gray-scale CRT pixels, 1992
There is plenty of formal signal processing analysis of the aliasing artifacts at different angles created by grids of pixels.
The ImageMagick folks did a bunch of experimentation about resampling filters as used in arbitrary transformations of existing raster images. https://www.imagemagick.org/Usage/filter/nicolas/ – of course what looks best depends significantly on (subjective) preferences and on what the source image is.
Or, perhaps a better posed question. What research-informed model will accurately predict the results of a user study that presents various renderings of antialiased lines on a modern LCD monitor and asks subjects to choose "which line is thicker" types of queries?
> asks subjects to choose "which line is thicker" types of queries
I think you could develop a model for “which line is thicker” given a specific target display without inordinately much trouble; you might have to tweak some parameters for matching particular displays. The harder question is “which line is the right thickness”, especially if you don’t have any correct answers to reference.
We also don’t just care about apparent line thickness but also spatial resolution, aliasing artifacts, ...
The article reminds me of the many mathematical text I've read insisting on that vectors are not tuples of numbers. That thinking of them as anything other than directions with magnitudes is wrong. Technically, that might be correct but vectors-as-numbers is much more useful when calculating with them. When you get into more abstract mathematics, and your vectors contain other kinds of algebraic objects, such as polynomials, you are already so accustomed with them that you can think of them as flying burritos if you like.
When I teach graphics programming, I will continue to tell students that pixels are like little boxes.
I break every mathematical object down into three things:
1. The intuition. Why do we have this concept to begin with? What underlying idea are we trying to capture?
2. The definition. These are the axioms.
3. The implementation. This includes every way to communicate the idea, from natural language words to notation to source code.
Without the intuition, you have nothing but a symbol game. It's hollow. Something with rules and notation but no deeper intuition is, arguably, chess.
Without the definitions, you can't think rigorously about your ideas and you don't know if they lead to internal contradiction. You can dump the axioms without losing the intuition; we did this with set theory at the turn of the previous century, when Russell proved that the previous axioms were inconsistent. We saved set theory without having to abandon the notion, the intuition, of sets entirely.
Without implementation, it's just thought, and you can't communicate with anyone. Moreover, without some intuition, the implementation is meaningless, because you have no cognitive frame to use to interpret it.
So the tuple of numbers is one implementation of a vector. It allows you to communicate some aspects of a vector, but without the underlying idea of what a vector means, what concept we're trying to get across, it's just a list of numbers. They might as well be box scores or something.
The tuple model is extremely useful, but incomplete.
Not by the mathematicians' definition. Maybe somewhere in physics such objects are useful, but if you formalized them you'd get something other than a vector space.
There’s certainly a sweet spot, where too much theoretical background takes away from learning graphics, and too little could leave students unprepared, or worse, uninterested.
It’s certainly good & fun to go through at least a little sampling theory.
You might also be interested to know the guy who wrote this paper co-founded Pixar. His graphics advice is worth careful consideration.
> It turns out that thinking of them as little boxes arranged in rectangular grids is very useful.
Do keep in mind that the shape and the arrangement are two separate things. Alvy’s paper is talking about the sample shape, but not talking about the arrangement. The grid arrangement is useful, and Alvy would agree.
> Because that is how computers deal with them. Not as point samples.
I’d be cautious drawing that line. I realize you were talking in part about grid arrangements. But computers treat samples however we teach them to. It’s uncommon and unlikely you’re writing a lot of code that truly handles pixels as finite square geometry rather than a point sample. Plus, if you’re teaching things like magnification filtering using bilinear or bicubic, then you’re already treating pixels (or perhaps texels) as point samples.
Here’s what you get when you treat pixels as squares: https://news.ycombinator.com/item?id=17845460
I thought about it and came up with a counter-example. subpixel rendering in general and ClearType in particular. The algorithm works by considering the exact arrangement of RGB squares (rectangles actually) to improve the appearance of rendered text. Theoretically subpixel rendering could increase the horizontal resolution threefold which were very useful for DPI-starved screens. Font hinting were also used to fit the text into the confines of the limited pixel grid.
I mostly agree, and see my other top-level comment where I called out LCD panels as having physically square pixels.
It’s a good thought, and it is correct that ClearType is considering the sub pixel arrangement of LCD elements. But also remember the arrangement of pixels isn’t Alvy’s main point, he was mainly trying to convey how to think about the shape of samples (pixels).
Even with ClearType you don’t necessarily want to integrate the sub-sub-samples of an LCD sub-pixel with a box (square) filter.
For sub-pixel rendering in general, box filtering is definitely not the best answer. Though yes, lots of people do it and get away with it all the time when sampling quality is not a high priority. Games are a good example, even as they’re improving. Treating pixels as square when sub-sampling causes ringing artifacts that can never be cured by adding more samples. This is actually a really fun thing to do with a class of graphics students because it’s kind of surprising the first time you really get it. For some nasty antialiasing problems only high quality kernels like a Gaussian will integrate sub-samples without artifacts.
Note I’m not talking about LCD sub-pixels there, just normal supersampling. The Wikipedia article on ClearType calls that “grayscale antialiasing” to distinguish it from LCD red-green-blue subpixels. But IMO that’s a bad name since grayscale antialiasing is still referring to filtering color images.
This is not accurate. Computers generally represent raster images as arrays of numbers (where each entry in the array is called a “pixel”). There are no literal little squares involved. Some code (much of it mediocre) conceives of those arrays of numbers as representing little squares. Other code does not.
> vectors-as-numbers is much more useful when calculating with them
This is super myopic / parochial.
Mathematicians think of “vectors” as elements of an abstract vector space (i.e. anything with well-defined concepts of scalar multiplication and vector addition over some field). This is useful to them because there are many powerful theorems which work in general for any arbitrary vector space, or sometimes for any vector space over the complex number field, or sometimes for any finite-dimensional vector space, or ....
Physicists think of vectors as directed magnitudes, generally some kind of measurable physical quantity in Euclidean 3-space (or Minkowski space). This is useful because many kinds of combinations and relations of directed magnitudes can be computed can be made without reference to any specific coordinate system.
One possible representation of physicists’ vectors (or certain types of mathematicians’ vectors) is an array of numbers.
But an array of numbers by itself is a completely different type of object than a vector. There are no specific well-defined operations on a generic array of numbers; or rather, depending on what it represents there are a wide variety of operations that might be meaningful or reasonable.
There are many kinds of “calculations” which are completely abstract where thinking of vectors as arrays of numbers is unbelievably obscurantist and counterproductive. Proofs and derivations involving coordinates are almost always extremely cumbersome.
There are even many types of concrete calculations on vectors-represented-as-arrays-of-numbers where the most effective algorithm is to first convert to a different representation.
Most calculations physicists do are the same as pure mathematicians would do. So we use for example in-product and cross-product operators to operate on vectors, without caring for the exact coordinate system used. Only when we have to come up with a final answer to a question like "what angle does the ball hit the ground and with what velocity?" will we convert our vectors into a magnitude and direction.
Interesting. This is what Hestenes calls the “coordinate virus”, http://geocalc.clas.asu.edu/pdf/MathViruses.pdf ; very often any specific coordinate system is not inherent but is some arbitrary addition to the space made for convenience in some particular calculation. It is in my opinion a mistake to think of the coordinates as primary.
> operate on vectors, without caring for the exact coordinate system used
This is the opposite of the previous statement.
Not really. I didn't say in the first statement that a specific coordinate system was used. In most systems, it doesn't matter how you choose your coordinate system, as long as the basis is orthogonal, treating the coordinates independently of each other will work.
Mathematicians like to think of just about everything as vectors, so that sentence seems a bit off.
Anyway, yeah, starting from abstract definitions is a great way to make a student's life hell. After all, there's a reason universities teach linear algebra twice.
The more I've thought about it since, the more I think we should represent images with hexagonal pixels (i.e., laid out on a hex grid), and that color images should treat the center points of red, green, and blue subpixels as not being right on top of each other. (the third image in the post shows how they would be arranged, which is actually similar to how they are on some displays)
It would be a little harder to deal with for graphics programmers who are working at a pixel level, or at the least, it would require a bit of relearning. But it makes more sense in so many ways. Hex is just a better way of "circle packing" (as you notice if you arrange a bunch of pennies on a table), and of course real world displays tend to have the red, green and blue subpixels offset from one another anyway. (are there any that don't?)
Obviously it isn't easy to change something like this at this point, but still, I find the idea fascinating, and appealing to the OCD efficiency fan in me.
One cool trick is that you can sprinkle in more efficient white pixels to allow the display to reduce its power consumption.
We are so used to see the visual presentation of samples that look like a bar diagram, that a lot of people think analog sounds better because the curves are smoother.
Chris Montgomery has a great talk about this.
https://youtu.be/cIQ9IXSUzuM <-- Youtube backup in case it doesn't work (YMMV)
I wish I could force any audiophiles to watch it before they waste their money on snake oil. I think a similar point can be made about analogue synthesis, e.g. You pay Moog £1500 for what is ultimately a fancy box for a relatively simple analogue circuit.
The underlying frustration is that objective (measurable) and subjective (unmeasurable) improvements in sound quality could ever be placed on equal footing with each other. It just wrecks my brain that people would ever spend a dollar on improvements that has an unmeasurable or negligibly measurable impact on the when there's still so much opportunity for substantial, measurable improvements.
If you want to improve your audio, focus on speakers, speaker placement, room acoustics, bass management and in-room calibration. Most everything else is relatively marginal (e.g. fancy amps, fancy DACs) or negligible/unmeasurable (e.g. fancy wires).
For most people with mid+ range audio gear, the number one upgrade they can perform is almost always to add targeted sound absorption to their room, not to change any of the electronics.
A lot of this analogue woo applies to synthesisers, as mentioned previously. For £500ish you can buy a million core, trillion transistor monster but for £1500 you get a box with some op amps and a (gasp) digital signal path in some cases.
Except for a philosophical debate about continuity, isn't that true?
The “stair-steps” you see in your DAW when you zoom up on a digital waveform only exist inside the computer. [...] When digital audio is played back in the Real World, the reconstruction filter doesn’t reproduce those stair-steps – and the audio becomes truly analogue again.
There are two ways this is not true for pixels. First, even for "retina" displays the human visual system can make out spatial frequencies beyond the Nyquist limit of the display (this will vary by viewing distance, so is more of an issue for young people who can get close to their displays). Second, even assuming perfect gamma, the display must clip at black and white because of physical device limitations. Thus, especially for text rendering, only a reconstruction filter with nonnegative support is generally useful. Such a reconstruction filter would be an extremely poor choice for audio.
It is true that many of the underlying signal processing principles are the same, and I encourage people to learn and understand those :)
The problem is that in images the Gibb's effect is visible and annoying (ringing artifact). If sampling theorem applied, people wouldn't be able to see it, like they can't hear the difference between square waves shifted by a half of a sample.
But there's good information in this article. In many contexts you should be thinking in terms of point-samples, not squares or other areas.
I think it's not such a great idea to try to simplify the definition of pixel in this article because it distracts from the useful info.
> rid the world of the misconception
that a pixel is a little geometric square.
> The little
square model is simply incorrect. It harms.
> I show why it is wrong in general.
Why the bluster?
He makes a decent case for this point in the domain of graphics processing; what he calls "correct image (sprite) computing".
But the narrow focus on computer graphics undermines these broad generalizations. There are many other domains that can be represented in pixel-based data models. Climate, terrain, population and land cover mapping are just a few domains where the use of a pixel as a "little geometric square" is a perfectly viable approach.
Ultimately, if the message is "think about how your data model maps to reality" - I agree. But why the hyperbole? Why shit on an entire model because it doesn't fit for your very specific use case?
Texels and Voxels can be square/cubic, such as in video game applications where it is accepted and exploited as a fundamental esthetic: worlds are textured with tiled mosaics which reveal their square unit when approached closely, and ditto for worlds made of voxels.
A voxel as a sample of a solid, for instance from a computed tomography, where the fidelity of the reconstruction matters, is subject to different requirements.
For example, you can also have “pixel art”, drawings made up of little squares, which only loosely have to do with pixels as samples for a raster image or physical camera detector elements or physical display elements. https://en.wikipedia.org/wiki/Pixel_art
is this the reason why they forced this horrible bilinear filter on the windows image viewer for so long (guess they still do) ? It made me crazy, it's so ugly