For instance, in GDAL there's a whole RFC for dealing with issues related to pixel corners versus pixel centers!
"Traditionally GDAL has treated this flag as having no relevance to the georeferencing of the image despite disputes from a variety of other software developers and data producers. This was based on the authors interpretation of something said once by the GeoTIFF author. However, a recent review of section [section 126.96.36.199] of the GeoTIFF specificaiton has made it clear that GDAL behavior is incorrect and that PixelIsPoint georeferencing needs to be offset by a half a pixel when transformed to the GDAL georeferencing model."
When dealing with cameras, the central point us rarely h/2,w/2. So you're really dealing with two sets of coordinates, camera coordinates and sensor coordinates, that need to be converted between.
Integer coordinates are convenient for accessing the sensor pixels, and the camera-to-sensor space transform should theoretically include for the 0.5,0.5 offset. However, getting a calibration within 0.5 pixels accuracy is going to be hard to begin with.
So, for example, if you’re scaling an image up by 10× with bilinear interpolation, and you need to figure out what to store at address (7, 23) in the output framebuffer, you should convert that to continuous coordinates (7.5, 23.5), scale these continuous coordinates down to (0.75, 2.35), and use that to take the appropriate weighted average of the surrounding input pixels centered at (0.5, 1.5), (1.5, 1.5), (0.5, 2.5), and (1.5, 2.5), which are located at address (0, 1), (1, 1), (0, 2), and (1, 2) in the input framebuffer. The result will be different and visually more correct than if you had done the computation without taking the (0.5, 0.5) offset into account. In this case the naive computation would instead give you a combination of the pixels at (0, 2), (1, 2), (0, 3), and (1, 3) in the input framebuffer, and the result would appear to be shifted by a subpixel offset. This was essentially the cause of a GIMP bug that I reported in 2009: https://bugzilla.gnome.org/show_bug.cgi?id=592628.
Otherwise, you're shifting the image around every time you reconstruct. This causes errors and can be very hard to reason about.
But what I also miss from the article: shouldn't the range of a pixel be -0.5,-0.5 to 0.5,0.5?
If you translate a pixel, to for example a LED on a screen, it makes more sense that it's location is the center of the LED.
I have mixed feelings about this memo. It's right about practical aspects of resampling filters, but tries too hard to justify that with sampling theory. For example, pixel-aligned sharp edges exist and are meaningful in images, unlike perfectly square waves in sampling theory.
Still lots of operations on image can be stated in terms of convolution (or a regularized inverse of it) so it's not like Fourier analysis is entirely useless.
Even if you were somehow able to create a perfect lens, you would not be able to create a perfectly sharp edge with real world objects.
Sometimes, pixels really are little squares. Not always, but not never, either.
It's not a question of the representation, it's a question of quanta. Pixels are data, the little squares are the artifacts your LCD eventually produces with the help of that data.
In any case, that is an extreme edge case of software renderers that doesn't even come close to a significant part of 2D graphics in real life. Indeed, most 2D graphics is really flat 3D graphics done using GPU routines and does not the work that way. I know that some extreme edge cases do use coverage based rasterization, but :
>You need anti-aliasing for the behaviour you are describing, which very rarely works the way you describe
This is a case of anti-aliasing (read the title of the article) and is extremely rarely used. It's essentially irrelevant when discussing how graphics work in real life.
I really cannot overstate just how rarely software rasterizers are used for interactive graphics in 2020, coverage based rasterizers are an even smaller subset of that. It really makes a ton more sense to use a GPU rasterizer and use MSAA or oversample the whole image.
Yes, MSAA 16x is incredibly expensive on mobile devices, and it provides a worse result than a coverage based approach. But MSAA 16x is done by an asic, and is simpler than coverage based AA. It is not even close in performance. A GPU ROP trounces any programmable compute unit as far as performance, it's not even closed. It is done by pecialized, in silico hardware. And in practice MSAA 8x is more than good enough, especially on mobile devices. You certainly will not notice a difference on a phone with a density of 563 dpi between MSAA 4x and 8x, let alone 16x and coverage based.
At those scales, the resolution of the phone is literally greater than the optical resolution of the optical system that is your eyes. There is no point in anything beyond MSAA 4x in reality, and a lot of people with displays in the 200 dpi range just use 2X MSAA while they could use 8X MSAA because they really can't tell the difference.
The final nail in the coffin is that these compute-based rasterization engines so far more or less match the performance of CPU rasterization. This is simply unacceptable when GPU direct rasterization can give results nearly indistinguishable at multiple times the performance and much less power usage. This is literally taking something done by a highly optimized, 12-7nm ASIC, and trying to do it through compute for a tiny improvement. It's absurd.
> resulting rendered image is most correctly interpreted as an array of little squares
Still nope. What matters in the end is the viewer’s eyes/brain reconstruction of the image, and given the frequency response of human eyes to typical screens at typical viewing distances, there is little if any practical difference between convolving some eye-like reconstruction filter with pixels thought of as uniform-brightness squares vs. point samples.
If you want to improve your results you’ll get much more bang for your buck from considering RGB subpixels to be point samples offset the appropriate amounts for the given physical display than you’ll get from thinking of any of them as being an area light source instead of a point.
Only we don't care about the data, except when they are rendered as "artifacts" at our screens.
So prioritising the abstract data is putting the cart before the horse. We only captured/generated those data to display them on our LCD.
Of course that's nonsense, because the data has context and the arrangement of those samples or pixels has a purpose.
Sometimes that purpose is to serve as a sampling of a real-world continuous image, other times it is to describe the arrangement and color of tiny little squares on an LCD screen.
Depending on what context you're working in pixels can be squares, or may not be.
Therefore, even before images hit the display there is a rationale to avoid using photographic resampling techniques, just because the method of their authoring defined the meaning otherwise.
See also: The gradual evolution of font rendering techniques. Earlier versions of Windows aimed for pixel-grid snapping to produce clean, sharp edges. Newer ones introduced anti-aliasing techniques, but again made trade-offs towards sharp edges, including exploitation of the LCD display format, while the contemporary Apple rendering favored the photographic approach. With the introduction of high DPI the differences have become less pronounced, but there's still disagreement about how best to render vector fonts into pixels.
Whatever pixels "are", they are certainly tools for inducing a certain perception in the eye of the viewer, so we should go by how that works. We're used to this from color itself - no one denies that 0xffff00 is "yellow", despite the physical emission containing no yellow light, because red + green in certain proportions induces the same response in our eyes as light whose wavelength is actually yellow. So why can't we apply the term "squares" to things that our eyes see as squares even if they are not physically squares?
Open this in IrfanView, set the magnification to 100%. It looks about 75%-25% dark grey to light grey unless you really focus on it. This is exactly what you would expect if you conceptualize pixels as points.
So no, a column of white pixels next to a column of black pixels does not look perfectly sharp. If it did, the image above would look perfectly black and white, with no grey. And by the time I approach the pixels enough to see perfect black vs perfect white, I also notice the black gaps between pixels, and very soon the sub-pixels themselves!
This is also why you don't see individual sub-pixels, by the way.
Your eyes largely cannot distinguish the individual squares that pixels are if you are using a non-ancient screen. Your eye does not form an image of the square pixel. It largely loses them as they blend into each other to form an image, in which pixels are much, much closer to points than they are to squares.
Indeed, an edge that goes from 0 to 100 can be considered as part of a wave of twice the frequency but the same amplitude as compared to an edge that goes from 0 to 200. Which is, by the way, why increasing the contrast in an image, especially micro-contrast, in practice increases resolution.
If you take a single point sample, then slowly moving objects will appear to jump an entire pixel at a time, looking awful.
If you antialias, then movement will look smooth, but you'll also notice than when you align to the pixel squares the edge will preserve its sharpness better.
You have to be really careful when you're applying wave equations to resolution, especially when declaring that a certain number of samples fully captures an image.
If you want to display a perfect image with point samples, you may need to go as far as 10x the 'retina' density.
Does that mean the lines don't exist?
>looks perfectly sharp, as sharp as a black sheet of paper
They don't. Pixels are so small that they they start looking like (or are well into looking like) a sample that is used to from an image. If pixels "looked perfectly sharp, as sharp as a black sheet of paper", this wouldn't be the case.
As far as your eyes, the sharp lines might as well be blurry waves of an appropriate contrast. So in an image sense, the lines don't really exist anymore, all that exists is a blurry brightness function and not infinitely sharp lines.
However, the edge of my window happens not to be green, so it doesn't actually align with the elements of my LCD :) I just so happen not to notice it, because my eyes don't have individual pixel resolution, almost as if there was a low-pass filter of the order > 2 lambda... Food for thought!
Or reformulated: On a low-resolution LCD, an axis-aligned square will appear much sharper than a square drawn at any other angle.
While yes, 6000x4000 is a lot, monitors are already coming out with higher resolutions, so it's relevant right now. The fraction of images taken at an actual resolution so high that it has to be down scaled on a 6K or 8K monitor in practice is exceedingly thin. Even with a Sony A7RIV, an insane camera, and mind boggingly sharp lenses, most of your pictures after the Bayer filter and taking into account sampling (which is very real in photography due to Moire), most of your pictures either due to depth of field, optical abberations or motion blur, will not be at the level where you can create a truly sharp line of frequency higher than that between two pixels.
So while it is often true, this is increasingly not the case.
In short, it's wrong. You can model an image as an array of point samples - however these are not "pixels".
But it was in context of old CRT and Sony Trinitron monitors! I was wondering what it'd say about LCD screens but the memo is from 1995, and the first standalone LCDs only appeared in the mid-1990s and were expensive .
What it says about CRT electron beams no longer apply, but I'm guessing this still does:
> The value of a pixel is converted, for each primary color, to a voltage level. This stepped voltage is passed through electronics which, by its very nature, rounds off the edges of the level steps
> Your eye then integrates the light pattern from a group of triads into a color
> There are, as usual in imaging, overlapping shapes that serve as natural reconstruction filters
This is, for LCDs, usually an array of little squares... sort of (probably more accurately described as an array of little rectangles of different color). Things get more complicated when you start talking about less traditional subpixel arrangements like PenTile, or the behavior of old CRTs (where you don't necessarily have fully discrete pixels at all).
I wonder if there have been any experiments constructing displays with optical filters to provide better reconstruction. I guess the visual analogue would be image upscaling, and in that sense the reconstruction what LCDs etc provide would be comparable to nearest neighbor scaling (which generally sucks)
But for graphics, why should this be the goal? Perhaps a better goal is high contrast of edges, and for this the box filter is one of the very best. An additional advantage of the box filter is that it has only positive support, so there's no clipping beyond white and black. This is especially helpful when rendering text.
And honestly I believe that those huge sinc-approximating reconstruction filters are overly fetishized even in the audio space. The main reason they sound "nearly perfect" is that the cutoff is safely outside the audible range. Try filtering a perfect slow sawtooth through a brick wall with a cutoff in the audio range, say 8kHz. It sounds like a very audible "ping" at that frequency, with pre-echo at that because of the symmetry of sinc.
Afaik this would come up in text rendering where the glyphs and strokes inevitably will not align with the pixel grid, but you would know better about that.
Digital audio sampling rates and resolution are far higher than the limits of perception, so there's sufficient resolution to shift by eg. half a wavelength for noise cancelling or by fractions of a wavelength for beamforming (at least, if you use the highest sample rates and resolution supported by your equipment, rather than the CD audio standard default settings).
The spatial resolution of computer graphics does not yet have such comfortable headroom. In the best cases, they've caught up to normal human visual acuity but usually not vernier acuity. Once displays outpace eyeballs to the same extent that soundcards outpace ears, it will be possible to shift an image by 1mm and have it remain as sharp-looking as the original—because a sharp edge smeared across several pixels will still be sharp enough to look "perfectly sharp" to a human.
There are other ways to do it, but they generally have a lot in common.
The problem comes when you try to align raster images with vector ones. Instead of starting a line at 0,0 you need to start it at 0.5,0.5. And heaven help you if you're combining raster images at different resolutions, they'll never line up.
The proper way to work is to put the pixel center at 0,0 and let it extend from -0.5 to 0.5. This works out well with reconstruction or resizing filters, because they're symmetric around 0,0 too.
One common mistake is to not think in floating point coordinates all the way through. E.g. a rectangle that covers a single pixel should have coordinates like (100.0,100.0)-(101.0,101.0), NOT involving a 0.5 offset. You almost never offset anything by 0.5 in this convention. 1 pixel wide lines are an exception, but only because then the edges of the line are exactly at pixel boundaries.
There have been some papers that do this though. I can't find the reference but I know it exists.
Can anybody confirm that they investigated that deeply and found that to be true, or is there another explanation?
Hard to disagree, really.