
A Pixel Is Not a Little Square (1995) [pdf] - rinesh
http://alvyray.com/Memos/CG/Microsoft/6_pixel.pdf
======
fsloth
A little bit of background for the paper and a brief summary might be
beneficial:

The author is Alvy Ray Smith. He's one of the pioneers of computer graphics
([http://en.wikipedia.org1/wiki/Alvy_Ray_Smith](http://en.wikipedia.org1/wiki/Alvy_Ray_Smith)).

As the author states in the introduction: "If you find yourself thinking that
a pixel is a little square, please read this paper."

To get from the point to the squares -interpretation you need to read a little
bit forwards:

"Sampling Theorem tells us that we can reconstruct a continuous entity from
such a discrete entity using an appropriate reconstruction filter"

This might be proper background reading before or after reading the paper:

[http://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_samplin...](http://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem)

~~~
k__
The problem most people have here is probably, they don't sample anything.

~~~
tacotime
Yeah at least here in America, bet you can't develop just one metabolic
condition _nudge_ _nudge_ am I right?

------
sehugg
Since this is 1995, pixels didn't map 1-to-1 to phosphors on the CRT. Now with
LED and LCD displays they do. Not that this invalidates the paper, but it
certainly adds some context.

~~~
darkmighty
This is actually the reason I believe CRTs look better for a given resolution.
The square reconstruction filter has a quite poor frequency response. If you
look at a far enough distance (or good enough resolution), the cognitive low
pass filter when you see the image is going to mitigate the problem, but up
close it's quite bad. The smooth fall-off of CRT's pixel "point spread
function" actually resembles a bit more the ideal sinc filter for
reconstruction of bandlimtied signals. The square window has nasty zeros and
decays very slowly in frequency.

I think manufacturers don't have a solution to this because the light
producing elements inherently produce light across a small surface. They can't
make this surface much smaller (and then apply a good blurring filter) since
that would incur directly less brightness. Also, if they could, it would
probably be more costly then simply making a larger number of smaller pixels
(which also raises resolution).

I'm sure a lot of manufactures are not even mindful of these issues though,
and I ssupect a compromise would be possible with some R&D.

EDIT: It's interesting this phenomenon is about the same across time. Some
high-end monitor manufacturers are starting to offer strobed high frequency
displays that don't exhibit the poor response of a window reconstruction. With
a high enough refresh rate the temporal time constant of our visual system is
enough to provide a good reconstruction. The difference can be seen in this
video:
[https://www.youtube.com/watch?v=zjTgz9byxuo](https://www.youtube.com/watch?v=zjTgz9byxuo)

~~~
userbinator
_This is actually the reason I believe CRTs look better for a given
resolution._

Having used a CRT for _many_ years and then switching to an LCD, I disagree;
one of the major "wow" moments for me upon looking at an LCD for the first
time (at native resolution) was how sharp and crisp everything was compared to
the ill-defined blurriness of the CRTs I'd used before. I was finally able to
see pixels as "little squares" (even their corners were discernable) and not
the vague blobs they were before. Trying to use a CRT again for anything but
picture-viewing (which naturally doesn't require the same level of sharpness)
feels like my eyes can't focus and I need new glasses.

(That video was terribly nausea-inducing; definitely needs a warning.)

~~~
tacotime
I'm with you for everything except n64 games. I don't know what it is exactly
but some of the magic is just lost when all of the polygons come into clean
crisp focus on the screen. Legend of zelda, mario kart, super smash: all
reasons I still haven't thrown out my seemingly 500 pound CRT. Maybe I should
just try and induce artificial blur so I can finally rid myself of that beast.

------
dsjoerg
I skimmed it and failed to find an explanation of what problems are caused by
thinking of a pixel as a little square. Anyone?

~~~
gaze
It's exactly the same problem as is caused by thinking of a digital audio
signal as a series of steps, like
[http://upload.wikimedia.org/wikipedia/commons/thumb/1/15/Zer...](http://upload.wikimedia.org/wikipedia/commons/thumb/1/15/Zeroorderhold.signal.svg/2000px-
Zeroorderhold.signal.svg.png) . This too is completely wrong. A digital image
DOES NOT DEFINE the space between the discrete points just as a sampled sound
does not define the level in between each point.

It's why signal processing folks will insist on plotting sampled signals as
lollipop plots, NOT step (aka zero-order-hold) signals.
[http://www.ling.upenn.edu/courses/ling525/swave2.gif](http://www.ling.upenn.edu/courses/ling525/swave2.gif)

The reason why is it treats appropriately an ambiguity in a sampled signal...
the fact that anything at all could happen between two samples, but all you
see is the value at that point. Stated another way, any line that goes through
those green circles could have been the original sampled signal. I might also
add that the zero-order-hold isn't even a minimal energy or GOOD interpolation
of the original signal. Typically when sampling, you band limit the input to
the nyquist frequency. That way a PERFECT reconstruction can be done. Yes I
know that's weird. If you band limit a signal and discretely sample it, you
may reconstruct it PERFECTLY. The perfect reconstruction may be generated
through the Whittaker–Shannon interpolation formula. A zero-order hold will
interpolate the signal in a way that will create tons of harmonics outside the
band of interest. It's wrong. It's the wrong way to think about sampling.

~~~
davrosthedalek
It's not quite the same, and here is why: In audio, the samples from an ADC
are generally very close to point-samples indeed. Not quite, because ADCs can
not sample the input at infinite small times. Or actually really point-like if
generated algorithmically.

However, this is not true for pixels. The meaning of the pixel values in a
source image is arbitrary. It can be point-like, but I would argue it mostly
isn't.

Pixels in a display device /never/ are point-like. They always fill an area.
That's just a physical reality.

We also can not use a simple band pass to limit the (spatial) frequency
content. If you take the videolinked in the sister comment, and assume that t
is a spatial direction.

A pattern like __^^^___^^^___^^, that is black-white-black-white... would show
ringing when the slopes don't match pixel borders. What do we do instead?
Either align it, or integrate over the pixel area. I.e., the pixel overlapping
the slopes would have grey values.

(Edit: realized parent post didn't link the video)

~~~
chipsy
With pixel data, we have low enough resolutions that our thinking is biased
towards the display device the majority of the time; we can author "pixel art"
that expects the results to look square, or very close to square. And when we
resize that art, we don't interpolate the color values(unless we want it to
look like shit), we use a simple drop-sample or nearest-neighbor. But when we
resize photographic images, suddenly we have a reason to do other things. Take
for example this Stack Overflow answer on Lanczos resampling. [0] It describes
exactly what you say we can't and don't use, but is actually used every day in
Photoshop: a filter kernel that produces ringing artifacts on black-white
patterns. As I allude to with the pixel art, it depends on the source content.
When we get an image off of a camera, it's sampled in a way that assumes the
pixels are points as impulses, and thus should be processed "as if" they are
convolutions of the sinc function, even though the data itself is "just"
points. Digital photographs always look blurry or noisy when examined pixel-
by-pixel, and only form coherency to our eyes when viewed more distantly. For
similar reasons, pixel artists can't downsample and process arbitrary photos
and pass them off as legitimate pixel art, because the original content isn't
designed towards the display of square or near-square pixels.

[0] [http://gis.stackexchange.com/questions/10931/what-is-
lanczos...](http://gis.stackexchange.com/questions/10931/what-is-lanczos-
resampling-useful-for-in-a-spatial-context)

------
Camillo
Nowadays, a digitized image comes with a color profile that describes the
source device's color model. This is used to maintain accurate color
reproduction as the image is processed and output again.

Shouldn't there also be a "sampling profile" that describes what sort of
filter was used when the image was originally sampled? That would help
programs choose the best resampling method when processing the image.

------
davrosthedalek
Stopped reading at "A pixel is a point sample." No, it's not. We wouldn't have
sub-pixel font rendering and hundreds of anti-aliasing filter modes for the
graphics cards. The pixel is an /area/ element of your image. And yes, many
times, it's a square.

~~~
gumby
> Stopped reading at "A pixel is a point sample." > ...yes, many times, it's a
> square.

While "appeal to authority" is generally a bad way to _end_ a discussion, I
will start by pointing out that this is Alvy Ray Smith writing, and he was
writing in 1995. So while you could end up disagreeing with him (though I hope
you wouldn't), you might consider _why_ he was writing. Your comment shows
that this paper is just as applicable today.

(since you stopped on page two you never saw the several examples of why this
matters in the article, including one on scanning and one on pre-rendering
geometric computation).

Those graphics cards you refer to reflect Smith's influence on their
design.You have sub pixel rendering today _because_ pixels aren't squares.
This was computationally prohibitive in 1995.

The TL;DR of this brief paper is "IO is really really hard, and though
accepting shortcuts and defaults only kinda works and here's why." You benefit
from a lot of work done in the last 20 years to improve the quality of the
libraries and hardware available to you, but still, "photo-realistic" graphics
aren't.

By the way, rendered pixels weren't squares in those days (they were not-
always-circular dots on a screen which itself wasn't flat) and they aren't on
today's flat LCDs (you can think of them as rectangular, though the color
elements do not present uniformly though the area) and they sure as hell
aren't on any device with lenses or mirrors (including projectors and
headsets)

~~~
Stratoscope
A couple of things in your interesting comment were a bit confusing to me.
Since you obviously know your stuff, it may be just a difference in
terminology that thew me off. So let me add a couple of notes that hopefully
may help clarify for others.

> You have sub pixel rendering today because pixels aren't squares.

Another way to put this is that color LCD panels don't have pixels at all. The
only thing they have is red, green, and blue "subpixels" that each have a 1:3
aspect ratio. So you could take any three of those subpixels in a row and call
them a "pixel".

Operating systems were traditionally written without any knowledge of these
individual subpixels, and addressed the display on a whole-pixel basis. It's
up to the display hardware to figure out how to map colors onto the physical
device.

But if you know you're rendering on an LCD with the 1:3 subpixels, _and_ you
know the order of those subpixels (usually RGB, sometimes BGR), then you can
give each "pixel" a color that lights up the individual subpixels as you want.

As an aside, in CSS you can think of a color value like "rgb(10,20,30)" as
addressing the individual subpixels, assigning 10, 20, and 30 to the red,
green, and blue subpixels. You just don't know how those physical subpixels
are laid out on the display, so you can't really do subpixel rendering in CSS.

But if the OS knows what order the subpixels are in, it can take advantage of
that. Even though the display hardware/firmware presents an abstraction of
"pixels", the OS still provides individual R, G and B values for each "pixel"
\- thus addressing in a roundabout way the individual subpixels.

That's what makes subpixel rendering possible on displays that don't directly
expose the individual subpixels.

> rendered pixels weren't squares in those days (they were not-always-circular
> dots on a screen which itself wasn't flat)

The circular dots on a CRT aren't pixels and aren't directly related to pixels
at all. Assuming a traditional non-Trinitron CRT, you have three electron guns
for R, G, and B. The beams from these guns go through a shadow mask with
little holes in it, and then the beams hit phosphor dots on the screen. The
shadow mask allows each beam to light up only the phosphor dots for that
color.

As the electron beams sweep across the CRT, each horizontal sweep corresponds
to a row of pixels, and the three beams are modulated brighter or dimmer for
each individual pixel. But there's no relationship between the size and
location of these "pixels" and the actual shadow mask holes or phosphor dots
on the screen. CRT displays could run in multiple resolutions but obviously
the phosphor dots couldn't move to line up with the pixel locations for each
resolution.

Trinitron displays were are a bit different, using phosphor stripes instead of
dots and an aperture grille (tiny vertical wires under tension) instead of a
shadow mask with holes in it, but the principle was the same: The phosphor
stripes didn't correspond at all to pixel locations.

------
j2kun
Very interesting article. He does say some strange things, though, like "the
device integrates over [blah]." Is there some natural phenomenon allowing it
to compute the integral without resorting to approximating it by essentially
using little squares? (perhaps the little squares are not directly related to
squares on the image surface, but it's not entirely clear to me that they
aren't just a few steps removed from that)

~~~
fsloth
We are discussing the mathematical model of the system that has been found to
be the most proper to deal with the practical problems that arise about image
processing.

When he says "the device integrates over..." what he means is that we can
model the display device as a reconstruction filter integrating over the point
sample lattice.

Remember, we are discussing the mathematical model of image processing.

In the purely physical model the pixels in the memory are probably a linear
array of n-bit numbers, that drive voltages over the illuminants of the
display. However, the "linear array of n-bit numbers in a framebuffer" is a
really awkward model to start discussing e.g. image processing algorithms.

The point sample model, as the author tries to persuade, provides a simple
formalism for a lot of practical problems and is also, from the point of view
of sampling theory, the correct model.

~~~
wtallis
This paper only mentions integration in the context of _sensors_ , not
displays. So it's pretty straightforward: the scanner/camera/eyeball is
integrating photons over time and over the surface of the sensor. None of
those are inherently discretized by little squares, because even the
photoreceptors on a chip aren't perfect rectangles and are subject to optical
effects like things being imperfectly focused.

------
bajsejohannes
I can see why you should look at pixels as point samples when resampling a
picture.

However, if I'm sampling, say, a fractal to make a picture, I can't see a
better way than to sample in the square (assuming it's grayscale) that each
pixel provides. Should I do a gaussian sample instead?

~~~
fsloth
"Should I do a gaussian sample instead?"

There are a ton of sampling filters. The main metric of quality for them is
how they look (or rather, how the image looks after being processed by them).
So if the square looks nice, there's absolutely nothing wrong with it.

However, if you feel like geeking out, as resampling filters go, the last time
I looked the "best" by some empirical estimate was the Mitchell-Netravali
filter (they did a "scientific" sampling of asking a bunch of computer
graphics people what they thought looked best).

There's a paper
[http://www.cs.utexas.edu/~fussell/courses/cs384g-fall2013/le...](http://www.cs.utexas.edu/~fussell/courses/cs384g-fall2013/lectures/mitchell/Mitchell.pdf)

and an entry in the german (no english page!) wikipedia

[http://de.wikipedia.org/wiki/Mitchell-Netravali-
Filter](http://de.wikipedia.org/wiki/Mitchell-Netravali-Filter)

Gives the brief definition

~~~
jacobolus
The best resource about this I know of on the web is:
[http://www.imagemagick.org/Usage/filter/](http://www.imagemagick.org/Usage/filter/)
and also see
[http://www.imagemagick.org/Usage/filter/nicolas/](http://www.imagemagick.org/Usage/filter/nicolas/)

The Mitchell–Netravali paper is definitely not the last word on the subject.

~~~
bajsejohannes
Thanks! (And thanks for the other replies as well)

This resource is the most interesting to me.

As I got a lot of answers about zooming and resampling, I'll try to restate my
original question:

What I have is some mechanism to sample something as much and as detailed as I
want. This is why I mentioned a fractal. I don't want to zoom or resample,
just find the optimal way to sample the original data to construct, say, a
500x500 pixel image. Up until I read this article, I would just sample
randomly in the [0.0, 0.0]-[1.0, 1.0] space and avergage out the value. If
there's a better way, I'd love to learn it.

The best answer might just be "main metric of quality for them is how they
look" from fsloth.

~~~
jacobolus
> _Up until I read this article, I would just sample randomly in the [0.0,
> 0.0]-[1.0, 1.0] space and avergage out the value._

What does it mean to sample “random” points in a fractal? In practice you’re
probably sampling in terms of numbers at some specific precision (e.g. the
limit of your floating point type for the range in question). This is still
going to be points in some particular discrete lattice. Depending on the way
the fractal is constructed, this could bias your picture.

Again, I’m not sure that the color of an “area” is meaningful for a fractal.
But yeah, I think fsloth has the right idea: if the purpose of the images is
to be pretty, then do it in whichever way gets you results you prefer.

~~~
ska
Techniques like the chaos game are used to sample according to the associated
fractal measure, which addresses the issue of bias. Issues or floating point
representations and the like are a separate problem, but you can address them
also.

------
pixel
I can verify that I am not a little square.

------
enthdegree
Great read! Very similar ideas apply for image capture:

[http://enthdegree.tumblr.com/post/102480374444/a-model-
for-i...](http://enthdegree.tumblr.com/post/102480374444/a-model-for-image-
capture)

Under this model, a pixel can be interpreted as both a point-sample and a
square.

------
kejaed
A link to the last time this was discussed on HN:
[https://news.ycombinator.com/item?id=1472175](https://news.ycombinator.com/item?id=1472175)

------
agumonkey
I knew the name was familiar
[http://alvyray.com/Pixar/PixarHistoryRevisited.htm](http://alvyray.com/Pixar/PixarHistoryRevisited.htm)

------
arielby
LCD and LED displays _are_ made of little coloured rectangles, and many
programs are writing for them.

~~~
fsloth
We get to the display device after the display pipeline has transformed the
source image. The author is discussing the best way to think about and model
the source image.

The best model for the source image is a lattice of point samples if you want
to manipulate it in any way.

------
marze
Yawn. Steve Jobs was right to fire this guy, you don't need to anything more
than this paper to see that.

