Hacker News new | comments | show | ask | jobs | submit login
Understanding Image Sharpness (2003) (normankoren.com)
138 points by brudgers 62 days ago | hide | past | web | favorite | 32 comments



Oh wow, this is my father's site! Will try to get him here to answer questions, if anyone is interested.


Thanks for sharing his inspirational story in this old thread: https://news.ycombinator.com/item?id=7540832


I’m curious what he’d say about anti-aliasing for digital imagery, if anything. In CG, there seems to be a difficult balance between sharpness and antialiasing. If you try to get too close to the Nyquist limit, you suffer more aliasing, and to get rid of all signs of aliasing, you have to compromise on sharpness. Personally, I’ve started to prefer using a Gaussian filter, but many people in digital imaging feel it’s way too blurry. My theory is that we are pretty good at seeing sub-pixel detail even in frequencies that are well (e.g. 2x) below the Nyquist limit, that having something less sharp than ideal only compromises the aesthetic, but not our ability to see the details of what’s there.

I did a video project a long time ago in old 480i NTSC and found that filtering vertically twice as much as should be called for improved my ability to see fine details (that is, filtering as if the vertical image resolution is 240). I realize that motion and interlaced video is super different than still photography, but my lesson there has been influencing all my filtering decisions ever since...


So, I'm assuming you know this, but the theoretically optimal pixel reconstruction filter is the sinc function. https://en.wikipedia.org/wiki/Sinc_filter For very high-quality work, I guess we have enough processing power to approximate sinc quite well these days. Is that what you mean by trying to get close to the Nyquist limit? The sinc filter does tend to introduce ringing artifacts.

Very high resolutions are now practical, too, which changes things a lot. Probably in favour of using filters which never introduce ringing artifacts, like the Gaussian.

(And things like the Lanczos or Mitchell filters exist as compromises between sinc and Gaussian.)


Yeah, I know about sinc and the ideal theory, but nobody uses sinc for production work. Your filter has to be too wide, and the ringing artifacts are too undesirable.

The max sharpness in a digital image, the Nyquist limit, is 1 cycle across 2 pixels. It’s not possible to get any sharper than that. But actually rendering images with frequencies exactly at that limit generally cause aliasing artifacts of some kind. That’s what I mean about getting close to Nyquist.

> Very high resolutions are now practical, too, which changes things a lot. Probably in favor of using filters which never introduce ringing artifacts, like the Gaussian.

Yeah, this is mostly true, and it is one way I justify using a Gaussian.

But it’s important to know and remember that without the right filter, you can get visible ringing artifacts no matter how high your resolution is, and no matter how much you super-sample your pixels. This doesn’t come up all that much in CG film or photography, but I have a digital art project where it matters a lot and comes up all the time. (And I order expensive prints too, where artifacts not very visible on my monitor can become plain & obvious on paper).

> And things like the Lanzos or Mitchell filters exist as compromises between since and Gaussian.

Yes, and I suspect those are the most common ones used in VFX and CG film & print production. But those are balancing sharpness and aliasing. For situations where I really need no aliasing, they aren’t good enough.

I can’t usually see the difference between Gaussian, Lanczos and Mitchell without zooming and studying high frequency regions, but I’ve watched VFX supervisors comment on the softness they feel after watching like 2 seconds of film. Some people have an incredible sensitivity to sharpness.


+ My understanding of the way the Nyquist limit plays out in digital photography is that it guarantees no moire. If moire isn't an issue in a scene then it is possible, but not guaranteed to get higher resolution (that of the individual pixel). Of course resolution and sharpness are not exactly the same thing...that's why a 16 mega pixels is a resolution not a sharpness and why a lot of studio photography uses cameras without anti-aliasing filters (e.g. medium format digital). In the studio, conditions can be set up to avoid moire in ways that aren't practical for sports or wedding photography. Similarly, moire isn't going to be a problem with astro-photography and some subjects in landscape photography.

+ For what it's worth, ringing artifacts are often caused by diffraction or lens aberrations rather than aliasing from sampling frequency (though they can be).

+ The shape of the modulation transfer function (MTF) is different between film and digital. The line pair comparison in the article illustrates it without being explicit. The Nyquist limit means that sharpness falls off a cliff in digital images and the example line pairs go from resolved to gray abruptly. The example line pairs for film fade to gray, what is and isn't sharp is determined by observation rather than an equation. Digital images often appear sharper because they are only ever so sharp and film appears softer because it is kinda sorta sharp beyond the point where it is clearly sharp. It's why film has a different look...or is a different artistic medium than digital when it comes to some forms of expression.


Getting close to the Nyquist limit is unsafe. The limit is theoretical; it applies only to an infinitely repeating signal. You'd have to create and then observe an infinite signal for the Nyquist limit to be correct, but of course you can't do this.

That would be 2 samples for each feature detail you want. Things work OK in the real world at about 4 or 5 samples, but it takes 10 or 12 samples to work pretty well.


I'm doing some amateur work on sub pixel matching at the block-level for stereo, currently using cheap APS-C prime lens commercial cameras ( Ricoh GR) and my background is maths not cameras, but what I've observed (all in RAW) is

a) Real world scenes often go below the Nyquist limit ( e.g pattern from a brick wall at sufficient distance)

b)In a camera without a low-pass filter (e.g. the Ricoh GR) in practice you don't see Moire but you can definitely generate it artificially off real world objects

c) The Point Spread function seems quite important (e.g a 1-pixel scene feature for the GR will have half the effect on the sensor as a 2-pixel function because of the point spread)

d) I'm a bit confused as why Fourier/ DCT all work as well as they do, as the Nyquist assumption for a modern digital camera seems incorrect, but on the whole sub-pixel matching in a Fourier space (while Nyquist assumption is not true) seems roughly on par with matching in the spatial domain - I would love to see something that explains why this is so

Happy to discuss this more as I am still learning


Manual of Photography might be worth reading.


Thanks for the pointer - I've just skimmed the last 2 chapters and looks like a good book but don't think it solves my specific problem (just to give more context, when I said amateur I'm doing this for fun not money, but have previously built an image processing pipeline from raw sensor output and I get the basics of wavelets, Fourier etc )

My specific question is "If I am matching a RAW stereo block (say 8x8 or 16x16 block), for sub-pixel resolution matching from two similar cameras, why (if at all) is it better to use DCT/Fourier matching e.g. matlab normxcorr2 or the elphel guys https://blog.elphel.com/2018/01/complex-lapped-transform-bay... , rather than try to match in the spatial domain (linear algebra). On writing this I suppose I better look at the H.264 and HEVC algos and see if they work in the frequency or spatial domain when they are looking at sub-pixel matches.


Sorry for not being clear. When the signal does not repeat, the Nyquist limit isn't a factor because there is no possibility of moire. That's where the difference between sharpness and resolution comes from...single points don't have frequencies. Between the Nyquist limit and the sensor resolution, there are no guarantees about how or whether a feature will or won't resolve and the photographer has to use their judgment relative to the image's purpose. An anti-alias filter takes the necessity of such decision off-the photographer's plate in exchange for limiting the absolute possibilities.

Astro-photography is an illustrative example. Celestial objects don't appear in night sky scenes with a regular frequency. So there is no need for an anti-aliasing filter. It just lowers point resolution without any benefit. So cameras for astro-photography often lack AA filters.


The limit remains a factor for non-repeating signals. You get speckle. It shows up in natural scenes like leaves against the sky, fur, grass, sand, etc.


I don't understand how the Gaussian can introduce ringing artifacts. It doesn't have any negative lobes.

If the scene you are sampling contains arbitrarily high frequencies, there's no way to avoid aliasing, yes. But surely ringing per se is always a product of the reconstruction filter. (Fourier transforms etc. are a different matter.)


> I don’t understand how the Gaussian can introduce ringing artifacts. It doesn’t have any negative lobes.

I don’t think Gaussian can. (I hope it didnt seem like suggested that) But a box filter can alias, and it doesn’t have any negative lobes. So can Lanczos & Mitchell, but to a much lesser degree than a box filter.

It’s a fun exercise to plot sin(1/x) and try to get rid of all visible aliasing. It can be surprising to see aliasing when you take 100,000 samples per pixel.

> if the scene you are sampling contains arbitrary high frequencies, there’s no way to avoid aliasing.

Right, yes exactly. Though Gaussian is pretty dang good, the best I’ve found personally. A lot of samples & a Gaussian that is just a tiny bit soft, and I can usually remove any signs of aliasing.


Gaussian always looks fuzzy, even if you are careful about it.

Your personal preference might be for fuzzy-edged images, but sharper ones will look better to almost all observers, including both professional photographers and laypeople. It depends a lot on the precise details of how you handle the sharpening / resampling filtering; many available tools do a crappy job.

In general the laypeople prefer images to be sharper than you would expect and don’t care much about artifacts (at least, in my experience asking people off the street to pick between two choices of images with different amounts of sharpening), whereas image experts tend to be a bit more conservative if there are noticeable artifacts, especially aliasing, etc.

If you are printing photos on paper, I recommend sharpening beyond your initial inclination, and then sharpening some more, because the printing process tends to bring some fuzzies back.

Note that the human visual system inherently introduces ringing artifacts even if they aren’t there in the original. There’s no inherent problem with amplifying these slightly; the visual effect if you do it subtly will be to imply more contrast than is actually available, rather than obviously appearing like an artifact.

Most types of images will look better if you stretch your available contrast to the extent you can. If you allow some ringing artifacts, you can get away with less real contrast for details, giving you more room to add large-scale contrast between shapes or regions of your picture.


> Gaussian always looks fuzzy, even if you are careful about it.

Yeah, I agree, and it even looks fuzzy to me, I’ve just grown accustomed to it, and I rationalize / theorize that I’m not losing detail even if it looks soft.

What I really want is to not be able to see any sign of pixels at all; to be completely unable to tell how large a pixel is, or tell whether the image is high res and soft or sharper but low res.

> many available tools do a crappy job

Lol, you could say that again.


This is all very similar to Inter Symbol Interfence in digital comm systems. You don’t want your pixel bleeding into the next, so choosing the appropriate filter to reduce ISI is important.

Typical root raised cosine filters are used, as it is a matched filter for itself, and minimizes ISI depending on the bandwidth chosen.

I suppose Shannon’s capacity theorem can be applied to images, trading off bandwidth for SNR. If resolution is bandwidth, and brightness the SNR, what would capacity be in a photo? The sharpness? That would imply sharper photos would be better off with higher pixel sensors than with dynamic range. You can always decimate to increase SNR.

In electronics, it’s easier to make faster converters than increase dynamic range, so oversample and decimate. Same probably with image sensors.


Um, the first point, in digital imaging, is that a pixel is not a little square. It's a sample. So it doesn't have any extension to blur. Any filtering invariably happens either at sampling or at reconstruction.

https://graphics.stanford.edu/courses/cs248-04/ho6/

Of course sharpness is a function of resolution. I'm not sure it makes sense to talk about capacity, because typically the resolution of the image is non-negotiable. The number of bits needed to encode it is variable and depends on compression technique used.

So in general, Shannon-style channel thinking operates on a different layer from the sampling and reconstruction processes we are discussing here. It's more relevant to how discrete pixel values are coded, or more elaborate compression mechanisms which exploit coherence in the image.


> I suppose Shannon’s capacity theorem can be applied to images, trading off bandwidth for SNR. If resolution is bandwidth, and brightness the SNR, what would capacity be in a photo? The sharpness? That would imply sharper photos would be better off with higher pixel sensors than with dynamic range.

To perform this tradeoff and capture e.g 1-bit images with super-high resolution you'd need the sensor to do error diffusion a la sigma-delta modulation, and AFAIK image sensors cannot currently do this.


Ah, so you are saying the quantanization noise can't be averaged out, as with decimation after an ADC.


It could, but you'd have to start with a sensor that had both a high number of bits per sample and super-high spatial frequency. If you had such a sensor, there would be no point in decimating; you'd just use the samples the sensor gave you directly. The point with decimating is to sacrifice some spatial resolution for an increase in bits-per-sample from a low-bps (but high-frequency) sensor. This is done routinely for audio signals but it would be a lot more tricky with a two-dimensional signal. Not impossible, but it would require processing elements between the pixels to diffuse the error properly without losing information.


Isn't the sync more for 1-dimensional things, like audio? I would expect images to need the Airy disk.

https://en.wikipedia.org/wiki/Airy_disk


Sinc is closely related to Bessel functions (which is what the Airy disc is). In image processing afaik it is interpreted as a radially-symmetric filter, if it's not just used as if it was separable (i.e. on the x axis, then on the y axis).

I don't think it is exactly the same as the Airy disc, but it's a good question. http://mathworld.wolfram.com/SincFunction.html


There isn’t really any perfect choice for images with a square lattice of sample points. With a one-dimensional signal you don’t get the same annoying patterns from the interaction of the two grids along different dimensions.

If you use a equilateral triangle grid (if you like, think of hexagonal pixels) you can do better.


I’m not sure how I hadn’t stumbled on this site before, but it is crazy good. The amount of information is a bit overwhelming to be honest, but skimming it I see a metric ton of gems for image processing and photography and optics. I see some math & advice that apply to audio processing as well. As someone working in CG, this is going to be a great reference.


My dad clarifies that he hasn't updated this page in about 15 years. Eventually his work on this got sufficiently out of control that he founded www.imatest.com, and that's been keeping him busy ever since.


I was going to say this looks a lot like the stuff on the imatest website.


Having struggled with video autofocus issues, I've asked some videographer friends what the solution is. They've told me that I have to learn to manually focus for shots that the autofocus has trouble with. Also, most, if not all, cinematographers do not use autofocus when shooting movies or films.

I have a newfound respect for the 1st AC (or Focus Puller) on a movie set.

https://en.wikipedia.org/wiki/Focus_puller


That’s really what blocking is all about. You set places for the camera and talent to be (set distance) and the focus should be set accordingly. If people are moving (or camera is moving) you simply pull the focus from one block to the next (like key frames).

There are cases where the AC needs to manually (with eyes) focus, and yeah, those are very skilled individuals. I was looking for a video of Chivo’s Focus guy working while Chico worked the frame from the Revenant, but I can’t find it.

https://youtu.be/a5tgTQXr62M


Another trick is, in chaotic handheld tracking shots, to light up the set and stop the lens down for a large depth of field. If you watch the long scenes in Children of Men for instance, even objects far away are sharp; this is because lens is stopped down. That way the subjects stay in focus even if the blocking is not terribly precise. In those long shots, Cuaron uses framing and contrast to direct your eye, rather than focus pulls. He creates depth with haze, which reduces contrast in distant objects.


This might be off-topic, but does anyone know of analyses of the effects of the debayering process on modern digital cameras on the end image? For example, based on experimentation, it seems to me that some of the characteristics of Alexa cameras (e.g magenta bleed around highlights) seem to be reproducible in digital material if you apply a pixel mask like that of the Alexa sensor and implement an approximation of the debayer process. I'm not quite confident enough of my conclusions to put my foot down on them yet, though.


Great text, a bit difficult to understand. I have written a 'digested' version in the process of understanding the concept:

https://epxx.co/artigos/dof_en.html#pmp




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: