I did a video project a long time ago in old 480i NTSC and found that filtering vertically twice as much as should be called for improved my ability to see fine details (that is, filtering as if the vertical image resolution is 240). I realize that motion and interlaced video is super different than still photography, but my lesson there has been influencing all my filtering decisions ever since...
Very high resolutions are now practical, too, which changes things a lot. Probably in favour of using filters which never introduce ringing artifacts, like the Gaussian.
(And things like the Lanczos or Mitchell filters exist as compromises between sinc and Gaussian.)
The max sharpness in a digital image, the Nyquist limit, is 1 cycle across 2 pixels. It’s not possible to get any sharper than that. But actually rendering images with frequencies exactly at that limit generally cause aliasing artifacts of some kind. That’s what I mean about getting close to Nyquist.
> Very high resolutions are now practical, too, which changes things a lot. Probably in favor of using filters which never introduce ringing artifacts, like the Gaussian.
Yeah, this is mostly true, and it is one way I justify using a Gaussian.
But it’s important to know and remember that without the right filter, you can get visible ringing artifacts no matter how high your resolution is, and no matter how much you super-sample your pixels. This doesn’t come up all that much in CG film or photography, but I have a digital art project where it matters a lot and comes up all the time. (And I order expensive prints too, where artifacts not very visible on my monitor can become plain & obvious on paper).
> And things like the Lanzos or Mitchell filters exist as compromises between since and Gaussian.
Yes, and I suspect those are the most common ones used in VFX and CG film & print production. But those are balancing sharpness and aliasing. For situations where I really need no aliasing, they aren’t good enough.
I can’t usually see the difference between Gaussian, Lanczos and Mitchell without zooming and studying high frequency regions, but I’ve watched VFX supervisors comment on the softness they feel after watching like 2 seconds of film. Some people have an incredible sensitivity to sharpness.
+ For what it's worth, ringing artifacts are often caused by diffraction or lens aberrations rather than aliasing from sampling frequency (though they can be).
+ The shape of the modulation transfer function (MTF) is different between film and digital. The line pair comparison in the article illustrates it without being explicit. The Nyquist limit means that sharpness falls off a cliff in digital images and the example line pairs go from resolved to gray abruptly. The example line pairs for film fade to gray, what is and isn't sharp is determined by observation rather than an equation. Digital images often appear sharper because they are only ever so sharp and film appears softer because it is kinda sorta sharp beyond the point where it is clearly sharp. It's why film has a different look...or is a different artistic medium than digital when it comes to some forms of expression.
That would be 2 samples for each feature detail you want. Things work OK in the real world at about 4 or 5 samples, but it takes 10 or 12 samples to work pretty well.
a) Real world scenes often go below the Nyquist limit ( e.g pattern from a brick wall at sufficient distance)
b)In a camera without a low-pass filter (e.g. the Ricoh GR) in practice you don't see Moire but you can definitely generate it artificially off real world objects
c) The Point Spread function seems quite important (e.g a 1-pixel scene feature for the GR will have half the effect on the sensor as a 2-pixel function because of the point spread)
d) I'm a bit confused as why Fourier/ DCT all work as well as they do, as the Nyquist assumption for a modern digital camera seems incorrect, but on the whole sub-pixel matching in a Fourier space (while Nyquist assumption is not true) seems roughly on par with matching in the spatial domain - I would love to see something that explains why this is so
Happy to discuss this more as I am still learning
My specific question is "If I am matching a RAW stereo block (say 8x8 or 16x16 block), for sub-pixel resolution matching from two similar cameras, why (if at all) is it better to use DCT/Fourier matching e.g. matlab normxcorr2 or the elphel guys https://blog.elphel.com/2018/01/complex-lapped-transform-bay... , rather than try to match in the spatial domain (linear algebra). On writing this I suppose I better look at the H.264 and HEVC algos and see if they work in the frequency or spatial domain when they are looking at sub-pixel matches.
Astro-photography is an illustrative example. Celestial objects don't appear in night sky scenes with a regular frequency. So there is no need for an anti-aliasing filter. It just lowers point resolution without any benefit. So cameras for astro-photography often lack AA filters.
If the scene you are sampling contains arbitrarily high frequencies, there's no way to avoid aliasing, yes. But surely ringing per se is always a product of the reconstruction filter. (Fourier transforms etc. are a different matter.)
I don’t think Gaussian can. (I hope it didnt seem like suggested that) But a box filter can alias, and it doesn’t have any negative lobes. So can Lanczos & Mitchell, but to a much lesser degree than a box filter.
It’s a fun exercise to plot sin(1/x) and try to get rid of all visible aliasing. It can be surprising to see aliasing when you take 100,000 samples per pixel.
> if the scene you are sampling contains arbitrary high frequencies, there’s no way to avoid aliasing.
Right, yes exactly. Though Gaussian is pretty dang good, the best I’ve found personally. A lot of samples & a Gaussian that is just a tiny bit soft, and I can usually remove any signs of aliasing.
Your personal preference might be for fuzzy-edged images, but sharper ones will look better to almost all observers, including both professional photographers and laypeople. It depends a lot on the precise details of how you handle the sharpening / resampling filtering; many available tools do a crappy job.
In general the laypeople prefer images to be sharper than you would expect and don’t care much about artifacts (at least, in my experience asking people off the street to pick between two choices of images with different amounts of sharpening), whereas image experts tend to be a bit more conservative if there are noticeable artifacts, especially aliasing, etc.
If you are printing photos on paper, I recommend sharpening beyond your initial inclination, and then sharpening some more, because the printing process tends to bring some fuzzies back.
Note that the human visual system inherently introduces ringing artifacts even if they aren’t there in the original. There’s no inherent problem with amplifying these slightly; the visual effect if you do it subtly will be to imply more contrast than is actually available, rather than obviously appearing like an artifact.
Most types of images will look better if you stretch your available contrast to the extent you can. If you allow some ringing artifacts, you can get away with less real contrast for details, giving you more room to add large-scale contrast between shapes or regions of your picture.
Yeah, I agree, and it even looks fuzzy to me, I’ve just grown accustomed to it, and I rationalize / theorize that I’m not losing detail even if it looks soft.
What I really want is to not be able to see any sign of pixels at all; to be completely unable to tell how large a pixel is, or tell whether the image is high res and soft or sharper but low res.
> many available tools do a crappy job
Lol, you could say that again.
Typical root raised cosine filters are used, as it is a matched filter for itself, and minimizes ISI depending on the bandwidth chosen.
I suppose Shannon’s capacity theorem can be applied to images, trading off bandwidth for SNR. If resolution is bandwidth, and brightness the SNR, what would capacity be in a photo? The sharpness? That would imply sharper photos would be better off with higher pixel sensors than with dynamic range. You can always decimate to increase SNR.
In electronics, it’s easier to make faster converters than increase dynamic range, so oversample and decimate. Same probably with image sensors.
Of course sharpness is a function of resolution. I'm not sure it makes sense to talk about capacity, because typically the resolution of the image is non-negotiable. The number of bits needed to encode it is variable and depends on compression technique used.
So in general, Shannon-style channel thinking operates on a different layer from the sampling and reconstruction processes we are discussing here. It's more relevant to how discrete pixel values are coded, or more elaborate compression mechanisms which exploit coherence in the image.
To perform this tradeoff and capture e.g 1-bit images with super-high resolution you'd need the sensor to do error diffusion a la sigma-delta modulation, and AFAIK image sensors cannot currently do this.
I don't think it is exactly the same as the Airy disc, but it's a good question. http://mathworld.wolfram.com/SincFunction.html
If you use a equilateral triangle grid (if you like, think of hexagonal pixels) you can do better.
I have a newfound respect for the 1st AC (or Focus Puller) on
a movie set.
There are cases where the AC needs to manually (with eyes) focus, and yeah, those are very skilled individuals. I was looking for a video of Chivo’s Focus guy working while Chico worked the frame from the Revenant, but I can’t find it.