The dangers behind image resizing (2021)

planede · on Feb 16, 2023

Problems with image resizing is a much deeper rabbit hole than this. Some important talking points:

1. The form of interpolation (this article).

2. The colorspace used for doing the arithmetic for interpolation. You most likely want a linear colorspace here.

3. Clipping. Resizing is typically done in two phases, once resizing in x then in y direction, not necessarily in this order. If the kernel used has values outside of the range [0, 1] (like Lanczos) and for intermediate results you only capture the range [0,1], then you might get clipping in the intermediate image, which can cause artifacts.

4. Quantization and dithering.

5. If you have an alpha channel, using pre-multiplied alpha for interpolation arithmetic.

I'm not trying to be exhaustive here. ImageWorsener's page has a nice reading list[1].

[1] https://entropymine.com/imageworsener/

PaulHoule · on Feb 16, 2023

I've definitely learned a lot about these problems from the viewpoint of art and graphic design. When using Pillow I convert to linear light with high dynamic range and work in that space.

One pet peeve of mine is algorithms for making thumbnails, most of the algorithms from the image processing book don't really apply as they are usually trying to interpolate between points based on a small neighborhood whereas if you are downscaling by a large factor (say 10) the obvious thing to do is sample the pixels in the input image that intersect with the pixel in the output image (100 in that case.)

That box averaging is a pretty expensive convolution so most libraries usually downscale images by powers of 2 and then interpolate from the closest such image which I think is not quite perfect and I think you could do better.

nyanpasu64 · on Feb 16, 2023

If you downscale by a factor of 2 using bandlimited resampling every time, followed by a single final shrink, you'll theoretically get identical results to a single bandlimited shrinking operation. Of course real world image resampling kernels (Lanczos, cubic, magic kernel) are very much truncated compared to the actual sinc kernel (to avoid massive ringing which looks unacceptable in images), so the results won't be mathematically perfect. And linear/area-based resampling is even less mathematically optimal, although they don't cause overshoot.

starkd · on Feb 16, 2023

Isn't this generally addressed by applying a gaussian blur before downsizing? I know this introduces an extra processing step, but I always figured this was necessary.

PaulHoule · on Feb 16, 2023

That's an even more expensive convolution since you're going to average 100 or so points for each of those 100 points!

Practically people think that box averaging is too expensive (pretty much it is like that Gaussian blur but computed on fewer output points.)

eutectic · on Feb 16, 2023

Box filtering should be pretty cheap; it is separable, and can be implemented with a moving average. Overall just a handful of operations per pixel.

magicalhippo · on Feb 16, 2023

Gaussian blur can be pretty cheap too using an IIR approximation[1]. It's separable also either way.

[1]: https://www.intel.com/content/dam/develop/external/us/en/doc...

puterich123 · on Feb 16, 2023

I played a little with FFT Gaussian blur. It uses the frequency domain, and so does not have to average hundreds of points, but rather transforms the image and the blur kernel into the frequency domain. There it performs a pointwise multiplication and transforms the image back. It's way faster than the direct convolution.

deadbeeves · on Feb 16, 2023

Having to process 100 source pixels per destination pixel to shrink 10x seems like an inefficient implementation. If you downsample each dimension individually you only need to process 20 pixels per pixel. This is the same optimization used for Gaussian blur.

Dylan16807 · on Feb 16, 2023

> If you downsample each dimension individually you only need to process 20 pixels per pixel.

If you shrink 10x in one direction, then the other, then you first turn 100 pixels into 10, before turning 10 pixels into 1. You actually do more work for a non-smoothed shrink, sampling 110 pixels total.

To benefit from doing the dimensions separately, the width of your sample has to be bigger than the shrink factor. The best case is a blur where you're not shrinking at all, and that's where 20:1 actually happens.

If you sampled 10 pixels wide, then shrunk by a factor of 3, you'd have 100 samples per output if you do both dimensions at the same time, and 40 samples per output if you do one dimension at a time.

Two dimensions at the same time need width^2 samples

Two dimensions, one after the other, need width*(shrink_factor + 1) samples

deadbeeves · on Feb 16, 2023

You're right, I got confused. I was think of Gaussian blur, where the areas to process overlap heavily. Here there's zero overlap.

phkahler · on Feb 16, 2023

Yeah I was shocked at how naive this quote is:

>> The definition of scaling function is mathematical and should never be a function of the library being used.

I could just as easily say "hey, why is you NN affected by image artifacts, isn't it supposed to be robust?"

ChrisMarshallNY · on Feb 16, 2023

> 3. Clipping. Resizing is typically done in two phases, once resizing in x then in y direction, not necessarily in this order. If the kernel used has values outside of the range [0, 1] (like Lanczos) and for intermediate results you only capture the range [0,1], then you might get clipping in the intermediate image, which can cause artifacts.

Also, gamut clipping and interpolation[0]. That's a real rabbithole.

[0] https://www.cis.rit.edu/people/faculty/montag/PDFs/057.PDF (Downloads a PDF)

bombcar · on Feb 16, 2023

Captain D on premulitplication and the alpha channel (with regards to video): https://www.youtube.com/watch?v=XobSAXZaKJ8

actionfromafar · on Feb 16, 2023

Wow, points 2, 3 and 5 wouldn't have occured to me even if I tried. Thanks. I now have a mental note to look stuff up if my resizing ever gives results I'm not happy with. :)

planede · on Feb 16, 2023

Point 2 is the most important one, and the most egregious error. Even most browsers implement it wrong (at least the last time I checked, I confirmed it again with Edge).

Here is the most popular article about this problem [1].

Warning: once you start noticing incorrect color blending done in sRGB space, then you will see it everywhere.

[1] http://www.ericbrasseur.org/gamma.html

account42 · on Feb 16, 2023

Browsers (and other tools) can't even agree on the color space for some images, e.g. "Portable" Network Graphics.

londons_explore · on Feb 16, 2023

Browsers now 'deliberately' do it wrong, because web developers have come to rely on the fact that a 50/50 blend of #000000 and #FFFFFF is #808080

planede · on Feb 16, 2023

I'm a little bit sympathetic for doing it wrong on gradients (having said that SVG spec has an opt-in to do the interpolation in linear colorspace, and browsers don't implement it). But not for images.

unconed · on Feb 16, 2023

Linear RGB blending also requires >8 bit per channel for the result to avoid noticeable banding.

It is unquestionably superior though.

thrdbndndn · on Feb 16, 2023

Another classic: https://www.youtube.com/watch?v=LKnqECcg6Gw

sundarurfriend · on Feb 17, 2023

Did the link go down between when you posted this and now? It now lead to http://suspendeddomain.org/index.php?host=www.ericbrasseur.o...

zokier · on Feb 16, 2023

I imagine that beyond just using linearized srgb using perceptually uniform colorspace such as oklab would bring further improvement. Although I suppose the effect might be somewhat subtle in most real-world images.

planede · on Feb 16, 2023

For downscaling, I doubt that. If you literally squint your eyes or unfocus your eyes, then colors you see will be mixed in a linear colorspace. It makes sense for downscaling to follow that.

Upscaling is much more difficult.

bsenftner · on Feb 16, 2023

When image generating AIs first appeared, the color space interpolations were terribly wrong. One could see hue rainbows practically anywhere blending occurred.

abainbridge · on Feb 16, 2023

I'd also add speed to that list. Resizing is an expensive operation. Correctness is often traded off for speed. I've written code that deliberately ignored the conversation to a linear color space and back in order to gain speed.

SuchAnonMuchWow · on Feb 16, 2023

A connected rabbit hole is image decoding of lossy format such as jpeg: from my experience depending on the library used (opencv vs tensorflow vs pillow) you get rgb values that varies between 1-2% of each others with default decoders.

BlueTemplar · on Feb 16, 2023

And also (for humans at least) the rabbit hole coming from effectively displaying the resulting image : various forms of subpixel rendering for screens, various forms of printing... which are likely to have a big influence on what is "acceptable quality" or not.

guruparan18 · on Feb 16, 2023

Another thing I had experienced before was a document picture I used after downsizing to mandatory upload size had a character/number randomly changed (6 to b or d). Don't remember which exactly and had to convert the doc to PDF that managed it better.

peepee1982 · on Feb 16, 2023

Wouldn't the clipping be solved by using floating point numbers during the filtering process?

planede · on Feb 16, 2023

It would. It would also not accumulate quantization errors from an intermediate result. Having said that there are precedents for having the intermediate image pixels in integral values.

Here is imageworsener's article about this[1]

[1] https://entropymine.com/imageworsener/clamp-int/

peepee1982 · on Feb 16, 2023

I love sites like these. Had never heard of Image Worsener before. Thanks!

contravariant · on Feb 16, 2023

If you're doing interpolation you probably don't want a linear colourspace. At least not linear in the way that light works. Interpolation minimizes deviations in the colourspace you're in, so you want it to be somewhat perceptual to get it right.

Of course if you're not interpolating but downscaling the image (which isn't really an interpolation, the value at a particular position in the image does not remain the same) then you do want a linear colourspace to avoid brightening / darkening details, but you need a perceptual colourspace to minimize ringing etc. It's an interesting puzzle.

version_five · on Feb 16, 2023

I'd argue that if your ML model is sensitive to the anti-aliasing filter used in image resizing, you've got bigger problems than that. Unless it's actually making a visible change that spoils whatever it is the model supposed to be looking for. To use the standard cat / dog example, filter choice or resampling choice is not going to change what you've got a picture of, and if your model is classifying based in features that change with resampling, it's not trustworthy.

If one is concerned about this, one could intentionally vary the resampling or deliberately add different blurring filters during training to make the model robust to these variations

hprotagonist · on Feb 16, 2023

> I'd argue that if your ML model is sensitive to the anti-aliasing filter used in image resizing, you've got bigger problems than that.

I’ve seen it cause trouble in every model architecture i’ve tried.

version_five · on Feb 16, 2023

What kinds of model architectures? I'm curious to play with it myself

hprotagonist · on Feb 16, 2023

most object detection models will show variability in bounding box confidences and coordinates.

it’s not a huge instability, but you can absolutely see performance changes.

derefr · on Feb 16, 2023

You say that “if your model is classifying based in features that change with resampling, it’s not trustworthy.”

I say that choice of resampling algorithm is what determines whether a model can learn the rule “zebras can be recognized by their uniform-width stripes” or not; as a bad resample will result in non-uniform-width stripes (or, at sufficiently small scales, loss of stripes!)

kbutler · on Feb 19, 2023

> whether a model can learn the rule “zebras can be recognized by their uniform-width stripes” or not

But zebras don't have uniform-width stripes. https://www.animalfactsencyclopedia.com/Zebra-facts.html

version_five · on Feb 16, 2023

  Unless it's actually making a visible change that spoils whatever it is the model supposed to be looking for

derefr · on Feb 16, 2023

A zebra having stripes that alternate between 5 black pixels, and 4 black pixels + 1 dark-grey pixel, isn’t actually a visible change to the human eye. But it’s visible to the model.

alanbernstein · on Feb 16, 2023

I'm not saying your general argument is wrong, but... zebra stripes are not made out of pixels. A model that requires a photograph of a zebra to align with the camera's sensor grid also has bigger problems.

brucethemoose2 · on Feb 16, 2023

For those going down this rabbit hole, perceptual downscaling is state of the art, and the closest thing we have to a Python implementation is here (with a citation of the original paper): https://github.com/WolframRhodium/muvsfunc/blob/master/muvsf...

Other supposedly better CUDA/ML filters give me strange results.

thrdbndndn · on Feb 16, 2023

There are so many gems in VapourSynth scene.

I really wish there are some better general-purpose imaging libraries that steadily implement/copy these useful filters, so that more people can use them out of the box.

Most of languages I've involved are surprisingly lacking in this regard despite their huge potential use cases.

Like, in case of Python, Pillow is fine but it has nothing fancy. You can't even fine-tune parameters of bicubic, let alone billions of new algorithms from video communities.

OpenCV or ML tools like to re-invent the wheels themselves, but often only the most basic ones (and badly as noted in this article).

brucethemoose2 · on Feb 16, 2023

VapourSynth is great for ML stuff actually, as it can ingest/output numpy arrays or PNGs, and work with native FP32.

A big sticking point is variable resolution, which it technically supports but doesn't really like without some workarounds.

But yeah I agree, its kinda tragic that the ML community is stuck with the simpler stuff.

anotheryou · on Feb 16, 2023

Hm, any examples of that?

I found https://dl.acm.org/doi/10.1145/2766891 but I don't like the comparisons. Any designer will tell you, after down-scaling you do a minimal sharpening pass. The "perceptual downscaling" looks slightly over-sharpened to me.

I'd love to compare something I sharpened in photoshop with these results.

brucethemoose2 · on Feb 16, 2023

That implementation is pretty easy to run! The whole Python block (along with some imports) is something like:

clip = core.imwri.Read(img)

clip = muf.ssim_downscale(clip, x, y)

clip = core.imwri.Write(clip, imgoutput)

clip.set_output()

> Any designer will tell you, after down-scaling you do a minimal sharpening pass

This is probably wisdom from bicubic scaling, but you usually dont need further sharpening if you use a "sharp" filter like Mitchell.

Anyway I havent run butteraugli or ssim metrics vs other scalers, I just subjectively observed that ssim_downscale was preserving some edges in video frames that Spline36, Mitchell, and Bicubic were not preserving.

account42 · on Feb 16, 2023

> The definition of scaling function is mathematical and should never be a function of the library being used.

Horseshit. Image resizing or any other kind of resampling is essentially always about filling in missing information. The is no mathematical model that will tell you for certain what the missing information is.

mytailorisrich · on Feb 16, 2023

Not at all. He is correct that those functions are defined mathematically and that the results should therefore be the same using any libraries which claim to implement them.

An example used in the article: https://en.wikipedia.org/wiki/Lanczos_resampling

planede · on Feb 16, 2023

Arguably downscaling does not fill in missing information, it only throws away information. Still, implementations vary a lot here. There might not be a consensus of a unique correct way to do downscaling, but there are certain things that you certainly don't want to do. Like doing naive linear arithmetic on sRGB color values.

HPsquared · on Feb 16, 2023

Interpolation is still filling in missing information, it's just possible to get a pretty good estimate.

willis936 · on Feb 16, 2023

This is wrong. Interpolation below Nyquist (downsampling) results in a subset of the original Information (capital I information theory information).

astrange · on Feb 16, 2023

Images aren't bandlimited so the conditions don't apply for that.

That's why a vector image rendered at 128x128 can look better/sharper than one rendered at 256x256 and scaled down.

willis936 · on Feb 16, 2023

They are band-limited. That's why you get aliasing when taking unfiltered photos above Nyquist without AA filters.

In your example the lower res image would be using most of its bandwidth while the higher res image would be using almost none of its bandwidth.

Images are 2D discrete signals. Everything you know about 1D DSP applies to them.

astrange · on Feb 16, 2023

If some of the edges are infinitely sharp, and you know which ones they are by looking at them, as in my example, then it's using more than all its bandwidth at any resolution.

willis936 · on Feb 17, 2023

That's true in the 1D case as well. That requires upsampling with information generation before downsampling. Using priori to guess missing information is a task that will never be finished and is interesting. It isn't necessary for a satisfactory downsampling result.

orlp · on Feb 16, 2023

One interesting complication for a lot of photos is that the bandwidth of the green channel is twice as high as the red and blue channels due to the Bayer filter mosaic.

Gordonjcp · on Feb 16, 2023

Aha, no! Downscaling *into a discrete space by an arbitrary amount* is absolutely filling in missing information.

Take the naive case where you downscale a line of four pixels to two pixels - you can simply discard two of them so you go from `0,1,2,3` to `0,2`. It looks okay.

But what happens if you want to scale four pixels to three? You could simply throw one away but then things will look wobbly and lumpy. So you need to take your four pixels, and fill in a missing value that lands slap bang between 1 and 2. Worse, you actually need to treat 0 and 3 as missing values too because they will be somewhat affected by spreading them into the middle pixel.

So yes, downscaling does have to compute missing values even in your naive linear interpolation!

meindnoch · on Feb 16, 2023

>Take the naive case where you downscale a line of four pixels to two pixels - you can simply discard two of them so you go from `0,1,2,3` to `0,2`. It looks okay.

This is already wrong, unless the pixels are band-limited to Nyquist/4. Trivial example where this is not true:

  1 0 1 0

If such a signal is decimated by 2 you get

1 1

Which is not correct.

im3w1l · on Feb 16, 2023

For downscaling, area averaging is simple and makes a lot of intuitive sense and gives good results. To me it's basically the definition of downscaling.

Like yeah, you can try to get clever and preserve the artistic intent or something with something like seamcarving but then I wouldn't call it downscaling anymore.

planede · on Feb 16, 2023

I suggest to read up on this:

https://entropymine.com/imageworsener/pixelmixing/

im3w1l · on Feb 16, 2023

Hmm, maybe I was wrong then!

actionfromafar · on Feb 16, 2023

The article talks about downsampling, not upsampling, just so we are clear about that.

And besides, a ranty blog post pointing out pitfall can still be useful for someone else coming from the same naïve (in a good/neutral way) place as the author.

jcynix · on Feb 16, 2023

Now that's an interesting topic for photographers who like to experiment with anamorphic lenses for panoramas.

An anamorphic lens (optically) "squeezes" the image onto the sensor, and afterwards the digital image has to be "desqueezed" (i.e. upscaled in one axis) to give you the "final" image. Which in turn is downscaled to be viewed on either a monitor or a printout.

But the resulting images I've seen until now nevertheless look good. I think that's because in natural images you have not that many pixel-level details. And we mostly see downscaled images on the web or in youtube videos most of the time ...

thrdbndndn · on Feb 16, 2023

I'm shocked. I don't even know this is a thing.

By that I mean, I know what bilinear/bicubic/lanczos resizing algorithms are, and I know they should at least have acceptable results (compared to NN).

But I don't know famous libraries (especially OpenCV which is a computer vision library!) could have such poor results.

Also a side note, IIRC bilinear and bicubic have constants in the equation. So technically when you're comparing different implementations you need to make sure this input (parameters) is the same. But this shouldn't excuse the extreme poor results in some.

NohatCoder · on Feb 16, 2023

At least bilinear and bicubic have a widely agreed upon specific definition. The poor results are the result of that definition. They work reasonably for upscaling, but downscaling more than a trivial amount causes them to weigh a few input pixels highly and outright ignore most of the rest.

leni536 · on Feb 16, 2023

> bicubic have a widely agreed upon specific definition

Not so fast: https://entropymine.com/imageworsener/bicubic/

NohatCoder · on Feb 17, 2023

Fair. To be clear the issue remains no matter the choice of these parameters.

pacaro · on Feb 16, 2023

I've seen more than one team find that reimplementing an OpenCV capability that they use gain them both in quality and performance.

This isn't necessarily a criticism of OpenCV, often the OpenCV implementation is, of necessity, quite general, and a specific use-case can engage optimizations not available in the general case

godshatter · on Feb 16, 2023

If their worry is the differences between algorithms in libraries in different execution environments, shouldn't they either find a library they like that can be called from all such environments or if they can't find one or there is no single library that can be used in all environments then shouldn't they just write their own using their favorite algorithm? Why make all libraries do this the same way? Which one is undeniably correct?

TechBro8615 · on Feb 16, 2023

That's basically what they did, which they mention in the last paragraph of the article. They released a wrapper library [0] for Pillow so that it can be called from C++:

> Since we noticed that the most correct behavior is given by the Pillow resize and we are interested in deploying our applications in C++, it could be useful to use it in C++. The Pillow image processing algorithms are almost all written in C, but they cannot be directly used because they are designed to be part of the Python wrapper. We, therefore, released a porting of the resize method in a new standalone library that works on cv::Mat so it would be compatible with all OpenCV algorithms. You can find the library here: pillow-resize.

[0] https://github.com/zurutech/pillow-resize

JackFr · on Feb 16, 2023

Hmmm. With respect to feeding an ML system, are visual glitches and artifacts important? Wouldn't the most important thing to use a transformation which preserves as much information as possible and captures relevant structure? If the intermediate picture doesn't look great, who cares if the result is good.

Ooops. Just thought about generative systems. Nevermind.

brucethemoose2 · on Feb 16, 2023

Just speaking from experience, GAN upscalers pick up artifacts in the training dataset like a bloodhound.

You can use this to your advantage by purposely introducing them into the lowres inputs so they will be removed.

IYasha · on Feb 16, 2023

So, what are the dangers? (what's the point of the article?) That you'll get different model with same originals processed by different algorithms?

The comparison of resizing algorithms is not something new, importance of adequate input data is obvious, difference in image processing algorithms availability is also understandable. Clickbaity.

azubinski · on Feb 16, 2023

A friend of mine decided to take up image resizing on the third lane of a six-lane highway.

And he was hit by a truck.

So it's true about the danger of image resizing.

IYasha · on Feb 18, 2023

plot twist: a Tesla truck, with autopilot using bad image resizing algorithms )

TechBro8615 · on Feb 16, 2023

If you read to the end, they link to a library they made for solving the problem by wrapping Pillow C functions to be callable in C++

ricardobeat · on Feb 16, 2023

Was hoping to see libvips in the comparison, which is widely used.

I wonder why it's not adopted by any of these frameworks?

intrasight · on Feb 16, 2023

I was sort of expecting them to describe this danger to resizing: one can feed a piece of an image into one of these new massive ML models and get back the full image - with things that you didn't want to share. Like cropping out my ex.

IS ML sort of like a universal hologram in that respect?

pallas_athena · on Feb 17, 2023

If you upscale (with interpolation) some sensitive image (think security camera), could that be dismissed in court as it "creates" new information that wasn't there in the original image?

hgomersall · on Feb 16, 2023

The bigger problem is that the pixel domain is not a very good domain to be operating in. How many hours and of training and thousands of images are used to essentially learn about Gabor filters.

biscuits1 · on Feb 16, 2023

This article throws a red flag on proving negative(s). This is impossible with maths. The void is filled by human subjectivity. In a graphical sense, "visual taste."

mythz · on Feb 16, 2023

What are some good image upscaler libraries that exist? I'm assuming the high quality ones would need to use some AI model to fill in missing detail.

soderfoo · on Feb 16, 2023

Waifu2x - I've used the library to upscale both old photos and videos with enough success to be pleased with the results.

https://github.com/nagadomi/waifu2x

brucethemoose2 · on Feb 16, 2023

Depends on your needs!

Zimg is a gold standard to me, but yeah, you can get better output depending on the nature of your content and hardware. I think ESRGAN is state-of-the-art above 2x scales, with the right community model from upscale.wiki, but it is slow and artifacty. And pixel art, for instance, may look better upscaled with xBRZ.

erulabs · on Feb 16, 2023

Image resizing is one of those things that most companies seem to build in-house over and over. There are several hosted services, but obviously sending your users photos to a 3rd party is pretty weak. For those of us looking for a middle-ground: I've had great success with imgproxy (https://github.com/imgproxy/imgproxy) which wraps libvips and well is maintained.

singularity2001 · on Feb 16, 2023

funny that they use tf and pytorch in this context without even mentioning their fantastic upsampling capabilities

est · on Feb 16, 2023

Is there any hacks/study to maximize the downsampling errors?

E.g. looks totally different on original vs 224x224 pictures

version_five · on Feb 16, 2023

There is a "resizing attack" that's been published that does what you're suggesting

https://embracethered.com/blog/posts/2020/husky-ai-image-res...

WithinReason · on Feb 16, 2023

torch.nn.functional.interpolate has an "antialias" switch that's off by default

qwertyforce · on Feb 16, 2023

It seems it was introduced after 1.9.0

https://pytorch.org/docs/1.9.0/generated/torch.nn.functional...

WithinReason · on Feb 16, 2023

You're right, looks like it was added with 1.11 on March 10, 2022. Seems like an important feature to miss so long!

dark-star · on Feb 16, 2023

downscaling images introduces artifacts and throws away information! news at 5!

AtNightWeCode · on Feb 16, 2023

Thought this article was going to be about DDOS...

fIREpOK · on Feb 16, 2023

I favored cropping even back in 2021

thr0wnawaytod4y · on Feb 16, 2023

Came here for a new ImageTragick but got actual resizing problems

cynicalsecurity · on Feb 16, 2023

Finally someone said it.