"The proposed algorithm can be quickly described as an iterative algorithm that treats color information as a heightmap and 'pushes' pixels towards probable edges using gradient-ascent. This is very likely what learning-based approaches are already doing under the hood (eg. VDSR, waifu2x)."
This is interesting to me because it hints at the direction I really want to see ML stuff go.
Some problems may not lend themselves to this concept, but hear me out: We train models, they start giving reliable output, then we put it in production really having no idea what the thing is doing inside. Here we have a traditional image processing algorithm that's doing something similar to what the author suspects the ML-based solution is doing... only the authors solution is much more performant. What I think we'd love to see is the ML approach yield a result that not only works, but is transparent in how it works. So plain old human engineers can internalize what the machine learned, and re-implement the solution as a run-of-the mill algorithm that does the job faster than pretending to be a brain.
Is this feasible?
Perhaps MT (machine teaching?) is the next evolution of ML.
My enthusiasm in this instance is probably tempered by the fact that image resizing is on the simple end of things we're using ML for, I'd think.
It's a two dimensional grid of data points. That's it. I mean, that's certainly not trivial (look at all the algorithms we've come up with just in the last 10-20 years! imagine all the people-hours!) but it pales in complexity to, say, weather models or automated scanning of PET scans for tumors or something.
Image the output of any given image sizing algorithm can be quickly assessed by eye so that's a very convenient feedback loop. As opposed to say, using ML to come up with proposed oil drilling locations where testing out each proposed drilling spot is a very expensive proposition.
So plain old human engineers can internalize what
the machine learned, and re-implement the solution
as a run-of-the mill algorithm that does the job
faster than pretending to be a brain.
Disclaimer, in case it's not blindingly obvious - I am not versed in ML at all.
Any sufficiently complex system acts as a black box when it becomes easier to experiment with than to understand. Hence, black-box optimization has become increasingly important as systems become more complex. - Google Vizier: a service for black-box optimization Golovin et al., KDD'17
... via https://github.com/globalcitizen/taoup
I would rather a high level algorithm description as an output -- which could definitely be fed into some sort of compiler that ultimately outputs executable code.
I feel like going straight to executable code isn't solving the problem GP was interested in, which I believe to be the problem of transferring knowledge from machine to engineer in much the way an engineer would transfer it to another engineer.
An algorithm that outputs code without any high level understanding or documentation is about as useful to me in a large project as an intern who can copy-paste from Stack Overflow and produce volumes of code with no documentation, in the long term.
I suspect many (most?) algorithms are sufficiently complex as to make this completely infeasible, but hopefully I'm wrong!
For example the first impresive ImageNet solvers clearly worked by coming up with a number of characteristics based mainly around various "textures" rather than "shapes", but this wasn't obvious when it was first published. It really seemed like it could "recognise a Panda" etc.
ML can insert "its best guess based on a training set". A human-tuned algo can insert "its output as defined by the handwritten aglo", which presumably is based on the human's own "training set" of personal experience.
but the truth of any lossy encoding is that... information is lost, period. best you can do is guess as to what was there.
"Information is lost" is too vague. You're counting bits on disk, but fewer bits does not always mean "less information" when your algorithm gets smarter. Compression is the obvious, classical example. Even for lossy compression, information loss is << change in size.
ML offers the promise to take this to extreme levels: give it a picture of (part of) the NY skyline, and it adds the rest from memory, adjusting weather and time of day to your sample. Is that new information "real"? That's really up to your definition.
The best example of this idea is those CSI-Style "Enhance" effects: It used to be true that people on Slashdot and later HN would outrank each other with the superior smartitude of saying "That's impossible! Information was lost!".
Funny story: that effect now exists. It's quite obvious that, for example, a low-res image of a license plate still contains some data, and that an algorithm can find a license plate number that maximizes the probability of that specific low-res image. With a bit of ML, those algorithms have become better than the human brain in almost zero time flat.
Turns out the information was still there.
And then you can get fooled instead of actually correctly believing the image was unreadable.
There is no free lunch, even with robust estimators. They will make mistakes. For image quality, it is ok to make a mistake here or there. For actual recognition? Terrible.
Better than human brain? Show it.
People are pretty good at reading blurry text when trained, but I'm not aware of a test pitting trained people against a machine.
(No, Mechanical Turk does not count as trained at a specific task.)
That's because there was enough information (data) present to extrapolate.
Let's say you take a photo of someone across the room, and downsize it so it's low res, then use machine learning to upscale it.
It will do it's best to reconstruct the face/other features based off it's data. It might even get pretty close. But it still has no way of knowing where every single freckle or mole on their skin is - it might try placing some based off what it's learn but they aren't related to the actual person.
Here's another good example , it doesn't know what color the bridge should be. Maybe it was painted white, and should stay white! We humans know other information such as which bridge that is, so we know what color it should be, but there's not enough data to extrapolate that from the image alone.
And mostly temporally stable, which is not even getting exploited by this cheap but effective superresolution filter.
>Interesting enough, waifu2x performed very poorly on anime. A plausible explaination is that the network was simply not trained to upscale these types of images. Usually anime style art have sharper lines and contain much more small details/textures compared to anime. The distribution of images used to train waifu2x must have been mostly art images from sites like DevianArt/Danbooru/Pixiv, and not anime.
In the chart, it says to compare "perceptual quality", but the axis is only marked with "blurry" and "less blurry". Sharpness is not the only thing about the (perceptual or not) quality. I can tell that Anime4K's result is indeed very sharp, but the quality of the edges/lines are very unnatural even for the examples author provided. I personally would prefer a slightly blurry lines with less "oily effect".
Also, I didn't see any comparison with ground truth, i.e. having a high-resolution image first, resize it down, use the proposed algorithms (among existing ones) to upscale it back, and then compare the upscaled results with the original image. I understand it may be hard to find enough examples of 4k animes, but we can do so with 1080p -> 480p -> 1080p etc.
(I am not familiar with this domain, do similar researches normally do this or not in their analysis?)
To my knowledge, not much has changed since 2017 where only a single anime (Clockwork Planet) was produced in 1080p. The only two studios I can name offhand that I know have done 1080p masters are KyoAni and JC Staff.
 2017 reference: https://www.reddit.com/r/anime/comments/65wqeu/spring_2017_a...
Why have they stuck with 720P and 1080P?
Another factor, I believe, is the know-how. In my opinion, despite anime being broadcasted in 16:9 for so long, it is only in recent years where the extra width are put in a good use during layouting.
> Interesting enough, waifu2x performed very poorly on anime. A plausible explaination is that the network was simply not trained to upscale these types of images. Usually anime style art have sharper lines and contain much more small details/textures compared to anime. The distribution of images used to train waifu2x must have been mostly art images from sites like DevianArt/Danbooru/Pixiv, and not anime.
Being upscaled and released on DVD or BluRay at 1080p (which most anime have been for most of the past decade) is not the same as being produced at 1080p.
I wasn't aware of the two Gundam movies mentioned by fireattack, but I can't confirm they were mastered at 4k and aren't just upscales. So if you could name some of those 4k releases that would be helpful, especially if you can provide information as to them being mastered at 4k and not just upscaled to 4k.
Would you feel better if I said "they upscale 720p and 837p and 900p and 810p and 806p and 873p and 864p and 957p and 878p and 719p to 1080p"? I excluded non-standard resolutions for simplicity since it doesn't really change my greater point: most 1080p releases are just upscales. 19 of 41 listed are 720p and 720p is the most common resolution listed.
Of course, in the end, it's an entirely subjective thing. Personally I hold off using on using NGU sharp and use NGU Anti-Alias instead for the above reasons.
EDIT: this is addressed in the readme:
I think the results are worse!
-Surely some people like sharper edges, some like softer ones. Do try it yourself on a few anime before reaching a definite conclusion. People tend to prefer sharper edges. Also, seeing the comparisons on a 1080p screen is not representative of the final results on a 4K screen, the pixel density and sharpness of the final image is simply not comparable.
EDIT: I just tried this filter on a 4k monitor. To honest I don't think is very good. To me it reminds me of the bad parts of sharpeners turned up to the max. All the edges turn into a weird, sometimes jagged, smear, and originally blurry but detailed backgrounds just become a weird mess. I really don't think even people who like sharpness will prefer this filter for general viewing, and I find the chart given in the preprint (https://raw.githubusercontent.com/bloc97/Anime4K/master/resu...) extremely dubious.
A lot of the really old cartoons would use a background art image and would pan over it with the characters dong stuff to create a sense of motion. Sometimes the characters would move over a still background image but the 'camera's would zoom in.
Something that could extract the full size background image to apply it to the frames to enlarge the aspect ratio could go a long way toward revitalizing a lot of older cartoons. Especially fit could fill in any gaps using the opensource equivilent of Content Aware Fill (is there an FOSS equal?)
I've been trying to get my kids into Space Ghost Coast to Coast, Home Movies, Sealab 2021, the Simpsons, etc. If the video is wide screen they try it and enjoy it. If it's 4:3 they barely give it a chance because it's "too old"
”Adobe Systems acquired a non-exclusive license to seam carving technology from MERL, and implemented it as a feature in Photoshop CS4, where it is called Content Aware Scaling. As the license is non-exclusive, other popular computer graphics applications, among which are GIMP, digiKam, ImageMagick, as well as some stand-alone programs, among which are iResizer, also have implementations of this technique, some of which are released as free and open source software”
Seam carving removes stuff, but the principle is the same. The Gimp plug-in is http://www.logarithmic.net/pfh/resynthesizer, and apparently also can do the filling-in. I haven’t used it, so I don’t know how good it is.
https://perso.crans.org/frenoy/matlab2012/seamcarving.pdf (emphasis added):
”We propose a simple image operator, we term seam-carving, that can change the size of an image by gracefully carving-out OR INSERTING pixels in different parts of the image”
That paper (which I think is the paper introducing the seam carving technique) also has examples of widening pictures.
but for the stated purpose this looks pretty good. for example, 720p [https://giant.gfycat.com/AccomplishedBelatedBlueshark.webm] to 1440p [https://giant.gfycat.com/FluidBlissfulCob.webm] test. is subtle, improves video, and runs fine (tested via mpv, https://mpv.io/manual/master/#options-glsl-shaders).
Anime4k looks obviously like a filter (I think Photoshop has an effect that looks like that, but I can't remember the name at the moment), particular at the 4x setting.
Consider a grayscale morphological operator such as erosion. For each pixel, you would replace the value with the minimum value found inside a structuring element surrounding the pixel. This is kind of like a weird morphological operator with a 3x3 box structuring element, where instead of choosing values based on a simple criterion such as 'min' or 'max' you use information from an approximation of the image gradient. If the gradient magnitude is above some threshold, you select the neighbor pixel in the 3x3 structuring element in the opposite direction of the gradient.
This generally has the effect of making the edges more pronounced. Intuitively, you're distorting the image by "pinching" along the edges. To prevent weird color artifacts, they're using edges computed on grayscale data so that the identical morphological filter is applied to each color channel.
It seems similar but not identical to the method described in this paper:
T. A. Mahmoud and S. Marshall, Edge-Detected Guided Morphological Filter for Image Sharpening 2008
In any case, great looking results! Proof that neural networks have not yet made thinking obsolete.
> [...] a big weakness of our algorithm [...] is texture detail, however since upscaling art was not our main goal, our results are acceptable.
That sounds like a multiobjective optimization problem. If this multiobjective optimization problem was solved (permitting the nature or structure of the multiobjective optimization problem, of course), then the algorithm would be improved, don't you agree?
Did the authors of this algorithm not have the capability to formulate or recognize the multiobjective optimization problem?
Or if they did have the formulation capabilities, but that they did not have the capability to solve the multiobjective optimization problem? Why if so? Too difficult? Not enough time? Limited by a resource? No intention to have done so, excepting that they said that a specific trade-off was acceptable?
You're welcome to share your speculation or opinion, Hacker News reader.
I'm curious to know your thoughts, is all.
FWIW I tried doing the same thing using waifu2x, but it was about one or two orders of magnitude too slow. I don't remember the details but I think it worked out to about 2 weeks of 24/7 operation on a 1070 to upscale a full show (don't remember if it was 1-cour or 2-cour) to 1080p. Results were okay, gave kind of an oily texture to it but the denoising worked quite well. If it took only a day or two to convert a full show I'd consider doing it on some old 480p shows with bad quality, though I probably would just watch the original video myself.
Maybe some style-transfer related algorithm could be useful in this situation?
Wonder how it works on more "fancy" looking anime like
Also does this run on Linux or Mac? Haven't had a Windows machine in years.
How about using the same algorithm to upscale 540p to 1080p, and compare with 1080p ground truth? Would that not be sufficient?
The application domain for this includes any other sort of abstract logical synthesis, charts and maybe videogames (even ones that look realistic).
Real world content also has sharp boundaries between objects, and whatever part happens to do that work might be shared, but within objects fuzzier is probably better. IIRC someone was making an AI assisted upscaling of DS9 which would probably be closer to a generic algorithm for 'filmed' content.
Anime and cartoons have very specific qualities that allow for these types of techniques to be effective (as other reply explains).
But with 5fps, each frame can be so radically different, I think interpolation is generally just not possible. You can generate something smooth, but it will be so far away from whatever an animator would actually have inserted, that it will seem more strange/surreal than natural, and thus achieve the opposite effect as intended.
E.g. see  which shows animation at 15/30/60fps... you can see that even with the 15, it's hard to imagine an algorithm that would port well to 60. (Use the period on your keyboard to advance frame-by-frame.)
Even high grade interpolation sometimes has this problem - or the comparable "wake of water near moving object" one. Essentially you'd get tons of inpainting kind of artifacts.
By the way, most animated sequences are about 8fps even now.
Aside from the obvious problem of motion interpolation having problems with acceleration/deceleration, there's a lot of nuance in the original animation that gets lost when you try to interpolate from one sprite to the next.
Even if you can avoid obvious artifacts, no interpolation algorithm can create new information, it can only derive from what's already there and guess at what's missing.
EDIT: If you dig through twitter you'll find some tweets from animators explaining why the results are bad. As mere consumers we might be tempted to dismiss that criticism as snobbery but animating is a craft and the interpolated results are objectively worse than the original.
Old cartoons lend themselves to be converted to vector images and these are better to animate automatically.
Direct link to video: https://www.youtube.com/watch?v=MjViy6kyiqs
They seemed to have built an edge optimized image upscaler. It prevents the edges from becoming soft during the upsampling.
You can clearly see the difference in their comparison pictures (of which they have a metric ton)
The upsampling algorithm in the OP is not based on machine learning but is also fairly domain specific and of limited general applicability.