
Anime4K: Real time high quality video upscaling - bufferoverflow
https://github.com/bloc97/Anime4K
======
jpk
Part of the abstract at the bottom:

"The proposed algorithm can be quickly described as an iterative algorithm
that treats color information as a heightmap and 'pushes' pixels towards
probable edges using gradient-ascent. This is very likely what learning-based
approaches are already doing under the hood (eg. VDSR[1], waifu2x[2])."

This is interesting to me because it hints at the direction I really want to
see ML stuff go.

Some problems may not lend themselves to this concept, but hear me out: We
train models, they start giving reliable output, then we put it in production
really having no idea what the thing is doing inside. Here we have a
traditional image processing algorithm that's doing something similar to what
the author suspects the ML-based solution is doing... only the authors
solution is much more performant. What I think we'd love to see is the ML
approach yield a result that not only works, but is transparent in how it
works. So plain old human engineers can internalize what the machine learned,
and re-implement the solution as a run-of-the mill algorithm that does the job
faster than pretending to be a brain.

Is this feasible?

~~~
JohnBooty
Feasible? That seems highly dependent on the task at hand. Worthy? Absolutely!

Perhaps MT (machine teaching?) is the next evolution of ML.

My enthusiasm in this instance is probably tempered by the fact that image
resizing is on the simple end of things we're using ML for, I'd think.

It's a two dimensional grid of data points. That's it. I mean, that's
certainly not trivial (look at all the algorithms we've come up with just in
the last 10-20 years! imagine all the people-hours!) but it pales in
complexity to, say, weather models or automated scanning of PET scans for
tumors or something.

Image the output of any given image sizing algorithm can be quickly assessed
by eye so that's a very convenient feedback loop. As opposed to say, using ML
to come up with proposed oil drilling locations where testing out each
proposed drilling spot is a very expensive proposition.

    
    
        So plain old human engineers can internalize what 
        the machine learned, and re-implement the solution 
        as a run-of-the mill algorithm that does the job 
        faster than pretending to be a brain.
    

Perhaps we can cut out the middleman here. Maybe the answer is not for ML
models to come up with human-understandable algorithms. Perhaps the answer is
for them to produce optimized code that implements the algorithms they've
discovered.

Disclaimer, in case it's not blindingly obvious - I am not versed in ML at
all.

~~~
merlincorey
> Perhaps we can cut out the middleman here. Maybe the answer is not for ML
> models to come up with human-understandable algorithms. Perhaps the answer
> is for them to produce optimized code that implements the algorithms they've
> discovered.

I would rather a high level algorithm description as an output -- which could
definitely be fed into some sort of compiler that ultimately outputs
executable code.

I feel like going straight to executable code isn't solving the problem GP was
interested in, which I believe to be the problem of transferring knowledge
from machine to engineer in much the way an engineer would transfer it to
another engineer.

An algorithm that outputs code without any high level understanding or
documentation is about as useful to me in a large project as an intern who can
copy-paste from Stack Overflow and produce volumes of code with no
documentation, in the long term.

~~~
JohnBooty
Yeah, that would of course be preferable. A humanized description like "push
the pixels toward the edges" is of course a wonderful thing.

I suspect many (most?) algorithms are sufficiently complex as to make this
completely infeasible, but hopefully I'm wrong!

------
ladberg
Definitely a breath of fresh air that someone's still trying to do super-
resolution without neural networks. This example shows that at the moment, it
can still be better and MUCH faster to use classical CV techniques for certain
applications.

~~~
gruez
A similar thing happened with upscaling algorithms for video games. AMD's
Contrast Adaptive Sharpening was shown to have superior image quality than
Nvidia's Deep Learning Super Sampling[1]. Plus the former algorithm works on
every game and doesn't need a training set unlike the deep learning algorithm.

[1] [https://www.techspot.com/article/1873-radeon-image-
sharpenin...](https://www.techspot.com/article/1873-radeon-image-sharpening-
vs-nvidia-dlss/)

~~~
kevin_thibedeau
ML implementations can insert detail that was never present in the original
image. You can't get that with other methods. That may or not be a good thing
depending on the source material and your desired result.

~~~
2bitencryption
"detail that was never present" doesn't exist.

ML can insert "its best guess based on a training set". A human-tuned algo can
insert "its output as defined by the handwritten aglo", which presumably is
based on the human's own "training set" of personal experience.

but the truth of any lossy encoding is that... information is lost, period.
best you can do is guess as to what was there.

~~~
IfOnlyYouKnew
This is one of those old talking points people for some reason love...

"Information is lost" is too vague. You're counting bits on disk, but fewer
bits does not always mean "less information" when your algorithm gets smarter.
Compression is the obvious, classical example. Even for lossy compression,
information loss is << change in size.

ML offers the promise to take this to extreme levels: give it a picture of
(part of) the NY skyline, and it adds the rest from memory, adjusting weather
and time of day to your sample. Is that new information "real"? That's really
up to your definition.

The best example of this idea is those CSI-Style "Enhance" effects: It used to
be true that people on Slashdot and later HN would outrank each other with the
superior smartitude of saying "That's impossible! Information was lost!".

Funny story: that effect now exists. It's quite obvious that, for example, a
low-res image of a license plate still contains _some_ data, and that an
algorithm can find a license plate number that maximizes the probability of
that specific low-res image. With a bit of ML, those algorithms have become
better than the human brain in almost zero time flat.

Turns out the information was still there.

~~~
pjc50
This is quite capable of producing a high-res image of _some_ license plate,
yes. But it's only probabilistic: there's no proof that the license plate
definitely has the exact same number as the one in the low-res photo. You have
to allow for the possibility of the system hallucinating the wrong result and
enhancing the certainty of it. While you could use it as input to a police
search it would be grossly unjust to show such an enhanced image to a jury.

~~~
buckminster
Like the Xerox scanner bug which randomly altered digits in numbers. These
problems aren't just theoretical.

[http://www.dkriesel.com/en/blog/2013/0802_xerox-
workcentres_...](http://www.dkriesel.com/en/blog/2013/0802_xerox-
workcentres_are_switching_written_numbers_when_scanning)

------
sand500
Interesting bit since I always figured waifu2x was the best at upscaling:

>Interesting enough, waifu2x performed very poorly on anime. A plausible
explaination is that the network was simply not trained to upscale these types
of images. Usually anime style art have sharper lines and contain much more
small details/textures compared to anime. The distribution of images used to
train waifu2x must have been mostly art images from sites like
DevianArt/Danbooru/Pixiv, and not anime.

~~~
k_sze
Which is a bit ironic because I always thought that the name 'waifu2x' came
from anime/otaku culture, yet it sucks when applied to anime. ¯\\_(ツ)_/¯

~~~
Liquid_Fire
It performs well on anime-style (static) drawings, as opposed to animation.

------
fireattack
I'm not sure I understand how the author compares the quality in the preprint.

In the chart, it says to compare "perceptual quality", but the axis is only
marked with "blurry" and "less blurry". Sharpness is not the only thing about
the (perceptual or not) quality. I can tell that Anime4K's result is indeed
very sharp, but the quality of the edges/lines are very unnatural even for the
examples author provided. I personally would prefer a slightly blurry lines
with less "oily effect".

Also, I didn't see any comparison with ground truth, i.e. having a high-
resolution image first, resize it down, use the proposed algorithms (among
existing ones) to upscale it back, and then compare the upscaled results with
the original image. I understand it may be hard to find enough examples of 4k
animes, but we can do so with 1080p -> 480p -> 1080p etc.

(I am not familiar with this domain, do similar researches normally do this or
not in their analysis?)

~~~
Nadya
There is no ground truth because AFAIK there are no native 4k anime produced
yet. There are _very_ few produced at 1080p. Most 1080p that are released by
studios are just an upscaled 720p master. Fansubbers will sometimes release
their own upscaled 720p when the studio upscale was done very poorly.

To my knowledge, not much has changed since 2017 where only a single anime
(Clockwork Planet) was produced in 1080p. The only two studios I can name
offhand that I know have done 1080p masters are KyoAni and JC Staff.

[0] 2017 reference:
[https://www.reddit.com/r/anime/comments/65wqeu/spring_2017_a...](https://www.reddit.com/r/anime/comments/65wqeu/spring_2017_anime_resolution_chart/)

~~~
ksec
Excuse my ignorance, I would have thought Anime would be much easier to be
produced in 4K or even 8K when compared to movies that requires 4K / 8K
Camera.

Why have they stuck with 720P and 1080P?

~~~
sirn
It is actually harder to produce in 4K/8K due to more details that need to be
drawn to not make it looked too empty and need to make sure the lines are not
too thick (e.g. by using a larger paper). TV series are usually drawn on an A4
paper with 1-2 inch margin while a proper theatrical releases are drawn on a
B4 paper.

Another factor, I believe, is the know-how. In my opinion, despite anime being
broadcasted in 16:9 for so long, it is only in recent years where the extra
width are put in a good use during layouting.

------
deftnerd
I've also been wondering if there is something similar to Content Aware Fill
that can help process old 4:3 cartoons to 16:9.

A lot of the really old cartoons would use a background art image and would
pan over it with the characters dong stuff to create a sense of motion.
Sometimes the characters would move over a still background image but the
'camera's would zoom in.

Something that could extract the full size background image to apply it to the
frames to enlarge the aspect ratio could go a long way toward revitalizing a
lot of older cartoons. Especially fit could fill in any gaps using the
opensource equivilent of Content Aware Fill (is there an FOSS equal?)

I've been trying to get my kids into Space Ghost Coast to Coast, Home Movies,
Sealab 2021, the Simpsons, etc. If the video is wide screen they try it and
enjoy it. If it's 4:3 they barely give it a chance because it's "too old"

~~~
Someone
_”using the opensource equivilent of Content Aware Fill (is there an FOSS
equal?)”_

[https://en.m.wikipedia.org/wiki/Seam_carving#Implementations](https://en.m.wikipedia.org/wiki/Seam_carving#Implementations):

 _”Adobe Systems acquired a non-exclusive license to seam carving technology
from MERL, and implemented it as a feature in Photoshop CS4, where it is
called Content Aware Scaling. As the license is non-exclusive, other popular
computer graphics applications, among which are GIMP, digiKam, ImageMagick, as
well as some stand-alone programs, among which are iResizer, also have
implementations of this technique, some of which are released as free and open
source software”_

Seam carving removes stuff, but the principle is the same. The Gimp plug-in is
[http://www.logarithmic.net/pfh/resynthesizer](http://www.logarithmic.net/pfh/resynthesizer),
and apparently also can do the filling-in. I haven’t used it, so I don’t know
how good it is.

~~~
NoodleIncident
My only experience with Photoshop is through memes; is Content Aware Fill the
same as Content Aware Scaling? I thought the former tried to guess what was
"behind" something you removed, while the latter just moves the existing
pixels around by guessing which ones need to stay together when you resize
something.

~~~
trynewideas
See the PatchMatch research project and associated papers[1] for more detail.
They are different tools in presentation and implementation within Photoshop
but are based on similar concepts of randomized correspondence.

[1]:
[https://research.adobe.com/project/patchmatch/](https://research.adobe.com/project/patchmatch/)

------
sand500
Didn't realize the madVR NGU algorithm is proprietary. A comparison of various
upscaling algorithms:
[https://artoriuz.github.io/mpv_upscaling.html](https://artoriuz.github.io/mpv_upscaling.html)

------
czr
waifu2x still much nicer for art of course (comparison:
[https://i.imgur.com/4QkIUOc.png](https://i.imgur.com/4QkIUOc.png) 2x,
[https://i.imgur.com/pQDuIpl.png](https://i.imgur.com/pQDuIpl.png) 4x)

but for the stated purpose this looks pretty good. for example, 720p
[[https://giant.gfycat.com/AccomplishedBelatedBlueshark.webm](https://giant.gfycat.com/AccomplishedBelatedBlueshark.webm)]
to 1440p
[[https://giant.gfycat.com/FluidBlissfulCob.webm](https://giant.gfycat.com/FluidBlissfulCob.webm)]
test. is subtle, improves video, and runs fine (tested via mpv,
[https://mpv.io/manual/master/#options-glsl-
shaders](https://mpv.io/manual/master/#options-glsl-shaders)).

~~~
userbinator
What's GT? It performs the best in your examples.

Anime4k looks obviously like a filter (I think Photoshop has an effect that
looks like that, but I can't remember the name at the moment), particular at
the 4x setting.

~~~
kalleboo
GT = Ground Truth, it's the original image used for the comparison (before
being scaled down and then to scaled up with the different algorithms)

------
neetdeth
I found the preprint somewhat confusing with its talk of approximate residuals
and "pushing" pixels. Let me propose another way to think of this and someone
can tell me if I'm off base. Disclaimer, I haven't read the source code.

Consider a grayscale morphological operator such as erosion. For each pixel,
you would replace the value with the minimum value found inside a structuring
element surrounding the pixel. This is kind of like a weird morphological
operator with a 3x3 box structuring element, where instead of choosing values
based on a simple criterion such as 'min' or 'max' you use information from an
approximation of the image gradient. If the gradient magnitude is above some
threshold, you select the neighbor pixel in the 3x3 structuring element in the
opposite direction of the gradient.

This generally has the effect of making the edges more pronounced.
Intuitively, you're distorting the image by "pinching" along the edges. To
prevent weird color artifacts, they're using edges computed on grayscale data
so that the identical morphological filter is applied to each color channel.

It seems similar but not identical to the method described in this paper: T.
A. Mahmoud and S. Marshall, Edge-Detected Guided Morphological Filter for
Image Sharpening 2008
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.384...](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.384.8621&rep=rep1&type=pdf)

In any case, great looking results! Proof that neural networks have not yet
made thinking obsolete.

------
webdva
> [...] the proposed method [...] is tailored to content that puts importance
> to well defined lines/edges while tolerates a sacrifice of the finer
> textures.

and

> [...] a big weakness of our algorithm [...] is texture detail, however since
> upscaling art was not our main goal, our results are acceptable.

That sounds like a multiobjective optimization problem. If this multiobjective
optimization problem was solved (permitting the nature or structure of the
multiobjective optimization problem, of course), then the algorithm would be
improved, don't you agree?

Did the authors of this algorithm not have the capability to formulate or
recognize the multiobjective optimization problem?

Or if they did have the formulation capabilities, but that they did not have
the capability to solve the multiobjective optimization problem? Why if so?
Too difficult? Not enough time? Limited by a resource? No intention to have
done so, excepting that they said that a specific trade-off was acceptable?

You're welcome to share your speculation or opinion, Hacker News reader.

I'm curious to know your thoughts, is all.

~~~
nestorD
I believe they recognise the problem is a multiobjective optimization problem
(hence the formulation of their sentence) but their algorithm is not
parametrizable : it is a single point on the pareto front and you would need
other algorithms to explore the rest of the front.

------
Causality1
I'm not super clear on why speed was a primary goal if the intended
application is upscaling anime. If this were intended for, say, sharpening the
graphical output from a game console, sure, but why does premade video content
like anime need upscaling that only takes 3ms instead of 6ms or even 60ms?

~~~
gwern
So you can simply have it as an option in a media player (as they indeed have
theirs) instead of requiring a cumbersome preprocessing pass which will in
addition produce a much larger file size.

------
spython
I have a small resolution video of a (static) scene, and a high resolution
photograph of the same scene. Does anyone know of an upscaling algorithm that
takes an image as auxiliary input?

Maybe some style-transfer related algorithm could be useful in this situation?

------
hnaccy
The examples seem to focus on characters.

Wonder how it works on more "fancy" looking anime like

[https://www.youtube.com/watch?v=eVGbgBy_yo4](https://www.youtube.com/watch?v=eVGbgBy_yo4)

------
ArlenBales
Any video examples? If you want a good subject, take the final fight scene
from the 1080p latest episode of Kimetsu no Yaiba (Ep 19) and upscale to 4K.

Also does this run on Linux or Mac? Haven't had a Windows machine in years.

~~~
throwaway8941
If I understand it correctly, the whole project is one shader file. Sure it's
portable, just pick the glsl file from the repository and plug it into your
favorite video player.

Edit: uh-huh.

[https://github.com/bloc97/Anime4K/blob/master/GLSL_Instructi...](https://github.com/bloc97/Anime4K/blob/master/GLSL_Instructions.md)

------
k_sze
Can somebody explain why it would matter that the ground truth be at exactly
2160p resolution?

How about using the same algorithm to upscale 540p to 1080p, and compare with
1080p ground truth? Would that not be sufficient?

~~~
janekm
It's explain in some detail in the article, but in essence, imagine a fine pen
line which in 540p would be less than one pixel wide but in 2160p would be
multiple pixels wide. The problem solved by Anime4K algorithm is essentially
producing sharp edges of the line when upscaled to 4k, which is a different
problem from upscaling a <1 pixel antialiased line.

------
ttoinou
Interesting. Curious about applying this to normal real world images

~~~
mjevans
It's very likely that the results won't be as desired. Anime is nearly always
a synthetic image which is intended to be clean and geometrically based (even
if there are gradients and more real world additions; it's a synthetic).

The application domain for this includes any other sort of abstract logical
synthesis, charts and maybe videogames (even ones that look realistic).

Real world content also has sharp boundaries between objects, and whatever
part happens to do that work might be shared, but within objects fuzzier is
probably better. IIRC someone was making an AI assisted upscaling of DS9 which
would probably be closer to a generic algorithm for 'filmed' content.

------
Animats
Nice. Can they interpolate frames, too, so that old 5fps anime can get an
upgrade?

~~~
crazygringo
Going from something like 30fps to 60fps, interpolation works decently well in
many cases, because there's already so much information encoded in the 30fps.
And _some_ 15fps can work too.

But with 5fps, each frame can be so radically different, I think interpolation
is generally just not possible. You can generate _something_ smooth, but it
will be _so_ far away from whatever an animator would actually have inserted,
that it will seem more strange/surreal than natural, and thus achieve the
opposite effect as intended.

E.g. see [1] which shows animation at 15/30/60fps... you can see that even
with the 15, it's hard to imagine an algorithm that would port well to 60.
(Use the period on your keyboard to advance frame-by-frame.)

[1]
[https://www.youtube.com/watch?v=npMreLeVD6o](https://www.youtube.com/watch?v=npMreLeVD6o)

~~~
JonathanFly
I did some tests on this, trying to find footage I thought would be least well
suited to interpolation:

[https://twitter.com/jonathanfly/status/1156343739738152961](https://twitter.com/jonathanfly/status/1156343739738152961)

[https://www.youtube.com/watch?v=FRHmGEEemoY](https://www.youtube.com/watch?v=FRHmGEEemoY)

------
nitrogen
Are algorithms like this ever used by cartoon-style video games to improve
apparent rendering resolution?

------
somishere
Would there be a web use-case here (i.e. converting shaders to webgl) for e.g
upscaling map tiles?

------
AstralStorm
Is this someone reinventing xbr series of pixel art scaling filters?

------
pragmatick
Does anybod know how to use these shaders with pot player?

------
xienze
In the example they’re up scaling 1080p content to 4K. Am I missing something
or is that not particularly impressive? Isn’t it just pixel doubling?

~~~
penagwin
No if they did that it would look "pixelated".

They seemed to have built an edge optimized image upscaler. It prevents the
edges from becoming soft during the upsampling.

You can clearly see the difference in their comparison pictures (of which they
have a metric ton)

~~~
KaoruAoiShiho
I'm pretty sure this is what nvidia DLSS does. Only this works much better
than DLSS I think.

~~~
SeanBoocock
Per the name (Deep Learning Super Sampling), DLSS uses a trained neural
network to achieve high-quality upsampling. The neural network is trained on
representative output of the game at the internal framebuffer resolution and
at the target output resolution (with SSAA and such).

The upsampling algorithm in the OP is not based on machine learning but is
also fairly domain specific and of limited general applicability.

