
Using Waifu2x to Upscale Japanese Prints - mxfh
http://ejohn.org/blog/using-waifu2x-to-upscale-japanese-prints/
======
ComputerGuru
The "cleanliness" of the resulting images is undeniable, but once you get past
the sheer awe at how crisp and clear the upscaled image is, you'll immediately
notice the loss of detail. It completely does away with any and all texturing,
which is especially noticeable in the last image ([1] vs [2]) - look at the
scales and patterned lines on the snake (?) around his neck and the white
strands in his hair, and of course, the letters have been turned into
(unrecognizable?) squiggles.

Still, in terms of pure shock and awe - they're jaw-droppingly nice for
upscaled versions, to the point where if you didn't have the original, it
wouldn't occur to you that this wasn't it.

1:
[http://ukiyo-e.org/image/mfa/sc165440](http://ukiyo-e.org/image/mfa/sc165440)

2: [http://i.imgur.com/541uG5t.png](http://i.imgur.com/541uG5t.png)

~~~
0x09
To be fair, the demo site provides a configurable level of artifact reduction.
This article uses the highest level. Here it is with none and some:

[http://imgur.com/a/cVVnC](http://imgur.com/a/cVVnC)

~~~
jeresig
Yeah, on second thought, after seeing the low noise reduction result again, I
suspect that may be an even better result for what I'm looking to achieve.
Many of the details in his rope are preserved and the calligraphy appears to
be in better shape.

------
Daiz
Waifu2x is not actually the first or only image scaler to use neural networks
- NNEDI3[1], an Avisynth[2] filter used for deinterlacing can also do really
nice image upscaling (and it's a _lot_ faster than waifu2x). Here's an example
of what it can do to the images in the blogpost:

Image 1: [http://i.imgur.com/4cXr51v.png](http://i.imgur.com/4cXr51v.png)

Image 2: [http://i.imgur.com/PZAXeM8.png](http://i.imgur.com/PZAXeM8.png)

It doesn't come with any noise reduction, but nothing stops you from doing
that separately from the upscaling process itself, and that way you should be
able to control it better anyway (I find the reduction options provided by
waifu2x really aggressive even with the low setting, it just kills tons of
detail).

As a sidenote, when talking about something like image scaling, it would be a
good idea to avoid saying something like _" image scaled 2x (normally)"_ as
there are lots of ways to scale images and what's "normal" can vary a lot
depending on what you're using.

[1] [http://bengal.missouri.edu/~kes25c/](http://bengal.missouri.edu/~kes25c/)

[2] [http://avisynth.org](http://avisynth.org)

~~~
jeresig
NNEDI3 is fantastic - thank you for providing a link and some samples!

You're absolutely right that I shouldn't have said "normal". I update the post
to clarify that this was using "OSX Preview". I did some hunting but didn't
find any obvious pointers as to which algorithm they're using. If anyone knows
offhand I'll be happy to include it!

~~~
jonah
Talk with the imgix.com folks about the CoreImage stuff. They're using the
built-in re-sampling in their product.

Also chat with @deepbluecea who's done a lot of image processing stuff,
including for Apple.

~~~
TD-Linux
I looked at what imgix was using a few weeks ago on HN. The resampling they do
is really poor. You can do much better with imagemagick.

[https://news.ycombinator.com/item?id=9501601](https://news.ycombinator.com/item?id=9501601)

~~~
jonah
Yeah, why they're going through all that hardware effort, I dunno. Simpler
developer workflow I guess. Would be interesting to do a cost/benefit vs. just
using a Linux stack.

~~~
nacs
The 'hardware effort' is to get dramatically improved processing time by using
the GPU since they're trying to do it on a much larger scale.

I have/continue to use imagemagick and similar software-based solutions and
they're pretty slow for multi-MB images (but most servers don't have good GPUs
so it's the only solution unless you're building custom racks as imgix does).

~~~
TD-Linux
Yeah, I'm not super sure about the dramatically improved processing time.
Especially compared to a SIMD-optimized scaler. You have to spend some time
sending the image to the GPU and reading it back too.

Especially if you set imagemagick to use the much worse scaler that imgix
uses, I imagine it'd be pretty fast.

On the other hand, if you replaced imgix's stack with the high quality scalers
from mpv (written as OpenGL pixel shaders), and then compared to expensive CPU
scalers, I would expect a GPU solution to be a win.

Note that imgix also has to recompress the image as PNG or JPEG at the end.
This has to be done on the CPU and is probably more resource intensive than
any of the scaling.

~~~
nacs
You can upload 100s of MBs of texture data to a GPU in milliseconds. Sending
and receiving from GPU doesn't actually take that long in comparison to the
time it takes to process a multi-MB file in software.

------
the8472
The NN was explicitly trained for artifact-free PNG sources of anime fanart.
Which it handles quite well according to my own testing[1]

Its benefits are questionable if used on anything else.

I've also tested it on anime screenshots and in that case it's it pretty much
is en par there with NNEDI3 (which is computationally much cheaper) because
real world encodes actually have compression artifacts and those get scaled
too if you disable noise reduction or everything is smoothed out too much if
you leave it on.

So if you want to use it on anything else you really do have to retrain the NN
first, otherwise you get results you could also achieve by other means (e.g.
warpsharp, NNEDI or Photoshop Topaz)

Also, waifu2x only scales luma. Its chroma handling is just regular upscaling
(whatever imagemagick uses by default. I think), so even that part could be
improved.

[1]
[http://forum.doom9.org/showpost.php?p=1722990&postcount=3](http://forum.doom9.org/showpost.php?p=1722990&postcount=3)

------
gburt
Related, this really cool project:

[http://research.microsoft.com/en-
us/um/people/kopf/pixelart/](http://research.microsoft.com/en-
us/um/people/kopf/pixelart/)

------
yellowapple
This looks like it could be applied to a real-life "ENHANCE" button. By
training similar algorithms with photographs instead of anime prints, would
this be a feasible means of approximating detail from enlarged photographs
CSI-style (not quite to the extreme one sees on TV, but perhaps enough for a
police sketch or something)?

~~~
kibwen
Something to keep in mind is that when upscaling, you are actually inventing
(fabricating) detail. Tools like the one presented here are content to invent
detail that looks pleasing to the eye, but if you tried to do something like
this for photographs you wouldn't get anything that would hold up as evidence.
You also wouldn't want to use this to guide a police sketch, because the
"enhanced" image actually contains _false_ information compared to the
original.

~~~
woodman
This upscaling can be considered a form of lossy compression. The pixelated
images, while ugly, contain more information - the process cannot be reversed
due to this loss of information.

> You also wouldn't want to use this to guide a police sketch...

You would, could and can. Take a look at police sketch software, it is a very
manual version of this process. There are a very limited number of potential
variations of the human face, that is why eigenfaces work for facial
recognition. Consider the scenario where you have a photo of a human face,
where one half is occluded. In manually reconstructing the image, you wouldn't
place the reconstructed eye an inch above the chin - because human faces don't
work that way. Neural networks pick up on that. There is software that tailors
use in fitting suits, where a few measurements (like weight and arm length)
can be used to extrapolate the rest of the measurements (like chest size and
torso length). This works because of the limited number of potential human
dimensions.

As far as use of these techniques for evidence... I'd actually prefer it to
reliance on eye witness accounts, as the algos are open to exact measurement -
unlike most of the other stuff that passes for "criminal science" (humans are
still in the loop for fingerprint analysis, wtf?).

~~~
kibwen
Yes, I'm not trying to say that software is useless for assisting in
approximating detail for facial recognition, but software like this, where in
goes a single image and out pops a single "clean and enhanced" image, with no
manual guidance in between, sounds like it would be fantastically misleading.
Somehow you have to express to the decision-makers (investigators,
prosecutors, jury) that there is error and guesswork involved in this process,
lest you end up with techno-magic like polygraph tests that are popularly
understood to produce evidence that they really don't.

~~~
woodman
The "manual guidance in between" is where CSI is so incredibly screwed up
though (toolmark and fingerprint examiners, polygraph operators, etc). The
only criminal science that is actually reliable has cut humans out of the loop
(dna, forensic document analysis, computer forensics). Even with the reliable
methods, they are still probabilistic, which is exactly how the software we
are discussing would work. As far as misleading decision-makers, well that is
a more fundamental problem with the justice system... we really need to cut as
much human judgement out of the process as possible. I'm looking forward to
the day when speech recognition and language parsing are solved problems,
because formal logic will fix this situation pretty quickly.

~~~
kibwen
It's great to cut out humans from the loop where we can, but we cannot do so
here. As you say, upscaling is lossy (de)compression, and no amount of math is
going to reveal information that fundamentally does not exist in a source
image. Furthermore, neural networks are trivially fooled:
[http://news.cornell.edu/stories/2015/03/images-fool-
computer...](http://news.cornell.edu/stories/2015/03/images-fool-computer-
vision-raise-security-concerns) . I'd actually trust a trained neural network
far less than a human, just like I'd trust the upscaling technique in this
article far less than a human artist. Speed and automation are their
advantages compared to trained humans, not quality.

~~~
woodman
> As you say, upscaling is lossy (de)compression

As are eye witness accounts, which have been demonstrated to be pretty
useless.

As are fingerprints, a tiny sliver of (maybe?[0]) uniquely identifiable
information.

As are autopsies, where the state of the corpse is maintained only in whatever
the examiner writes down, x-rays, or snaps a polaroid of.

As are bite marks...

So you've got all that, plus your lawyer's sweaty appeals to emotion in a
group of 12 people - of whom four will express a belief in haunted houses and
two will claim to have actually seen a ghost [1]. You'd prefer that over an
application of math that can be challenged and rationally discussed?

> Furthermore, neural networks are trivially fooled...

 _A_ neural network was fooled with the equivalent of a hash collision, one
guess as to how to fix that :)

> I'd actually trust a trained neural network far less than a human...

I can't think of a single person I'd trust over math, once maybe Bill Cosby -
but not anymore.

> Speed and automation are their advantages compared to trained humans, not
> quality.

Well in this context I'd say that impartiality and repeatability are pretty
important, which are characteristics more likely to describe a math model than
an individual with the qualifications of a mailing address - and all the
training that can be packing into a 20 minute vhs about civic duty played on a
wheeled TV.

[0]
[http://www.academia.edu/447251/The_Current_Position_of_Finge...](http://www.academia.edu/447251/The_Current_Position_of_Fingerprint_Evidence_A_Literature_Review)

[1] [http://www.pewresearch.org/fact-tank/2013/10/30/18-of-
americ...](http://www.pewresearch.org/fact-tank/2013/10/30/18-of-americans-
say-theyve-seen-a-ghost/)

------
xatnys
Interesting! The effect looks quite similar to warpsharp
([http://avisynth.nl/index.php/WarpSharp](http://avisynth.nl/index.php/WarpSharp)),
a sharpening filter that seemed to have some vanity among anime encoders back
when video sources were not as crisp as they are today. There's quite a lot of
detail loss in Resig's ukiyo-e example, but I imagine for most people the most
striking part of it will be how much smoother the result appears.

------
deepnet
Great use case, upscaling print thumbnails.

Norman Tasfi made a Neural Net upscaler for Flipoard
[http://engineering.flipboard.com/2015/05/scaling-
convnets/](http://engineering.flipboard.com/2015/05/scaling-convnets/)

I expect video upscaling next.

~~~
the8472
> I expect video upscaling next.

There is a directshow filter (madvr[1]) for windows that already offers a
neural network scaler {NNEDI3, simpler network than waifu2x) in realtime.

[1]
[http://forum.doom9.org/showthread.php?t=146228](http://forum.doom9.org/showthread.php?t=146228)

~~~
ma2rten
Interesting... If a Neural Network can be used for up scaling a video it means
that you need to send less data over the wire for getting the same quality.
This means neural networks can be used as a compression algorithm.

~~~
nrmn
from the flipboard article:

"The final use case that we thought of was saving bandwidth. A smaller image
could be sent to the client which would run a client side version of this
model to gain a larger image."

Can be applied to gifs and videos, but this really depends on the usage case
and if the client would tolerate such a thing.

------
CarVac
It performs better on larger features.

Anime is almost never drawn with finer detail than the output resolution, so
artifacts are not a problem. This is a low resolution scan of something with
very fine detail, something which it is not trained on.

------
mahouse
He forgot to comment on how the filter destroyed the letters.

~~~
jeresig
I'm not sure I'd go so far as to say "destroyed". Compare the text in this
cartouche:
[https://imgur.com/7fGJg4s,iWf4pXG](https://imgur.com/7fGJg4s,iWf4pXG)

At worst it seems comparable to the previous result. At least to my eyes.

~~~
mahouse
For example, 4th character, two lines were converted into a stain.

~~~
jeresig
Great point. FWIW, I've updated the post to include some of the cartouches,
along with a cartouche at the "low noise reduction" level. The two lines in
the fourth character appear to still be relatively distinct in this case.

~~~
mahouse
That being said, I don't want by any means to disrespect the work, because
it's clearly impressive.

------
yohui
I'd just like to express my appreciation for Waifu2x's informative name. More
projects could do with such evocative labels.

~~~
pjc50
My neural network sarcasm detector is confused by this post. I was going to
complain about it being the same kind of dim unintentional sexism as the
original choice of Lena as reference image.

~~~
yohui
It was a backhanded compliment. The name is a bit... unprofessional.

On the other hand, it does help you grok its function, and I suspect the
'memorable' name is at least partially responsible for its popularity.

------
Joona
I did a quick comparison between ImageMagick and Waifu2x using a common anime-
style image: [http://imgur.com/a/teKVY](http://imgur.com/a/teKVY)

------
Gravityloss
The scans look to have jpeg artifacts?

If you really are working with the original source, you should rescan to png
or tiff or even just higher rate jpeg?

------
georgehm
for comparison sake, can someone share the time taken to upscale some of these
images?

