

Butchering HQX pixel art scaling filters - ux
http://blog.pkh.me/p/19-butchering-hqx-scaling-filters.html

======
byuu
> Unfortunately, the reference code is more than 12.000 lines of unrolled C
> code. This code is basically duplicated without a thought in every single
> project.

Not by me. Overly bulky code is one of my pet peeves[1]. Here is my
implementation of HQ2x in 6KB, 188 lines of code (not including the header
file, which is 1KB and used for external linking.)

[http://pastebin.com/raw.php?i=DsNupdbc](http://pastebin.com/raw.php?i=DsNupdbc)

I devised a few observations to reduce the code. First is that the 256
patterns have four cases for each corner. If you rotate the eight surrounding
pixels twice, the case blending rules match another corner (sans two cases
which appear to be typos in the original algorithm. I asked the author, he
apparently made the entire unrolled table by hand.) Now that the algorithm is
cut down by 75%, put the switch statement into a lookup table.

To speed up the algorithm (and which further reduces its size), I use an old
16<>32-bit expand/compact trick, a YUV lookup table, and a really clever mask
compare that blargg came up with. At this point, my version is already
significant faster than the original algorithm. But I also added in OpenMP
parallelization support, which really makes things run fast on multicore
systems.

But anyway ... this guy clearly bested me. 560 lines with HQ3x and HQ4x
included is even more impressive. Hats off to him!

([1] I also have inflate in 8kb of code, unzip in (inflate+2KB) of code, PNG
decompression in (inflate+8kb) of code, SHA256 hashing in 4KB of code, an SMTP
client with authentication+MIME attachments+HTML in 8KB of code, an XML parser
(sans DTDs) in 6KB of code, etc.)

~~~
ux
Oh that's awesome. May I ask what method you used to figure out the algorithm?

> I asked the author, he apparently made the entire unrolled table by hand.

what the hell...

~~~
byuu
It was a mostly manual process. I was trying to understand why four rules (one
for each output pixel) were needed inside every case statement. So I wrote a
parser for the original source file that turned each output pixel rule into an
integer representing what the code for it was actually doing.

From here I noticed the strong symmetry when you rotated the pixels. Which of
course makes perfect sense, I was just able to confirm it here.

So I tried to reproduce the same tables using a LUT and rotate table, and
matched all but two instances. This seemed really weird to me, since I figured
this would have to be generated unrolled code, right? So I asked Maxim about
it, and he said that they were probably mistakes. I was pretty shocked myself
that he did this by hand.

I did a few blind comparison photos on my board to see if anyone could tell
the difference (in practice it amounted to one or two pixel differences in
each 256x224 upscaled image), and everyone actually favored mine, so I just
went ahead and ignored those two differences.

I'm curious if the blog author noticed the same thing, or if he noticed the
rotation quicker and only built his tables off of the top-left output rules.

Ultimately it looks like the blog author found a better route to reducing the
comparison rules than I did. But I still don't think we've 100% cracked how
and why the original author chose the rules he did. And since it sounds like
it was done almost by sight, we probably never will.

Still, I'd really like to see if someone could find a way to generate the
comparison rules mathematically. It's very likely that if we had some sort of
scientific basis for when and why to blend each case, we could improve the
visual appearance beyond that of the original algorithm. Obviously perfection
doesn't exist. I'd expect the end result to be a function that takes tuning
variables, and different input images would look better with different tuning
values. But even for the default case, I bet we could do better.

~~~
ux
Well, I just assumed it was symmetrical because I couldn't imagine it to be
unrolled manually (I just quickly watched the visual representations of each
side)... But now that you pointed out this I guess I'm going to check for
differences, in all the filters. Do you remember on which side the typo was
present?

And yeah, we still haven't really cracked the algorithm. I initially wanted to
write the article as "understanding hqx", but I actually realized I still
don't really get the rules.

Could you share the board post where you discuss your implementation?

~~~
byuu
Please understand that I wrote this code around late 2005, early 2006.

Unfortunately the discussion thread where I talked about this initially (on
the ZSNES forums) has been deleted, or is for some reason not showing up in
Google search results.

The best info I could find is here:
[http://forums.nesdev.com/viewtopic.php?p=82794#p82794](http://forums.nesdev.com/viewtopic.php?p=82794#p82794)

But you can see how hazy my memory is. There I said there seemed to be six
differences, but was already unsure of the exact number.

I am unable to find any posts where I mentioned what the exact errors were,
nor where I spoke with Maxim about them. But if you really want to find out,
it should only take a few minutes to write a parser for hq2x.c, generate the
tables using my rotation approach, and compare the two. If you do so, please
post which rules are different so I can file the info away somewhere.

Although I will say it's not too terribly important. The difference is
extremely minimal.

------
madlag
Nice stuff ! I am wondering how we could use it to improve some of our stuff
with this ...

