Hacker News new | past | comments | ask | show | jobs | submit login
Recursive image rotation (2020) [video] (youtube.com)
59 points by tosh 22 days ago | hide | past | favorite | 51 comments

Used in the 1992 BlitSpin screen saver, part of the XScreenSaver collection:


EDIT: But I can’t find any acknowledgement in either the “Recursive Image Rotation” YouTube video, the original tweet, or the source code.

EDIT2: In a Reddit thread¹, they acknowledge a PDF of a book from 2019². The book itself gives no acknowledgements for the algorithm.

1. https://www.reddit.com/r/compsci/comments/izy2kf/rotating_an...

2. https://jeffe.cs.illinois.edu/teaching/algorithms/book/01-re...

IIRC it was in Smalltalk-80: the language and its implementation from the 1980s. Or perhaps one of the other two books in that series.

Yup, there it is. (Pages 430-432 of that PDF file, figures 20.8 through 20.10) So I guess that 1983 is the earliest year found so far.

We can do almost a decade better :-)

Pages 26 through 28, "The Evolution of Smalltalk: From Smalltalk-72 through Squeak"

Proc. ACM Program. Lang., Vol. 4, No. HOPL, Article 85. Publication date: June 2020.


A decade? That reference gives a date of 1981 for the bitmap-rotating algorithm:

> The Byte article on “The Smalltalk Graphics Kernel” [Ingalls 1981] details Smalltalk methods that, with BitBlt at their core, can lay out and display text, draw lines, and magnify and rotate images—each in little more than a dozen lines of code.


> Daniel H. H. Ingalls. 1981. The Smalltalk Graphics Kernel. Byte 6, 8, 168–194. https://archive.org/details/byte-magazine-1981-08/page/n181/...

In that reference, the bitmap rotation is on page 188 in the original publication (page 202 of the archived document).

That's because the Byte issue was the "coming out" for Smalltalk -- a description of their work but not necessarily new work. Bitblt was first implemented in Smalltalk-74 I believe (meaning ~1974)

Do you have a reference to show that image rotation using BitBlt was written any earlier than 1981?

page 28 '… when I gave a demo at one of the CSL “Dealer” meetings at Parc that featured overlapping windows with BitBlt in Smalltalk-74.'

They only write that BitBlt was demonstrated, not image rotation with BitBlt. So as far as we know currently, the 1981 article was the first mention of image rotation.

There is an algorithm to rotate an image by any angle (not just multiple of 90), and without adding, duplicating or deleting pixels, using three shear operations (which only move rows/columns along one orthogonal axis).


If you rotate a straight line by 45 degrees, there will be deleted pixels. A horizontal line of 100 pixels will consist of ~70 pixels (100 * sqrt(2) / 0.5) after such rotation.

There will always be 100 pixels, however, they will no longer be arranged in a perfectly straight and consistent line any more.

I don’t think so. As I understand the algorithm, no pixels are ever deleted. The 100 pixels would still be there, but probably a bit mushed together to fit on a line 70 pixels in length.

> The 100 pixels would still be there, but probably a bit mushed together to fit on a line 70 pixels in length.

This seems incorrect. It sounds like you are saying that you can store any 100 random values into 70 values. Or 100 bits of information in to 70 bits.

If you could, what would stop you from then storing that 70 bits in 49 bits, then that 49 in ~34 bits, etc and therefor have infinite information storage?

It’s certainly not incorrect, these are not infinitely thin lines. The original line is 100 pixels long and 1 pixel wide. A diagonal stair-stepped line of 70 pixels is an average of about 0.7 pixels wide, which is why you’re deleting about 30%. You need to mush the additional pixels in somewhere between the stair steps to make it an average of 1 pixel wide again.

You're assuming the rotated line maintains the same pixelwise thickness. A straight 1 pixel line edge-to-edge may not translate to a diagonal 1 pixel line corner-to-corner. I think that's what the parent is saying. No data is being 'compressed' here; the shear operation simply preserves all pixels uniquely.

The drawback is that it does require 3 times interpolation. With neares-neighbour interpolation that is not obvious from the example, but it's there.

Each interpolation step causes loss of resolution and the multiple passes might in practice make it slower due to increased memory bandwidth requirements compared to a single pass algorithm.

That of course doesn't take away that it's an interesting trick to know about.

But nearest neighbors interpolation would be quick on a gpu, right?

Here's an some source code for bitmap rotation from the mid 90's:


It's Turbo Pascal with inline ASM.

This is similar to how to swap two numbers without using temporary variable:

    X = 31;
    Y = 25;
    console.log(X, Y); // prints 31 25
    X = X + Y;
    Y = X - Y;
    X = X - Y;
    console.log(X, Y); // prints 25 31

Only if you assume that X+Y never exceeds 9007199254740991 (Number.MAX_SAFE_INTEGER, 2⁵³-1).

Also, that algorithm has nothing in common with the image rotation one; the image rotation algorithm does require a temporary buffer.

This is why I like python:

    X = 31
    Y = 25
    X, Y = Y, X

This still creates a temporary variable. Sorry.

Where? Variables are a language semantic not an implementation detail. A temporary variable is just a new symbolic name. You may be thinking of temporary storage

Semantics. You still have to use additional resources for the swap.

BTW, in Python, "a,b = b,a" is actually faster (execution time) than explicitly using a third variable.

In other words, I am not saying it is a bad technique, just saying it has to use additional resources for the swap, that's all.

In fact, the Python idiom is faster even if you doing something like:

    a,b = b[:],a[:]

> Semantics

I would claim it's not semantics. Variables are meant for humans, not computers. Variable does not necessarily mean storage. If you're outside of interpreted languages, the code you type is very often not the code that is executed. If this were a compiled language, there's no way to know if temporary storage was used or not, without looking at the generated machine code. There's a very real chance that something as useless as swapping variables would be optimized away completely.

This is transforming the address of each pixel one bit (of both x and y coordinate) at a time, from most to least significant.

Grossly inefficient, yet grossly interesting.

I see what it’s doing recursively (splitting into quadrants and moving clockwise) . But I’m a bit startled as to why it rotates, but I’m thinking when it gets to one pixel is key but it still seems like a lateral shift…

If you have

with A, B, C, D square blocks of 2^n×2^n pixels, it correctly moves the pixels in A to where those in B are, those in B to where those in D are, etc.

It doesn’t rotate A, B, C, and D, though. So, we have to rotate the four smaller blocks A, B, C, D.

Let’s assume the algorithm works. Then, we can recurse into each of the smaller blocks.

Eventually, you hit the case where A, B, C, and D are 1×1 pixels (and the image 2×2 pixels). Here, you cannot recurse anymore, but luckily, you notice that you can’t see the difference between a single pixel and that pixel rotated by 90 degrees. So, for sunblocks that small, you don’t have to rotate the pixels. Because of that, the 2×2 case works. Because of that, the 4×4 case works, etc.

You could also say it doesn’t work. If, say, your pixels had a wood grain running horizontally, that grain would still run horizontally after this set of shifts.

Thanks. Helpful and clear. single pixels don't have orientation, I overlooked that.

This lateral shift on a 2x2 grid is equal to rotation. In each iteration tiles are shifted/rotated but its content is not. So the content is split into smaller tiles which are rotated but the content is not, so it is split and so on until you reach pixel level.

I guess this is a nice example of why multimedia people benefit tremendously from high parallelization.

This is more an example of a case where a recursive algorithm is very slow.

Is there any advantage to doing it this way? Partitioning?

I’m not making claims about production graphics algorithms, but this gets really interesting if you consider using quad trees to represent images rather than bitmaps.

The algorithm as presented turns the image into a quadtree as it works, for then re-flattens it. But if we store the image as a quadtree to begin with, we obtain other advantages:

1. We can memoize operations like rotation so that the algorithm goes much faster for images that have large blocks of a single colour or pattern;

2. We can write other operations that recursively apply to the image, like changing one color to another.


Original discussion:


I wouldn't do it exactly this way, but for a CPU implementation I would consider a blocked rotate that decomposes the image into square tiles of about [cache-line x cache-line] in size (e.g., 8x8 for RGBA8 data with a 32 byte cache line). Then cycle sets of four tiles and rotate within them in the process. E.g., cycle and rotate the four A tiles, cycle and rotate the four B tiles, etc:

    A B C D M I E A
    E F G H N J F B
    I J K L O K G C
    M N O P P L H D
    D H L P P O N M
    C G K O L K J I
    B F J N H G F E
    A E I M D C B A
With the right tile sizes and memory alignment you could parallelize these steps without false sharing.

It's easy to parallelize this, and at a certain recursion level the whole block will fit into cache, so this might avoid inefficient memory access compared to a naive for-loop that just moves every pixel to its rotated position.

Except you need to access each pixel log(n) times and it doesn’t obviously work for non-square images (or for non-right rotations).

Compare to the normal way you rotate things by right angles where e.g. new_image(x,y) = old_image(y, old_height - x), which is extremely embarrassingly parallel.

The actual reason is that it is fun.

> new_image(x,y) = old_image(y, old_height - x)

If you implement this as a view (rather than imperatively), you can get the compiler to optimize out four successive rotations to nothing :)


I assume that in an actual implementation you are not really processing every pixel log(n) times, but rather do something like new_image = rotate(old_image, 1), where 1 is the recursion level, and rotate(image, level) calls rotate(subset_of(image), level + 1) recursively until it reaches a single pixel, which is then returned as-is. Combined with return value optimization, that should avoid any temporaries being stored.

But yes, it's mostly fun :)

In that world doesn’t it just compile to the thing I wrote above (if you’re lucky)? In the GP you said it would be more cache-efficient or something but that’s just not true if it compiles to the same code is it?

No, the order in which things happen would be different, and this is what might make it more cache-efficient for rotating large images. It should look like this: https://en.wikipedia.org/wiki/Z-order_curve

I’m really confused by what you mean. If you’re not executing the line I wrote above (in parallel) then what are you doing? The things I could think you meant were:

- moving data in stages like the OP video which might be better for the cache (I don’t believe this) but would require writing to/from me more multiple times per pixel which I think would make it slower

- Accessing memory in a certain order when doing this on a cpu. Certainly that is a reasonable thing to expect to improve performance (roughly breaking the image into blocks that fit exactly into N cache lines, rotating in registers/memory, then writing into exactly N cache lines means you aren’t reading data from RAM/cache that you can’t use), but I don’t think that is related to the recursive process described in the article, unless you mean that you can use this process to get some operations like block(x,y,16) moves to block(s,t,16) and those could be executed cache-efficiently? But then this recursive method is more confusing and more restrictive on the image size than the thing I tried to describe at the start of this bullet point

Not really. It requires far too many memory accesses to be efficient.

No lookup tables

No rotate-by-ninety-degree algorithms use lookup tables.

does the pixel size of the image need to be a power of 2 for this to work without artifacts

yes, you cannot do this to a 3x3 grid of pixels for example.

Any right quadrangle of any dimensions can be rotated with this algorithm by padding it to become a square with 2^n pixels per side, rotating the square, and then removing the padding.

The degenerate case is when the right quadrangle is already a square with 2^n sides, in which case there is no padding to remove.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact