Why recursive data structures?

Const-me · on Jan 3, 2017

Apparently, every computer science problem has easy to understand solution that’s easy to implement but awfully slow in practice.

This is one of them.

Rotating images that way isn’t CPU bound problem. It’s RAM bandwidth and RAM latency bound problem. The optimal data structure for that is not recursive, it’s 2D array of small square blocks of pixels. If pixels are bits, I’d bet on either 8x8 or 16x16 blocks. If they are RGB, one good format for those images, optimized for rotations, is BC7 https://www.opengl.org/registry/specs/ARB/texture_compressio... the decompressor being implemented right in the graphics hardware.

> If we were writing an image manipulation application, we’d provide much snappier behaviour using coloured quadtrees to represent images on screen.

That statement was true 30 years ago, no longer a case. Branch misprediction is expensive, memory latency is huge, RAM is a block device now (the block size typically being 16 bytes for dual-channel DDR). In addition, some displays now show 8+ megapixels, and most show 16M colors.

For an image manipulation application, that recursive thing ain’t good at all.

inimino · on Jan 3, 2017

>> we’d provide much snappier behaviour using coloured quadtrees

> That statement was true 30 years ago, no longer a case.

The colored quadtrees are an optimization over the plain quadtrees earlier in the article, reducing the number of comparisons for some images. That will be more efficient 300 years from now just as much as it was 30 years ago. In context it is perfectly obvious that "much snappier" does not refer to the totally different techniques used by actual graphics software.

It's a toy implementation used as an example, if you get hung up on the example, you miss the whole point.

coldtea · on Jan 3, 2017

>It's a toy implementation used as an example, if you get hung up on the example, you miss the whole point.

When to use such techniques, and whether they make sense in the real world, performance wise, should have been part of the "whole point" all along.

inimino · on Jan 4, 2017

By that argument, any blog post that does not contain a comprehensive software development education can be criticized.

You should ask whether these techniques are useful in writing correct code and if they save developer effort, rather than picking apart irrelevant aspects of the toy example.

I don't think modern graphics code represents individual pixels as strings either, would you dismiss the post for that reason?

munificent · on Jan 3, 2017

> The optimal data structure for that is not recursive, it’s 2D array of small square blocks of pixels. If pixels are bits, I’d bet on either 8x8 or 16x16 blocks.

If the image is very large and contains large contiguous regions of solid colors, then a quadtree that doesn't subdivide homogeneous regions can be a much faster approach than a flat pile-o-pixels.

Either way, this article isn't about rotating images.

Const-me · on Jan 3, 2017

Well, hierarchical representation of those images is indeed a good idea for some data. Saves both RAM space and bandwidth, and CPU time.

Just not the quad-tree way described in the article.

There are ways to implement hierarchical structures that work well on modern hardware. One is hashmaps keeping dense 2 bits per pixel blocks (again, 16-64 bytes per block) with the values saying filled/empty/mixed. Another one is B-trees or B+ trees.

Patient0 · on Jan 3, 2017

How would you improve on the quad-tree representation used to implement hashlife? http://www.drdobbs.com/jvm/an-algorithm-for-compressing-spac...

Const-me · on Jan 3, 2017

I wouldn’t.

I am not convinced hashlife is still the best of them. The invention is from early 80-s. Many old heuristics about caching versus calculation is no longer true today. Here’s my article on the related topic, lookup tables: https://github.com/Const-me/LookupTables

If I wanted to implement efficient life simulator, I’d probably used the fact that with AVX2, it’s possible to count those neighbors for 8x8 block of pixels in a single __m256i register (with 8 neighbors max, 4 bits / value is more than enough). Might be faster than L3 cache access, and will definitely be much faster than main memory access.

That’s why I’d put those pixels in 8x8 blocks, single uint64_t value per block.

Putting those blocks in a hash table takes care about the infinite field. std::unordered_map will work, but if you care about performance prefer CAtlMap from Microsoft, dense_hash_map from Google, or something similar.

braythwayt · on Jan 3, 2017

I believe most modern implementations, including hybrid implementations of HashLife, take advantage of register and cache sizes. HashLife as I might implement it in JavaScript, is really a very naïve approach. I think Bill Gosper even had versions in the 1970s that did things with registers and assembly, not just doing everything in lists.

But the big issue with HashLife is not whether you can write a faster implementation, it's whether the particular pattern(s) you are exploring are amenable to HashLife. It wins for highly repetitive patterns, but is much slower for patterns with a very large amount of entropy in time and space, like methuselahs.

HashLife can be fabulous for exploring purpose-built machines that cycle though regular states.

Const-me · on Jan 3, 2017

Looking at golly-2.8-src. At the first sight, it does not use SIMD. Only uses 32-bit registers in 32-bit builds, 64-bit registers in 64-bit builds.

All x86 CPUs manufactured after 2003 have at least 16 128-bit SIMD registers. Using them for calculations is the only way to get anywhere near advertised CPU performance.

For languages other than C and C++, like JavaScript, where there’s no SIMD instructions, calculations are generally slower, but the hash tables are provided by the platform and hence implemented in highly-optimized C code, HashLife should be perfectly fine.

braythwayt · on Jan 4, 2017

For HashLife, I think the chief problems are around an efficient cache, and especially efficient canonicalization. The algorithm is so fast for universes that suit its domain that I think most other considerations drop by the wayside.

However, besides the fact that not all problems have enough redundancy in time+space to suit it, it also suffers from the fact that it is most efficient when computing massive jumps in generations. So it’s as not as performant for creating animations as it is for doing things like finding the maximum size of a pattern, calculating its growth rate, and so forth.

fanf2 · on Jan 4, 2017

For JavaScript you might do quite well with a 32-way bit-slice implementation, like http://fanf.livejournal.com/93032.html That is, if you can't do it on the GPU with WebGL :-)

coldtea · on Jan 3, 2017

>If the image is very large and contains large contiguous regions of solid colors

In other words, if the image is not very image-y? (but more vector-y).

OskarS · on Jan 3, 2017

It's an entirely plausible scenario. It will rarely be the case with photographs, but graphical images often feature large solid color blocks. It's large reason why the PNG file format exists.

munificent · on Jan 3, 2017

You might still want the precision and detail at the boundaries of the shapes that a raster representation would give you. Think big blotchy cow spots or something along those lines.

empath75 · on Jan 3, 2017

I don't think the point of the article was to present an efficient and novel solution for image rotation. It was just a nicely visual example of a recursive data structure.

hinkley · on Jan 3, 2017

   For every complex problem there is an answer that is clear, simple, and wrong.

ivraatiems · on Jan 3, 2017

A slow solution isn't wrong. Just inefficient.

hinkley · on Jan 3, 2017

We're talking about several orders of magnitude here.

Is a web page that loads in fifty seconds 'correct' or is it wrong? Is a video format that can only decode 2 frames per second correct, or wrong?

Often, timeliness is an unstated part of the requirements, in which case an academically pleasing solution that isn't practical doesn't fulfill the requirements.

yarou · on Jan 3, 2017

I think you're missing the point. A quadtree is a compact way to store a partition of two-dimensional space, much like you'd store a binary heap in an array.

The implementation of a recursive data structure or algorithm can usually take advantage of tail recursion or trampolining. The practical implementation of a theoretical algorithm may look extremely different, much like you wouldn't literally translate abstract pseudocode to the language of your choice.

lowbloodsugar · on Jan 3, 2017

>Is a web page that loads in fifty seconds 'correct' or is it wrong? Is a video format that can only decode 2 frames per second correct, or wrong?

If your condition of satisfaction is "I am presenting information to someone who will read it", then "loads in fifty seconds" is wrong, yes.

zeveb · on Jan 3, 2017

> Is a web page that loads in fifty seconds 'correct' or is it wrong?

Neither: it's probably using JavaScript to perform an AJAX request to get some data in a bespoke markup format, then using JavaScript to parse that markup format and build DOM entries, all because the developer thinks HTML is 'so 90s, dude!'

sangnoir · on Jan 4, 2017

> Is a web page that loads in fifty seconds 'correct' or is it wrong?

Of course it is correct - this is how the internet was/is with a 14k or 56k dialup modem, I don't blame you for not remembering/not experiencing it. I remember watching JPEGs render progressively from a blurry mess one at a time, or sometimes not at all. It took way more than 50 seconds to load and render all elements on a typical website.

coldtea · on Jan 3, 2017

Depends on the business constraints. Wrong is not just "returns invalid results".

There's also the notion of wrong or right solution for a particular problem (which the phrase above already assumes). E.g. for a game where things have to run in 60hz, a slow algorithm would be the wrong solution to use.

The reverse is also true: a solution can return wrong results in the absolute sense (e.g. false positives) and still be correct for our business case. This is the case with things like Bloom filters for example.

themckman · on Jan 3, 2017

> Apparently, every computer science problem has easy to understand solution that’s easy to implement but awfully slow in practice.

I have been doing some data structures/algorithms review as I haven't really used much of that knowledge from my CS degree and, frankly, wasn't that great of a student when I was learning it. I've obviously focused mostly on understanding theoretical performance (Big-O) as I've been implementing and playing with things, but I'd like to know about good resources for things to consider about performance on real systems. I hear about things like cache misses, memory layouts and branch mispredictions, but don't really know how to apply that knowledge or, maybe more importantly, measure the effects of optimizing for some of those things over others. I'd appreciate any advice anyone has to offer.

Const-me · on Jan 3, 2017

That Big-O thing is mostly theoretical. Most useful application is job interviews.

It might stop you from doing some things, like bubble-sorting 1GB datasets, or inserting many elements to the start of a large array.

However, in other cases it’s just fine to bubble-sort (for very small collections), or insert many elements to the start of the array (when you’re reading much more often so the profit from memory locality while reading outweigh the losses from relocating those elements while inserting).

Modern hardware is extremely complex. No amount of knowledge will help you to 100% predict the performance effects of your changes. Two advices.

(1) Learn to profile your code. Both externally, using a sampling profiler that will tell you where’re you spending the CPU time, and internally using e.g. std::chrono::high_resolution_clock that will tell you the performance effect of your changes.

(2) Don’t forget about rollback feature of your version control, and don’t hesitate using that as necessary. I routinely discover some of my changes I viewed as performance optimizations actually decreased performance. The reasons include branch misprediction, compiler optimization issues, cores competition for L3 cache, RAM bandwidth, thermal issues…

fanf2 · on Jan 4, 2017

For a dose of real-world big-O I recommend following http://accidentallyquadratic.tumblr.com

Const-me · on Jan 4, 2017

I don’t think engineers working at Ericsson and Apple (the companies listed in the copyright section of that source file) forgot about big-O. What I think happened, when they wrote that, their assumption was people won’t be using those to pass 16MB blobs. And for small messages, O(n^2) works just fine.

I can understand those people. HTTP is better than Web Sockets for large chunks of data. With HTTP, servers and proxies may cache, clients may resume downloads, clients may check for changes with if-modified-since header, and so on.

The problem here isn’t with big-O, it’s with changing world.

If in the future someone will send gigabytes in that event, the current version will break despite O(n), because RAM is finite. Relatively easy to fix, you just need to save data to a hard drive instead or RAM.

Does it mean it’s a good idea to do so now? Don’t think so, it would be over-engineering and premature optimization, and the current version is currently fine. Just like the original O(n^2) code was fine for small messages.

taeric · on Jan 3, 2017

At face value, I don't actually see where you contradict the article. The recursive nature of this solution is akin to recursive matrix multiplication. Which can be a way to setup memory fetches to stay optimal. (Could... I have no data to say it is.)

Any chance you know of any benchmarks exploring this?

GFK_of_xmaspast · on Jan 3, 2017

Recursive matrix multiplication is in general a bad idea (Strassen, for example, loses a little bit of stability and is not cache-friendly)

taeric · on Jan 3, 2017

To be clear, I can see why coding it purely recursively would be a bad idea. Same for using linked structures, if that was the concern with the quad tree.

However, it seems that you could use the general recursive definition to find some optimizations that one could do. For example, define a recursive structure to break down a problem until it is small enough that a more traditional method works. (So, don't drop down to the size of a pixel, but do drop down till you have things at cache line size?)

Perusing the wikipedia for Strassen, I see that it scoffs at a ~10% benefit for large matrices. Curious how that holds in light of "deep learning" environments. Especially since 1k x 1k matrices are probably not that uncommon nowdays. Seems like every little bit would help.

Edit to add: THinking of the context of the original post. Yeah... probably not needed for anything in a "snappy" user application. I was thinking more for very large data sets.

GFK_of_xmaspast · on Jan 3, 2017

The mention of "around 10% or less" is based on a 2005 paper, and even in the past ten years there's been a lot of change in computer architecture (the machines used in the paper were mostly ultrasparcs, and they write "Note that for the (pentium 3 system), we could find no problem size for which Strassen’s is faster than ATLAS’s").

See also : https://www.quora.com/What-algorithm-does-BLAS-use-for-matri... https://github.com/flame/how-to-optimize-gemm/wiki#notice-on... http://www.cs.utexas.edu/~flame/pubs/GotoTOMS_revision.pdf

taeric · on Jan 3, 2017

That is an interesting note. I remember reading that the lower bound on multiplications was lower due to swapping a lot of them for additions. Knuth indicated that this did not matter in practice, since few people are multiplying large matrices.

My interest here is really just if that is still a good assumption. With the multiplications that happen in "deep" fields, I'm curious if there are now reasons to look to these algorithms.

GFK_of_xmaspast · on Jan 3, 2017

We might be talking at cross-purposes, my angle is that of dense linear algebra, single or double precision, on NUMA architectures with vector operations, and all respect to Knuth but it requires a different model for the machine.

If that's changed, like for example if you're computing over a different ring where arithmetic ops are expense as compared to cache misses and there's no stability issues, then yes the issue is back open and Strassen is put back on the table.

taeric · on Jan 3, 2017

Knuth seems to be in agreement with you, from my understanding. My comments are only of a general interest and curiosity, not of knowledge. (If that makes sense.)

That is, I certainly was not trying to second guess what you are saying. Just noting my curiosity on whether or not this has, or will, change. Knuth's view that I'm quoting is from Vol 2, which is pretty old at this point. I'd be a little surprised if it was outdated. But not shocked.

rrobukef · on Jan 3, 2017

Very large matrices is >50kx50k (maybe even bigger). Strassen is then just a tiny bit better when implementing both cache and allocation aware.(Tested in Fortran)

At least it's at a non practical size.

taeric · on Jan 3, 2017

It is always interesting to see just how valuable a "tiny bit" can be, though.

Granted, I confess most is almost always navel gazing. Increases in the "best" sorting/map algorithm have probably not lead to any noticeable improvements in modern codebases. Usually, those benefit more from being the low hanging fruit, not most computationally complex. (That is, I'm sure in these large matrix problems, there are other low hanging fruit that will yield better improvements.)

iamleppert · on Jan 3, 2017

Recursion is often times less efficient than a well designed and concrete imperative solution, even if the imperative solution uses more memory. This is because most languages, the stack is not free. There's a lot of meta data and house keeping that goes on when you invoke a function to support things like stack traces and introspection.

The provided example (at least to me) is also convoluted and fails to illustrate the elegance of using a recursive data structure.

In general parlance, an affine transform or rotation matrix would both be more elegant and readable.

braythwayt · on Jan 3, 2017

Although I'm satisfied with the choice I made for what to put in my essay, I want to thank you for pointing out the gaps between my essay and the real world, and for doing so in a forceful way without crossing into being querulous.

This kind of discourse is, IMO, an example of Hacker News at its finest.

devty · on Jan 3, 2017

Are you aware of a programming language/compiler that optimizes on recursively written code? I'm finding it difficult to imagine a PL where stack _isn't_ free except tail-recursive code.

andreareina · on Jan 3, 2017

My understanding is that it's the opposite -- function calls expend a cost to save (and later restore) the registers and push the locals onto the stack. Tail recursive functions on the other hand can be implemented as a jump to the top of the function, right after all the stack bookkeeping, and so saves on that expense.

junke · on Jan 3, 2017

Tail-merging doesn't only apply to recursive functions. As soon as you don't need to store intermediate result, you can jump to another function without allocating a frame.

nick0garvey · on Jan 3, 2017

Yes, this is right

lmm · on Jan 3, 2017

Haskell (sometimes) does this kind of thing much more efficiently - it doesn't specify or always provide a stack, it models the program as a graph to be evaluated by reduction. And any language where you can implement continuations allows you to write this kind of thing efficiently (e.g. https://github.com/Atry/memcontinuationed ).

(Ultimately you can do "userspace continuations" in any programming language by explicitly transforming your code into CPS, using an explicit FSM if the language doesn't support tail-calls as such, the only price being developer sanity)

I do think there's a real gap in many languages between the natural way to express graph/tree transformations (recursively) and a cache-friendly/mechanical-sympathy-style runtime - see e.g. https://medium.com/@gkossakowski/kentucky-mule-limits-of-sca... . And really this gap shouldn't be there - the language implementation should be able to transform and fuse this kind of tree-walking operation into an efficient implementation - but at the same time it's hard to see how you'd ever make the language spec permit this without resorting to a full-Haskell "everything is lazy everywhere and the language implementation is allowed to evaluate whatever it likes however many times it likes (including 0) in whatever order".

vanderZwan · on Jan 3, 2017

Maybe an actual stack language like Forth could do so? The stack overhead of calling a word is a lot lower to begin with, and ColorForth for already has tail call optimisation[0]. More advanced concatenative languages like Factor[1] or Kitten[2] might be able to perform more interesting optimisations.

[0] http://wiki.c2.com/?TailCallOptimization

[1] https://factorcode.org/

[2] http://kittenlang.org/

35bge57dtjku · on Jan 3, 2017

I believe even GCC optimizes tail recursion in some cases.

johnfn · on Jan 3, 2017

I really enjoyed this article because it introduced me to a new data structure and some very elegant and (subjectively) beautiful algorithms. I came onto HN expecting some insightful commentary and instead got 80 comments griping about how it wasn't 100% optimized.

Sigh.

braythwayt · on Jan 3, 2017

Author here. Nevertheless, there's learning gold in those comments. Take the gold, leave the mud.

mypalmike · on Jan 2, 2017

  maxIndex = size - 1
  for x in range(0, size):
    for y in range(0, size):
      # swap diagonally, invert horizontally
      out[x][y] = in[maxIndex - y][x]

teddyh · on Jan 3, 2017

The algorithm to rotate images can be illustrated by running

    /usr/lib/xscreensaver/blitspin

blitspin - rotate a bitmap in an interesting way

The blitspin program repeatedly rotates a bitmap by 90 degrees by using logical operations: the bitmap is divided into quadrants, and the quad‐ rants are shifted clockwise. Then the same thing is done again with progressively smaller quadrants, except that all sub-quadrants of a given size are rotated in parallel. So this takes O(16×log2(N)) blits of size NxN, with the limitation that the image must be square, and the size must be a power of 2.

Movie here, for those without XScreenSaver installed:

https://www.youtube.com/watch?v=UTtcwb-UWW8

braythwayt · on Jan 3, 2017

Thanks, I have included the video and a credit to you in the essay.

cousin_it · on Jan 2, 2017

Quadtrees are neat. Though I guess general purpose image processing would be better served by either raster or vector representation. Maybe a better showcase would be something like collision detection or geospatial indexing.

phkahler · on Jan 3, 2017

I have a bit-rotting ray tracing library where I put all geometric primitives in an oct-tree. When you fire a ray through a 2x2x2 cube it intersects at most 4 of the child nodes. You can see that because the ray must originate in one octant, and it must cross one of the 3 partition planes to reach another octant and it can only cross each one once. There are lots of other interesting properties.

One of my favorite algorithms is one that takes an oct-tree and a triangle and inserts the triangle into the tree. I allow primitives to occupy more than one tree node, so first I decide what size node they will occupy and then I use a recursive algorithm to "voxelize" the triangle. I found a heuristic for determining optimal node size.

The ability for node to contain and arbitrary number of primitives and for each primitive to be in an arbitrary number of nodes resulted in a very interesting data structure that allows fast deletes and inserts of the primitives. But I'm really starting to ramble now.

SomeStupidPoint · on Jan 3, 2017

Could you go on a bit more (or post a link to somewhere you do)?

phkahler · on Jan 3, 2017

I've been hoping to put the code online for a long time, but I'm suffering from "it's not cleaned up enough" syndrome. But I'll talk a little here.

I'd been after real time ray tracing since the 1990's so I had a goal of putting both static environments and dynamic objects/players in the same acceleration structure. A lot of the old BSP-tree stuff or other BVH are used only for static geometry and I didn't like that. The problem with those structures is that if you move some things around it may require regeneration of the entire structure. Or at the very least, local updates that result in a different structure than if you regenerated the entire thing.

Part of the reason for that is trying to partition the geometric primitives roughly in two groups so a partition roughly cuts the scene in half. I decided that with a regular octtree structure, which nodes a primitive is contained in could be independent of the other primitives. They may share tree nodes, but I never decide whether to split a node based on content - instead each primitive is inserted into the nodes that make sense for it. This does lead to a few nodes having a lot of content - thing of the node that encloses a triangle fan - but those are rare and you won't typically be shooting lot of rays through them.

Also, with the primitives deciding what nodes they occupy, we can now do local deletes and inserts without having to regenerate the whole world structure.

For inserting a primitive we take its bounding box. If that is outside the root box of the world, I repeatedly double the size of the world by adding a new root node and placing the existing roots children and some new parents inside it - that's a neat recursion too. Next, we traverse the tree from the root calling on only the children that overlap the bounding box until we reach the node size that was requested. This decent is trivial for bounding boxes, but doing it with high precision specifically for triangles was harder.

Once the nodes that need to contain a primitive are reached, it's a matter of adding that primitive to a linked list off the tree node. But each primitive will need to exist in multiple lists. This means not putting the primitives directly in a list, but having a list of little link structures that are added to the nodes list and point back to the primitives.

For deletions, each primitive will need a list of link nodes that point back to it. Deletion consists of following this list of links and deleting them from the list that contains them.

So the links themselves will each be a part of two lists. They are part of a list coming off a primitive and they are part of a list coming off the tree node. It took a lot of thought to shrink the link node data structure, but at present it is 4 pointers in size. It needs to contain two "next" pointers, a "previous" pointer (to allow deletion from the tree nodes list of primitives), and it also has to have pointers to both the tree node AND the primitive.

The reasons for those last two pointers is thus: when checking which primitives a ray intersects, we want to start at the tree node that contains it and quickly get to the primitives for ray/object intersection tests. This is done by following the list of links via their "next" pointer which also following their links to "primitive". For deletions, we start at the primitive and follow the other set of "next" pointers and do deletes from the other list by using both the next and previous pointers. We never need a previous pointer for the list off the primitive because that entire list is always deleted, where the other list may still contain other primitives. There remains a problem that when an Octree node becomes empty, I need to prune back the tree - this is why the links need to also contain a pointer back to the tree node that contained the link. So now we're looking at a link structure that contains 5 pointers {2 nexts, one previous, one primitive, and one treenode}

Then I had the insight that no matter which list we're traversing, we always have a pointer to either the treenode or the primitive. So those two pointers can be combined via sum or XOR into a single value. If your deleting a primitive you just XOR a pointer to the primitive with that value to recover a pointer to the tree node. When you're doing intersection tests, you XOR the treenode pointer with that value to recover a pointer to the primitive. This change resulted in no significant performance change while reducing the structure to a nice size of 4 pointers.

Another problem was that having a linked list with a previous pointer meant the octree node needed to always contain one of these link structures because deletion depended on that "back" pointer pointing at another link. Turns out you can have the back pointer point directly to the "next" pointer of the previous link. That means the octree node now only needs a single pointer to the first "link" node in the list and that links back pointer points directly to that pointer.

So for fitting an arbitrary number of things (primitives) each into an arbitrary number of boxes (tree nodes) we add a single pointer to the thing structure, a single pointer to the box structure, and then use these little link structures that are the size of just 4 pointers.

It was a long journey to arrive at that data structure, but for a while it was real point of pride for me. The structures are nice, but the code is still a bit messy with the remains of several experiments still around. Also notice that the list in one direction does not have a back pointer, so you can never delete a box directly - well you could but it's not as easy. There isn't complete symmetry between the boxes and things.

Is that enough rambling for now? ;-)

cousin_it · on Jan 4, 2017

Nice puzzle! I think you can allow both kinds of deletions while still using four pointers per link node:

* prev1 XOR next1

* prev2 XOR next2

* prev1 XOR prev2

* data1 XOR data2

Because while traversing you always know one of {prev1, prev2} and one of {data1, data2}. Right?

braythwayt · on Jan 2, 2017

If you really want to show quadtrees off...

http://raganwald.com/hashlife/

But yes, I’d love to read more about things like collision detection and geospatial indexing.

bpicolo · on Jan 3, 2017

(You probably know this but) quadtrees are indeed used extensively for both of those things.

Whole buncha programs can use them (though there are other good options as well) for geospatial:

https://www.elastic.co/guide/en/elasticsearch/reference/curr...

skybrian · on Jan 3, 2017

Before reading this, I would have said that we need to use recursion to model constructs like programming languages that are specifically designed to have a recursive structure. They are naturally recursive because someone designed the language that way. (By contrast, building a basic database-backed website doesn't require any recursion, unless you're modeling a hierarchy.)

But with a compiler, there is the same pattern of converting to a recursive data structure (parsing), performing manipulations, and then converting back to a non-recursive structure (object code generation).

dukerutledge · on Jan 2, 2017

Reginald is definitely hinting at hylomorphisms here. I'd be curious to see if stream fusion could be achieved in JS for a hylomorphism.

braythwayt · on Jan 2, 2017

Isn’t `multirec` a function for making hylomorphisms? The recursive dividing is the anamorphism, and the combining of the results is the catamorphism.

As written, however, `multirec` does not efficiently compose. For two constructions to be compatible, they’d have to share their decomposition strategy, including `indivisible` and `divide`. `value` probably composes neatly, but `combine` is a little thorny as the current implementation does two jobs: Processing the data and reconstructing the form of the data.

A composable `multirec` does sound interesting, however. Hmmm...

jaredtobin · on Jan 4, 2017

It's exactly a hylomorphism. Here's a possibly more familiar-looking Haskell form:

    {-# LANGUAGE DeriveFunctor #-}
    {-# LANGUAGE LambdaCase #-}

    import Data.Functor.Foldable
    import Data.List.Split

    data QuadTreeF a r =
        NodeF r r r r
      | LeafF a
      | EmptyF
      deriving Functor

    builder :: [a] -> QuadTreeF a [a]
    builder = \case
      []  -> EmptyF
      [x] -> LeafF x
      xs  -> NodeF a b c d where
        [a, b, c, d] = chunksOf (length xs `div` 4) xs

    consumer :: QuadTreeF a [a] -> [a]
    consumer = \case
      EmptyF            -> []
      LeafF a           -> [a]
      NodeF ul ur lr ll -> concat [ll, ul, ur, lr]

    rotate :: [a] -> [a]
    rotate = hylo consumer builder

taeric · on Jan 3, 2017

This is worth it, if only for the footnotes. Don't skip them!

c2the3rd · on Jan 3, 2017

I liked this article because I enjoy reading about data structures that can be used in places where the "obvious" implementation is a list of lists. A lists of lists data structure is general enough for many applications and readily available in many languages, but when using it, I often get the sense that I am spinning wheels writing far more code than necessary to do the desired manipulations. It makes me happy to read about more novel solutions.

Greek0 · on Jan 3, 2017

The argument would be stronger if he wasn't swapping an O(n) algorithm for O(n log(n)). With the recursive implementation you have to touch every element once for each recursion level.

braythwayt · on Jan 3, 2017

For quadtrees come into their own when there are optimizations that fit the problem domain, such as the coloured quadtrees explained at the end of the post: They are much faster for images with large blank areas.

They aren’t discussed in the post, but quadtrees are also very amenable to memoizing common operations.

lilyball · on Jan 2, 2017

The footnotes say that all the functions can be adjusted to deal with non–power-of-2 squares, but it's not obvious how this is done when working with region quadtrees.

braythwayt · on Jan 2, 2017

One way is to represent squares as a square-that-is-a-power-of-two, plus a “clipping size” that is used when the square is rastorized. If you’re using a lot of optimizations for blank regions, the extra padding has a negligible effect on performance.

lilyball · on Jan 2, 2017

I suppose that would work, but all the operations you do on the region quadtree have to be aware of the "clipping size" as well. For example, the rotation transformation that this article implements would also have to rotate that "clipping size" as the unwanted portion of the square is now in a different location.

braythwayt · on Jan 2, 2017

That depends If you can live with evenly-sized squares, you can always center the clipping square, and you’re good for rotations, flips, and so on.

If you want odd-sized squares, or other shapes, then yes, you’ll need to know how to transform the clipping path/description as well.

All in a day’s work!

jlas · on Jan 2, 2017

One can use recursive properties better if the structure divides equally. So if not having a strict quadtree is okay, they can divide the square into 3x3, 5x5, etc

pacaro · on Jan 3, 2017

For odd size squares, simply duplicate the center row and column before rotating, then deduplicate after.

lilyball · on Jan 2, 2017

Another approach would be to allow a cell's value to be null, meaning it should be clipped, but then you run the risk of having transforms on the region quadtrees that cause the null cells to migrate inwards into the square (such that they cannot simply be clipped off later).

tallanvor · on Jan 3, 2017

Yes, that made me lose a lot of confidence in the article. This shouldn't have been a footnote. It should have been stated in the beginning that the author was only considering powers of two, and the footnote should have been a reference to ways to deal with other sizes - my very first question, before I got to the footnotes, was "How would I deal with a 3x3 square?"

zachrose · on Jan 2, 2017

The 'multirec' function here looks to have a lot in common with Clojure's transducers. Can anyone more knowledgeable do a compare and contrast?

braythwayt · on Jan 2, 2017

`multirec` is a combinator, it transforms functions into other functions. `transducers` are "composable algorithmic transformations.”

`multirec` as presented here is not composable the way transducers are composable. It is, however, an extremely stimulating thing to consider.

https://news.ycombinator.com/item?id=13305176

noobiemcfoob · on Jan 3, 2017

Today I learned a different, much more harsh meaning for "bespoke"

I've never picked up on a hipster connotation from the word, though I did feel it was coming back into vogue.

vostok · on Jan 3, 2017

It's been popular in the derivatives space for a while now.

braythwayt · on Jan 2, 2017

FYI, this article on recursive combinators from 2008:

http://www.lazutkin.com/blog/2008/06/30/using-recursion-comb...

maxdemarzi · on Jan 2, 2017

Got an opportunity to do this recently using a graph mapped on to a K^2 Tree to find out if a relationship existed or not. See https://maxdemarzi.com/2016/12/27/connected-2/

nemoniac · on Jan 3, 2017

Interesting article. While reading it I wrote a version of the code in Racket. YMMV but I think it turns out cleaner, especially towards the end.

https://gist.github.com/bon/801fdb6dbced0efdb83a358bb4f6285f

lucker · on Jan 3, 2017

Umm... or, I could just rotate a square a quarter-turn by iterating over the square's elements:

for (x = 0; x < N; x ++) { for (y = 0; y < N; y ++) { newSquare[x][y] = oldSquare[y][N-1-x]; } }

Of course, in certain representations of the square (such as using a 1-d array) even simpler algorithms can be used.

stephenboyd · on Jan 3, 2017

As a web developer, I appreciate this rare use of JavaScript in a CS lesson.

braythwayt · on Jan 3, 2017

JavaScript is the language most likely to be familiar to the audience I'm trying to reach. But at a certain point, it gets in the way of explaining an idea rather than making it easier.