
The Great Tree-List Recursion Problem - lrsjng
http://cslibrary.stanford.edu/109/TreeListRecursion.html
======
StefanKarpinski
<rant on pointer-heavy data structures>

The continuing emphasis on pointer-heavy data structures in computer science
curricula is really doing a massive disservice to students and all the
employers that go on to hire those students. Pointer-heavy data structures are
TERRIBLE on modern hardware. They were great back in the era when pointers
were small (≤32 bits) and unpredictable memory access was roughly the same
speed as CPU operations. We have left that era long ago. Today pointers are a
whopping 64 bits and looking up your next node through a pointer is so slow
that it's the CPU equivalent of taking a ten year sabbatical.

For example: at what size does it become better to use a list rather than a
vector when doing lots of insertions and deletions which are O(1) for lists
and O(n) for vectors? NEVER. It is always better to use a vector. Don't take
if from me, this is according to Bjarne Stroustrup:

[https://www.youtube.com/watch?v=YQs6IC-
vgmo](https://www.youtube.com/watch?v=YQs6IC-vgmo)

Pointer-heavy data structures like linked lists are so bad for memory locality
and storage overhead on modern hardware that it is impossible for them to
catch up with contiguous memory vectors.

There are still situations where you need a tree but if you do, try very hard
to use a representation like a binary array heap where the child links are
implicit. Or if you must use "pointers" consider allocating all the values in
a contiguous block and storing the link structure separately as 32-bit or
smaller integer indexes into the data array. This at least improves the memory
overhead and locality.

</rant>

~~~
rstuart4133
> There are still situations where you need a tree but if you do, try very
> hard to use a representation like a binary array heap where the child links
> are implicit. Or if you must use "pointers" consider allocating all the
> values in a contiguous block and storing the link structure separately as
> 32-bit or smaller integer indexes into the data array. This at least
> improves the memory overhead and locality.

There is an ironic side to that advice. If the node in the tree is larger than
8 bytes (ie 2 don't fit into a cache line), then pointers vs vector doesn't
get you a win as either way you hit a different cache line when you access the
next element. Stroustrup's example is a nice "gotcha", but the reality most of
us don't store integers in a vector or tree, we store bigger things. That
boils down to two cases: this thing is so big storing a couple of pointers
along side them doesn't matter, or you store a pointer to the bigger thing in
your tree which means you are using a pointer heavy data structure anyway.

OK, OK, I've cheated. I get what you are saying is using a linked list to
store those pointers to the big object is always worse than storing them in an
array. But there is a third option which is better than both: embed the
pointers into the big object. If that pointer ends in the same cache line as
whatever you are using to distinguish the objects the pointer heavy method
will be faster than the non-pointer method, and that _is_ an irony.

This pattern has been followed by hoary old C programmers like myself for
decades. It's all over the Linux kernel for instance. Granted it ain't so easy
to do in languages other than C. (Even rust lacks the necessary "I have a
pointer to linked list node that I know is embedded in a object of type T,
give me the pointer to the T object" operation because it's so hard to make
type safe.) But just because most languages don't support it doesn't mean
pointer heavy data structures are bad.

~~~
StefanKarpinski
> but the reality most of us don't store integers in a vector or tree, we
> store bigger things.

I work in numerical computing where storing 64/32-bit floats is by far the
most common use case. In this world, storing a lot of small, immutable values
is the norm and using pointer-heavy data structures would be a disaster.

That's a good trick that I'll have to think about good ways to expose safely
in a higher level language.

------
kazinator
Lol, I made a C library on this very topic years twenty years ago.

[http://www.kylheku.com/~kaz/austin.html](http://www.kylheku.com/~kaz/austin.html)

"Austin" provides several container data structures in a single instance. With
a function call it can be instructed to change the representation from any one
of them to any other. E.g. a red-black tree can be told to become an AVL tree
or sorted list and so on.

When I wrote this and announced it, a gentleman from Austria named Carl Van
Tast contributed the AVL. I've not encountered him since; I remember telling
him it was Van-Tast-ic. That could why I haven't encountered him since.

The API is in "udict.h" (universal dictionary). The algorithms are enumerated
thus:

    
    
      #define UDICT_LIST      0
      #define UDICT_BST       1
      #define UDICT_REDBLACK  2
      #define UDICT_SPLAY     3
      #define UDICT_AVL       4

------
cousin_it
I think there's a simpler solution, with one loop and no recursion, in O(n)
time and O(1) space.

    
    
         static Node treeToCycle(Node root) {
          Node pseudoRoot = new Node();
          pseudoRoot.left = null;
          pseudoRoot.right = root;
          Node tail = pseudoRoot;
          Node rest = root;
          while (rest != null) {
            if (rest.left == null) {
              rest.left = tail;
              tail = rest;
              rest = rest.right;
            } else {
              Node temp = rest.left;
              rest.left = temp.right;
              temp.right = rest;
              rest = temp;
              tail.right = temp;
            }
          }
          Node head = pseudoRoot.right;
          head.left = tail;
          tail.right = head;
          return head;
        }

~~~
DerekL
Let me explain to everyone else how your function works. It's turning a tree
into a list. The tree is rooted at “rest”, and the list starts at “pseudoRoot”
and ends at “tail”.

If the root of the tree has no child on the left, it pops that node and adds
it to the end of the list. If it does have a left child, then it does a _right
rotation_ on the tree, which makes the left side smaller. (See
[https://en.wikipedia.org/wiki/Tree_rotation](https://en.wikipedia.org/wiki/Tree_rotation)
.)

I was wondering if this really runs in linear time. Call a node in the tree
_good_ if it lies on the right-most part of the tree. In other words, a node
is good if the path from the root to that node only goes right at each step.
The root is always good. A node is _bad_ if it is not good. Each right
rotation turns a bad node into a good one, and the pruning removes one good
node. So it is really O(n) time.

~~~
cousin_it
Yes, this is all exactly right. To convince everyone else that I'm not a
wizard - my function is just a slightly modified version of the tree-to-vine
function from the Day-Stout-Warren algorithm.
[https://en.wikipedia.org/wiki/Day%E2%80%93Stout%E2%80%93Warr...](https://en.wikipedia.org/wiki/Day%E2%80%93Stout%E2%80%93Warren_algorithm)

------
benmmurphy
there is is an alternative solution where you basically do it using an inorder
traversal. it uses more stack space because it holds 1 node pointer in global
state and two node pointers between the left and right recursion. While the
building a list solution just holds a single node pointer during the left and
right recursion.

lets say you want to convert a tree to single linked list in order then you
can just do an in order traversal of the tree and keep track of the previous
node and assign previous.left = current_node and this is fine because if you
are at the successor to a node then the whole of the previous nodes left
subtree has been visited because it is smaller and you will never need to
follow the left pointer again.

    
    
        previous = nil
        in_order(tree) do |node|
          if previous
            previous.left = node
          end
          previous = node
        end
    

however, you can't do node.right = previous until you have visited the right
subtree of the current node so you can't use exactly the same code for a
doubly linked list. however, if you do the fix up after you have finished
processing the current node then it works fine. so if you have an iteration
that will give you both an inorder and postorder callback and a way of sharing
state between them you can just do the fixup in the postorder callback where
it is safe.

    
    
        previous = nil
        head = nil
        multi_order(tree) do |node, order, state|
          if order == :inorder
            if previous
              fixup_state = [node, previous]
            else
              head = node
            end
            previous = node
            fixup_state #passed back into iterator and is return in post_order callback
          else
            if state
              fixup_node, fixup_previous = state
              fixup_previous.left = fixup_node
              fixup_node.right = fixup_previous
            end
          end
        end
        previous.next = head
        head.prev = previous
        head

------
eklitzke
I did this exact problem on Leetcode the other day:
[https://leetcode.com/problems/convert-binary-search-tree-
to-...](https://leetcode.com/problems/convert-binary-search-tree-to-sorted-
doubly-linked-list/)

~~~
hopler
Login wall

------
mrburton
Why wouldn't you do a pre-order traversal and connect the appropriate nodes?
When they say "recursion is key", that's actually misleading and this problem
doesn't seem hard at all.

What am I missing?

~~~
mwkaufma
The spirit of the problem, as it was presented to us in undergrad, is to
modify the data structure in-place, not by allocating a scratch array.

~~~
benmmurphy
I think if you place a limit on O(log(n)) space being allocated and guarantee
the height of the tree is at most O(log(n)) then it breaks the scratch
solution and makes the inplace solution where you use scratch on the stack the
only solution. probably the inplace solution can be fixed to use O(log(n))
space even if it the tree is unbalanced because you don't need to allocate
stack space unless a node has both a left and a right pointer.

------
tristanj
There is a follow up to this question. Given two (balanced) binary search
trees, where the first BST has N nodes and the second has M nodes, how would
you merge both trees in linear time and constant space?

(The solution is here if you are interested
[https://stackoverflow.com/questions/7540546/merging-2-binary...](https://stackoverflow.com/questions/7540546/merging-2-binary-
search-trees) )

------
alkibiades
don’t get what’s so great about this? was asked this during a facebook
interview and solved it on the spot

~~~
hopler
Congratulations, you are brilliant and earned that elite job.

