

Lampsort:  Leslie Lamport's Non-recursive Quicksort - mpweiher
http://bertrandmeyer.com/2014/12/07/lampsort/

======
lisper
TL;DR: if you have an algorithm that makes two recursive calls and it doesn't
matter what order you do them in (e.g. Quicksort) then you can rewrite that
algorithm to use a set of sub-problems in stead of a stack of sub-problems.
The latter is what you (implicitly) get when you write an algorithm
recursively.

A much more interesting observation would have been to note that you could
accomplish the same thing by introducing a new language construct that
explicitly noted that the order of two calls doesn't matter. That would allow
a compiler to automatically produce code that used a set-of-intervals data
structure, or -- much more relevant in today's world -- code that used
multiple cores. (But that would have defeated the real purpose of the article,
which was to be a tacit plug for Eiffel, which doesn't have such a construct.)

~~~
fiatmoney
"you could accomplish the same thing by introducing a new language construct
that explicitly noted that the order of two calls doesn't matter"

Effectively, this is what Golang's "go" construct does. Combined with channels
you can construct an arbitrary dependency graph as well.

~~~
pacala
A very common need is to do an bunch of independent RPCs. Small wish: Allow
returning values from a goroutine, along the lines of "results <\- go
rpc(args)".

------
jwmerrill
> His trick is to ask the audience to give a non-recursive version of
> Quicksort, and of course everyone starts trying to remove the recursion, for
> example by making the stack explicit or looking for invertible functions in
> calls. But his point is that recursion is not at all fundamental in
> Quicksort.

I'm not completely sure what distinction Meyer is trying to draw between the
implementation he discusses, and "making the stack explicit". He ends up with
a "set" of intervals that he pushes onto and pops from, which might as well
have been a stack of arguments.

Is the point that the order you pull things out of the set doesn't matter, so
demanding that it behave as a stack is overly prescriptive?

~~~
Cushman
Watching the source lecture[0], it's not about how the code is written so much
as how the function is specified. It's easier to describe quicksort as a few
set equations without invoking the concept of recursion, but since we don't
write code like that people can't get to it easily.

The nuts-and-bolts difference is in fact that it's a set and not a stack, but
the underlying question is, why do we all treat the set implementation as the
derived version?

[0]
[http://channel9.msdn.com/Events/Build/2014/3-642](http://channel9.msdn.com/Events/Build/2014/3-642)
~35m

~~~
TheLoneWolfling
> since we don't write code like that people can't get to it easily.

I've lately been trying to write code with no recursion - as in able to be
implemented with each function having a variable indicating where to return
to, no implicit stack in sight. (Statically-determinable stack size, to put it
another way.)

It took me a bit to get the hang of, but I'd argue that it ends up being
easier in the long run. Far easier to change the priority operator on your
queue than trying to reason what behavior different orders of recursion get
you, for example.

Ditto with graph search algorithms. It's enlightening to teach graph search
algorithms as a single algorithm with a queue, where changing the priority
gives you Dijkstra's algorithm (least cost first), DFS (LIFO), BFS (FIFO), A*
(current cost + underestimating huristic of cost left), or whatever. Also easy
to show that A* degenerates into Dijkstra's algorithm when you use a null
heuristic (i.e. a heuristic that returns a constant) if you teach them as
variants of the same algorithm.

------
Cushman
If I understand the argument, the use of a set rather than a stack encodes the
observation that the order of execution of adjacent partition operations isn't
important. That ordering is information you don't have to store, meaning you
can choose any set-like data structure, and being able to choose your access
is theoretically useful: For example, by following a rule like "partition the
smallest interval first", you could keep the set size ("depth") to a minimum.
(That might be handy due to random pivot choice?) You don't have the freedom
to choose that in a stack-backed implementation, whether recursive or "non".

So it is reasonable to argue this is a more abstract, "fundamental"
implementation of the sort. Intuitively, I'm not sure it's that different from
"remov[ing] the recursion, for example by making the stack explicit", then
removing the stack, but it's sure interesting.

Edit: See
[https://news.ycombinator.com/item?id=8713160](https://news.ycombinator.com/item?id=8713160)
for more context; this is correct, but the point is not really about how the
code works.

~~~
tedunangst
"Smallest interval first" is exactly what naive recursion gives you.

~~~
TheLoneWolfling
Not quite.

Naive recursion doesn't generally change the order between the two subproblems
at a single stage. It's generally just sort(first half), sort(second half).

Think of the case of a really bad pivot as the first choice - say, the second-
highest element. Naive recursion generally will recurse on the lower chunk
first, whereas smallest chunk first will recurse on the higher chunk.

~~~
tedunangst
Fair point. I assume you get a reasonable pivot.

Recursing on one of the two halves of the "current" partition is the cache
friendliest option. Smallest interval will guarantee this by induction, I
think, but I've never heard that the size of the stack is really a problem in
quicksort. If you continuously partition out only a single element, your
runtime is going to suck, even if you "sort" those elements first.

~~~
TheLoneWolfling
> I've never heard that the size of the stack is really a problem in quicksort

Naive quicksort tends to, in pathological cases, run out of stack space as
opposed to having performance issues.

Smallest-interval ends up with a strict upper limit of O(log n) elements in
the running set, as opposed to O(n) for naive quicksort. This can be useful.

------
rav
Where Quicksort uses the recursion stack to maintain the `intervals`
implicitly, Lampsort instead uses some set data structure to maintain this set
of subproblems explicitly.

It reminds me of the way you turn recursive depth-first search (recurse on
children that have not been visited) into non-recursive depth-first search
(push the children that have not been visited onto the DFS stack). Calling it
"non-recursive" is a bit misleading, since you're trading recursion stack
space for an actual data stack to maintain the recursion you are supposed to
avoid -- but it is a useful program transformation to know.

~~~
taeric
There are ways to do a DFS of a tree without any sort of stack, though. This
sounds more like a nitpick on stack vs set. You still have a data structure
that grows in lg(n).

~~~
hyt7u
> There are ways to do a DFS of a tree without any sort of stack, though

What do you mean by this? You need a way to say that the unexplored nodes
we've most recently found are the ones we explore first. How do you do that
(with constant time operations) without what's effectively a stack?

~~~
taeric
Threaded trees was what I had in mind.[1] I believe the example there still
talks of checking a visited list, though. Either a misunderstanding on my
part, or on that page. I'll have to check on my books. Later, sadly. :)

[1]
[http://en.wikipedia.org/wiki/Threaded_binary_tree](http://en.wikipedia.org/wiki/Threaded_binary_tree)

Edit: Apologies for a quick edit. I remembered I had a short implementation on
my computer. Basically, when you follow a link, you know if it was a regular
link or a thread. If it was a regular link, you could still go left from the
new node. Otherwise, you visit and go right. Again, keep note of whether it
was a thread or not. Repeat until done. (Make sense?)

~~~
al2o3cr
"Again, keep note of whether it was a thread or not."

This sounds like the visited list again...

~~~
taeric
You only have to remember the link you just followed. Basically, make a
boolean "followedThread" and initialize to false. Then, go to the root and
follow this algorithm

    
    
        if (!followedThread && left != null) 
            set followedThread <- curNode.left.isThread
            set curNode <- curNode.left.node
        else
            set followedThread <- curNode.right.isThread
            set curNode <- curNode.right.node
    

Only node you really have to special case is the root node, and you can do
that by making its right.node == null. Then you just do this while curNode !=
null.

That make sense? (Also... very possible that I made a mistake here...)

------
oggy
> Formal specification languages look remarkably like programming > languages;
> to be usable for significant applications they must meet > the same
> challenges: defining a coherent type system...

Lamport disagreed with this statement before [1], wonder what he'd say
nowadays (he's been known to change his mind). Gossip warning: Lamport gave a
talk at Meyer's university a few years back, bashing OOP. Meyer's a huge OOP
proponent.

[1]: [http://research.microsoft.com/en-
us/um/people/lamport/pubs/l...](http://research.microsoft.com/en-
us/um/people/lamport/pubs/lamport-types.pdf)

------
yason
I hiss at the term "non-recursive quicksort" which is used a lot. Quicksort
per definition is recursive logically. The _same computation recurs_ in the
subarrays, you know.

Whether you maintain the state by using the programming language's own stack,
a separate stack, a queue, or some other data structure either implicitly or
explicitly is a matter of implementation. The amount of space you need to
allocate for the program state as a function of the size of the input is the
same regardless.

In a high level enough language it doesn't matter because you are to encode,
in that language, your intent to apply the same process again and again to the
subpartitions until everything is sorted, and a sufficiently smart compiler
could choose any implementation that best suits the underlying hardware.

~~~
mpweiher
"The same computation recurs"

Hmm...you mean like in a loop?

~~~
yason
Nope.

You can run, for example, bubble sort or insertion sort in a loop, always
iterating over the same set of data (with some fixed amount of state) until
the array is sorted.

Conversely and per definition, quicksort divides the data flow into subarrays
and then sub-subarrays recursively which means that a plain single loop with
constant space for state won't do. You need extra space for storing state and
the amount of that extra space needed depends on the size of the input;
logarithmically, in quicksort's case.

The implementation of recursion can vary but the control and data flow still
does recur in a nested fashion.

------
joelgrus
Here's a Python version I quickly threw together, for people like me who don't
know Eiffel:

[https://gist.github.com/joelgrus/9dc47ebb22243fe990e5](https://gist.github.com/joelgrus/9dc47ebb22243fe990e5)

I think it's right. :)

------
bluecalm
I am not sure what's the point is. It's still recursion it just bypasses using
frame pointer and simulates its own. Here with set but stack is customary way
to do that.

The idea isn't new either. Link to C implementation of quicksort without
function calls:
[http://www.ucw.cz/libucw/#what](http://www.ucw.cz/libucw/#what) (simple
implementations is in array-simple.h). This implementation is very similar to
the one in the article and is way (in my experience often more than 2x) faster
than qsort or std::qsort so worth taking a look at.

------
truantbuick
I'm not sure I agree with the idea that this somehow characterizes the idea of
quicksort better.

Technically, you can always convert a recursive algorithm to iterative and
back again (and not just with existence arguments -- it can be generally
constructed), so why pick on quicksort specifically?

The only thing achieved here is putting the partitioning of the list and
sorting in two separable parts. Whether that characterizes quicksort any
better is a matter of opinion, I reckon.

------
ultimape
Is this a useful thing to implement as a parallel algorithm, or would I be
better of just using mergesort as my choice?

------
tedunangst
First question was why anybody would present this using Eiffel. That's
apparently answered at the bottom, but I still don't understand. Eiffel is
good at being abstract or something? But isn't the code presented a concrete
implementation of a single way to do things?

~~~
jerluc
I think it probably has a lot to do with the fact that the author, Bertrand
Meyer, invented Eiffel
([http://en.wikipedia.org/wiki/Bertrand_Meyer#Computer_languag...](http://en.wikipedia.org/wiki/Bertrand_Meyer#Computer_languages)).

~~~
vram22
Eiffel is considered by some people to be one of the better programming
languages (maybe even near the top of the heap, though of course a lot depends
on application area you are using it for, just as with any other language).
Features in it may have influenced other languages. Not sure, but I think
Design by Contract is one of them - preconditions, postconditions and
invariants, which can be implemented somewhat, using assertions, in languages
that provide assertions. But Eiffel provides language support for them. Search
for some Eiffel success stories. There was one very good one - I don't have
the link right now - about someone at Hewlett Packard using it to develop
software for printers, maybe a device driver, after earlier attempts using
other languages turned out to have many problems - IIRC.

Edit: after a quick search, I found an article that is about the same story,
of HP using Eiffel - though I don't think it is the same article I read a
while ago:

[https://archive.eiffel.com/eiffel/projects/hp/creel.html](https://archive.eiffel.com/eiffel/projects/hp/creel.html)

From the article:

``Eiffel is the perfect embedded language...'': an interview of Christopher
Creel, HP CCD (Color laserjet and Consumables Division) How HP used ISE Eiffel
to develop leading-edge printer software, used Design by Contract to preserve
the work of its best designers, and in the process found bugs in its legacy
software, discovered a flaw in a hardware chip, and learned a few lessons --
such as how to do in weeks what used to require months.

I've used design by contract principles (in C) in a successful middleware
software that I was the team leader for, ensured (pun intended :) that my team
used it extensively in the code, and the end result met its goals and was
somewhat widely used in projects by the company where it was developed.

Edit 2: Bertrand Meyer is also the author of a very well-known and respected
book, Object Oriented Software Construction. It's a big book. I read a lot of
it, some years ago. It has some very good points in it. Bertrand Meyer is one
person who seems to have thought and done a great deal about the process of
software development (with a view to getting better results), on many levels,
from theory through practice to industrial application.

[http://en.wikipedia.org/wiki/Bertrand_Meyer](http://en.wikipedia.org/wiki/Bertrand_Meyer)

[http://en.wikipedia.org/wiki/Object-
Oriented_Software_Constr...](http://en.wikipedia.org/wiki/Object-
Oriented_Software_Construction)

~~~
coldcode
Better is always subjective. Eiffel is not exactly in common usage in real
programs. Lots of people think their language is better at something but until
lots of people start using it, we don't really know.

~~~
vram22
>Better is always subjective. True, and I implied that in my first sentence.
It is well known that Eiffel is not used by a lot of people. That does not
automatically mean it is not good, though.

------
ape4
How does it compare speed-wise?

------
danbruc
require j >= a.upper seems wrong.

