Hacker News new | comments | show | ask | jobs | submit login
Show HN: Write a sort, watch it go (visualsort.appspot.com)
292 points by thedufer on Sept 26, 2011 | hide | past | web | favorite | 80 comments

Sleep sort:

  pos = [0 ... VA.length]
  rpos = [0 ... VA.length]

  swap = (i, j) ->
    VA.swap i, j
    tmp = rpos[j]; rpos[j] = rpos[i]; rpos[i] = tmp
    pos[rpos[i]] = i; pos[rpos[j]] = j

  j = 0

  for i in [0 ... VA.length]
    v = VA.get(i)
    do (v, i) ->
      setTimeout ->
        # Move the element that was originally in position i to position j
        swap pos[i], j++
      , 100 * v

Never heard of this one. It's awesome.

Radix-exchange sort:

  sort = (begin, end, bit) ->
    i = begin
    j = end
    mask = 1 << bit
    while i < j
      while i < j and !(VA.get(i) & mask)
      while i < j and (VA.get(j - 1) & mask)
      if i < j
        VA.swap(i++, --j)

    if bit and i > begin
      sort(begin, i, bit - 1)
    if bit and i < end
      sort(i, end, bit - 1)

  sort(0, VA.length, 30)

Neat, I've never actually seen a radix sort before.

I thought it was a bit clearer to understand what it was doing when I stuck

  VA.locals.bit = bit
at the very top of your sort function.

Hash sort is surprisingly fast:

  for x in [0 ... VA.length]
    ind = VA.get(x)-1
    VA.swap(x, ind + VA.length)
  for x in [0 ... VA.length]
    VA.swap(x, x + VA.length)

In 1/2 as many swaps:

  for x in [0 ... VA.length]
    ind = VA.get(x)-1
    while ind != x
      VA.swap(x, ind)
      ind = VA.get(x)-1

That's cheating somehow...

Trading a tremendous amount of memory for speed. Try replacing a list of integers with a list of Images. You might lock up the browser.

That's pretty cool. Is that related to this? http://arxiv.org/abs/cs/0408040

Merge sort:

   s = (lo, hi) ->
      if lo==hi
      if lo+1==hi
         if VA.gt(lo,hi)
      while lo<=mid && mid<=hi
         if VA.gt(lo,mid)
            VA.insert(mid, lo)

Merge sort w/o using insert (and probably with some bugs...):

    tmp = VA.length
    merge = (start1, start2, end2) ->
      out = start1
      end1 = start2 - 1
      for x in [tmp+out .. tmp+end2]
        if start1 <= end1 && start2 <= end2
          if VA.lt(start1, start2)
            VA.swap(start1, x)
            VA.swap(start2, x)
        else if start1 <= end1
          VA.swap(start1, x)
          VA.swap(start2, x)
      for x in [out..end2]
        VA.swap(x, tmp+x)
    mergesort = (left, right) ->
      if (right - left) == 1
        if VA.gt(left, right)
          VA.swap(left, right)
      else if (right - left) == 0
        /* do nothing */
        split = Math.floor((right-left)/2) + left;
        mergesort(left, split - 1)
        mergesort(split, right)
        merge(left, split, right)
    mergesort(0, VA.length - 1)

Added as a prewritten sort. Thanks!

>Nothing will display if there is an infinite loop.

Did someone solve the halting problem while I wasn't looking?

I assume it just computes every step, and then displays them all.

This is exactly what's going on. You can't get a reflow on the page without giving up control, and I didn't want to require people to write asynchronous code to give up control after every swap. So it runs your code and queues up actions, to be played back when its done. This is also why you have to use the methods on VA instead of working directly on the array.

This might be a dumb question, but can you add numbers about the size of the stack to the right-side statistics? I.e. How many variables and calls to functions are stored in memory at at a time.

I'm not sure if this possible or relevant to record based on how Javascript engines compile and work, but it would seem interesting to know.

The implementation here is doing tricksie things behind the scenes to make this work without instrumenting the javascript engine.

Instead of actually doing the operations, your sorting code outputs a list of actions to perform, which is then displayed on the webpage.

Yes; however, at each step I'm already storing the state of VA.locals. If you can show me how I would get this data, I'd be happy to display it.

Very neat, but here's a couple of additions I'd like to see:

1. Allow for the array to contain values outside of the range of [1..length]. Ideally, allow for duplicate values as well. Since you control the swap and insert operations, you can maintain two separate sets, one of the actual numbers, and one of the "normalized" values that you display as your graph.

2. Give us an operation to highlight a set of lines and have the highlights persist. I wanted to take mayoff's radix-exchange sort and tweak it to highlight the "working set" that it's currently sorting, but there's no way to do that. I'm thinking here that we just need one function VA.persistHighlight() which takes start and end indices and highlights them, and persists those highlights until a new call to VA.persistHighlight(). The "transient" highlights would be layered on top of the persistent one.

1. Requires a bit more work than I have time for right now. I'll keep it in mind.

2. Fantastic idea. This is exactly what I needed when I was struggling to write a quicksort from memory. Its in there now.

I just sent you a pull request that implements 1.

Forgot to mention: source at https://github.com/thedufer/VisualSort

Yay. My first program in CoffeScript. It was fun to implement algorithm of which I only vaguely remembered how it worked. Details bit me few times before I got it working and right.


  fix_heap = (y, size) ->
      y1 = 2*y+1
      y2 = 2*y+2
      if y1 >= size then break
      if !(y2 < size) || VA.gt(y1, y2)
        if VA.lt(y, y1)
          VA.swap(y, y1)
          y = y1
        if VA.lt(y, y2)
          VA.swap(y, y2)
          y = y2
  pull_up = (y) ->
      y0 = (y-1) >> 1
      if y == 0 || VA.lt(y, y0) then break
      VA.swap(y0, y)
      y = y0
  for x in [1 ... VA.length]
  for x in [VA.length-1 ... 0] by -1
    VA.swap(x, 0)
    fix_heap(0, x)

You can also build the heap from the bottom up and change fix_heap a little which altogether is faster and simpler (thanks wiki):

  fix_heap = (y, size) ->
      y1 = 2*y+1
      if y1 >= size then break
      if y1 + 1 < size && VA.gt(y1 + 1, y1) 
      if VA.lt(y, y1)
        VA.swap(y, y1)
        y = y1
  for x in [VA.length >> 1 ... -1] by -1
    fix_heap(x, VA.length)
  for x in [VA.length-1 ... 0] by -1
    VA.swap(x, 0)
    fix_heap(0, x)

I put this implementation in as a prewritten sort. Thanks!

This is fantastic for teaching sorting algorithms visually. I'm going to see if I can forward this to my former professors and teachers.

Agreed. Nice work OP!

It's fantastic. I don't know CoffeeScript so all I did was compare the pre-written ones to each other and see if they did what I expected them to do, but it's a nice visual representation. The only problem I have with it is that the neon yellow on black with a gray/white background can be pain inducing, especially when moving quickly.

Speaking of pain inducing, I found watching the bubble sort very pain inducing! I've always known it was slow compared to other options, but this make you never want to see the thing mentioned again!

Looks like bubble sort is so slow because this implementation has a very high ratio between the costs of a compare and a swap. A swap involves redrawing large parts of the graphical area, while a compare is fast. Bubble sort makes N^2 swaps, as compared to insertion sort which makes N^2 comparisons but only N swaps.

When a swap costs the same as a compare, bubble sort is as fast as the other N^2 sorts.

Turn off "Quick Compare" to make comparisons take longer if you want a different ratio. By default, it takes about 1/15 as long to compare as swap (I thought the visualization looked better this way).

I know Bubble sort is "bad", but there's no reason to make it worse than it needs to be. The current algorithm continues comparing the items that have already "bubbled" (or in this case, "sunk") by iterating the whole length on each pass, rather than shorting the loop to only consider the unsorted portion.

A minor tweak is (changed lines marked with #):

  bubbleSort = ->
  VA.locals.swapped = true
    y = VA.length  # grab initial length
    while VA.locals.swapped
      y--          # shorten sorted portion on each pass
      VA.locals.swapped = false
      for x in [0...y] # only iterate to y
        VA.locals.x = x
        if VA.gt(x, x + 1)
          VA.swap(x, x + 1)
          VA.locals.swapped = true


Neat. I came back 5 hours later, and the change was live. Thanks. (I was going to submit a change via Git when I got home, but you beat me to it!)

Changed to a darker purple. You're right, it was pretty jarring.

I wanted to implement the sleep sort algorithm, but I can't use

  VA = [4, 2, 5, 7];
  for(x=0; x<VA.length; x++) {
      var y = VA[x];
      setTimeout('console.log(' + y + ')', y*1000);
it tells me I can't use "var". And when I remove that I get some syntax error. Any nice alternative way to implement a sleep sort in coffescript?

This is the coffeescript equivalent, not sure how to get it working in the visualisation though as it doesn't involve swaps.

  VA = [4, 2, 5, 7]
  for y in VA
    setTimeout('console.log(' + y + ')', y*1000)

Try this:

    VA = [4, 2, 5, 7]
    for y in VA
      setTimeout "console.log(#{y})", y*1000

I love the stats on the side. Knowing how many comparisons and swaps occur really shows the complexity of the algorithms.

And to segue into my own area of interest with sorting, it would be awesome to see cache and branch behaviour too (though of course this running in the browser, it's not exactly the real behaviour, but still).

How would you expect such things to be calculated? Not sure exactly what you're looking for.

Well, they would need to be simulated.

Cache behavior simulation would be kind of a pain to implement, but I might give it a shot at some point. What is branch behavior? I'm not familiar with the term.

Yeah, I can see that. A big problem is that the set sizes where cache effects start to come into effect are much larger than the sizes you're modelling. A level 1 cache is going to have 4K, which is at least 5x the size you're modelling by default. But you can probably use play sizes, like a cache which only holds 16 or 32 ints.

By branch behaviour, I mean branch prediction. I did some research (http://paulbiggar.com/research/#sorting-tr) before that showed that branch prediction is really important for sorts. For example, insertion sort is way better than selection sort, and radix sort has really really good branch prediction properties, making it beat quicksort (well, this is LSB radixsort, and I think you implemented MSB radixsort, but no doubt it applies somewhat).

So for the branch simulation, implement a 2 bit dynamic saturating counter, and note the number of mispredictions. There are simpler and more complex predictors, but that's probably the simplest that's a reasonable approximation of real life.

I'll definitely implement a cache model to show cache behavior. I'm not sure branch prediction is realistic, though; I'm not actually parsing the code myself, so finding the branch points would be a lot of extra work. I'm considering a scheme where you have to call an extra function inside each branching statement to get useful branch prediction results. Something along the lines of:

    for x in foreach([1...VA.length], "for1")
      y = 0
      while branch(VA.gt(x, y), "while1")
        if branch(y == x, "if1")
      VA.insert(x, y)
Where the `branch`, `foreach` functions are identities over their first argument, and they use the second argument to to track branching for a given branch point (i.e. you must uniquely name each branch, foreach line). A foreach would be a branch run arg0.length + 1 times, with arg0.length true branches followed by a single false branch. Sound at all reasonable?

Needs a shuffle() function - can't bogosort without it.


  shuffle = ->
    for x in [0..VA.length]
      VA.swap x, Math.floor(Math.random()*VA.length)
  sort = ->
    if !checkSorted()
      setTimeout(sort, 10)
  checkSorted = ->
    for x in [0..VA.length-1]
      if VA.gte(x, x+1)
          return false
    return true

You should be able to write one with swaps - go for it.

Had fun writing a gnome-sort in a minute or two:

    pos = 1
    while pos < VA.length
      if (VA.gte(pos, pos-1))
        VA.swap(pos, pos-1)
        if (pos > 1)

The input set is problematic because it contains every number from 0 to VA.length exactly once.

This allows "algorithms" like this:

  for x in [0...VA.length]
    while (VA.get(x) > x)
      VA.swap(VA.get(x), x)

Its not a game; its a learning tool. It doesn't bother me that people can do that.

Also, that's not going to work quite right - the values are [1..VA.length] (see the bottom of the page), while the indices are [0...VA.length].

Nice, though I think static visualizations illustrate the examples more quickly and are easier to compare:


I also love Robert Sedgewick's technique of using angle to encode array value. Animated and static examples here:

http://bl.ocks.org/1243323 http://mbostock.github.com/protovis/ex/sort.html

Implemented my own quicksort that seems to have better performance than the built-in one.

  q = (lo, hi) =>
    # highlight range
    p = VA.get(lo)
    l = lo
    r = hi
    # test pivot position
    t = false
    while (l < r)
      if VA.gt(l,r)
        t = !t
      if t
    if (l > lo)
      q(lo, l - 1)
    if (hi > l + 1)
      q(l + 1, hi)
  q(0,VA.length - 1)

This Quick Sort Visualization is cool too. Were you inspired by this? ;-)


I can't say I'd seen that, but its pretty awesome.

Cocktail sort

  twoWayBubbleSort = ->
    VA.locals.swapped = true
    while VA.locals.swapped
      VA.locals.swapped = false
      for x in [0...VA.length - 1]
        VA.locals.x = x
        if VA.gt(x, x + 1)
          VA.swap(x, x + 1)
          VA.locals.swapped = true
      for x in[(VA.length-1)..1]
        VA.locals.x = x
        if VA.gt(x - 1, x)
          VA.swap(x - 1, x)
          VA.locals.swapped = true


I don't like the insertion sort. Couldn't it do a binary search on the sorted part of the list, when it's looking for the place to put the next element?

I coded up the easiest version of each algorithm, rather than the most efficient. (Quicksort uses the first element as a pivot, rather than something more intelligent) Feel free to email me an algorithm if you think it should be there/is better than what I wrote. The pre-written algorithms weren't exactly my focus.

Fair enough. I'll try to write one if I can learn coffeescript right now :)

How about adding the VA.equals and VA.notequals functions? You've got all the other comparators, but it seems a shame that you don't have these.

I...don't know how I missed them. They're in now.

Stooge sort:

  stoogesort = (lo, hi) ->
    if VA.lt(hi, lo)
      VA.swap(lo, hi)
    if hi - lo > 1
      third = Math.floor((hi - lo + 1)/3)
      stoogesort(lo, hi-third)
      stoogesort(lo+third, hi)
      stoogesort(lo, hi-third)

  stoogesort(0, VA.length-1)
It's probably best to run this on an array that's smaller than 100 numbers...

definitely have the page start with an example.

Makes sense. Starts with a bubble sort now.

Wikipedia has similar animations for some of its articles about sorting algorithms, e.g.:


This is a good set of sorting visualisations too, presented more compactly (but not customisable): http://www.sorting-algorithms.com/

Staring at screen looking at how bars shift and swap, I think merge sort is most elegant visually.

Commenting to save for later

Voting the submission up and then going to http://news.ycombinator.com/saved?id=matlaber would do the same without adding a comment to the thread.

Interesting way to bookmark :-)

Super nice!

The quicksort implementation is incorrect. The code simply picking the left-most element as the pivot, and as a result it becomes pathologically slow for nearly-sorted array -- for example, run bubblesort then quicksort, or run quicksort twice -- not to mention potential stack overflow problem due to O(N) recursive call.

That's still a quicksort - it always has n^2 as its worst case I thought? If you pick the wrong pivot each time.

There are ways to make the n^2 worst case extraordinarily unlikely, and there is a way to make it impossible (though it is highly technical).

Some others have posted methods (random pivot, median of 3). The gnu libc qsort implementation is a good learning tool (it uses median of 3). I think CLRS has the guaranteed n log n version.

> That's still a quicksort - it always has n^2 as its worst case I thought? If you pick the wrong pivot each time.

Pick median of three random elements as pivot; it's hard to pick wrong pivot every time unless your RNG always returns 7.

Why three random elements though? If you expect some structure in the array, then picking pivots based on that structure makes more sense (e.g. first, middle and last for sorted, reverse sorted for partially sorted arrays).

On the other hand, if you are sorting unsanitised input, then random is good, but it should be secure pseudo-random numbers if it's important to your security/performance whilst under attack.

Poor choice of the pivot makes the implementation bad, but absolutely not incorrect.

I said this in another comment, but I coded up the simplest version of each algorithm - the quicksort is correct, just not optimal. The focus was on letting people write their own code, not the pre-written algorithms.

Add the following right above the line "pivot = left" to fix it:

VA.swap(left, Math.floor(Math.random() * (right - left + 1)) + left)

Alternatively you can insert this code for the median of 3 version:

   centre = (left + right) / 2;
   if VA.lt(left, centre)
     if VA.lt(centre, right)
       VA.swap(left, centre)
     else if VA.lt(left, right)
       VA.swap(left, right)
     if VA.lt(centre, right)
       VA.swap(left, centre)
     else if VA.lt(left, right)
       VA.swap(left, right)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact