
Sorting a Billion Numbers with Julia - josep2
http://mikeinnes.github.io/2016/03/21/sorting.html
======
ljw1001
I'm working on a dataframe for Java for large datasets:
[https://github.com/lwhite1/tablesaw](https://github.com/lwhite1/tablesaw).

It's not ready for prime time, but:

Time to sum 1,000,000,000 floats: 1.5 seconds

Time to sort 1,000,000,000 floats: 30.5 seconds

The code to fill a column and sort it:

    
    
        FloatColumn fc = new FloatColumn("test", 1_000_000_000);
        for (int i = 0; i < 1_000_000_000; i++) {
          fc.add((float) Math.random());
        }
        fc.sortAscending();

~~~
twotwotwo
Neither as fast nor out-of-core but, for fun, parallel sort of 2^30 floats
(i5-6260U, two cores running at 2.7GHz):

    
    
    	  $ go run billion.go
    	  1m32.471498831s
    

Where billion.go goes:

    
    
            package main
            
            import (
    		"fmt"
    		"math/rand"
    		"time"
    
    		"github.com/twotwotwo/sorts/sortutil"
    	)
    
    	func main() {
    		floats := make([]float64, 1<<30)
    		for i := range floats {
    		        floats[i] = rand.Float64()
    		}
    
    		t := time.Now()
    		sortutil.Float64s(floats)
    		fmt.Println(time.Now().Sub(t))
    	}
    

Out-of-core stuff's cool, and sometimes seems a shame it's not available
directly (rather than by exporting data to some other program) and widely used
in more programming environments.

Glad original poster's experimenting and got something about external sorting
up here.

~~~
ljw1001
This is cool, I'm kinda fascinated by golang.

Tablesaw's not out of core either: too much of a pain (for me) to work with
variable width columns like text that way.

------
rxin
As part of Spark 2.0, we are introducing some new neat optimizations to make a
general engine as efficient as specialized code.

I just tried on Spark master branch (i.e. the work-in-progress code for Spark
2.0). It takes about 1.5 secs to sum up 1 billion 64-bit integers using a
single thread, and about 1 secs using 2 threads. This was done on my laptop
(Early 2015 Macbook Pro 13, 3.1GHz Intel Core i7).

We haven't optimized integer sorting yet, so that's probably not going to be
super fast, but the aggregation performance has been pretty good.

    
    
      scala> val start = System.nanoTime
      start: Long = 56832659265590
      scala> sqlContext.range(0, 1000L * 1000 * 1000, 1, 2).count()
      res8: Long = 1000000000
      scala> val end = System.nanoTime
      end: Long = 56833605100948
      scala> (end - start) / 1000 / 1000
      res9: Long = 945
    

Part of the time are actually spent analyzing the query plan, optimizing it,
and generating bytecode for it. If we run this on 10 billion integers, the
time is about 5 secs.

------
en4bz
I was initially skeptical of the 2.5s for the SUM base case but after a bit of
experimenting I concluded the following. Note this is for the SUM base case
only. The whole file is loaded into memory for my tests.

MMAP'd from HDD 1st run - 47s

MMAP'd from HDD 2nd run - 1.35s (OS caches pages in memory)

MMAP'd from NVMe 1st run - 6.5s (OS Caches dropped)

MMAp'd from NVMe 2nd run 1.35s (OS cached again)

------
brashrat
in a purely functional language like Haskell, you can sort a billion numbers
in nanoseconds, all with no performance crushing side effects. Yes, it's true,
that's the kind of results you get with lazy evaluation!

When you start searching the resultant sorted list, it might be somewhat
slower than if you sorted using other techniques, but--silver lining--search
times will only improve after that!

I'd give you the actual stats but so far I've only lazy evaluated them.

~~~
nly
The complexity of fully iterating over a lazily-sorted range is more than
'somewhat slower'. At a guess I'd say it's O(n^2) rather than O(n log n).

~~~
im3w1l
Finding sorted(array)[k] is possible in linear (even worst case!) time. Using
that n times would as you note take n^2 time. But we could employ a trick.
Once we have been asked for log n elements, and have thus already spent n log
n time, we could then sort the array.

------
mateuszb
1 billion numbers (let's say DWORDs) doesn't even use full 4GB worth of space.
Load it up into memory and sort. With QWORDS it grows to about 8GB. A modern
laptop still can load twice the amount and sort everything in memory. That is
not a large dataset.

