Sorting a Billion Numbers with Julia

ljw1001 · on March 22, 2016

I'm working on a dataframe for Java for large datasets: https://github.com/lwhite1/tablesaw.

It's not ready for prime time, but:

Time to sum 1,000,000,000 floats: 1.5 seconds

Time to sort 1,000,000,000 floats: 30.5 seconds

The code to fill a column and sort it:

    FloatColumn fc = new FloatColumn("test", 1_000_000_000);
    for (int i = 0; i < 1_000_000_000; i++) {
      fc.add((float) Math.random());
    }
    fc.sortAscending();

twotwotwo · on March 22, 2016

Neither as fast nor out-of-core but, for fun, parallel sort of 2^30 floats (i5-6260U, two cores running at 2.7GHz):

	  $ go run billion.go
	  1m32.471498831s

Where billion.go goes:

        package main
        
        import (
		"fmt"
		"math/rand"
		"time"

		"github.com/twotwotwo/sorts/sortutil"
	)

	func main() {
		floats := make([]float64, 1<<30)
		for i := range floats {
		        floats[i] = rand.Float64()
		}

		t := time.Now()
		sortutil.Float64s(floats)
		fmt.Println(time.Now().Sub(t))
	}

Out-of-core stuff's cool, and sometimes seems a shame it's not available directly (rather than by exporting data to some other program) and widely used in more programming environments.

Glad original poster's experimenting and got something about external sorting up here.

ljw1001 · on March 22, 2016

This is cool, I'm kinda fascinated by golang.

Tablesaw's not out of core either: too much of a pain (for me) to work with variable width columns like text that way.

lololomg · on March 22, 2016

Impressive, what's the speed like when you add a few bytes of payload?

ljw1001 · on March 22, 2016

Good question. I'll have to test it later today if i have some time

rxin · on March 22, 2016

As part of Spark 2.0, we are introducing some new neat optimizations to make a general engine as efficient as specialized code.

I just tried on Spark master branch (i.e. the work-in-progress code for Spark 2.0). It takes about 1.5 secs to sum up 1 billion 64-bit integers using a single thread, and about 1 secs using 2 threads. This was done on my laptop (Early 2015 Macbook Pro 13, 3.1GHz Intel Core i7).

We haven't optimized integer sorting yet, so that's probably not going to be super fast, but the aggregation performance has been pretty good.

  scala> val start = System.nanoTime
  start: Long = 56832659265590
  scala> sqlContext.range(0, 1000L * 1000 * 1000, 1, 2).count()
  res8: Long = 1000000000
  scala> val end = System.nanoTime
  end: Long = 56833605100948
  scala> (end - start) / 1000 / 1000
  res9: Long = 945

Part of the time are actually spent analyzing the query plan, optimizing it, and generating bytecode for it. If we run this on 10 billion integers, the time is about 5 secs.

en4bz · on March 21, 2016

I was initially skeptical of the 2.5s for the SUM base case but after a bit of experimenting I concluded the following. Note this is for the SUM base case only. The whole file is loaded into memory for my tests.

MMAP'd from HDD 1st run - 47s

MMAP'd from HDD 2nd run - 1.35s (OS caches pages in memory)

MMAP'd from NVMe 1st run - 6.5s (OS Caches dropped)

MMAp'd from NVMe 2nd run 1.35s (OS cached again)

brashrat · on March 21, 2016

in a purely functional language like Haskell, you can sort a billion numbers in nanoseconds, all with no performance crushing side effects. Yes, it's true, that's the kind of results you get with lazy evaluation!

When you start searching the resultant sorted list, it might be somewhat slower than if you sorted using other techniques, but--silver lining--search times will only improve after that!

I'd give you the actual stats but so far I've only lazy evaluated them.

ACow_Adonis · on March 22, 2016

No offence, but that's absurd. If anyone said their database can do things almost instantly because they can write a view which is defined super quick vs materialising or running an actual query they'd be, justifiably and hopefully, laughed out of the room.

But that's pretty much what you've done/contributed.

Of course lazy evaluation is faster if it's so lazy it doesn't actually evaluate anything/does something completely different to the original problem.

Now evaluate it and come back with a wall time...

nl · on March 22, 2016

It's not absurd, it's funny.

But you already knew that - you just hadn't evaluated it yet.

Mikeb85 · on March 21, 2016

You mean you can define a sorted list in nano-seconds. The actual calculations (ie. obtaining every result) take much longer...

mcnamaratw · on March 22, 2016

Haha, In that case I'm going to lazily start using Haskell immediately. There's no learning curve at all!

nly · on March 22, 2016

The complexity of fully iterating over a lazily-sorted range is more than 'somewhat slower'. At a guess I'd say it's O(n^2) rather than O(n log n).

im3w1l · on March 22, 2016

Finding sorted(array)[k] is possible in linear (even worst case!) time. Using that n times would as you note take n^2 time. But we could employ a trick. Once we have been asked for log n elements, and have thus already spent n log n time, we could then sort the array.

imh · on March 22, 2016

It should stay n log n. A lazy quicksort can get the min element of a list super fast by only working on the partitions relevant to finding the min. As it requires more elements of the list, it can finish sorting the other partitions. It's the exact same algo as usual.

adrianN · on March 22, 2016

It depends on the implementation. If you stuff your list in a (lazily constructed) heap, you can get the next element in O(log n) amortized.

tanlermin · on March 21, 2016

Are you being sarcastic?

brashrat · on March 21, 2016

I pointed it out to be lighthearted, but it's true not sarcasm, and not only true, it's also meaningfully instructive to think about the benefits of lazy evaluation.

For example, a priority queue (list of tasks) does not need to be fully sorted, only sorted to the extent that the highest priorty item is quickly determinable, and there are algorithms and data structures designed for this behavior.

Sorting (other than to print a sorted list) doesn't pay for itself till you do a number of searches, and associative memory hashes are frequently better if you simply wish to find exact matches again. Even the lowly bubble sort has the not-insignificant benefit of finding the first value in O(n) time which may be the behavior you need: lazy evaluation can be the exact right way to go if you are instructed to sort, as you await more info as to what the appropriate technique might be.

I think humor is only funny when it's based on truthiness.

dekhn · on March 21, 2016

This is an out-of-core problem.

What matters when sorting a billion numbers is wallclock runtime from the start of the sort to the end of materialization (see http://sortbenchmark.org/) or iteration. And it means, almost always, materializing or iterating the entire result set.

it's not clear to me any haskell programs have won any real awards in sorting modest amounts of data, since you have to visit all the data and compare nlogn times.

GFK_of_xmaspast · on March 22, 2016

> Sorting (other than to print a sorted list) doesn't pay for itself till you do a number of searches

Or until you absolutely need to iterate thru a collection in sorted order.

If you need a priority queue, use a priority queue, if you need a sorted container, use a sorted container.

dekhn · on March 21, 2016

Yes, they are being sarcastic. The list has to be fully materialized and iterated over. Further, the Haskell version probably doesn't have a concept of doing work in memory-sized batches with spill-to-disk, so the runtime would likely be much higher. Of course, I don't know enough Haskell so it's entirely possible they somehow figured this how to iterate over the input data in sorted order efficiently without materializing it.

mateuszb · on March 22, 2016

1 billion numbers (let's say DWORDs) doesn't even use full 4GB worth of space. Load it up into memory and sort. With QWORDS it grows to about 8GB. A modern laptop still can load twice the amount and sort everything in memory. That is not a large dataset.