Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Cache-Efficient Functional Algorithms (2014) [pdf] (cmu.edu)
46 points by ingve on April 23, 2016 | hide | past | favorite | 13 comments


I'm getting the impression that our problems with caching are related to the way we abstract computer memory. If languages are built with the assumption that you can jump around your RAM with no cost, I guess that can result in code that could have been made more efficient. This makes me wonder - what kind of abstraction would help alleviate this problem?


The point of this article is that typical code that deals with persistent/immutable data structures can be optimized to work efficiently in presence of cache hierarchy. That is, with persistent data structures one can keep code rather abstract while still allowing for cache friendly implementation.


But jumping around in DRAM does have a slower access time.


More abstractions! Higher level languages!


Don't you need a lower level language (than even assembly and maybe even some sort of hardware support)? Virtual memory abstracts away whether the address is in RAM or disk. Going a step further, you need to undo even one more layer of abstraction so you know whether you're accessing L1, L2, or main memory.

But to be honest I think just malloc is a decent abstraction. You control memory layout and the rest is having algorithms with good access patterns.


Great question.

I'm a huge fan of separating the business specifications from talking about the execution.

-------

Some runtimes will do this under the hood.

RDBMS execution plans are a good example. Plan costs are calculated using data-dependent statistics. And cache-friendliness is taken into account: in postgres, hash joins are made to fit L1 (tuples per bucket = 10) and L3 caches (work_mem default to 4MB, after which batches are introduced) [1]

Some JITs will also monitor performance and behavior, and try to speculatively prune unused paths.

But this is all letting the runtime deal with it.

-------

Execution should be treated as an orthogonal but tweakable concern. And be exposed in some way to the user.

Halide [2] is one very interesting approach, is letting user choose:

* between materialization and compute-again

* parallelism levels

* chunk sizes

* scheduling, etc.

However the storage is implied by the execution chosen. Which to me is a constraint that is automatically enforced, and not exposed.

-------

All knobs, and their consequences, should be readily available to both users and compilers.

To me, typed languages could, should contain so much more information about performance (or anything!).

The perfect abstraction should be the seamless integration of 4 languages:

* Logical business requirements (what you want done). Should be user friendly

* Business properties requirements (constraints like liveness, performance envelope, security, or business ones). Declarative

* A language to help in proofs. Think TLA+[3] or the checker framework [4].

* A language to express which implementations are chosen, knobs to configure, how the layout is done (array of structs, struct of arrays?). Think Halide or JAI [5].

Such a combination of languages should be built on extremely typed primitives; the combination of which should propagate type information, and provide hooks to provide more type information.

An implementation of a sum of a value over a range should make emerge information about:

* which business properties it offers

* which items will benefit from fitting in cache

* which parts can be stopped without lock contention

* different options for the memory layout coming into and out of this algorithm

In the internals, something as simple as a + b would look like:

    @Associative
    @Commutative
    @MapsTo(Algebra.addition)
    @Inlineable
    @CanSafePointBefore
    @WillNotContend
    @NonBlocking
    @CacheSizefunction(argumentsSize + returnSize)
    @BatchingStrategies(Single,Batch.L1Size,Batch.L1Size)
    @WaitOnIOStrategy(Unscheduleable)
    @ArgumentPassing(ref,value,ArrayOfStruct,StructOfArrays)
    @ReturnPassing(ref,value,Array)
    @Properties(case unsigned: ret >= a and ret >= b)
    def plus(
      a @NumberType(uint16, int, IEEEfloat, double, @CustomizeableNumber)
      ,
      b @NumberType(uint16, int, IEEEfloat, double, @CustomizeableNumber)
    ) {
      return @
        a 
        + @Impl(asm_x86_add,simd_x86_add,cuda_add)
          @NumberTypeInference
        b
    }

-------

In short: a language into which anyone can commit type information, with type information that could inform performance.

[1] http://patshaughnessy.net/2016/1/22/is-your-postgres-query-s...

[2] http://halide-lang.org/

[3] http://cseweb.ucsd.edu/~daricket/papers/fm2012.pdf

[4] http://types.cs.washington.edu/checker-framework/current/che...

[5] https://inductive.no/jai/


This is a really interesting article and I'm glad this is getting attention. It's especially refreshing to see a theoretical treatment bridging algorithms with more "modern" hardware implementations.

One opportunity I do find in this article is real-world performance tests and determining the importance of constants in performance tuning. Prakop et al. did some really interesting work in 1999 on cache-oblivious algorithms, with applications to kernels such as the FFT and matrix multiplication. The work is theoretically extremely interesting, but in practice, hand-written or machine-generated algorithms continue to dominate. The most famous example of this is probably Kazushige Goto, whose hand-optimized GotoBLAS (now under open source license as OpenBLAS) still provides some of the fastest linear algebra kernels in the world.

If you're interested in learning more about how the differences between the two approaches shake out in linear algebra, I recommend "Is Cache-Oblivious DGEMM Viable?" by Gunnels et al.


This is about algorithms over immutable or persistent data structures, not about algorithms operating on functions, a common and unfortunate misnaming.

Ocaml is functional but allows arbitrary state mutations. On the other hand it is typically very straightforward to implement persistent data structures in a non-functional language with GC support.


The book on immutable and persistent data structures is named "Purely Functional Data Structures" so I don't think there's anything wrong here. Functional means a lot more than programming with functions nowadays. If you can't look past the letters of the word to see its meaning in a larger context idk what to say.


I would be very hesitant to accuse Robert Harper of misusing the term 'functional'. I'm not an expert in their field, but I interpret 'functional algorithm' to mean an algorithm whose inputs are transformed by successive applications of functions (in the mathematical sense.)


I guess I formulated not precisely enough. I have not tried to accuse anybody of misusing. From books that I read I got a strong impression that the term "functional programming" as a term is about treating functions as the first-class entities (or even the only entities). The persistent data structures is orthogonal to this. But I guess I just have to accept that in the field it is a common speak to use "functional algorithms" to mean "algorithms for persistent data structures".

Personally I see this as unfortunate as the persistence as a term IMO is much more precise. It also remains that one also needs GC and properly accounts its cost which this paper discuss a lot.


The confusion is that you are thinking of function in the programming language sense (i.e. A subroutine ) rather than in the mathematical sense (i.e. A function called with the same arguments always returns the same value). The mathematic definition of function necessitates persistent data structures (i.e. You do not modify the dataset but return a new dataset with the modifications).


Yet persistent data do not imply functional programming. For example, one can work with persistent data structures in Java using global variables to pass references to data structures. One can also implement algorithms from the article using explicit stack to avoid recursion. Yet such atrocities will not change the conclusion of the article that the code will be cache-efficient as long as the runtime is sufficiently smart.

So my assumption (which could be wrong) is that by using the word "functional" that article unnecessary restricts its auditory. For example, a Java programmer would not even bother to open it assuming this is something relevant at best to Haskell or F#. I definitely had such expectation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: